Arm joins the Confidential Computing party

Arm’s announcement of Realms isn’t just about the Edge

The Confidential Computing Consortium is a Linux Project designed to encourage open source projects around confidential computing. Arm has been part of the consortium for a while – in fact, the company is Premier Member – but things got interesting on the 30th March, 2021. That’s when Arm announced their latest architecture: Arm 9. Arm 9 includes a new set of features, called Realms. There’s not a huge amount of information in the announcement about Realms, but Arm is clear that this is their big play into Confidential Computing:

To address the greatest technology challenge today – securing the world’s data – the Armv9 roadmap introduces the Arm Confidential Compute Architecture (CCA).

I happen to live about 30 minutes’ drive from the main Arm campus in Cambridge (UK, of course), and know a number of Arm folks professionally and socially – I think I may even have interviewed for a job with them many moons ago – but I don’t want to write a puff piece about the company or the technology[1]. What I’m interested in, instead, is the impact this announcement is likely to have on the Confidential Computing landscape.

Arm has had an element in their architecture for a while called TrustZone which provides a number of capabilities around security, but TrustZone isn’t a TEE (Trusted Execution Environment) on its own. A TEE is the generally accepted unit of confidential computing – the minimum building block on which you can build. It is arguably possible to construct TEEs using TrustZone, but that’s not what it’s designed for, and Arm’s decision to introduce Realms strongly suggests that they want to address this. This is borne out by the press release.

Why is all this important? I suspect that few of you have laptops or desktops that run on Arm (Raspberry Pi machines apart – see below). Few of the servers in the public cloud run Arm, and Realms are probably not aimed particularly at your mobile phone (for which TrustZone is a better fit). Why, then, is Arm bothering to make a fuss about this and to put such an enormous design effort into this new technology? There are two answers, it seems to me, one of which is probably pretty much a sure thing, and the other of which is more of a competitive gamble.

Answer 1 – the Edge

Despite recent intrusions by both AMD and Intel into the Edge space, the market is dominated by Arm-based[3] devices. And Edge security is huge, partly because we’re just seeing a large increase in the number of Edge devices, and partly because security is really hard at the Edge, where devices are more difficult to defend, both logically (they’re on remote networks, more vulnerable to malicious attack) and physically (many are out of the control of their owners, living on customer premises, up utility poles, on gas pipelines or in sports stadia, just to give a few examples). One of the problems that confidential computing aims to solve is the issue that, traditionally, once an attacker has physical access to a system, it should be considered compromised. TEEs allow some strong mitigations against that problem (at least against most attackers and timeframes), so making it easy to create and use TEEs on the Edge makes a lot of sense. With the addition of Realms to the Arm 9 architecture, Arm is signally its intent to address security on the Edge, and to defend and consolidate its position as leader in the market.

Answer 2 – the Cloud

I mentioned above that few public cloud hosts run Arm – this is true, but it’s likely to change. Arm would certainly like to see it change, and to see its chipsets move into the cloud mainstream. There has been a lot of work to improve support for server-scale Arm within Linux (in fact, open source support for Arm is generally excellent, not least because of the success of Arm-based chips in Raspberry Pi machines). Amazon Cloud Services (AWS) started offering Arm-based servers to customers as long ago as 2018. This is a market in which Arm would clearly love to be more active and carve out a larger share, and the growing importance of confidential computing in the cloud (and public and private) means that having a strong story in this space was important: Realms are Arm’s answer to this.

What next?

An announcement of an architecture is not the same as availability of hardware or software to run on it. We can expect it to be quite a few months before we see production chips running Arm 9, though evaluation hardware should be available to trusted partners well before that, and software emulation for various components of the architecture will probably come even sooner. This means that those interested in working with Realms should be able to get things moving and have something ready pretty much by the time of availability of production hardware. We’ll need to see how easy they are to use, what performance impact they have, etc., but Arm do have an advantage here: as they are not the first into the confidential computing space, they’ve had the opportunity to watch Intel and AMD and see what has worked, and what hasn’t, both technically and in terms of what the market seems to like. I have high hopes for Arm Realms, and Enarx, the open source confidential computing project with which I’m closely involved, has plans to support them when we can: our architecture was designed with multi-platform support from the beginning.


1 – I should also note that I participated in a panel session on Confidential Computing which was put together by Arm for their “Arm Vision Day”, but I was in no way compensated for this[2].

2 -in fact, the still for the video is such a terrible picture of me that I think maybe I have grounds to sue for it to be taken down.

3 – Arm doesn’t manufacture chips itself: it licenses its designs to other companies, who create, manufacture and ship devices themselves.

The importance of hardware End of Life

Security considerations are important when considering End of Life.

Linus Torvald’s announcement this week that Itanium support is “orphaned” in the Linux kernel means that we shouldn’t expect further support for it in the future, and possibly that support will be dropped in the future. In 2019, floppy disk support was dropped from the Linux kernel. In this article, I want to make the case that security considerations are important when considering End of Life for hardware platforms and components.

Dropping support for hardware which customers aren’t using is understandable if you’re a proprietary company and can decide what platforms and components to concentrate on, but why do so in open source software? Open source enthusiasts are likely to be running old hardware for years – sometimes decades after anybody is still producing it. There’s a vibrant community, in fact, of enthusiasts who enjoying resurrecting old hardware and getting it running (and I mean really old: EDSAC (1947) old), some of whom enjoy getting Linux running on it, and some of whom enjoy running it on Linux – by which I mean emulating the old hardware by running it on Linux hardware. It’s a fascinating set of communities, and if it’s your sort of thing, I encourage you to have a look.

But what about dropping open source software support (which tends to centre around Linux kernel support) for hardware which isn’t ancient, but is no longer manufactured and/or has a small or dwindling user base? One reason you might give would be that the size of the kernel for “normal” users (users of more recent hardware) is impacted by support for old hardware. This would be true if you had to compile the kernel with all options in it, but Linux distributions like Fedora, Ubuntu, Debian and RHEL already pare down the number of supported systems to something which they deem sensible, and it’s not that difficult to compile a kernel which cuts that down even further – my main home system is an AMD box (with AMD graphics card) running a kernel which I’ve compiled without most Intel-specific drivers, for instance.

There are other reasons, though, for dropping support for old hardware, and considering that it has met its End of Life. Here are three of the most important.

Resources

My first point isn’t specifically security related, but is an important consideration: while there are many volunteers (and paid folks!) working on the Linux kernel, we (the community) don’t have an unlimited number of skilled engineers. Many older hardware components and architectures are maintained by teams of dedicated people, and the option exists for communities who rely on older hardware to fund resources to ensure that they keep running, are patched against security holes, etc.. Once there ceases to be sufficient funding to keep these types of resources available, however, hardware is likely to become “orphaned”, as in the case of Itanium.

There is also a secondary impact, in that however modularised the kernel is, there is likely to be some requirement for resources and time to coordinate testing, patching, documentation and other tasks associated with kernel modules, which needs to be performed by people who aren’t associated with that particular hardware. The community is generally very generous with its time and understanding around such issues, but once the resources and time required to keep such components “current” reaches a certain level in relation to the amount of use being made of the hardware, it may not make sense to continue.

Security risk to named hardware

People expect the software they run to maintain certain levels of security, and the Linux kernel is no exception. Over the past 5-10 years or so, there’s been a surge in work to improve security for all hardware and platforms which Linux supports. A good example of a feature which is applicable across multiple platforms is Address Space Layout Randomisation (ASLR), for instance. The problem here is not only that there may be some such changes which are not applicable to older hardware platforms – meaning that Linux is less secure when running on older hardware – but also that, even when it is possible, the resources required to port the changes, or just to test that they work, may be unavailable. This relates to the point about resources above: even when there’s a core team dedicated to the hardware, they may not include security experts able to port and verify security features.

The problem goes beyond this, however, in that it is not just new security features which are an issue. Over the past week, issues were discovered in the popular sudo tool which ships with most Linux systems, and libgcrypt, a cryptographic library used by some Linux components. The sudo problem was years old, and the libgcrypt so new that few distributions had taken the updated version, and neither of them is directly related to the Linux kernel, but we know that bugs – security bugs – exist in the Linux kernel for many years before being discovered and patched. The ability to create and test these patches across the range of supported hardware depends, yet again, not just on availability of the hardware to test it on, or enthusiastic volunteers with general expertise in the platform, but on security experts willing, able and with the time to do the work.

Security risks to other hardware – and beyond

There is a final – and possibly surprising – point, which is that there may sometimes be occasions when continuing support for old hardware has a negative impact on security for other hardware, and that is even if resources are available to test and implement changes. In order to be able to make improvements to certain features and functionality to the kernel, sometimes there is a need for significant architectural changes. The best-known example (though not necessarily directly security-related) is the Big Kernel Lock, or BLK, an architectural feature of the Linux kernel until 2.6.39 in 2011, which had been introduced to aid concurrency management, but ended up having significant negative impacts on performance.

In some cases, older hardware may be unable to accept such changes, or, even worse, maintaining support for older hardware may impose such constraints on architectural changes – or require such baroque and complex work-arounds – that it is in the best interests of the broader security of the kernel to drop support. Luckily, the Linux kernel’s modular design means that such cases should be few and far between, but they do need to be taken into consideration.

Conclusion

Some of the arguments I’ve made above apply not only to hardware, but to software as well: people often keep wanting to run software well past its expected support life. The difference with software is that it is often possible to emulate the hardware or software environment on which it is expected to run, often via virtual machines (VMs). Maintaining these environments is a challenge in itself, but may actually offer a via alternative to trying to keep old hardware running.

End of Life is an important consideration for hardware and software, and, much as we may enjoy nursing old hardware along, it doesn’t makes sense to delay the inevitable – End of Life – beyond a certain point. When that point is will depend on many things, but security considerations should be included.

Immutability: my favourite superpower

As a security guy, I approve of defence in depth.

I’m a recent but dedicated convert to Silverblue, which I run on my main home laptop and which I’ll be putting onto my work laptop when I’m due a hardware upgrade in a few months’ time.  I wrote an article about Silverblue over at Enable Sysadmin, and over the weekend, I moved the laptop that one of my kids has over to it as well.  You can learn more about Silverblue over at the main Silverblue site, but in terms of usability, look and feel, it’s basically a version of Fedora.  There’s one key difference, however, which is that the operating system is mounted read-only, meaning that it’s immutable.

What does “immutable” mean?  It means that it can’t be changed.  To be more accurate, in a software context, it generally means that something can’t be changed during run-time.

Important digression – constant immutability

I realised as I wrote that final sentence that it might be a little misleading.  Many  programming languages have the concept of “constants”.  A constant is a variable (or set, or data structure) which is constant – that is, not variable.  You can assign a value to a constant, and generally expect it not to change.  But – and this depends on the language you are using – it may be that the constant is not immutable.  This seems to go against common sense[1], but that’s just the way that some languages are designed.  The bottom line is this: if you have a variable that you intend to be immutable, check the syntax of the programming language you’re using and take any specific steps needed to maintain that immutability if required.

Operating System immutability

In Silverblue’s case, it’s the operating system that’s immutable.  You install applications in containers (of which more later), using Flatpak, rather than onto the root filesystem.  This means not only that the installation of applications is isolated from the core filesystem, but also that the ability for malicious applications to compromise your system is significantly reduced.  It’s not impossible[2], but the risk is significantly lower.

How do you update your system, then?  Well, what you do is create a new boot image which includes any updated packages that are needed, and when you’re ready, you boot into that.  Silverblue provides simple tools to do this: it’s arguably less hassle than the standard way of upgrading your system.  This approach also makes it very easy to maintain different versions of an operating system, or installations with different sets of packages.  If you need to test an application in a particular environment, you boot into the image that reflects that environment, and do the testing.  Another environment?  Another image.

We’re more interested in the security properties that this offers us, however.  Not only is it very difficult to compromise the core operating system as a standard user[3], but you are always operating in a known environment, and knowability is very much a desirable property for security, as you can test, monitor and perform forensic analysis from a known configuration.  From a security point of view (let alone what other benefits it delivers), immutability is definitely an asset in an operating system.

Container immutability

This isn’t the place to describe containers (also known as “Linux containers” or, less frequently or accurately these days, “Docker containers) in detail, but they are basically collections of software that you create as images and then run workloads on a host server (sometimes known as a “pod”).  One of the great things about containers is that they’re generally very fast to spin up (provision and execute) from an image, and another is that the format of that image – the packaging format – is well-defined, so it’s easy to create the images themselves.

From our point of view, however, what’s great about containers is that you can choose to use them immutably.  In fact, that’s the way they’re generally used: using mutable containers is generally considered an anti-pattern.  The standard (and “correct”) way to use containers is to bundle each application component and required dependencies into a well-defined (and hopefully small) container, and deploy that as required.  The way that containers are designed doesn’t mean that you can’t change any of the software within the running container, but the way that they run discourages you from doing that, which is good, as you definitely shouldn’t.  Remember: immutable software gives better knowability, and improves your resistance to run-time compromise.  Instead, given how lightweight containers are, you should design your application in such a way that if you need to, you can just kill the container instance and replace it with an instance from an updated image.

This brings us to two of the reasons that you should never run containers with root privilege:

  • there’s a temptation for legitimate users to use that privilege to update software in a running container, reducing knowability, and possibly introducing unexpected behaviour;
  • there are many more opportunities for compromise if a malicious actor – human or automated – can change the underlying software in the container.

Double immutability with Silverblue

I mentioned above that Silverblue runs applications in containers.  This means that you have two levels of security provided as default when you run applications on a Silverblue system:

  1. the operating system immutability;
  2. the container immutability.

As a security guy, I approve of defence in depth, and this is a classic example of that property.  I also like the fact that I can control what I’m running – and what versions – with a great deal more ease than if I were on a standard operating system.


1 – though, to be fair, the phrases “programming language” and “common sense” are rarely used positively in the same sentence in my experience.

2 – we generally try to avoid the word “impossible” when describing attacks or vulnerabilities in security.

3 – as with many security issues, once you have sudo or root access, the situation is significantly degraded.

I’m turning off your security.

“Don’t worry, I know what I’m doing.”

Today’s security story is people turning security off.  For me, the fact that it’s even a story is the story.  This particular story is covered in The Register, who explain (to nobody’s surprise) that some of the patches to fix issues identified in CPU’s (think Spectre, Meltdown, etc.) can actually slow down the applications running on them.  The problem is that, in some cases, they don’t slow them down a little bit, but rather a lot.  By which I mean up to 50%.  And if you’ve bought expensive hardware – or rented it [1] – then you’d generally prefer it if it runs your applications/programs/workloads quickly, rather than just half as fast as they might run.

And so you turn off the security patches.  Your decision: fine.

No, stop: this isn’t what has happened.

The mythical “you”, the person running the workload, isn’t the person who makes the decision, in most cases, because it’s been made for you.  This is the real story.

Linus Torvalds, and a bunch of other experts in the Linux kernel[2], have decided that although the patch that could make your workloads secure is available, the functionality that does it should be “off” by default.  They reason – quite correctly, in my opinion – that the vast majority of people running workloads, won’t easily be able to turn this functionality on themselves

They also reason – again, correctly, in my opinion – that most people will care more about how quickly their workloads run than about how secure they are.  I’m not happy about this, but that’s the way it is.

What I worry about is the final step in the logic to making the decision.  I’m going to quote Linus:

“Have you seen any actual realistic attacks for normal human users?” he asked. “Things where the kernel should actually care? The JavaScript thing is for the browser to fix up, not for the kernel to say ‘now everything should run up to 50 per cent slower.'”

I get the reasoning behind this, but I don’t like it.  To give some context, somebody came up with an example attack which could compromise certain workloads, and Linus points out that there are better ways to fix this attack than fixing it in the kernel. My concerns are two-fold:

  1. although there may be better places to fix that particular attack, a kernel-level fix is likely to fix an entire class of attacks, meaning better protection for users who are using any application which might include an attack vector.
  2. pointing out that there haven’t been any attacks yet not only ignores the fact that there is a future out there[3] but also points malicious actors in the direction of a likely attack vector.

Now, I know that the more dedicated malicious actors are already looking for these things, but do we really need to advertise?

What’s my fix?

I don’t have one, or at least not an easy one.

Somebody, somewhere, needs to decide whether security is turned on or off.  What I’d honestly like to see is an easier set of controls to allow people to turn on or off security, and to understand the trade-offs when they do that.  The problems with that are:

  • the trade-offs are often much more complex than just “fast and insecure” or “slow and secure”, and are really difficult to explain.
  • in order to make a sensible decision about trade-offs, people need to understand risk.  And people are awful at understanding risk.

And there’s a “chicken and egg problem”[7] here: people won’t understand risk until they are offered the chance to make decisions, but there’s little incentive to offer them complex decisions unless they understand risk.

My plea?  Where possible, expose risk, and explain what it is.  And if you’re turning off security-related functionality, make it easy to turn back on for those who need it.


1 – a quick heads-up: this is what “deploying to the cloud” actually is.

2 – what sits at the bottom of many of the workloads that are running in servers.

3 – hopefully.  If the Three Minute Warning[4] sounds while you’re reading this, you may wish to duck and cover.  You can come back to it later[6].

4 – “… sounds like this …”[5].

5 – 80s reference.

6 – or not.  See [3].

7 – for non-native English readers, this means “a problem where the solution requires two pieces, both of which are dependent on each other”.

6 types of attack: learning from Supermicro, State Actors and silicon

… it could have happened, and it could be happening now.

Last week, Bloomberg published a story detailing how Chinese state actors had allegedly forced employees of Supermicro (or companies subcontracting to them) to insert a small chip – the silicon in the title – into motherboards destined for Apple and Amazon.  The article talked about how an investigation into these boards had uncovered this chip and the steps that Apple, Amazon and others had taken.  The story was vigorously denied by Supermicro, Apple and Amazon, but that didn’t stop Supermicro’s stock price from tumbling by over 50%.

I have heard strong views expressed by people with expertise in the topic on both sides of the argument: that it probably didn’t happen, and that it probably did.  One side argues that the denials by Apple and Amazon, for instance, might have been impacted by legal “gagging orders” from the US government.  An opposing argument suggests that the Bloomberg reporters might have confused this story with a similar one that occurred a few months ago.  Whether this particular story is correct in every detail, or a fabrication – intentional or unintentional – is not what I’m interested in at this point.  What I’m interested in is not whether it did happen in this instance: the clear message is that it could have happened, and it could be happening now.

I’ve written before about State Actors, and whether you should worry about them.  There’s another question which this story brings up, which is possibly even more germane: what can you do about it if you are worried about them?  This breaks down further into two questions:

  • how can I tell if my systems have been compromised?
  • what can I do if I discover that they have?

The first of these is easily enough to keep us occupied for now [1], so let’s spend some time on that.  First, let’s first define six types of compromise, think about how they might be carried out, and then consider the questions above for each:

  • supply-chain hardware compromise;
  • supply-chain firmware compromise;
  • supply-chain software compromise;
  • post-provisioning hardware compromise;
  • post-provisioning firmware compromise;
  • post-provisioning software compromise.

This article doesn’t provide sufficient space to go into detail of these types of attack, and provides an overview of each, instead[2].

Terms

  • Supply-chain – all of the steps up to when you start actually running a system.  From manufacture through installation, including vendors of all hardware components and all software, OEMs, integrators and even shipping firms that have physical access to any pieces of the system.  For all supply-chain compromises, the key question is the extent to which you, the owner of a system, can trust every single member of the supply chain[3].
  • Post-provisioning – any point after which you have installed the hardware, put all of the software you want on it, and started running it: the time during which you might consider the system “under your control”.
  • Hardware – the physical components of a system.
  • Software – software that you have installed on the system and over which you have some control: typically the Operating System and application software.  The amount of control depends on factors such as whether you use proprietary or open source software, and how much of it is produced, compiled or checked by you.
  • Firmware – special software that controls how the hardware interacts with the standard software on the machine, the hardware that comprises the system, and external systems.  It is typically provided by hardware vendors and its operation opaque to owners and operators of the system.

Compromise types

See the table at the bottom of this article for a short summary of the points below.

  1. Supply-chain hardware – there are multiple opportunities in the supply chain to compromise hardware, but the more hard they are made to detect, the more difficult they are to perform.  The attack described in the Bloomberg story would be extremely difficult to detect, but the addition of a keyboard logger to a keyboard just before delivery (for instance) would be correspondingly more simple.
  2. Supply-chain firmware – of all the options, this has the best return on investment for an attacker.  Assuming good access to an appropriate part of the supply chain, inserting firmware that (for instance) impacts network performance or leaks data over a wifi connection is relatively simple.  The difficulty in detection comes from the fact that although it is possible for the owner of the system to check that the firmware is what they think it is, what that measurement confirms is only that the vendor has told them what they have supplied.  So the “medium” rating relates only to firmware that was implanted by members in the supply chain who did not source the original firmware: otherwise, it’s “high”.
  3. Supply-chain software – by this, I mean software that comes installed on a system when it is delivered.  Some organisations will insist in “clean” systems being delivered to them[4], and will install everything from the Operating System upwards themselves.  This means that they basically now have to trust their Operating System vendor[5], which is maybe better than trusting other members of the supply chain to have installed the software correctly.  I’d say that it’s not too simple to mess with this in the supply chain, if only because checking isn’t too hard for the legitimate members of the chain.
  4. Post-provisioning hardware – this is where somebody with physical access to your hardware – after it’s been set up and is running – inserts or attaches hardware to it.  I nearly gave this a “high” rating for difficulty below, assuming that we’re talking about servers, rather than laptops or desktop systems, as one would hope that your servers are well-protected, but the ease with which attackers have shown that they can typically get physical access to systems using techniques like social engineering, means that I’ve downgraded this to “medium”.  Detection, on the other hand, should be fairly simple given sufficient resources (hence the “medium” rating), and although I don’t believe anybody who says that a system is “tamper-proof”, tamper-evidence is a much simpler property to achieve.
  5. Post-provisioning firmware – when you patch your Operating System, it will often also patch firmware on the rest of your system.  This is generally a good thing to do, as patches may provide security, resilience or performance improvements, but you’re stuck with the same problem as with supply-chain firmware that you need to trust the vendor: in fact, you need to trust both your Operating System vendor and their relationship with the firmware vendor.
  6. Post-provisioning software – is it easy to compromise systems via their Operating System and/or application software?  Yes: this we know.  Luckily – though depending on the sophistication of the attack – there are generally good tools and mechanisms for detecting such compromises, including behavioural monitoring.

Table

 

Compromise type Attacker difficulty Detection difficulty
Supply-chain hardware High High
Supply-chain firmware Low Medium
Supply-chain software Medium Medium
Post-provisioning hardware Medium Medium
Post-provisioning firmware Medium Medium
Post-provisioning software Low Low

Conclusion

What are your chances of spotting a compromise on your system?  I would argue that they are generally pretty much in line with the difficulty of performing the attack in the first place: with the glaring exception of supply-chain firmware.  We’ve seen attacks of this type, and they’re very difficult to detect.  The good news is that there is some good work going on to help detection of these types of attacks, particularly in the world of Linux[6] and open source.  In the meantime, I would argue our best forms of defence are currently:

  • for supply-chain: build close relationships, use known and trusted suppliers.  You may want to restrict as much as possible of your supply chain to “friendly” regimes if you’re worried about State Actor attacks, but this is very hard in the global economy.
  • for post-provisioning: lock down your systems as much as possible – both physically and logically – and use behavioural monitoring to try to detect anomalies in what you expect them to be doing.

1 – I’ll try to write something on this other topic in a different article.

2 – depending on interest, I’ll also consider a series of articles to go into more detail on each.

3 – how certain are you, for instance, that your delivery company won’t give your own government’s security services access to the boxes containing your equipment before they deliver them to you?

4 – though see above: what about the firmware?

5 – though you can always compile your own Operating System if you use open source software[6].

6 – oh, you didn’t compile your compiler yourself?  All bets off, then…

7 – yes, “GNU Linux”.

Disbelieving the many eyes hypothesis

There is a view that because Open Source Software is subject to review by many eyes, all the bugs will be ironed out of it. This is a myth.

大勢がレビューしていても信用しないという仮説

Writing code is hard.  Writing secure code is harder: much harder.  And before you get there, you need to think about design and architecture.  When you’re writing code to implement security functionality, it’s often based on architectures and designs which have been pored over and examined in detail.  They may even reflect standards which have gone through worldwide review processes and are generally considered perfect and unbreakable*.

However good those designs and architectures are, though, there’s something about putting things into actual software that’s, well, special.  With the exception of software proven to be mathematically correct**, being able to write software which accurately implements the functionality you’re trying to realise is somewhere between a science and an art.  This is no surprise to anyone who’s actually written any software, tried to debug software or divine software’s correctness by stepping through it.  It’s not the key point of this post either, however.

Nobody*** actually believes that the software that comes out of this process is going to be perfect, but everybody agrees that software should be made as close to perfect and bug-free as possible.  It is for this reason that code review is a core principle of software development.  And luckily – in my view, at least – much of the code that we use these days in our day-to-day lives is Open Source, which means that anybody can look at it, and it’s available for tens or hundreds of thousands of eyes to review.

And herein lies the problem.  There is a view that because Open Source Software is subject to review by many eyes, all the bugs will be ironed out of it.  This is a myth.  A dangerous myth.  The problems with this view are at least twofold.  The first is the “if you build it, they will come” fallacy.  I remember when there was a list of all the websites in the world, and if you added your website to that list, people would visit it****.  In the same way, the number of Open Source projects was (maybe) once so small that there was a good chance that people might look at and review your code.  Those days are past – long past.  Second, for many areas of security functionality – crypto primitives implementation is a good example – the number of suitably qualified eyes is low.

Don’t think that I am in any way suggesting that the problem is any lesser in proprietary code: quite the opposite.  Not only are the designs and architectures in proprietary software often hidden from review, but you have fewer eyes available to look at the code, and the dangers of hierarchical pressure and groupthink are dramatically increased.  “Proprietary code is more secure” is less myth, more fake news.  I completely understand why companies like to keep their security software secret – and I’m afraid that the “it’s to protect our intellectual property” line is too often a platitude they tell themselves, when really, it’s just unsafe to release it.  So for me, it’s Open Source all the way when we’re looking at security software.

So, what can we do?  Well, companies and other organisations that care about security functionality can – and have, I believe a responsibility to – expend resources on checking and reviewing the code that implements that functionality.  That is part of what Red Hat, the organisation for whom I work, is committed to doing.  Alongside that, we, the Open Source community, can – and are – finding ways to support critical projects and improve the amount of review that goes into that code*****.  And we should encourage academic organisations to train students in the black art of security software writing and review, not to mention highlighting the importance of Open Source Software.

We can do better – and we are doing better.  Because what we need to realise is that the reason the “many eyes hypothesis” is a myth is not that many eyes won’t improve code – they will – but that we don’t have enough expert eyes looking.  Yet.


* Yeah, really: “perfect and unbreakable”.  Let’s just pretend that’s true for the purposes of this discussion.

** …and which still relies on the design and architecture actually to do what you want – or think you want – of course, so good luck.

*** nobody who’s actually written more than about 5 lines of code (or more than 6 characters of Perl)

**** I added one.  They came.  It was like some sort of magic.

***** see, for instance, the Linux Foundation‘s Core Infrastructure Initiative