Security, cost and usability (pick 2)

If we cannot be explicit that there is a trade-off, it’s always security that loses.

Everybody wants security: why wouldn’t you? Let’s role-play: you’re a software engineer on a project to create a security product. There comes a time in the product life-cycle when it’s nearly due, and, as usual, time is tight. So you’re in the regular project meeting and the product manager’s there, so you ask them what they want you to do: should you prioritise security? The product manager is very clear[1]: they will tell you that they want the product as secure as possible – and they’re right, because that’s what customers want. I’ve never spoken to a customer (and I’ve spoken to lots of customers over the years) who said that they’d prefer a product which wasn’t as secure as possible. But there’s a problem, which is that all customers also want their products tomorrow – in fact, most customers want their products today, if not yesterday.

Luckily, products can generally be produced more quickly if more resources are applied (though Frederick Brooks’ The Mythical Man Month tells us that simple application of more engineers is actually likely to have a negative impact), so the requirement for speed of delivery can be translated to cost. There’s another thing that customers want, however, and that is for products to be easy to use: who wants to get a new product and then, when it arrives, for it to take months to integrate or for it to be almost impossible for their employees to run it as they expect?

So, to clarify, customers want a security product to be be the following:

  1. secure – security is a strong requirement for many enterprises and organisations[3], and although we shouldn’t ever use the word secure on its own, that’s still what customers want;
  2. cheap – nobody wants to pay more than the minimum they can;
  3. usable – everybody likes simple-to-use, easy-to-integrate applications.

There’s a problem, however, which is that out of the three properties above, you can only choose two for any application or project. You say this to your product manager (who’s always right, remember[1]), and they’ll say: “don’t be ridiculous! I want all three”.

But it just doesn’t work like that: why? Here’s my take on the reasons. Security, simply stated, is designed to stop people doing things. Stated from the point of view of a user, security’s view is to reduce usability. “Doing security” is generally around applying controls to actions in a system – whether by users or non-human entities – and the simplest way to apply it is “blanket security” – defaulting to blocking or denying actions. This is sometimes known as fail to safe or fail to closed.

Let’s take an example: you have a simple internal network in your office and you wish to implement a firewall between your network and the Internet, to stop malicious actors from probing your internal machines and to compromised systems on the internal network from communicating out to the Internet. “Easy,” you think, and set up a DENY ALL rule for connections originating outside the firewall, and a DENY ALL rule for connections originating inside the firewall, with the addition of a ALLOW all outgoing port 443 connections to ensure that people can use web browsers to make HTTPS connections. You set up the firewall, and get ready to head home, knowing that your work is done. But then the problems arise:

  • it turns out that some users would like to be able to send email, which requires a different outgoing port number;
  • sending email often goes hand in hand with receiving email, so you need to allow incoming connections to your mail server;
  • one of your printers has been compromised, and is making connections over port 443 to an external botnet;
  • in order to administer the pay system, your accountant – who is not a full-time employee, and works from home, needs to access your network via a VPN, which requires the ability to accept an incoming connection.

Your “easy” just became more difficult – and it’s going to get more difficult still as more users start encountering what they will see as your attempts to make their day-to-day revenue-generating lives more difficult.

This is a very simple scenario, but it’s clear that in order to allow people actually to use a system, you need to spend a lot more time understanding how security will interact with it, and how people’s experience of the measures you put in place will be impacted. Usability and user experience (“UX”) is a complex field on its own, but when you combine it with the extra requirements around security, things become even more tricky.

You need both to manage the requirements of users to whom the security measures should be transparent (“TLS encryption should be on by default”) and those who may need much more control (“developers need to be able to select the TLS cipher suite options when connecting to a vendor’s database”), so you need to understand the different personae[4] you are targeting for your application. You also need to understand the different failures modes, and what the correct behaviour should be: if authentication fails three times in a row, should the medical professional who is trying to get a rush blood test result be locked out of the system, or should the result be provided, and a message sent to an administrator, for example? There will be more decisions to make, based on what your application does, the security policies of your customers, their risk profiles, and more. All of these investigations and decisions take time, and time equates to money. What is more, they also require expertise – both in terms of security but also usability – and that is in itself expensive.

So, you have three options:

  1. choose usability and cost – you can prioritise usability and low cost, but you won’t be able to apply security as you might like;
  2. choose security and cost – in this case, you can apply more security to the system, but you need to be aware that usability – and therefore your customer’s acceptance of the system – will suffer;
  3. choose usability and security – I wish this was the one that we chose every time: you decide that you’re willing to wait longer or pay more for a more secure product, which people can use.

I’m not going to pretend that these are easy decisions, nor that they are always clear cut. And a product manager’s job is sometimes to make difficult choices – hopefully ones which can be re-balanced in a later release, but difficult choices nevertheless. It’s really important, however, that anyone involved in security – as an engineer, as a UX expert, as a product manager, as a customer – understands the trade-off here. If we cannot be explicit that there is a trade-off, then the trade-off will be made silently, and in my experience, it’s always security that loses.


1 – and right: product managers are always right[2].

2 – I know: I used to be a product manager.

3 – and the main subject of this blog, so it shouldn’t be a surprise that I’m writing about it.

4 – or personas if you really, really must. I got an “A” in Latin O level, and I’m not letting this one go.

Do I trust this package?

The area of software supply chain management is growing in importance.

This isn’t one of those police dramas where a suspect parcel arrives at the precinct and someone realises just in time that it may be a bomb – what we’re talking about here is open source software packages (though the impact on your application may be similar if you’re not sufficiently suspicious). Open source software is everywhere these days – which is great – but how can you be sure that you should trust the software you’ve downloaded to do what you want? The area of software supply chain management – of which this discussion forms a part – is fairly newly visible in the industry, but is growing in importance. We’re going to consider a particular example.

There’s a huge conversation to be had here about what trust means (see my article “What is trust?” as a starting point, and I have a forthcoming book on Trust in Computing and the Cloud for Wiley), but let’s assume that you have a need for a library which provides some cryptographic protocol implementation. What do you need to know, and what are you choices? We’ll assume, for now, that you’ve already made what is almost certainly the right choice, and gone with an open source implementation (see many of my articles on this blog for why open source is just best for security), and that you don’t want to be building everything from source all the time: you need something stable and maintained. What should be your source of a new package?

Option 1 – use a vendor

There are many vendors out there now who provide open source software through a variety of mechanisms – typically subscription. Red Hat, my employer (see standard disclosure) is one of them. In this case, the vendor will typically stand behind the fitness for use of a particular package, provide patches, etc.. This is your easiest and best choice in many cases. There may be times, however, when you want to use a package which is not provided by a vendor, or not packaged by your vendor of choice: what do you do then? Equally, what decisions do vendors need to make about how to trust a package?

Option 2 – delve deeper

This is where things get complex. So complex, in fact, that I’m going to be examining them at some length in my book. For the purposes of this article, though, I’ll try to be brief. We’ll start with the assumption that there is a single maintainer of the package, and multiple contributors. The contributors provide code (and tests and documentation, etc.) to the project, and the maintainer provides builds – binaries/libraries – for you to consume, rather than your taking the source code and compiling it yourself (which is actually what a vendor is likely to do, though they still need to consider most of the points below). This is a library to provide cryptographic capabilities, so it’s fairly safe to assume that we care about its security. There are at least five specific areas which we need to consider in detail, all of them relying on the maintainer to a large degree (I’ve used the example of security here, though very similar considerations exist for almost any package): let’s look at the issues.

  1. build – how is the package that you are consuming created? Is the build process performed on a “clean” (that is, non-compromised) machine, with the appropriate compilers and libraries (there’s a turtles problem here!)? If the binary is created with untrusted tools, then how can we trust it at all, so what measures does the maintainer take to ensure the “cleanness” of the build environment? It would be great if the build process is documented as a repeatable build, so that those who want to check it can do so.
  2. integrity – this is related to build, in that we want to be sure that the source code inputs to the build process – the code coming, for instance, from a git repository – are what we expect. If, somehow, compromised code is injected into the build process, then we are in a very bad position. We want to know exactly which version of the source code is being used as the basis for the package we are consuming so that we can track features – and bugs. As above, having a repeatable build is a great bonus here.
  3. responsiveness – this is a measure of how responsive – or not – the maintainer is to changes. Generally, we want stable features, tied to known versions, but a quick response to bug and (in particular) security patches. If the maintainer doesn’t accept patches in a timely manner, then we need to worry about the security of our package. We should also be asking questions like, “is there a well-defined security disclosure of vulnerability management process?” (See my article Security disclosure or vulnerability management?), and, if so, “is it followed”?
  4. provenance – all code is not created equal, and one of the things of which a maintainer should be keeping track is the provenance of contributors. If a large amount of code in a part of the package which provides particularly sensitive features is suddenly submitted by an unknown contributor with a pseudonymous email address and no history of contributions of security functionality, this should raise alarm bells. On the other hand, if there is a group of contributors employed by a company with a history of open source contributions and well-reviewed code who submit a large patch, this is probably less troublesome. This is a difficult issue to manage, and there are typically no definite “OK” or “no-go” signs, but the maintainer’s awareness and management of contributors and their contributions is an important point to consider.
  5. expertise – this is the most tricky. You may have a maintainer who is excellent at managing all of the points above, but is just not an expert in certain aspects of the functionality of the contributed code. As a consumer of the package, however, I need to be sure that it is fit for purpose, and that may include, in the case of the security-related package we’re considering, being assured that the correct cryptographic primitives are used, that bounds-checking is enforced on byte streams, that proper key lengths are used or that constant time implementations are provided for particular primitives. This is very, very hard, and the job of maintainer can easily become a full-time one if they are acting as the expert for a large and/or complex project. Indeed, best practice in such cases is to have a team of trusted, experienced experts who work either as co-maintainers or as a senior advisory group for the project. Alternatively, having external people or organisations (such as industry bodies) perform audits of the project at critical junctures – when a major release is due, or when an important vulnerability is patched, for instance – allows the maintainer to share this responsibility. It’s important to note that the project does not become magically “secure” just because it’s open source (see Disbelieving the many eyes hypothesis), but that the community, when it comes together, can significantly improve the assurance that consumers of a project can have in the packages which is produces.

Once we consider these areas, we then need to work out how we measure and track each of them. Who is in a position to judge the extent to which any particular maintainer is fulfilling each of the areas? How much can we trust them? These are a set of complex issues, and one about which much more needs to be written, but I am passionate about exposing the importance of explicit trust in computing, particularly in open source. There is work going on around open source supply chain management, for instance the new (at time of writing Project Rekor – but there is lots of work still to be done.

Remember, though: when you take a package – whether library or executable – please consider what you’re consuming, what about it you can trust, and on what assurances that trust is founded.

Security is (only) subjective

What aspects of security does it provide?

This article covers ground covered in more detail within (but is not quite an excerpt from) my forthcoming book on Trust in Computing and the Cloud for Wiley.

In 1985, the US Department of Defense [sic] published the “Orange Book”[1], officially named Trusted Computer System Evaluation Criteria. It was a guide to how to create a “trusted system”, and was hugely influential within the IT and security industry as a whole. Eight years later, in 1993, Dorothy Denny published a devastating critique of the Orange Book called A New Paradigm for Trusted Systems[1]. It is a brilliant step-by-step analysis of why the approach taken by the DoD was fundamentally flawed. Denning starts:

“The current paradigm for trusted computer systems holds that trust is a property of a system. It is a property that can be formally modeled, specified, and verified. It can be designed into a system using a rigorous design methodology.”

Later, she explains why this just doesn’t work in the real world:

“The current paradigm of treating trust as a property is inconsistent, with the way trust is actually established in the world. It is not a property, but rather an assessment that is based on experience and shared through networks of people in the world-wide market. It is a declaration made by an observer, rather than a property of the observed.”

Demolishing the idea that trust is an inherent property of a system, and making it relational instead, changed the way that systems designed for security would be considered (and ushered in a new approach by the US Government and associated organisations, known was Common Criteria). Denning was writing about trust, but very similar issues exist around the concept of “security”. Too often, security is considered an inherent or intrinsic property of a system: “it’s secure”, someone will say, or “this fix will secure your computer”. It isn’t, and it won’t.

The first problem with such statements is that it’s not clear what “secure” means. There are a number of properties associated with systems that are relevant to security: three of the ones most often quoted are confidentiality, integrity and availability (which I discuss in more detail in the post The Other CIA: Confidentiality, Integrity and Availability). Specifying which of these you’re interested in removes the temptation just to say that something is “secure”, and if someone says, “it provides security”, we’re now in a position to start asking what that assertion actually means. Which aspects of security does it provide?

I also don’t think it makes sense to say that a system is “confidential” or “available” (there’s no obvious equivalent adjective for integrity – “integral” means something rather different): what we may be able to say is that it exhibits properties associated with confidentiality, integrity and availability, or better, that it has measures associated with it which are designed and intended to provide confidentiality, integrity and availability. These measures can be listed, examined and evaluated, hopefully against well-defined criteria.

This seems like a much better approach: not only have we addressed the suggestion that there is such as thing as “security” that we can apply to a system, but following Denning, we have also challenged the suggestion that it is inherent to – or in – a system. Instead, we have introduced the alternative approach of describing security-related properties which can be subjected to scrutiny by the users of the system. This allows the type of relational understanding of security that Denning was proposing, but it also raises the possibility of differing parties having different views of the security (or not) of a system, depending on who they are, and how it is going to be used.

It turns out, when you think about it, that this makes a lot of sense. A laptop which provides sufficient confidentiality, integrity and availability protection for the computing needs of my retired uncle may not provide sufficient protections for the uses to which an operative of a government security service might put it[3]. Equally, a system which a telecommunications company runs in a physically protected data centre may well be considered to have appropriate security protections, whereas the same system, attached to a pole somewhere on a residential street, might not. The measures applied to provide the protections associated with the properties (e.g. 128 bit AES encryption for the confidentiality) may be objectively specifiable, but the extent to which they provide “security” is not, because they are relative to specific requirements.

One last point, and it’s one which regular readers of my blog will be unsurprised to see: how can you assess the applicability of a system’s security properties to your requirements if it is not open? Open source helps significantly with security. Yes, there are assessment regimes to say that systems meet certain criteria – and sometimes these can be very helpful – but they are generally broad criteria, and difficult to apply to your specific use cases. Equally, most are just a starting point, and many such certified systems will require “exceptions” to be met in order to function in the real world, exceptions which require significant expertise to understand, judge and apply safely (that is, with appropriate levels of risk). If the system you want to use is open, then you, a party who you trust, or the wider community can evaluate the appropriateness of controls and measures, and make an informed decision about whether a system’s security properties are what you need. Without open source, this is impossible.


1 – it had an orange cover.

2 – Denning, Dorothy E. (1993) A New Paradigm for Trusted Systems [online]. Available at: https://www.researchgate.net/publication/234793347_A_New_Paradigm_for_Trusted_Systems [Accessed 3 Apr. 2020]

3 – I’m assuming that my uncle isn’t an operative of a government security service[4].

4 – or at least that his security needs are reduced in retirement[5].

5 – that is, if he has really retired…

Taking some time

I’m going to practice what I preach, and not write.

I’m going to practice what I preach, this week, and not write a full article. I’ve had a stressful and busy few weeks, including needing to spend some extra time with the family (nothing scary or earth-shattering – we just needed some family time), and I think the best thing for me to do today is not spend time writing an article. Let me point you instead at some I’ve written in the past.

On self-care:

On security:

On trust:

Keep safe, and look after yourself, dear reader!

Formal verification … or Ken Thompson?

“You can’t trust code that you did not totally create yourself” – Ken Thompson.

This article is an edited excerpt from my forthcoming book on Trust in Computing and the Cloud for Wiley.

How can we be sure that the code we’re running does what we think it does? One of the answers – or partial answers – to that question is “formal verification.” Formal verification is an important field of study, applying mathematics to computing, and it aims to start with proofs – at best, with an equivalent level of assurance to that of formal mathematical proofs – of the correctness of algorithms to be implemented in code to ensure that they perform the operations expected and set forth in a set of requirements. Though implementation of code can often fall down in the actual instructions created by a developer or set of developers – the programming – mistakes are equally possible at the level of the design of the code to be implemented in the first place, and so this must be a minimum step before looking at any actual implementations. What is more, these types of mistakes can be all the more hard to spot, as even if the developer has introduced no bugs in the work they have done, the implementation will be flawed by virtue of it being incorrectly defined in the first place. It is with an acknowledgement of this type of error, and an intention of reducing or eliminating it, that formal verification starts, but some areas go much further, with methods to examine concrete implementations and make statements about their correctness with regards to the algorithms which they are implementing.

Where we can make these work, they are extremely valuable, and the sort of places that they are applied are exactly where we would expect: for systems where security is paramount, and to prove the correctness of cryptographic designs and implementations. Another major focus of formal verification is software for safety systems, where the “correct” operation of the system – by which we mean “as designed and expected” – is vital. Examples might include oil refineries, fire suppression systems, nuclear power station management, aircraft flight systems and electrical grid management – unsurprisingly, given the composition of such systems, formal verification of hardware is also an important field of study. The practical application of formal verification methods to software is, however, more limited than we might like. As Alessandro Abate notes in a paper on formal verification of software:

“Two known shortcomings of standard techniques in formal verification are the limited capability to provide system-level assertions, and the scalability to large, complex models.”

To these shortcomings we can add another, extremely significant one: how sure can you be that what you are running is what you think you are running? Surely knowing what you are running is exactly why we write software, look at the source, and then compile it under our control? That, certainly, is the basic starting point for software that we care about.

The problem is arguably one of layers and dependencies, and was outlined by Ken Thompson, one of the founders or modern computing, in the lecture he gave at his acceptance of the Turing Award in 1983. It is short, stands as one of the establishing artefacts of computing security, and has weathered the tests of time: I have no hesitation in recommending that all readers of this blog read it: Reflections on Trusting Trust. In it, he describes how careful placing of malicious code in the C standard compiler could lead to vulnerabilities (his specific example is in account login code) which are not only undetectable by those without access to the source code, but also not removable. The final section of the paper is entitled “Moral”, and Thompson starts with these words:

“The moral is obvious. You can’t trust code that you did not totally create yourself. (Especially code from companies that employ people like me.) No amount of source-level verification or scrutiny will protect you from using untrusted code.”

However, as he goes on to point out, here is nothing special about the compiler:

“I could have picked on any program-handling program such as an assembler, a loader, or even hardware microcode. As the level of program gets lower, these bugs will be harder and harder to detect. A well-installed microcode bug will be almost impossible to detect.”

It is for this the reasons noted by Thompson that open source software – and hardware – is so vital to the field of computer security, and to our task of defining and understanding what “trust” means in the context of computing. Just relying on the “open source-ness” of your code is not enough: there is more work to be done in understanding your stack, the community and your requirements, but without the ability to look at the source code of all the layers of software and hardware on which you are running code, then you can have only reduced trust that what you are running is what you think you should be running, whether you have performed formal verification on it or not.


Why Enarx is open

It’s not just our coding that we do in the open.

When Nathaniel McCallum and I embarked on the project which is now called Enarx, we made one decision right at the beginning: the code for Enarx would be open source, a stance fully supported by our employer Red Hat (see standard disclaimer). All of it, and for ever. That’s a decision that we’ve not regretted at any point, and it’s something we stand behind. As soon as we had enough code for a demo, and were ready to show it, we created a repository on github and made it public. There’s a very small exception, which is that there are some details of upcoming chip features that are shared with us under NDA[1] where if we write code for them, publishing that code would be a breach of the NDA. But where this applied (which is rarely) we are absolutely clear with the various vendors that we intend to make the code open as soon as possible, and lobby them to release details as early as they can (which may be earlier than they might prefer), so that more experts can look over both their designs and our code.

Auditability and trust

This brings us to possibly the most important reasons for making Enarx open source: auditability and trust. Enarx is a security-related project, and I believe passionately not only that security should be done in the open, but that if anybody is actually going to trust their sensitive data, algorithms and workloads to a piece of software, then they want to be in a position where as many experts as possible have looked at it, scrutinised it, criticised it and improved it: whether that is the people running the software, their employees, contractors or (even better) the wider security community. The more people who check the code, the happier you should be to trust it. This is important for any piece of security software, but vital for software such as Enarx which is designed to protect your more most sensitive workloads.

Bug-catching

There are bugs in Enarx. I know: I’m writing some of the code[2] and I found one yesterday (which I’d put in), just as I was about to give a demo[3]. It is very, very difficult to write perfect code, and we know that if we make our source open, then more people can help us fix issues.

Commonwealth

For Nathaniel and me, open source is an ethical issue, and we make no apologies for that. I think it’s the same for most, if not all, of the team working on Enarx. This include a number of Red Hat employees (see standard disclaimer), so shouldn’t come as a surprise, but we have non-Red Hat contributors from a number of backgrounds, and we feel that Enarx should be a Common Good, and contribute to the commonwealth of intellectual property out there.

More brain power

Making something open source doesn’t just make it easier to fix bugs: it can improve the quality of what you produce in general. The more brain power you have to apply to the problem, but better your chances of making something great – assuming that the brain power is applied efficiently (not always an easy task!). We had a design meeting yesterday where one of the participants said towards the end, “I’m sure I could implement some of this, but don’t know a huge amount about this topic, and I’m worried that I’m not contributing to this discussion.” In fact, they had, by asking questions and clarifying some points, and we assured them that we wanted to include experienced, senior developers for their expertise and knowledge, and to pull out assumptions and to validate the design, and not because we expected everybody to be experts in all parts of the project. Having bright people around, involved in design and coding, spreads expertise and knowledge, and helps keep the work from becoming an insulated, isolated “ivory tower” construction, understood by few, and almost impossible to validate.

Not just code

It’s not just our coding that we do in the open. We manage our architecture in the open, our design meetings, our protocol design, our design methodology[4], our documentation, our bug-tracking, our chat, our CI/CD processes: all of it is open. The one exception is our vulnerability management process, which needs to have the opportunity for confidential exposure for a limited time.

We also take diversity seriously, and the project contributors are subject to the Contributor Covenant Code of Conduct.

In short, Enarx is an open project. I’m sure we could do better, and we’ll strive for that, but our underlying principles are that open is good in general, and vital for security. If you agree, please come and visit!


1 – Non-Disclosure Agreement.

2 – to the surprise of many of the team, including myself. At least it’s not in Perl.

3 – I fixed it. Admittedly after the demo.

4 – we’ve just moved to a Sprint pattern – the details of which we designed and agreed in the open.

They won’t get security right

Save users from themselves: make it difficult to do the wrong thing.

I’m currently writing a book at Trust in computing and the cloud – I’ve mentioned it before – and I confidently expect to reach 50% of my projected word count today, as I’m on holiday, have more time to write it, and got within about 850 words of the goal yesterday. Little boast aside, one of the topics that I’ve been writing about is the need to consider the contexts in which the systems you design and implement will be used.

When we design systems, there’s a temptation – a laudable one, in many cases – to provide all of the features and functionality that anyone could want, to implement all of the requests from customers, to accept every enhancement request that comes in from the community. Why is this? Well, for a variety of reasons, including:

  • we want our project or product to be useful to as many people as possible;
  • we want our project or product to match the capabilities of another competing one;
  • we want to help other people and be seen as responsive;
  • it’s more interesting implementing new features than marking an existing set complete, and settling down to bug fixing and technical debt management.

These are all good – or at least understandable – reasons, but I want to argue that there are times that you absolutely should not give in to this temptation: that, in fact, on every occasion that you consider adding a new feature or functionality, you should step back and think very hard whether your product would be better if you rejected it.

Don’t improve your product

This seems, on the face of it, to be insane advice, but bear with me. One of the reasons is that many techies (myself included) are more interested getting code out of the door than weighing up alternative implementation options. Another reason is that every opportunity to add a new feature is also an opportunity to deal with technical debt or improve the documentation and architectural information about your project. But the other reasons are to do with security.

Reason 1 – attack surface

Every time that you add a feature, a new function, a parameter on an interface or an option on the command line, you increase the attack surface of your code. Whether by fuzzing, targeted probing or careful analysis of your design, the larger the attack surface of your code, the more opportunities there are for attackers to find vulnerabilities, create exploits and mount attacks on instances of your code. Strange as it may seem, adding options and features for your customers and users can often be doing them a disservice: making them more vulnerable to attacks than they would have been if you had left well enough alone.

If we do not need an all-powerful administrator account after initial installation, then it makes sense to delete it after it has done its job. If logging of all transactions might yield information of use to an attacker, then it should be disabled in production. If older versions of cryptographic functions might lead to protocol attacks, then it is better to compile them out than just to turn them off in a configuration file. All of these lead to reductions in attack surface which ultimately help safeguard users and customers.

Reason 2 – saving users from themselves

The other reason is also about helping your users and customers: saving them from themselves. There is a dictum – somewhat unfair – within computing that “users are stupid”. While this is overstating the case somewhat, it is fairer to note that Murphy’s Law holds in computing as it does everywhere else: “Anything that can go wrong, will go wrong”. Specific to our point, some user somewhere can be counted upon to use the system that you are designing, implementing or operating in ways which are at odds with your intentions. IT security experts, in particular, know that we cannot stop people doing the wrong thing, but where there are opportunities to make it difficult to do the wrong thing, then we should embrace them.

Not adding features, disabling capabilities and restricting how your product is used might seem counter-intuitive, but if it leads to a safer user experience and fewer vulnerabilities associated with your product or project, then in the end, everyone benefits. And you can use the time to go and write some documentation. Or go to the beach. Enjoy!

What’s a vulnerability?

Often, it’s the terms which seem familiar that are most confusing

I was discussing a document with some colleagues recently, and wanted to point them an article I’d written with some definitions of terms like “vulnerability”, “exploit” and “attack”, only to get somewhat annoyed when I discovered that I’d never written it. So this week’s post attempts to remedy that, though I’m going to address them in reverse order, and I’m going to add an extra one: “mitigation”.

The world of IT security is full of lots of terms, some familiar, some less so. Often, it’s the terms which seem familiar that are most confusing, because they may mean something other that what you think. The three that I think it’s important to define here are the ones I noted above (and mitigation, because it’s often used in the same contexts). Before we do that, let’s just define quickly what we’re talking about when we mean security.

CIA

We often talk about three characteristics of a system that we want to protect or maintain in order to safeguard its security: C, I and A.

  • “C” stands for confidentiality. Unauthorised entities should not be able to access information or processes.
  • “I” stands for integrity. Unauthorised entities should not be able to change information or processes.
  • “A” stands for availability. Unauthorised entities should not be able to impact the ability for authorised entities to access information or processes.

I’ve gone into more detail in my article The Other CIA: Confidentiality, Integrity and Availability, but that should keep us going for now. I should also say that there are various other definitions around relevant characteristics that we might want to protect, but CIA gives us a good start.

Mitigation

A mitigation[3] in this context is a technique to reduce the impact of an attack on whichever of C, I or A is (or are) affected.

Attack

An attack[1] is an action or set of actions which affects the C, the I or the A of a system. It’s the breaking of the protections applied to safeguard confidentiality, integrity or availability. An attack is possible through the use of an exploit.

Exploit

An exploit is a mechanism to affect the C, the I or the A of a system, allowing an attack. It is a technique, or set of techniques, employed in an an attack, and is possible through the exploiting[2] of a vulnerability.

Vulnerability

A vulnerability is a flaw in a system which allows the breaking of protections applied to the C, the I or the A of a system. It allows attacks, which take place via exploits.

One thing that we should note is that a vulnerability is not necessarily a flaw in software: it may be in hardware or firmware, or be exposed as an emergent characteristic of a system, in which case it’s a design problem. Equally, it may expose a protocol design issue, so even if the software (+ hardware, etc.) is a correct implementation of the protocol, a security vulnerability exists. This is, in fact, very common, particularly where cryptography is involved, but cryptography is hard to do.


1 – a successful one, anyway.

2 – hence the name.

3 – I’ve written more about mitigation in Mitigate or remediate?.

Enarx news: WebAssembly + SGX

We can now run a WebAssembly workload in a TEE

Regular readers will know that I’m one of the co-founders for the Enarx project, and write about our progress fairly frequently. You’ll find some more information in these articles (newest first):

I won’t spend time going over what the project is about here, as you can read more in the articles above, and lots more over at our GitHub repository (https://enarx.io), but aim of the project is to allow you run sensitive workloads in Trusted Execution Environments (TEEs) from various silicon vendors. The runtime we’ll be providing is WebAssembly (using the WASI interface), and the TEEs that we’re targetting to start with are AMD’s SEV and Intel’s SGX – though you, the client running the workload, shouldn’t care which is being used at any particular point, as we abstract all of that away.

It’s a complicated architecture, with lots of moving parts and a fairly complex full-stack architecture. Things have been moving really fast recently, partly due to a number of new contributors to the project (which recently reached 150 stars on our main GitHub repository), and one of the important developments is the production of a new component architecture, included below.

I’ve written before about how important it is to have architectural diagrams for projects (and particularly open source projects), and so I’m really pleased that we have this updated – and much more detailed – version of the diagram. I won’t go into it in detail here, but the only trusted components (from the point of view of Enarx and a client using Enarx) are those in green: the Enarx client agent, the Attestation measurement database and the TEE instance containing the Application, Wasm, Shim, etc., which we call the Keep.

It’s only just over a couple of months since we announced our most recent milestone: running binaries within TEEs, including both SEV and SGX. This was a huge deal for us, but then, at the end of last week, one of the team, Daiki Ueno, managed to get a WebAssembly binary loaded using the Wasm/WASI layers and to run it. In other words, we can now run a WebAssembly workload in a TEE. Over the past 10 minutes, I’ve just managed to replicate that on my own machine (for my own interest, not that I doubted the results!). Circled, below, are the components which are involved in this feat.

I should make it clear that none of these components is complete or stable – they are all still very much in development, and, in fact, even the running test yielded around 40 lines for error messages! – but the fact that we’ve got a significant part of the stack working together is an enormous deal. The Keep loader creates an SGX TEE instance, loads the Shim and the Wasm/WASI components into the Keep and starts them running. The App Loader (one of the Wasm components) loads the Application, and it runs.

It’s not just an enormous deal due to the amount of work that it’s taken and the great teamwork that’s been evident throughout – though those are true as well – but also because it means we’re getting much closer to having something that people can try out for themselves. The steps to just replicating the test were lengthy and complex, and my next step is to see if I can make make some changes and create my own test, which may be quite a marathon, but I’d like to stress that my technical expertise lies closer to the application layer than the low-level pieces, which means that if I can make this work now (with a fair amount of guidance), then we shouldn’t be too far off being in a position where pretty much anyone with a decent knowledge of the various pieces can do the same.

We’re still quite a way off being able to provide a simple file with 5-10 steps to creating and running your own workload in a Keep, but that milestone suddenly feels like it’s in sight. In the meantime, there’s loads of design work, documentation, infrastructure automation, testing and good old development to be done: fancy joining us?

In praise of triage

It’s all too easy to prioritise based on the “golfing test”.

Not all bugs are created equal.

Some bugs need fixing now, some bugs can wait. Some bugs are in your implementation, some are in the underlying design. Some bugs will annoy a few customers, some will destroy your business.

Bugs come in all shapes and sizes, and one of the tasks of a product owner, product manager, chief architect – whoever makes the call about where to assign resources – is to decide which ones to address in which order: to prioritise them. The problem is deciding how to prioritise them. It’s all too easy to prioritise based on the “golfing test”: your CEO meets someone on golf course who mentions that his or her company loves your product, except for one tiny issue. The CEO comes back, and makes it clear that fixing this “major bug” is now your one and only task until it’s done, and your world is turned upside down. You have to fix the bug as quickly as possible, with no thought to the impact it has on the rest of the project, or the immense pile technical debt that’s just been accrued. You don’t want to live in this world. What, then, is the alternative?

The answer – though it’s only the beginning of the answer – is triage. Triage (from the French for “separating out”) comes from the world of battlefield medicine. When deciding which wounded soldiers to treat, rapid (hopefully objective) assessments are carried out, allowing a quick sorting of each soldier, typically into categories such as “not urgent: wait”, “urgent: treat immediately” and “not saveable: do not treat”. We can apply the same to software bugs in order to decide what to treat (fix) and with what priority. The important thing is not so much the categories – which will vary based on your context – but the assessment criteria, and how they are applied. Here are a list of just some of the possible criteria:

  • likely monetary impact per customer
  • number of customers impacted
  • reputational impact on your organisation
  • ease to fix
  • impact on system security
  • impact on system performance
  • impact on system stability
  • annoyance of CEO not to be listened to.

We do not, of course, only need to apply one of these: a number of them can be combined with a weighting system, though the more you add, the less clear your priorities will be, and the more likely it is that someone will “put a finger on the scales” – tweak the numbers to give the outcome they want. Another important point about the categories that you decide to apply is that they should be as measurable as you can make them, to allow as objective scoring as possible. I wrote a review of the book Building Evolutionary Architectures a while ago: the methodology adopted there, where you measure and test in order to meet specific criteria, is exactly the sort of approach you should be choosing when designing your triage system.

This is (ostensibly) a blog about security, and so you might expect me to say that “security always wins”, but that should absolutely not be the approach you take. Security might be the most important category for you (that is, carry the most weight), but you need to understand why that is the case – at this particular time – and what exactly you mean by “security”. The “security of the system” is not an objective measure: in order to mean anything, such a phrase needs to reference measurements that can be made (“resistance to physical tampering”, “resistance to brute force attacks”, “number or PhD students likely to be needed to reverse engineer our ‘secure’ protocol”[1]). More importantly, it may be that at this point in your organisation’s life, the damage done by lack of stability or decreased performance outweighs the impact of a security bug. If that’s the case, then your measurements should encapsulate that information and lead you to prioritise bugs with impact in these categories over security issues[3].

There’s one proviso that I feel I need to put in at this point, and it’s about the power of what, in Agile Methodology terms, is called the Product Owner. This is the person who represents the users of the product/project, and should have final say about the direction of development in terms of features, functionality and, most relevant in this discussion, bug-fixing. As noted above, this may be an architect, product manager or someone enjoying another title, but their role should be clear: they get to call the shots. There are times when this person goes against the evidence provided by the triage, and makes a decision to prioritise a particular bug over others despite the outcome of the measurements. This is typically very painful for the technical team[4], but, when it comes down to it, as the product owner, they get to decide. The technical team – after appropriate warnings and discussion[5] – must be ready to step aside and accept the decision. Such decisions (and related discussions) should be recorded, and the product owner must be ready to stand or fall based on the outcome, but that is their job. Triage is a guide, and there are occasions when there are measurements which cannot be easily made objectively, and which sit outside the expertise or scope of knowledge of the technical team. If this sort of decision keeps being made, and you think you know better, you may have a future in technical product management, where people with a view of both the technical and the business side of technology are much in demand. In the end, though, the product owner will need to justify their decision to management, and if they get it wrong, then they must be ready to take the blame (this is one reason why you should make sure that you’ve recorded the process taken to get to this decision – you don’t want to take the blame for a poor decision which you advised against).

So: go out an design a triage process, be ready to follow it, and be ready to defend it. Oh, and one last point: you might want to buy a set of golf clubs.

—–

1 – this last one is a joke: don’t design your own protocol, or if you do, make it open and have it peer-reviewed[2].

2 – and then throw it away and use an open source implementation of better, more thoroughly-reviewed one.

3 – much as it pains me to say it.

4 – I’ve been on both sides of these decisions: I know.

5 -often rather heated, in my experience.