Why I’m writing a book about trust

Who spends their holiday in their office? Authors.

Last week, I was on holiday. That is, I took 5 days off work in which I could have relaxed, read novels, watched TV, played lots of games, gone for long walks on the countryside and (in happier, less Covid-19 times) have spend time away from home, wrapped up warm enjoying a view of waves crashing onto a beach. Instead, I spent those 5 days – and fair amount of the 4 weekend days that bracketed them – squirreled away in my office, sitting at my computer. Which is rather similar to what I would have been doing if I hadn’t taken the time off.

The reason I did this is that I’m writing a book at the moment. Last week I managed to write well over 12,500 words of it, taking me to more than 80% of my projected word count (over 82% if you could bibliographic material), so the time felt well-spent. But why am I doing this in the first place?

I’m doing it at one level because I have a contract to do it. Around July-August last year (2019), I sent emails to a few (3-4) publishers pitching the idea for a book. I included a detailed Table of Contents, evidence that I’ve written before (including some links to this blog), and a bit about me, including a link to my LinkedIn profile. One of the replies I had (within 24 hours, to my amazement) was from Jim Minatel, a commissioning editor from Wiley Technology. Over a number of weeks, I talked to him and editors from other publishers, finally signing a contract with Wiley for a book on Trust in Computing and the Cloud, with a planned word count of 125,000 words. I’m not going to provide details of the contract, but I can say that :

  1. I don’t expect it to make me rich (which was never the point anyway);
  2. it has options for another book or books (I must be insane even to be considering the idea, but Jim and I have already have some preliminary conversations);
  3. there are clauses in it about film/movie rights (nobody, but nobody, is ever going to want to make or watch a film about this book: fascinating as it may be, Tom Clancy or John Grisham it is not).

This kind of explains why I spent last week closeted in my office, tapping away on a computer keyboard, but why did I get in touch with these publishers in the first place, pitching the idea of a book that is consuming a fair number of my non-work waking hours?

The basic reason is that I got cross. In fact, I got so annoyed about something that I went to a couple of people – my boss was one, a good friend with a publishing background was another – and announced that I planned to write a book. I was half hoping that they would dissuade me, but they both enthusiastically endorsed the idea, which meant that I had, at least in my head, now committed to doing it. This was in early May 2019, and it took me a couple of months to gather my thoughts, put together some materials and a find few candidate publishers before actually pitching the book to them and ending up with a contract.

But what actually got me to a position where I was cross enough about something to pitch an idea which would take up so much of my time and energy? The answer? I was tipped over the edge by hearing someone speak about trust in a way that made it clear that they had no idea what they were talking about. It was a session at a conference – I can remember the conference, but I can’t remember the session or the speaker – where the subject was security: IT security. This is my field, so that’s good. What wasn’t good was what the speaker said when speaking about trust: it didn’t hold together, it wasn’t consistent with how I felt about trust, and I didn’t feel that it was broadly applicable to computing or the Cloud.

This made me cross. And then I thought: why is there no consensus on what we mean by “trust”? Why do people talk about “zero trust” without really knowing what trust actually means? Why is there no literature on this subject that I’ve been thinking about for nearly 20 years? Why do I keep having conversations with people where they agree that trust is really interesting, but we discover that we don’t have a common starting point as there’s no theoretical underpinning describing exactly we’re discussing? Why are people deploying important workloads and designing business-critical systems without a good framework around what seems like a fundamental concept that everybody is always eager to include in their architectures? Why are we pushing forward with Confidential Computing when people don’t even understand the impact of the trust relationships which underlie it?

And I realised that I could keep asking these questions, and keep having these increasingly frustrating conversations, whilst waiting for somebody to publish some sort of definitive meisterwerk on the subject, or I could just admit that no-one was going to do it, and that I might get on and write something, even if it was never going to be the perfect treatment of what is, after all, a very complex subject.

And so that’s what I decided to do. I thought about what interests me in the field of trust in the realms of computing and the Cloud, about what I’ve heard people talk very badly about, and about what I’d had interesting conversations, decided that I might have something to say about all of those, and then put together a book structure. Wiley liked the idea, and asked me to flesh it out and then write something. It turns out that there’s loads of literature around human-to-human and human-to-organisational trust, and also on human-to-computer system trust, but very little on how computers can trust each other. Given how many organisations run much of their business in the Cloud these days, and the complex trust relationships that exist there, I wanted to write something about how to manage and understand these.

These are topics I’ve thought about (and, increasingly, written about) for around 20 years, since I did some research into the possibility of a PhD (which never materialised) in a related topic. They’ve stayed with me since, and I was involved in some theoretical and standards-based work around trust while involved with the ETSI NFV group nearly 10 years ago. I’m not pretending that I’m perfectly qualified to discuss this topic, but then again, I’m not sure that anybody is, and I feel that putting out some sort of book on this topic makes sense, if only to get the conversation started, and to give people an opportunity to converse with a shared language. The book starts with some theoretical underpinnings, looks at some of the technologies, what their implications are, the place of open source, the commercial and organisational impacts, and then suggests some future and frameworks. I hope to have the manuscript (well, typescript) completed and with Wiley by mid-spring (Northern Hemisphere) 2021: I don’t know when it’s actually likely to appear in print.

I hope people find it interesting, and that it acts as a catalyst for further discussion. I don’t expect it to be the last word on the subject – in fact I hope it’s not – but I do hope that it forces more people to realise that trust is really important in our world of computers, security and risk, and currently ill-understood. And if you happen to be a successful producer of Hollywood blockbusters, then I’m available to talk. Just as soon as I get these last couple of chapters submitted. ..

How open source builds distributed trust

Trust in open source is a positive feedback loop

This is an edited excerpt from my forthcoming book on Trust in Computing and the Cloud for Wiley, and leads on from a previous article I wrote called Trust & choosing open source.

In that article, I asked the question: what are we doing when we say “I trust open source software”? In reply, I suggested that what we are doing is making a determination that enough of the people who have written and tested it have similar requirements to mine, and that their expertise, combined, is such that the risk to my using the software is acceptable. I also introduced the idea of distributed trust.

The concept of distributing trust across a community is an application of the theory of the wisdom of the crowd, posited by Aristotle, where the assumption is that the opinions of many typically show more wisdom than the opinion of one, or a few. While demonstrably false in its simplest form in some situations – the most obvious example being examples of popular support for totalitarian regimes – this principle can provide a very effective mechanism for establishing certain information.

This distillation of collective experience allows what we refer as distributed trust, and is collected through numerous mechanisms on the Internet. Some, like TripAdvisor or Glassdoor, record information about organisations or the services they provide, while others, like UrbanSitter or LinkedIn, allow users to add information about specific people (see, for instance, LinkedIn’s “Recommendations” and “Skills & Endorsements” sections in individual’s profiles). The benefits that can accrue from of these examples are significantly increased by the network effect, as the number of possible connections between members increases exponentially as the number of members increases. Other examples of distributed trust include platforms like Twitter, where the number of followers that an account receives can be seen as a measure of their reputation, and even of their trustworthiness, a calculation which we should view with a strong degree of scepticism: indeed, the company Twitter felt that it had to address the social power of accounts with large numbers of followers and instituted a “verified accounts” mechanism to let people know “that an account of public interest is authentic”. Interestingly, the company had to suspend the service after problems related to users’ expectations of exactly what “verified” meant or implied: a classic case of differing understandings of context between different groups.

Where is the relevance to open source, then? The community aspect of open source is actually a driver towards building distributed trust. This is because once you become a part of the community around an open source project, you assume one or more of the roles that you start trusting once you say that you “trust” an open source project (see my previous article): examples include: architect, designer, developer, reviewer, technical writer, tester, deployer, bug reporter or bug fixer. The more involvement you have in a project, the more one becomes part of the community, which can, in time, become a community of practice. Jean Lave and Etienne Wenger introduced the concept of communities of practice in their book Situated Learning: Legitimate Peripheral Participation, where groups evolve into communities as their members share a passion and participate in shared activities, leading to their improving their skills and knowledge together. The core concept here is that as participants learn around a community of practice, they become members of it at the same time:

“Legitimate peripheral participation refers both to the development of knowledgeably skilled identities in practice and to the reproduction and transformation of communities of practice.”

Wenger further explored the concept of communities of practice, how they form, requirements for their health and how they encourage learning in Communities of Practice: Learning, meaning and identity. He identified negotiability of meaning (“why are we working together, what are we trying to achieve?”) as core to a community of practice, and noted that without engagement, imagination and alignment by individuals, communities of practice will not be robust.

We can align this with our views of how distributed trust is established and built: when you realise that your impact on open source can be equal to that of others, the distributed trust relationships that you hold to members of a community become less transitive (second- or third-hand or even more remote), and more immediate. You understand that the impact that you can have on the creation, maintenance, requirements and quality of the software which you are running can be the same as all of those other, previously anonymous contributors with whom you are now forming a community of practice, or whose existing community of practice you are joining. Then you yourself become part of a network of trust relationships which are distributed, but at less of a remove to that which you experience when buying and operating proprietary software. The process does not stop there, however: a common property of open source projects is cross-pollination, where developers from one project also work on others. This increases as the network effect of multiple open source projects allow reuse and dependencies on other projects to rise, and greater take-up across the entire set of projects.

It is easy to see why many open source contributors become open source enthusiasts or evangelists not just for a single project, but for open source as a whole. In fact, work by Mark Granovetter suggests that too many strong ties within communities can lead to cliques and stagnation, but weak ties provide for movement of ideas and trends around communities. This awareness of other projects and the communities that exist around them, and the flexibility of ideas across projects, leads to distributed trust being able to be extended (albeit with weaker assurances) beyond the direct or short-chain indirect relationships that contributors experience within projects of which they have immediate experience and out towards other projects where external observation or peripheral involvement shows that similar relationships exist between contributors. Put simply, the act of being involved in an open source project and building trust relationships through participation leads to stronger distributed trust towards similar open source projects, or just to other projects which are similarly open source.

What does this mean for each of us? It means that the more we get involved in open source, the more trust we can have in open source, as there will be a corresponding growth in the involvement – and therefore trust – of other people in open source. Trust in open source isn’t just a network effect: it’s a positive feedback loop!

Do I trust this package?

The area of software supply chain management is growing in importance.

This isn’t one of those police dramas where a suspect parcel arrives at the precinct and someone realises just in time that it may be a bomb – what we’re talking about here is open source software packages (though the impact on your application may be similar if you’re not sufficiently suspicious). Open source software is everywhere these days – which is great – but how can you be sure that you should trust the software you’ve downloaded to do what you want? The area of software supply chain management – of which this discussion forms a part – is fairly newly visible in the industry, but is growing in importance. We’re going to consider a particular example.

There’s a huge conversation to be had here about what trust means (see my article “What is trust?” as a starting point, and I have a forthcoming book on Trust in Computing and the Cloud for Wiley), but let’s assume that you have a need for a library which provides some cryptographic protocol implementation. What do you need to know, and what are you choices? We’ll assume, for now, that you’ve already made what is almost certainly the right choice, and gone with an open source implementation (see many of my articles on this blog for why open source is just best for security), and that you don’t want to be building everything from source all the time: you need something stable and maintained. What should be your source of a new package?

Option 1 – use a vendor

There are many vendors out there now who provide open source software through a variety of mechanisms – typically subscription. Red Hat, my employer (see standard disclosure) is one of them. In this case, the vendor will typically stand behind the fitness for use of a particular package, provide patches, etc.. This is your easiest and best choice in many cases. There may be times, however, when you want to use a package which is not provided by a vendor, or not packaged by your vendor of choice: what do you do then? Equally, what decisions do vendors need to make about how to trust a package?

Option 2 – delve deeper

This is where things get complex. So complex, in fact, that I’m going to be examining them at some length in my book. For the purposes of this article, though, I’ll try to be brief. We’ll start with the assumption that there is a single maintainer of the package, and multiple contributors. The contributors provide code (and tests and documentation, etc.) to the project, and the maintainer provides builds – binaries/libraries – for you to consume, rather than your taking the source code and compiling it yourself (which is actually what a vendor is likely to do, though they still need to consider most of the points below). This is a library to provide cryptographic capabilities, so it’s fairly safe to assume that we care about its security. There are at least five specific areas which we need to consider in detail, all of them relying on the maintainer to a large degree (I’ve used the example of security here, though very similar considerations exist for almost any package): let’s look at the issues.

  1. build – how is the package that you are consuming created? Is the build process performed on a “clean” (that is, non-compromised) machine, with the appropriate compilers and libraries (there’s a turtles problem here!)? If the binary is created with untrusted tools, then how can we trust it at all, so what measures does the maintainer take to ensure the “cleanness” of the build environment? It would be great if the build process is documented as a repeatable build, so that those who want to check it can do so.
  2. integrity – this is related to build, in that we want to be sure that the source code inputs to the build process – the code coming, for instance, from a git repository – are what we expect. If, somehow, compromised code is injected into the build process, then we are in a very bad position. We want to know exactly which version of the source code is being used as the basis for the package we are consuming so that we can track features – and bugs. As above, having a repeatable build is a great bonus here.
  3. responsiveness – this is a measure of how responsive – or not – the maintainer is to changes. Generally, we want stable features, tied to known versions, but a quick response to bug and (in particular) security patches. If the maintainer doesn’t accept patches in a timely manner, then we need to worry about the security of our package. We should also be asking questions like, “is there a well-defined security disclosure of vulnerability management process?” (See my article Security disclosure or vulnerability management?), and, if so, “is it followed”?
  4. provenance – all code is not created equal, and one of the things of which a maintainer should be keeping track is the provenance of contributors. If a large amount of code in a part of the package which provides particularly sensitive features is suddenly submitted by an unknown contributor with a pseudonymous email address and no history of contributions of security functionality, this should raise alarm bells. On the other hand, if there is a group of contributors employed by a company with a history of open source contributions and well-reviewed code who submit a large patch, this is probably less troublesome. This is a difficult issue to manage, and there are typically no definite “OK” or “no-go” signs, but the maintainer’s awareness and management of contributors and their contributions is an important point to consider.
  5. expertise – this is the most tricky. You may have a maintainer who is excellent at managing all of the points above, but is just not an expert in certain aspects of the functionality of the contributed code. As a consumer of the package, however, I need to be sure that it is fit for purpose, and that may include, in the case of the security-related package we’re considering, being assured that the correct cryptographic primitives are used, that bounds-checking is enforced on byte streams, that proper key lengths are used or that constant time implementations are provided for particular primitives. This is very, very hard, and the job of maintainer can easily become a full-time one if they are acting as the expert for a large and/or complex project. Indeed, best practice in such cases is to have a team of trusted, experienced experts who work either as co-maintainers or as a senior advisory group for the project. Alternatively, having external people or organisations (such as industry bodies) perform audits of the project at critical junctures – when a major release is due, or when an important vulnerability is patched, for instance – allows the maintainer to share this responsibility. It’s important to note that the project does not become magically “secure” just because it’s open source (see Disbelieving the many eyes hypothesis), but that the community, when it comes together, can significantly improve the assurance that consumers of a project can have in the packages which is produces.

Once we consider these areas, we then need to work out how we measure and track each of them. Who is in a position to judge the extent to which any particular maintainer is fulfilling each of the areas? How much can we trust them? These are a set of complex issues, and one about which much more needs to be written, but I am passionate about exposing the importance of explicit trust in computing, particularly in open source. There is work going on around open source supply chain management, for instance the new (at time of writing Project Rekor – but there is lots of work still to be done.

Remember, though: when you take a package – whether library or executable – please consider what you’re consuming, what about it you can trust, and on what assurances that trust is founded.

Security is (only) subjective

What aspects of security does it provide?

This article covers ground covered in more detail within (but is not quite an excerpt from) my forthcoming book on Trust in Computing and the Cloud for Wiley.

In 1985, the US Department of Defense [sic] published the “Orange Book”[1], officially named Trusted Computer System Evaluation Criteria. It was a guide to how to create a “trusted system”, and was hugely influential within the IT and security industry as a whole. Eight years later, in 1993, Dorothy Denny published a devastating critique of the Orange Book called A New Paradigm for Trusted Systems[1]. It is a brilliant step-by-step analysis of why the approach taken by the DoD was fundamentally flawed. Denning starts:

“The current paradigm for trusted computer systems holds that trust is a property of a system. It is a property that can be formally modeled, specified, and verified. It can be designed into a system using a rigorous design methodology.”

Later, she explains why this just doesn’t work in the real world:

“The current paradigm of treating trust as a property is inconsistent, with the way trust is actually established in the world. It is not a property, but rather an assessment that is based on experience and shared through networks of people in the world-wide market. It is a declaration made by an observer, rather than a property of the observed.”

Demolishing the idea that trust is an inherent property of a system, and making it relational instead, changed the way that systems designed for security would be considered (and ushered in a new approach by the US Government and associated organisations, known was Common Criteria). Denning was writing about trust, but very similar issues exist around the concept of “security”. Too often, security is considered an inherent or intrinsic property of a system: “it’s secure”, someone will say, or “this fix will secure your computer”. It isn’t, and it won’t.

The first problem with such statements is that it’s not clear what “secure” means. There are a number of properties associated with systems that are relevant to security: three of the ones most often quoted are confidentiality, integrity and availability (which I discuss in more detail in the post The Other CIA: Confidentiality, Integrity and Availability). Specifying which of these you’re interested in removes the temptation just to say that something is “secure”, and if someone says, “it provides security”, we’re now in a position to start asking what that assertion actually means. Which aspects of security does it provide?

I also don’t think it makes sense to say that a system is “confidential” or “available” (there’s no obvious equivalent adjective for integrity – “integral” means something rather different): what we may be able to say is that it exhibits properties associated with confidentiality, integrity and availability, or better, that it has measures associated with it which are designed and intended to provide confidentiality, integrity and availability. These measures can be listed, examined and evaluated, hopefully against well-defined criteria.

This seems like a much better approach: not only have we addressed the suggestion that there is such as thing as “security” that we can apply to a system, but following Denning, we have also challenged the suggestion that it is inherent to – or in – a system. Instead, we have introduced the alternative approach of describing security-related properties which can be subjected to scrutiny by the users of the system. This allows the type of relational understanding of security that Denning was proposing, but it also raises the possibility of differing parties having different views of the security (or not) of a system, depending on who they are, and how it is going to be used.

It turns out, when you think about it, that this makes a lot of sense. A laptop which provides sufficient confidentiality, integrity and availability protection for the computing needs of my retired uncle may not provide sufficient protections for the uses to which an operative of a government security service might put it[3]. Equally, a system which a telecommunications company runs in a physically protected data centre may well be considered to have appropriate security protections, whereas the same system, attached to a pole somewhere on a residential street, might not. The measures applied to provide the protections associated with the properties (e.g. 128 bit AES encryption for the confidentiality) may be objectively specifiable, but the extent to which they provide “security” is not, because they are relative to specific requirements.

One last point, and it’s one which regular readers of my blog will be unsurprised to see: how can you assess the applicability of a system’s security properties to your requirements if it is not open? Open source helps significantly with security. Yes, there are assessment regimes to say that systems meet certain criteria – and sometimes these can be very helpful – but they are generally broad criteria, and difficult to apply to your specific use cases. Equally, most are just a starting point, and many such certified systems will require “exceptions” to be met in order to function in the real world, exceptions which require significant expertise to understand, judge and apply safely (that is, with appropriate levels of risk). If the system you want to use is open, then you, a party who you trust, or the wider community can evaluate the appropriateness of controls and measures, and make an informed decision about whether a system’s security properties are what you need. Without open source, this is impossible.


1 – it had an orange cover.

2 – Denning, Dorothy E. (1993) A New Paradigm for Trusted Systems [online]. Available at: https://www.researchgate.net/publication/234793347_A_New_Paradigm_for_Trusted_Systems [Accessed 3 Apr. 2020]

3 – I’m assuming that my uncle isn’t an operative of a government security service[4].

4 – or at least that his security needs are reduced in retirement[5].

5 – that is, if he has really retired…

Taking some time

I’m going to practice what I preach, and not write.

I’m going to practice what I preach, this week, and not write a full article. I’ve had a stressful and busy few weeks, including needing to spend some extra time with the family (nothing scary or earth-shattering – we just needed some family time), and I think the best thing for me to do today is not spend time writing an article. Let me point you instead at some I’ve written in the past.

On self-care:

On security:

On trust:

Keep safe, and look after yourself, dear reader!

Measured and trusted boot

What they give you – and don’t.

Sometimes I’m looking around for a subject to write about, and realise that there’s one which I assume that I’ve covered, but, on searching, discover that I haven’t. Such a one is “measured boot” and “trusted boot” – sometimes, misleadingly, referred to as “secure boot”. There are specific procedures which use these terms with capital letters – e.g. Secure Boot – which I’m going to try to avoid discussing in this post. I’m more interested in the generic processes, and a major potential downfall, than in trying to go into the ins and outs of specifics. What follows is a (heavily edited) excerpt from my forthcoming book on Trust in Computing and the Cloud for Wiley.

In order to understand what measured boot and trusted boot aim to achieve, let’s have a look at the Linux virtualisation stack: the components you run if you want to be using virtual machines (VMs) on a Linux machine. This description is arguably over-simplified, but we’re not interested here in the specifics (as I noted above), but in what we’re trying to achieve. We’ll concentrate on the bottom four layers (at a rather simple level of abstraction): CPU/management engine; BIOS/EFI; Firmware; and Hypervisor, but we’ll also consider a layer just above the CPU/management engine, where we interpose a TPM (a Trusted Platform Module) and some instructions for how to perform one of our two processes. Once the system starts to boot, the TPM is triggered, and then starts its work (alternative roots of trust such as HSMs might also be used, but we will use TPMs, the most common example in this context, as our example).

In both cases, the basic flow starts with the TPM performing a measurement of the BIOS/EFI layer. This measurement involves checking the binary instructions to be carried out by this layer, and then creating a cryptographic hash of the binary image. The hash that’s produced is then stored in one of several “PCR slots” in the TPM. These can be thought of as pieces of memory which can be read later on, either by the TPM for its purposes, or by entities external to the TPM, but which cannot be changed once they have been written. This provides assurances that once a value is written to a PCR by the TPM, it can be considered constant for the lifetime of the system until power-off or reboot.

After measuring the BIOS/EFI layer, the next layer (Firmware) is measured. In this case, the resulting hash is combined with the previous hash (which was stored in the PCR slot) and then itself stored in a PCR slot. The process continues until all of the layers involved in the process have been measured, and the results of the hashes stored. There are (sometimes quite complex) processes to set up the original TPM values (I’ve missed out some of the more low-level steps in the process for simplicity) and to allow (hopefully authorised) changes to the layers for upgrading or security patching, for example. What this process “measured boot” allows is for entities to query the TPM after the process has completed, and check whether the values in the PCR slots correspond to the expected values, pre-calculated with “known good” versions of the various layers – that is, pre-checked versions whose provenance and integrity have already been established. Various protocols exist to allow parties external to the system to check the values (e.g. via a network connection) that the TPM attests to being correct: the process of receiving and checking such values from an external system is known as “remote attestation”.

This process – measured boot – allows us to find out whether the underpinnings of our system – the lowest layers – are what we think they are, but what if they’re not? Measured boot (unsurprisingly, given the name) only measures, but doesn’t perform any other actions. The alternative, “trusted boot” goes a step further. When a trusted boot process is performed, the process not only measures each value, but also performs a check against a known (and expected!) good value at the same time. If the check fails, then the process will halt, and the booting of the system will fail. This may sound like a rather extreme approach to take to a system, but sometimes it is absolutely the right one. Where the system under consideration may have been compromised – which is one likely inference that you can make from the failure of a trusted boot process – then it is better that it not be available at all than to be running based on flawed expectations.

This is all very well if I’m the owner of the system which is being measured, have checked all of the various components being measured (and the measurements), and so can be happy that what’s being booted it what I want[1]. But what if I’m actually using a system on the cloud, for instance, or any system owned and managed by someone elese? In that case, I’m trusting the cloud provider (or owner/manager) with two things:

  1. do all the measuring correctly, and report correct results to me;
  2. actually to have built something which I should be trusting in the first place!

This is the problem with the nomenclature “trusted boot”, and, even worse, “secure boot”. Both imply that an absolute, objective property of a system has been established – it is “trusted” or “secure” – when this is clearly not the case. Obviously, it would be unfair to expect the designers of such processes to name them after the failure states – “untrusted boot” or “insecure boot” – but unless I can be very certain that I trust the owner of the system to do step 2 entirely correctly (and in my best interests, as user of the system, rather than theirs, and owner) then we can make no stronger assertions. There is an enormous temptation to take a system which has gone through a trusted boot process and to label it a “trusted system”, where the very best assertion we can make is that the particular layers measured in the measured and/or trusted boot process have been asserted to be those which the process expected to be present. Such a process says nothing at all about the fitness of the layers to provide assurances of behaviour, nor about the correctness (or fitness to provide assurances of behaviour) of any subsequent layers on top of those.

It’s important to note that designers of TPMs are quite clear what is being asserted, and that assertions about trust should be made carefully and sparingly. Unluckily, however, the complexities of systems, the general low level of understanding of trust, and the complexities of context and transitive trust make it very easy for designers and implementors of systems to do the wrong thing, and to assume that any system which has successfully performed a trusted boot process can be considered “trusted”. It is also extremely important to remember that TPMs, as hardware roots of trust, offer us one of the best mechanisms for we have for establishing a chain of trust in systems that we may be designing or implementing, and I plan to write an article about them soon.


1 – although this turns out to be much harder to do that you might expect!

Formal verification … or Ken Thompson?

“You can’t trust code that you did not totally create yourself” – Ken Thompson.

This article is an edited excerpt from my forthcoming book on Trust in Computing and the Cloud for Wiley.

How can we be sure that the code we’re running does what we think it does? One of the answers – or partial answers – to that question is “formal verification.” Formal verification is an important field of study, applying mathematics to computing, and it aims to start with proofs – at best, with an equivalent level of assurance to that of formal mathematical proofs – of the correctness of algorithms to be implemented in code to ensure that they perform the operations expected and set forth in a set of requirements. Though implementation of code can often fall down in the actual instructions created by a developer or set of developers – the programming – mistakes are equally possible at the level of the design of the code to be implemented in the first place, and so this must be a minimum step before looking at any actual implementations. What is more, these types of mistakes can be all the more hard to spot, as even if the developer has introduced no bugs in the work they have done, the implementation will be flawed by virtue of it being incorrectly defined in the first place. It is with an acknowledgement of this type of error, and an intention of reducing or eliminating it, that formal verification starts, but some areas go much further, with methods to examine concrete implementations and make statements about their correctness with regards to the algorithms which they are implementing.

Where we can make these work, they are extremely valuable, and the sort of places that they are applied are exactly where we would expect: for systems where security is paramount, and to prove the correctness of cryptographic designs and implementations. Another major focus of formal verification is software for safety systems, where the “correct” operation of the system – by which we mean “as designed and expected” – is vital. Examples might include oil refineries, fire suppression systems, nuclear power station management, aircraft flight systems and electrical grid management – unsurprisingly, given the composition of such systems, formal verification of hardware is also an important field of study. The practical application of formal verification methods to software is, however, more limited than we might like. As Alessandro Abate notes in a paper on formal verification of software:

“Two known shortcomings of standard techniques in formal verification are the limited capability to provide system-level assertions, and the scalability to large, complex models.”

To these shortcomings we can add another, extremely significant one: how sure can you be that what you are running is what you think you are running? Surely knowing what you are running is exactly why we write software, look at the source, and then compile it under our control? That, certainly, is the basic starting point for software that we care about.

The problem is arguably one of layers and dependencies, and was outlined by Ken Thompson, one of the founders or modern computing, in the lecture he gave at his acceptance of the Turing Award in 1983. It is short, stands as one of the establishing artefacts of computing security, and has weathered the tests of time: I have no hesitation in recommending that all readers of this blog read it: Reflections on Trusting Trust. In it, he describes how careful placing of malicious code in the C standard compiler could lead to vulnerabilities (his specific example is in account login code) which are not only undetectable by those without access to the source code, but also not removable. The final section of the paper is entitled “Moral”, and Thompson starts with these words:

“The moral is obvious. You can’t trust code that you did not totally create yourself. (Especially code from companies that employ people like me.) No amount of source-level verification or scrutiny will protect you from using untrusted code.”

However, as he goes on to point out, here is nothing special about the compiler:

“I could have picked on any program-handling program such as an assembler, a loader, or even hardware microcode. As the level of program gets lower, these bugs will be harder and harder to detect. A well-installed microcode bug will be almost impossible to detect.”

It is for this the reasons noted by Thompson that open source software – and hardware – is so vital to the field of computer security, and to our task of defining and understanding what “trust” means in the context of computing. Just relying on the “open source-ness” of your code is not enough: there is more work to be done in understanding your stack, the community and your requirements, but without the ability to look at the source code of all the layers of software and hardware on which you are running code, then you can have only reduced trust that what you are running is what you think you should be running, whether you have performed formal verification on it or not.


An Enarx milestone: binaries

Demoing the same binary in very different TEEs.

This week is Red Hat Summit, which is being held virtually for the first time because of the Covid-19 crisis. The lock-down has not affected the productivity of the Enarx team, however (at least not negatively), as we have a very exciting demo that we will be showing at Summit. This post should be published at 1100 EDT, 1500 BST, 1400 GMT on Tuesday, 2020-04-28, which is the time that the session which Nathaniel McCallum and I recorded will be released to the world. I hope to be able to link to that once it’s released to the world. But what will we be showing?

Well, to set the scene, and to discover a little more about the Enarx project, you might want to read these articles first (also available in Japanese – visit each article of a link):

Enarx, as you’ll discover, is about running workloads in TEEs (Trusted Execution environments), using WebAssembly, in what we call “Keeps”. It’s a mammoth job, particularly as we’re abstracting away the underlying processor architectures (currently two: Intel’s SGX and AMD’s SEV), so that you, the user, don’t need to worry about them: all you need to do is write and compile your application, then request that it be deployed. Enarx, then, has lots of moving parts, and one of the key tasks for us has been to start the work to abstract away the underlying processor architectures so that we can prepare the runtime layers on top. Here’s a general picture of the software layers, and how they sit on top of the hardware platforms:

What we’re announcing – and demoing – today is that we have an initial implementation of code to allow us to abstract away process-based and VM-based types of architecture (with examples for SGX and SEV), so that we can do this:

This seems deceptively simple, but what’s actually going on under the covers is rather more than is exposed in the picture above. The reality is more like this:

This gives more detail: the application that’s running on both architectures (SGX on the left, SEV on the right) is the very same ELF static-PIE binary. To be clear, this is not only the same source code, compiled for different platforms, but exactly the same binary, with the very same hash signature. What’s pretty astounding about this is that in order to make it run on both platforms, the engineering team has had to write two sets of seriously low-level code, including more than a little Assembly language, providing the “plumbing” to allow the binary to run on both.

This is a very big deal, because although we’ve only implemented a handful of syscalls on each platform – enough to make our simple binary run and print out a message – we now have a framework on which we know we can build. And what’s next? Well, we need to expand that framework so that we can then build the WebAssembly layers which will allow WebAssembly applications to run on top:

There’s a long way to go, but this milestone shows that we have an initial framework which we can improve, and on which we can build.

What’s next?

What’s exciting about this milestone from our point of view is that we think it puts Enarx at a stage where more people can join and take part. There’s still lots of low-level work to be done, but it’s going to be easier to split up now, and also to start some of the higher level work, too. Enarx is completely open source, and we do all of our design work in the open, along with our daily stand-ups. You’re welcome to browse our documentation, RFCs (mostly in draft at the moment), raise issues, and join our calls. You can find loads more information on the Enarx wiki: we look forward to your involvement in the project.

Last, and not least, I’d like to take a chance to note that we now have testing/CI/CD resources available for the project with both Intel SGX and AMD SEV systems available to us, all courtesy of Packet. This is amazingly generous, and we both thank them and encourage you to visit them and look at their offerings for yourself!

Isolationism – not a 4 letter word (in the cloud)

Things are looking up if you’re interested in protecting your workloads.

In the world of international relations, economics and fiscal policy, isolationism doesn’t have a great reputation. I could go on, I suppose, if I did some research, but this is a security blog[1], and international relations, fascinating area of study though it is, isn’t my area of expertise: what I’d like to do is borrow the word and apply it to a different field: computing, and specifically cloud computing.

In computing, isolation is a set of techniques to protect a process, application or component from another (or a set of the former from a set of the latter). This is pretty much always a good thing – you don’t want another process interfering with the correct workings of your one, whether that’s by design (it’s malicious) or in error (because it’s badly designed or implemented). Isolationism, therefore, however unpopular it may be on the world stage, is a policy that you generally want to adopt for your applications, wherever they’re running.

This is particularly important in the “cloud”. Cloud computing is where you run your applications or processes on shared infrastructure. If you own that infrastructure, then you might call that a “private cloud”, and infrastructure owned by other people a “public cloud”, but when people say “cloud” on its own, they generally mean public clouds, such as those operated by Amazon, Microsoft, IBM, Alibaba or others.

There’s a useful adage around cloud computing: “Remember that the cloud is just somebody else’s computer”. In other words, it’s still just hardware and software running somewhere, it’s just not being run by you. Another important thing to remember about cloud computing is that when you run your applications – let’s call them “workloads” from here on in – on somebody else’s cloud (computer), they’re unlikely to be running on their own. They’re likely to be running on the same physical hardware as workloads from other users (or “tenants”) of that provider’s services. These two realisations – that your workload is on somebody else’s computer, and that it’s sharing that computer with workloads from other people – is where isolation comes into the picture.

Workload from workload isolation

Let’s start with the sharing problem. You want to ensure that your workloads run as you expect them to do, which means that you don’t want other workloads impacting on how yours run. You want them to be protected from interference, and that’s where isolation comes in. A workload running in a Linux container or a Virtual Machine (VM) is isolated from other workloads by hardware and/or software controls, which try to ensure (generally very successfully!) that your workload receives the amount of computing time it should have, that it can send and receive network packets, write to storage and the rest without interruption from another workload. Equally important, the confidentiality and integrity of its resources should be protected, so that another workload can’t look into its memory and/or change it.

The means to do this are well known and fairly mature, and the building blocks of containers and VMs, for instance, are augmented by software like KVM or Xen (both open source hypervisors) or like SELinux (an open source capabilities management framework). The cloud service providers are definitely keen to ensure that you get a fair allocation of resources and that they are protected from the workloads of other tenants, so providing workload from workload isolation is in their best interests.

Host from workload isolation

Next is isolating the host from the workload. Cloud service providers absolutely do not want workloads “breaking out” of their isolation and doing bad things – again, whether by accident or design. If one of a cloud service provider’s host machines is compromised by a workload, not only can that workload possibly impact other workloads on that host, but also the host itself, other hosts and the more general infrastructure that allows the cloud service provider to run workloads for their tenants and, in the final analysis, make money.

Luckily, again, there are well-known and mature ways to provide host from workload isolation using many of the same tools noted above. As with workload from workload isolation, cloud service providers absolutely do not want their own infrastructure compromised, so they are, of course, going to make sure that this is well implemented.

Workload from host isolation

Workload from host isolation is more tricky. A lot more tricky. This is protecting your workload from the cloud service provider, who controls the computer – the host – on which your workload is running. The way that workloads run – execute – is such that such isolation is almost impossible with standard techniques (containers, VMs, etc.) on their own, so providing ways to ensure and prove that the cloud service provider – or their sysadmins, or any compromised hosts on their network – cannot interfere with your workload is difficult.

You might expect me to say that providing this sort of isolation is something that cloud service providers don’t care about, as they feel that their tenants should trust them to run their workloads and just get on with it. Until sometime last year, that might have been my view, but it turns out to be wrong. Cloud service providers care about protecting your workloads from the host because it allows them to make more money. Currently, there are lots of workloads which are considered too sensitive to be run on public clouds – think financial, health, government, legal, … – often due to industry regulation. If cloud service providers could provide sufficient isolation of workloads from the host to convince tenants – and industry regulators – that such workloads can be safely run in the public cloud, then they get more business. And they can probably charge more for these protections as well! That doesn’t mean that isolating your workloads from their hosts is easy, though.

There is good news, however, for both cloud service providers and their teants, which is that there’s a new set of hardware techniques called TEEs – Trusted Execution Environments – which can provide exactly this sort of protection[2]. This is rapidly maturing technology, and TEEs are not easy to use – in that it can not only be difficult to run your workload in a TEE, but also to ensure that it’s running in a TEE – but when done right, they do provide the sorts of isolation from the host that a workload wants in order to maintain its integrity and confidentiality[3].

There are a number of projects looking to make using TEEs easier – I’d point to Enarx in particular – and even an industry consortium to promote open TEE adoption, the Confidential Computing Consortium. Things are looking up if you’re interested in protecting your workloads, and the cloud service providers are on board, too.


1 – sorry if you came here expecting something different, but do stick around and have a read: hopefully there’s something of interest.

2 – the best known are Intel’s SGX and AMD’s SEV.

3 – availability – ensuring that it runs fairly – is more difficult, but as this is a property that is also generally in the cloud service provider’s best interest, and something that can can control, it’s not generally too much of a concern[4].

4 – yes, there are definitely times when it is, but that’s a story for another article.

A cybersecurity tip from Hazzard County

Don’t place that bet in Boss Hogg’s betting saloon: you know he’s up to no good!

It’s a slightly guilty secret, but I used to love watching The Dukes of Hazzard in the early 80’s (the first series started in late 1979, but I suspect that it didn’t make it to the UK until the next year at the earliest).  It all seemed very glamourous, and there were lots of fast car chases.  And a basset hound, which was an extra win.  To say this was early days for cybersecurity would be an understatement, and though there are references in the Wikipedia plot summaries to computers, I can’t honestly say that I remember any of those particular episodes.

One episode has stuck with me, however, for reasons that I can’t fathom.  It’s called “Hazzard Hustle” and (*SPOILER ALERT*) in it, Boss Hogg sets up a crooked betting saloon.  The swindle (if I remember it correctly) is that he controls and delays the supposedly live feeds to the TVs in the saloon, which means that he has access to results before they come in.  Needless to say, the Duke boys (probably aided and abetted by Daisy Duke) get the better of him in the end, and everything turns out OK (for them, not Boss Hogg).

“What can this have to do with cybersecurity?” you have every right to ask.  Well, the answer is reporting and monitoring channels.  Monitoring is important because without it, there is no way for us to check that what we believe should be happening actually is.  The opportunities for direct sensory monitoring of actions in computer-based systems are limited: if I request via a web browser that a banking application transfers funds between one account and another, the only visible effect that I am likely to see is an acknowledgement on the screen. Until I actually try to spend or withdraw that money, I realistically have no way to be assured that the transactions has taken place.

Let’s take an example from the human realm.  It is as if I have a trust relationship with somebody around the corner of a street, out of view, that she will raise a red flag at the stroke of noon, and I have a friend, standing on the corner, who will watch her, and tell me when and if she raises the flag. I may be happy with this arrangement, but only because I have a trust relationship to the friend: that friend is acting as a trusted channel for information.

The word “friend” was chosen carefully above, because there is a trust relationship already implicit in the term. The same is not true for the word “somebody”, which I used to describe the person who was to raise the flag. The situation as described above is likely to make our minds presume that there is a fairly high probability that the trust relationship I have to the friend is sufficient to assure me that he will pass the information correctly. But what if my friend is actually a business partner of the flag-waver? Given our human understanding of the trust relationships typically involved with business partnerships, we may immediately begin to assume that my friend’s motivations in respect to correct reporting are not neutral.

The channels for reporting on actions – monitoring them – are vitally important within cybersecurity, and it is both easy and dangerous to fall into the trap of assuming that they are neutral, and that the only important one is between me and the acting party. In reality, the trust relationship that I have to a set of channels is key to the maintenance of the trust relationships that I have to the key entity that they monitor. In trust relationships involving computer systems, there are often multiple entities or components involved in actions, and these form a “chain of trust”, where each link depends on the other, and the chain is typically only as strong as the weakest of its links.  Don’t forget that.  Oh, and don’t place that bet in Boss Hogg’s betting saloon: you know he’s up to no good!