Web3 plus Confidential Computing

Technologies, when combined, sometimes yield fascinating – and commercially exciting – results.

Mike Bursell

Introduction

One of the things that I enjoy the most is taking two different technologies, accelerating them at speed and seeing what comes out when they hit, rather in the style of a particle physicist with a Large Hadron Collider. Technologies which may not seem to be obvious fits for each other, when combined, sometimes yield fascinating – and commercially exciting – results, and the idea of putting Web3 and Confidential Computing together is certainly one of those occasions. Like most great ideas, once someone explained it to me, it was a “oh, well, of course that’s going to make sense!” moment, and I’m hoping that this article, which attempts to explain the combination of the two technologies, will give you the same reaction. I’ll start with an introduction to the two technologies separately, why they are interesting from a business context, and then look at what happens when you put them together. We’ll finish with more of a description of a particular implementation: that of Super Protocol, using the Polygon blockchain.

Business context

Introduction to the technologies

In this section, we look at blockchain in general and Web3 in particular, followed by a description of the key aspects of Confidential Computing. If you’re already an expert in either of these technologies, feel free to skip these, of course.

Blockchain

Blockchains offer a way for groups of people to agree about the truth of key aspects of the world. They let people say: “the information that is part of that blockchain is locked in, and we – the other people who use it and I – believe that is correct and represents a true version of certain facts.” This is a powerful capability, but how does it arise? The key point about a blockchain is that it is immutable. More specifically, anything that is placed on the blockchain can’t be changed without such a change being obvious to anybody with access to it. And another key point about many blockchains is that they are public – that is, anybody with access to the Internet and the relevant software is able to access them. Such blockchains are sometimes called “permissionless”, in juxtaposition to blockchains to which only authorised entities have access, which are known as “permissioned”. In both cases, the act of putting something on a blockchain is very important – if we want to view blockchains as providing a source of truth about the world – then the ability to put something onto the blockchain is a power that comes with great responsibility. The various consensus mechanisms employed vary between implementations but all of them aim for consensus among the parties that are placing their trust in the blockchain, a consensus that what is being represented is correct and valid. Once such a consensus has been met a cryptographic hash is used to seal the latest information and anchor it to previous parts of the blockchain, adding a new block to it.

While this provides enough for some use cases, the addition of smart contracts provides a new dimension of capabilities. I’ve noted before that smart contracts aren’t very well named (they’re arguably neither smart nor contracts!), but what they basically allow is for programs and their results to be put on a blockchain. If I create a smart contract and there’s consensus that it allows deterministic results from known inputs, and it’s put onto the blockchain, then that means that when it’s run, if people can see the inputs – and be assured that the contract was run correctly, a point to which we’ll be returning later in this article – then they will happy to put the results of that smart contract on the blockchain. What we’ve just created is a way to create data that is known to be correct and valid, and which we can be happy to put directly on the blockchain without further checking: the blockchain can basically add results to itself!

Web3

Blockchains and smart contracts, on their own, are little more than a diverting combination of cryptography and computing: it’s the use cases that make things interesting. The first use case that everyone thinks of is crypto-currency, the use of blockchains to create wholly electronic currencies that can be (but don’t have) divorced from centralised government-backed banking systems. (Parenthetically, the fact that the field of use and study of these crypto-currencies has become known to its enthusiasts as “crypto” drives most experts in the much older and more academic field of cryptology wild.)

There are other uses of blockchains and smart contracts, however, and the one which occupies our attention here is Web3. I’m old (I’m not going to give a precise age, but let’s say early-to-mid Gen X, shall we?), so I cut my professional teeth on the technologies that make up what are now known as Web1. Web1 was the world of people running their own websites with fairly simple static pages and CGI interactions with online databases. Web2 came next and revolves around centralised platforms – often cloud-based – and user-generated data, typically processed and manipulated by large organisations. While data and information may be generated by users, it’s typically sucked into the platforms owned by these large organisations (banks, social media companies, governments, etc.), and passes almost entirely out of user control. Web3 is the next iteration, and the big change is that it’s a move to decentralised services, transparency and user control of data. Web3 is about open protocols – data and information isn’t owned by those processing it: Web3 provides a language of communication and says “let’s start here”. And Web3 would be impossible without the use of blockchains and smart contracts.

Confidential Computing

Confidential Computing is a set of technologies that arose in the mid 2010s, originally to address a number of the problems that people started to realise were associated with cloud computing and Web2. As organisations moved their applications to the cloud, it followed that the data they were processing also moved there, and this caused issues. It’s probably safe to say that the first concerns that surfaced were around the organisations’ own data. Keeping financial data, intellectual property, cryptographic keys and the like safe from prying eyes on servers operated in clouds owned and managed by completely different companies, sometimes in completely different jurisdictions, started to become a worry. But that worry was compounded by the rising tide of regulation being enacted to protect the data not of the organisations, but of the customers who they (supposedly) served. This, and the growing reputational damage associated with the loss of private data, required technologies that would allow the safeguarding of sensitive data and applications from the cloud service providers and, in some cases, from the organisations who “owned” – or at least processed – that data themselves.

Confidential Computing requires two main elements. The first is a hardware-based Trusted Execution Environment (TEE): a set of capabilities on a chip (typically a CPU or GPU at this point) that can isolate applications and their data from the rest of the system running them, including administrators, the operating system and even the lowest levels of the computer, the kernel itself. Even someone with physical access to the machine cannot overcome the protection that a TEE provides, except in truly exceptional circumstances. The second element is remote attestation. It’s all very well setting up a TEE on a system in, say, a public cloud, but how can you know that it’s actually in place or even that the application you wanted to load into it is the one that’s actually running? Remote attestation addresses this problem in a multi-step process. There are a number of ways to manage this, but the basic idea is that the application in the TEE asks the CPU (which understands how this works) to create a measurement of some or all of the memory in the TEE. The CPU does this, and signs this with a cryptographic key, creating an attestation measurement. This measurement is then passed to a different system (hence “remote”), which checks it to see if it conforms to the expectations of the party (or parties) running the application and, if it does, provides a verification confirms that all is well. This basically allows a certificate to be created that attests to the correctness of the CPU, the validity of the TEE’s configuration and the state of any applications or data within the TEE.

With these elements – TEEs and remote attestation – in place, organisations can use Confidential Computing to prove to themselves, their regulators and their customers that no unauthorised peeking or tampering is possible with those sensitive applications and data that need to be protected.

Combining blockchain & CC

One thing – possibly the key thing – about Web3 is that it’s decentralised. That means that anyone can offer to provide services and, most importantly, computing services, to anybody else. This means that you don’t need to go to one of the big (and expensive) cloud service providers to run your application – you can run a DApp (Decentralised Application) – or a standard application such as a simple container image – on the hardware of anyone willing to host it. The question, of course, is whether you can trust them with your application and your data; and the answer, of course in many, if not most, use cases, is “no”. Cloud service providers may not be entirely worthy of organisations’ trust – hence the need for Confidential Computing – but at least they are publicly identifiable, have reputations and are both shameable and suable. It’s very difficult to say the same about in a Web3 world about a provider of computing resources who may be anonymous or pseudonymous and with whom you have never had any interactions before – nor are likely to have any in the future. And while there is sometimes scepticism about whether independent actors can create complex computational infrastructure, we only need look at the example of Bitcoin and other cryptocurrency miners, who have built computational resources which rival those of even the largest cloud providers.

Luckily for Web3, it turns out that Confidential Computing, while designed primarily for Web2, has just the properties needed to allow us to build systems that do allow us to do Web3 computing with confidence (I’ll walk through some of the key elements of one such implementation – by Super Protocol – below). TEEs allow DApps to be isolated from the underlying hardware and system software and remote attestation can provide assurances to clients that everything has been set up correctly (and a number of other properties besides).

Open source

There is one important characteristic that Web3 and Confidential Computing share that is required to ensure the security and transparency that is a key to a system that combines them: open source software. Where software is proprietary and closed from scrutiny (this is the closed from which open source is differentiated), the development of trust in the various components and how they interact is impossible. Where proprietary software might allow trust in a closed system of actors and clients who already have trust with each other – or external mechanisms to establish it – the same is not true in a system such as Web3 whose very decentralised nature doesn’t allow for such centralised authorities.

Open source software is not automatically or by its very nature more secure than proprietary software – it is written by humans, after all (for now!), and – but its openness and availability to scrutiny means that experts can examine it, check it and, where necessary, fix it. This allows the open source community and those that interact with it to establish that it is worthy of trust in particular contexts and use cases (see Chapter 9: Open Source and Trust in my book for more details of how this can work). Confidential Computing – using TEEs and remote attestation – can provide cryptographic assurances not only the elements of a Web3 system are valid and have appropriate security properties, but also that the components of the TEE itself do as well.

Some readers may have noted the apparent circularity in this set-up – there are actually two trust relationships that are required for Confidential Computing to work: in the chip manufacturer and in the attestation verification service. The first of these is unavoidable with current systems, while the other can be managed in part by performing the attestation oneself. It turns out that allowing the creation of trust relationships between mutually un-trusting parties is extremely complex, but one way that this can be done is what we will now address.

Super Protocol’s approach

Super Protocol have created a system which uses Confidential Computing to allow execution of complex applications to be made within a smart contract on the blockchain and for all the parties in the transaction to have appropriate trust in the performance and result of that execution without having to know or trust each other. The key layers are:

Client Infrastructure, allowing a client to interact with the blockchain, initiate an instance and interact with it
Blockchain, including smart contracts
Various providers (TEE, Data, Solution, Storage).

Central to Super Protocol’s approach are two aspects of the system: that it is open source, and that remote attestation is required to allow the client to have sufficient assurance of the system’s security. Smart contracts – themselves open source – allow the resources made available by the various actors and combined into an offer that is placed on the blockchain and is available to anyone with access to the blockchain – to execute it, given sufficient resources from all involved. What makes this approach a Web3 approach, and differentiates it from a more Web2 system, is that none of these actors needs to be connected contractually.

Benefits of This Approach

How does this approach help? Well, you don’t need to store or process data (which may be sensitive or just very large) locally: TEEs can handle it, providing confidentiality and integrity assurances that would otherwise be impossible. And communications between the various applications are also encrypted transparently, reducing or removing risks of data leakage and exposure, without requiring complex key management by users, but keeping the flexibility and exposure offered by decentralisation and Confidential Computing.

But the step change that this opens up is the network effect enabled by the possibility of building huge numbers of interconnected Web3 agents and applications, operating with the benefits of integrity and confidentiality offered by Confidential Computing, and backed up by remote attestation. One of the recurring criticisms of Web2 ecosystems is their fragility and lack of flexibility (not to mention the problems of securing them in the first place): here we have an opportunity to create complex, flexible and robust ecosystems where decentralised agents and applications can collaborate, with privacy controls designed in and clearly defined security assurances and policies.

Technical details

In this section, I dig a little further into some of the technical details of Super Protocol’s system. It is, of course, not the only approach to combining Confidential Computing and Web3, but it is available right now, seems carefully architected and designed with security foremost in mind and provides a good example of the technologies and the complexities involved.

You can think of Super Protocol’s service as being in two main parts: on-chain and off-chain. The marketplace, with smart contract offers, sits on an Ethereum blockchain, and the client interacts with that, never needing to know the details of how and where their application instance is running. The actual running applications are off-chain, supported by other infrastructure to allow initial configuration and then communication services between clients and running applications. The “bridge” between the two parts, which moves from an offer to an actual running instance of the application, is a component called a Trusted Loader, which sets up the various parts of the application and sets it running. The data it is managing contains sensitive information such as cryptographic keys which need to be protected as they provide security for all the other parts of the system and the Trusted Loader also manages the important actions of hash verification (ensuring that what is being loaded what was originally offered) and order integrity (ensuring that no changes can be made whilst the loading is taking place and execution starting).

Trusted Loader – configuration and deployment with Data and Application information into TEE instance

But what is actually running? The answer is that the unit of execution for an application in this service is a Kubernetes Pod, so each application is basically a container image which is run within a Pod, which itself executes within a TEE, isolating it from any unauthorised access. This Pod itself is – of course! – measured, creating an attestation measurement that can now be verified by clients of the application. We should also remember that the application itself – the container image – needs protection as well. This is part of the job of the Trusted Loader, as the container image is stored encrypted, and the Trusted Loader has appropriate keys to decrypt this and other resources required to allow execution. This is not the only thing that the Trusted Loader does: it also gathers and sets up resources from the smart contract for networking and storage, putting everything together, setting it running and connecting the client to the running instance.

There isn’t space in this article to go into deeper detail of how the system works, but by combining the capabilities offered by Confidential Computing and a system of cryptographic keys and certificates, the overall system enforces a variety of properties that are vital for sensitive, distributed and decentralised Web3 applications.

Decentralised storage: secrets are kept in multiple places instead of one, making them harder to access, steal or leak.
Developer independence: creators of applications can’t access these secrets, continuing the lack of need for trust relationships between the various actors. In other words, each instance of an application is isolated from its creator, maintaining data confidentiality.
Unique secrets: Each application gets its own unique secrets that nobody else can use or see and which are not shared between instances.

Thanks

Thanks to Super Protocol for sponsoring this article. Although they made suggestions and provided assistance around the technical details, this article represents my views, the text is mine and final editorial control (and with it the blame for any mistakes!) rests with the author.

Photo by Rukma Pratista on Unsplash

Zero trust and Confidential Computing

Confidential Computing can provide two properties which are excellent starting points for zero/explicit trust.

I’ve been fairly scathing about “zero trust” before – see, for instance, my articles Thinking beyond “zero-trust” and “Zero-trust”: my love/hate relationship – and my view of how the industry talks about it hasn’t changed much. I still believe, in particular, that:

the original idea, as conceived, has a great deal of merit;
few people really understand what it means;
it’s become an industry bandwagon that is sorely abused by some security companies;
it would be better called “explicit trust”.

The reason for this last is that it’s impossible to have zero trust: any entity or component has to have some level of trust in the other entities/components with which it interacts. More specifically, it has to maintain trust relationships – and what they look like, how they’re established, evaluated, maintained and destroyed is the core point of discussion of my book Trust in Computer Systems and the Cloud. If you’re interested in a more complete and reasoned criticism of zero trust, you’ll find that in Chapter 5: The Importance of Systems.

But, as noted above, I’m actually in favour of the original idea of zero trust, and that’s why I wanted to write this article about how zero trust and Confidential Computing, when combined, can actually provide some real value and improvements over standard distributed architectures (particularly in the Cloud).

An important starting point, however, is to note that I’ll be using this definition of Confidential Computing:

Confidential Computing is the protection of data in use by performing computation in a hardware-based, attested Trusted Execution Environment.
Confidential Computing Consortium, https://confidentialcomputing.io/about/

Confidential Computing, as thus described, can provide two properties which are excellent starting points for components wishing to exercise zero/explicit trust, which we’ll examine individually:

isolation from the host machine/system, particularly in terms of confidentiality of data;
cryptographically verifiable identity.

Isolation

One of the main trust relationships that any executing component must establish and maintain is with the system that is providing the execution capabilities – the machine on which it is running (or virtual machine – but that presents similar issues). When you say that your component has “zero trust”, but has to trust the host machine on which it is running to maintain the confidentiality of the code and/or data associated with the component, then you have to accept the fact that you do actually have an enormous trust relationship: with the machine and whomever administers/controls it (and that includes anyone who may have compromised it). This can hardly form the basis for a “zero trust” architecture – but what can be done about it?

Where Confidential Computing helps is by allowing isolation from the machine which is doing the execution. The component still needs to trust the CPU/firmware that’s providing the execution context – something needs to run the code, after all! – but we can shrink that number of trust relationships required significantly, and provide cryptographic assurances to base this relationship on (see Attestation, below).

Knowing that a component is isolated from another component allows that component to have assurances about how it will operate and also allows other components to build a trust relationship with that component with the assurance that it is acting with its own agency, rather than under that of a malicious actor.

Attestation

Attestation is the mechanism by which an entity can receive assurances that a Confidential Computing component has been correctly set up and can provide the expected properties of data confidentiality and integrity and code integrity (and in some cases, confidentiality). These assurances are bound to a particular Confidential Computing component (and the Trusted Execution Environment in which it executes) cryptographically, which allows for another property to be provided as well: a unique identity. If the attesting service bind this identity cryptographically to the Confidential Computing component by means of, for instance, a standard X.509 certificate, then this can provide one of the bases for trust relationships both to and from the component.

Establishing a “zero trust” relationship

These properties allow zero (or “explicit”) trust relationships to be established with components that are operating within a Confidential Computing environment, and to do so in ways which have previously been impossible. Using classical computing approaches, any component is at the mercy of the environment within which it is executing, meaning that any trust relationship that is established to it is equally with the environment – that is, the system that is providing its execution environment. This is far from a zero trust relationship, and is also very unlikely to be explicit!

In a Confidential Computing environment, components can have a small number of trust relationships which are explicitly noted (typically these include the attestation service, the CPU/firmware provider and the provider of the executing code), allowing for a much better-defined trust architecture. It may not be exactly “zero trust”, but it is, at least, heading towards “minimal trust”.

Functional vs non-functional requirements: a dangerous dichotomy?

Non-functional requirements are at least as important as functional requirements.

Imagine you’re thinking about an application or a system: how would you describe it? How would you explain what you want it to do? Most of us, I think, would start with statements like:

it should read JPEGs and output SVG images;
it should buy the stocks I tell it to when they reach a particular price;
it should take a customer’s credit history and decide whether to approve a loan application;
it should ensure that the car maintains a specific speed unless the driver presses the brakes or disengages the system;
it should level me up when I hit 10,000 gold bars mined;
~~it should take a prompt and output several hundred words about a security topic that sound as if I wrote them~~;
it should strike out any text which would give away its non-human status.

These are all requirements on the system. Specifically, they are functional requirements: they are things that an application or a system should do based on the state of inputs and outputs to which it is exposed.

Now let’s look at another set of requirements: requirements which are important to the correct operation of the system, but which aren’t core to what it does. These are non-functional requirements, in that they don’t describe the functions it performs, but its broader operation. Here are some examples:

it should not leak cryptographic keys if someone performs a side-channel attack on it;
it should be able to be deployed on premises or in the Cloud;
it should be able to manage 30,000 transactions a second;
it should not slow stop a user’s phone from receiving a phone call when it is running;
it should not fail catastrophically, but degrade its performance gracefully under high load;
~~it should be allowed to empty the bank accounts of its human masters~~;
it should recover from unexpected failures, such as its operator switching off the power in a panic on seeing unexpected financial transactions.

You may notice that some of the non-functional requirements are expressed as negatives – “it should not” – this is fairly common, and though functional requirements are sometimes expressed in the negative, it is more rare.

So now we come to the important question, and the core of this article: which of the above lists is more important? Is it the list with the functional requirements or the non-functional requirements? I think that there’s a fair case to be made for the latter: the non-functional requirements. Even if that’s not always the case, my (far too) many years of requirements gathering (and requirements meeting) lead me to note that while there may be a core set of functional requirements that typically are very important, it’s very easy for a design, architecture or specification to collect more and more functional requirements which pale into insignificance against some of the non-functional requirements that accrue.

But the problem is that non-functional requirements are almost always second-class citizens when compared to functional requirements on an application or system. They are are often collected after the functional requirements – if at all – and are often the first to be discarded when things get complicated. They also typically require input from people with skill sets outside the context of the application or system: for instance, it may not be obvious to the designer of a back-end banking application that they need to consider data-in-use protection (such as Confidential Computing) when they are collecting requirements of an application which will initially be run in an internal data centre.

Agile and DevOps methodologies can be relevant in these contexts, as well. On the one hand, ensuring that the people who will be operating an application or system is likely to focus their minds on some of the non-functional requirements which might impact them if they are not considered early enough. On the other hand, however, a model of development where the the key performance indicator is having something that runs means that the functional requirements are fore-grounded (“yes, you can log in – though we’re not actually checking passwords yet…”).

What’s the take-away from this article? It’s to consider non-functional requirements as at least as important as functional requirements. Alongside that, it’s vital to be aware that the people in charge of designing, architecting and specifying an application or system may not be best placed to collect all of the broader requirements that are, in fact, core to its safe and continuing (business critical) operation.

Enarx 0.3.0 (Chittorgarh Fort)

Write some applications and run them in an Enarx Keep.

I usually post on a Tuesday, but this week I wanted to wait for a significant event: the release Enarx v0.3.0, codenamed “Chittorgarh Fort”. This happened after I’d gone to bed, so I don’t feel too bad about failing to post on time. I announced Enarx nearly three years ago, in the article Announcing Enarx on the 7th May 2019. and it’s admittedly taken us a long time to get to where we are now. That’s largely because we wanted to do it right, and building up a community, creating a start-up and hiring folks with the appropriate skills is difficult. The design has evolved over time, but the core principles and core architecture are the same as when we announced the project.

You can find more information about v0.3.0 at the release page, but I thought I’d give a few details here and also briefly add to what’s on the Enarx blog about the release.

What’s Enarx?

Enarx is a deployment framework for running applications within Trusted Execution Environments (TEEs). We provide a WebAssembly runtime and – this is new functionality that we’ve started adding in this release – attestation so that you can be sure that your application is protected within a TEE instance.

What’s new in v0.3.0?

A fair amount of the development for this release has been in functionality which won’t be visible to most users, including a major rewrite of the TEE/host interface component that we call sallyport. You will, however, notice that TLS support has been added to network connections from applications within the Keep. This is transparent to the application, so “Where does the certificate come from?” I hear you ask. The answer to that is from the attestation service that’s also part of this release. We’ll be talking more about that in further releases and articles, but key to the approach we’re taking is that interactions with the service (we call it the “Steward”) is pretty much transparent to users and applications.

How can I get involved?

What can you do to get involved? Well, visit the Enarx website, look at the code and docs over at our github repositories (please star the project!), get involved in the chat. The very best thing you can do, having looked around, is to write some applications and run them in an Enarx Keep. And then tell us about your experience. If it worked first time, then wow! We’re still very much in development, but we want to amass a list of applications that are known to work within Enarx, so tell us about it. If it doesn’t work, then please also tell us about it, and have a look at our issues page to see if you’re the first person to run across this problem. If you’re not, then please add your experiences to an existing issue, but if you are, then create a new one.

Enarx isn’t production ready, but it’s absolutely ready for initial investigations (as shown by our interns, who created a set of demos for v0.2.0, curated and aided by our community manager Nick Vidal).

Why Chittorgarh Fort?

It’s worth having a look at the Wikipedia entry for the fort: it’s really something! We decided, when we started creating official releases, that we wanted to go with the fortification theme that Enarx has adopted (that’s why you deploy applications to Enarx Keeps – a keep is the safest part of a castle). We started with Alamo, then went to Balmoral Castle, and then to Chittorgarh Fort (we’re trying to go with alphabetically sequential examples as far as we can!). I suggested Chittorgarh Fort to reflect the global nature of our community, which happens to include a number of contributors from India.

Who was involved?

I liked the fact that the Enarx blog post mentioned the names of some (most?) of those involved, so I thought I’d copy the list of github account names from there, with sincere thanks:

@MikeCamel @npmccallum @haraldh @connorkuehl @lkatalin @mbestavros @wgwoods @axelsimon @ueno @ziyi-yan @ambaxter @squidboylan @blazebissar @michiboo @matt-ross16 @jyotsna-penumaka @steveeJ @greyspectrum @rvolosatovs @lilienbm @CyberEpsilon @kubkon @nickvidal @uudiin @zeenix @sagiegurari @platten @greyspectrum @bstrie @jarkkojs @definitelynobody @Deepansharora27 @mayankkumar2 @moksh-pathak

Rahultalreja11 at English Wikipedia, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons

Cloud security asymmetry

We in the security world have to make people understand this issue.

My book, Trust in Computer Systems and the Cloud, is due out in the next few weeks, and I was wondering as I walked the dogs today (a key part of the day for thinking!) what the most important message in the book is. I did a bit of thinking and a bit of searching, and decided that the following two paragraphs expose the core thesis of the book. I’ll quote them below and then explain briefly why (the long explanation would require me to post most of the book here!). The paragraph is italicised in the book.

“A CSP [Cloud Service Provider] can have computational assurances that a tenant’s workloads cannot affect its hosts’ normal operation, but no such computational assurances are available to a tenant that a CSP’s hosts will not affect their workloads’ normal operation.

In other words, the tenant has to rely on commercial relationships for trust establishment, whereas the CSP can rely on both commercial relationships and computational techniques. Worse yet, the tenant has no way to monitor the actions of the CSP and its host machines to establish whether the confidentiality of its workloads has been compromised (though integrity compromise may be detectable in some situations): so even the “trust, but verify” approach is not available to them.”

What does this mean? There is, in cloud computing, a fundamental asymmetry: CSPs can protect themselves from you (their customer), but you can’t protect yourself from them.

Without Confidential Computing – the use of Trusted Execution Environments to protect your workloads – there are no technical measures that you can take which will stop Cloud Service Providers from looking into and/or altering not only your application, but also the data it is processing, storing and transmitting. CSPs can stop you from doing the same to them using standard virtualisation techniques, but those techniques provide you with no protection from a malicious or compromised host, or a malicious or compromised CSP.

I attended a conference recently attended by lots of people whose job it is to manage and process data for their customers. Many of them do so in the public cloud. And a scary number of them did not understand that all of this data is vulnerable, and that the only assurances they have are commercial and process-based.

We in the security world have to make people understand this issue, and realise that if they are looking after our data, they need to find ways to protect it with strong technical controls. These controls are few:

architectural: never deploy sensitive data to the public cloud, ever.
HSMs: use Hardware Security Modules. These are expensive, difficult to use and don’t scale, but they are appropriate for some sensitive data.
Confidential Computing: use Trusted Execution Environments (TEEs) to protect data and applications in use[1].

Given my interest – and my drive to write and publish my book – it will probably come as no surprise that this is something I care about: I’m co-founder of the Enarx Project (an open source Confidential Computing project) and co-founder and CEO of Profian (a start-up based on Enarx). But I’m not alone: the industry is waking up to the issue, and you can find lots more about the subject at the Confidential Computing Consortium‘s website (including a list of members of the consortium). If this matters to you – and if you’re an enterprise company who uses the cloud, it almost certainly already does, or will do so – then please do your research and consider joining as well. And my book is available for pre-order!

Win a copy of my book!

What’s better than excerpts? That’s right: the entire book.

As regular readers of this blog will know, I’ve got a book coming out with Wiley soon. It’s called “Trust in Computer Systems and the Cloud”, and the publisher’s blurb is available here. We’ve now got to the stage where we’ve completed not only the proof-reading for the main text, but also the front matter (acknowledgements, dedication, stuff like that), cover and “praise page”. I’d not heard the term before, but it’s where endorsements of the book go, and I’m very, very excited by the extremely kind comments from a variety of industry leaders which you’ll find quoted there and, in some cases, on the cover. You can find a copy of the cover (without endorsement) below.

Trust book front cover (without endorsement)

I’ve spent a lot of time on this book, and I’ve written a few articles about it, including providing a chapter index and summary to let you get a good idea of what it’s about. More than that, some of the articles here actually contain edited excerpts from the book.

What’s better than excerpts, though? That’s right: the entire book. Instead of an article today, however, I’m offering the opportunity to win a copy of the book. All you need to do is follow this blog (with email updates, as otherwise I can’t contact you), and when it’s published (soon, we hope – the March date should be beaten), I’ll choose one lucky follower to receive a copy.

No Wiley employees, please, but other than that, go for it, and I’ll endeavour to get you a copy as soon as I have any available. I’ll try to get it to you pretty much anywhere in the world, as well. So far, it’s only available in English, so apologies if you were hoping for an immediate copy in another language (hint: let me know, and I’ll lobby my publisher for a translation!).

Trust book – chapter index and summary

I thought it might be interesting to provide the chapter index and a brief summary of each chapter addresses.

In a previous article, I presented the publisher’s blurb for my upcoming book with Wiley, Trust in Computer Systems and the Cloud. I thought it might be interesting, this time around, to provide the chapter index of the book and to give a brief summary of what each chapter addresses.

While it’s possible to read many of the chapters on their own, I haved tried to maintain a logical progression of thought through the book, building on earlier concepts to provide a framework that can be used in the real world. It’s worth noting that the book is not about how humans trust – or don’t trust – computers (there’s a wealth of literature around this topic), but about how to consider the issue of trust between computing systems, or what we can say about assurances that computing systems can make, or can be made about them. This may sound complex, and it is – which is pretty much why I decided to write the book in the first place!

Introduction
- Why I think this is important, and how I came to the subject.
Chapter 1 – Why Trust?
- Trust as a concept, and why it’s important to security, organisations and risk management.
Chapter 2 – Humans and Trust
- Though the book is really about computing and trust, and not humans and trust, we need a grounding in how trust is considered, defined and talked about within the human realm if we are to look at it in our context.
Chapter 3 – Trust Operations and Alternatives
- What are the main things you might want to do around trust, how can we think about them, and what tools/operations are available to us?
Chapter 4 – Defining Trust in Computing
- In this chapter, we delve into the factors which are specific to trust in computing, comparing and contrasting them with the concepts in chapter 2 and looking at what we can and can’t take from the human world of trust.
Chapter 5 – The Importance of Systems
- Regular readers of this blog will be unsurprised that I’m interested in systems. This chapter examines why systems are important in computing and why we need to understand them before we can talk in detail about trust.
Chapter 6 – Blockchain and Trust
- This was initially not a separate chapter, but is an important – and often misunderstood or misrepresented – topic. Blockchains don’t exist or operate in a logical or computational vacuum, and this chapter looks at how trust is important to understanding how blockchains work (or don’t) in the real world.
Chapter 7 – The Importance of Time
- One of the important concepts introduced earlier in the book is the consideration of different contexts for trust, and none is more important to understand than time.
Chapter 8 – Systems and Trust
- Having introduced the importance of systems in chapter 5, we move to considering what it means to have establish a trust relationship from or to a system, and how the extent of what is considered part of the system is vital.
Chapter 9 – Open Source and Trust
- Another topc whose inclusion is unlikely to surprise regular readers of this blog, this chapter looks at various aspects of open source and how it relates to trust.
Chapter 10 – Trust, the Cloud, and the Edge
- Definitely a core chapter in the book, this addresses the complexities of trust in the modern computing environments of the public (and private) cloud and Edge networks.
Chapter 11 – Hardware, Trust, and Confidential Computing
- Confidential Computing is a growing and important area within computing, but to understand its strengths and weaknesses, there needs to be a solid theoretical underpinning of how to talk about trust. This chapter also covers areas such as TPMs and HSMs.
Chapter 12 – Trust Domains
- Trust domains are a concept that allow us to apply the lessons and frameworks we have discussed through the book to real-world situations at large scale. They also allow for modelling at the business level and for issues like risk management – introduced at the beginning of the book – to be considered more explicitly.
Chapter 13 – A World of Explicit Trust
- Final musings on what a trust-centric (or at least trust-inclusive) view of the world enables and hopes for future work in the field.
References
- List of works cited within the book.

Trust book preview

What it means to trust in the context of computer and network security

Just over two years ago, I agreed a contract with Wiley to write a book about trust in computing. It was a long road to get there, starting over twenty years ago, but what pushed me to commit to writing something was a conference I’d been to earlier in 2019 where there was quite a lot of discussion around “trust”, but no obvious underlying agreement about what was actually meant by the term. “Zero trust”, “trusted systems”, “trusted boot”, “trusted compute base” – all terms referencing trust, but with varying levels of definition, and differing understanding if what was being expected, by what components, and to what end.

I’ve spent a lot of time thinking about trust over my career and also have a major professional interest in security and cloud computing, specifically around Confidential Computing (see Confidential computing – the new HTTPS? and Enarx for everyone (a quest) for some starting points), and although the idea of a book wasn’t a simple one, I decided to go for it. This week, we should have the copy-editing stage complete (technical editing already done), with the final stage being proof-reading. This means that the book is close to down. I can’t share a definitive publication date yet, but things are getting there, and I’ve just discovered that the publisher’s blurb has made it onto Amazon. Here, then, is what you can expect.

Learn to analyze and measure risk by exploring the nature of trust and its application to cybersecurity

Trust in Computer Systems and the Cloud delivers an insightful and practical new take on what it means to trust in the context of computer and network security and the impact on the emerging field of Confidential Computing. Author Mike Bursell’s experience, ranging from Chief Security Architect at Red Hat to CEO at a Confidential Computing start-up grounds the reader in fundamental concepts of trust and related ideas before discussing the more sophisticated applications of these concepts to various areas in computing.

The book demonstrates in the importance of understanding and quantifying risk and draws on the social and computer sciences to explain hardware and software security, complex systems, and open source communities. It takes a detailed look at the impact of Confidential Computing on security, trust and risk and also describes the emerging concept of trust domains, which provide an alternative to standard layered security.

Foundational definitions of trust from sociology and other social sciences, how they evolved, and what modern concepts of trust mean to computer professionals
A comprehensive examination of the importance of systems, from open-source communities to HSMs, TPMs, and Confidential Computing with TEEs.
A thorough exploration of trust domains, including explorations of communities of practice, the centralization of control and policies, and monitoring

Perfect for security architects at the CISSP level or higher, Trust in Computer Systems and the Cloud is also an indispensable addition to the libraries of system architects, security system engineers, and master’s students in software architecture and security.

Does my TCB look big in this?

The smaller your TCB the less there is to attack, and that’s a good thing.

This isn’t the first article I’ve written about Trusted Compute Bases (TCBs), so if the concept is new to you, I suggest that you have a look at What’s a Trusted Compute Base? to get an idea of what I’ll be talking about here. In that article, I noted the importance of the size of the TCB: “what you want is a small, easily measurable and easily auditable TCB on which you can build the rest of your system – from which you can build a ‘chain of trust’ to the other parts of your system about which you care.” In this article, I want to take some time to discuss the importance of the size of a TCB, how we might measure it, and how difficult it can be to reduce the TCB size. Let’s look at all of those issues in order.

Size does matter

However you measure it – and we’ll get to that below – the size of the TCB matters for two reasons:

the larger the TCB is, the more bugs there are likely to be;
the larger the TCB is, the larger the attack surface.

The first of these is true of any system, and although there may be ways of reducing the number of bugs, proving the correctness of all or, more likely, part of the system, bugs are both tricky to remove and resilient – if you remove one, you may well be introducing another (or worse, several). Now, the kinds or bugs you have, and the number of them, can be reduced through a multitude of techniques, from language choice (choosing Rust over C/C++ to reduce memory allocation errors, for instance) to better specification and on to improved test coverage and fuzzing. In the end, however, the smaller the TCB, the less code (or hardware – we’re considering the broader system here, don’t forget), you have to trust, the less space there is for there to be bugs in it.

The concept of an attack surface is important, and, like TCBs, one I’ve introduced before (in What’s an attack surface?). Like bugs, there may be no absolute measure of the ratio of danger:attack surface, but the smaller your TCB, well, the less there is to attack, and that’s a good thing. As with bug reduction, there are number of techniques you may want to apply to reduce your attack surface, but the smaller it is, then, by definition, the fewer opportunities attackers have to try to compromise your system.

Measurement

Measuring the size of your TCB is really, really hard – or, maybe I should say that coming up with an absolute measure that you can compare to other TCBs is really, really hard. The problem is that there are so many measurements that you might take. The ones you care about are probably those that can be related to attack surface – but there are so many different attack vectors that might be relevant to a TCB that there are likely to be multiple attack surfaces. Let’s look at some of the possible measurements:

number of API methods
amount of data that can be passed across each API method
number of parameters that can be passed across each API method
number of open network sockets
number of open local (e.g. UNIX) sockets
number of files read from local storage
number of dynamically loaded libraries
number of DMA (Direct Memory Access) calls
number of lines of code
amount of compilation optimisation carried out
size of binary
size of executing code in memory
amount of memory shared with other processes
use of various caches (L1, L2, etc.)
number of syscalls made
number of strings visible using strings command or similar
number of cryptographic operations not subject to constant time checks

This is not meant to be an exhaustive list, but just to show the range of different areas in which vulnerabilities might appear. Designing your application to reduce one may increase another – one very simple example being an attempt to reduce the number of API calls exposed by increasing the number of parameters on each call, another being to reduce the size of the binary by using more dynamically linked libraries.

This leads us to an important point which I’m not going to address in detail in this article, but which is fundamental to understanding TCBs: that without a threat model, there’s actually very little point in considering what your TCB is.

Reducing the TCB size

We’ve just seen one of the main reasons that reducing your TCB size is difficult: it’s likely to involve trade-offs between different measures. If all you’re trying to do is produce competitive marketing material where you say “my TCB is smaller than yours”, then you’re likely to miss the point. The point of a TCB is to have a well-defined computing base which can protect against specific threats. This requires you to be clear about exactly what functionality requires that it be trusted, where it sits in the system, and how the other components in the system rely on it: what trust relationships they have. I was speaking to a colleague just yesterday who was relaying a story of software project who said, “we’ve reduced our TCB to this tiny component by designing it very carefully and checking how we implement it”, but who overlooked the fact that the rest of the stack – which contained a complete Linux distribution and applications – could be no more trusted than before. The threat model (if there was one – we didn’t get into details) seemed to assume that only the TCB would be attacked, which missed the point entirely: it just added another “turtle” to the stack, without actually fixing the problem that was presumably at issue: that of improving the security of the system.

Reducing the TCB by artificially defining what the TCB is to suit your capabilities or particular beliefs around what the TCB specifically should be protecting against is not only unhelpful but actively counter-productive. This is because it ignores the fact that a TCB is there to serve the needs of a broader system, and if it is considered in isolation, then it becomes irrelevant: what is it acting as a base for?

In conclusion, it’s all very well saying “we have a tiny TCB”, but you need to know what you’re protecting, from what, and how.

Dependencies and supply chains

A dependency on a component is one which that an application or component needs to work

Supply chain security is really, really hot right now. It’s something which folks in the “real world” of manufactured things have worried about for years – you might be surprised (and worried) how much effort aircraft operators need to pay to “counterfeit parts”, and I wish I hadn’t just searched online the words “counterfeit pharmaceutical” – but the world of software had a rude wake-up call recently with the Solarwinds hack (or crack, if you prefer). This isn’t the place to go over that: you’ll be able to find many, many articles if you search on that. In fact, many companies have been worrying about this for a while, but the change is that it’s an issue which is now out in the open, giving more leverage to those who want more transparency around what software they consume in their products or services.

When we in computing (particularly software) think about supply chains, we generally talk about “dependencies”, and I thought it might be useful to write a short article explaining the main dependency types.

What is a dependency?

A dependency on a component is one which that an application or component needs to work, and they are generally considered to come in two types:

build-time dependencies
run-time dependencies.

Let’s talk about those two types in turn.

Build-time dependencies

these are components which are required in order to build (typically compile and package) your application or library. For example, if I’m writing a program in Rust, I have a dependency on the compiler if I want to create an application. I’m actually likely to have many more run-time dependencies, however. How those dependencies are made visible to me will depend on the programming language and the environment that I’m building in.

Some languages, for instance, may have filesystem support built in, but others will require you to “import” one or more libraries in order to read and write files. Importing a library basically tells your build-time environment to look somewhere (local disk, online repository, etc.) for a library, and then bring it into the application, allowing its capabilities to be used. In some cases, you will be taking pre-built libraries, and in others, your build environment may insist on building them itself. Languages like Rust have clever environments which can look for new versions of a library, download it and compile it without your having to worry about it yourself (though you can if you want!).

To get back to your file system example, even if the language does come with built-in filesystem support, you may decide to import a different library – maybe you need some fancy distributed, sharded file system, for instance – from a different supplier. Other capabilities may not be provided by the language, or may be higher-level capabilities: JSON serialisation or HTTPS support, for instance. Whether that library is available in open source may have a large impact on your decision as to whether or not to use it.

Build-time dependencies, then, require you to have the pieces you need – either pre-built or in source code form – at the time that you’re building your application or library.

Run-time dependencies

Run-time dependencies, as the name implies, only come into play when you actually want to run your application. We can think of there being two types of run-time dependency:

service dependency – this may not be the official term, but think of an application which needs to write some data to a window on a monitor screen: in most cases, there’s already a display manager and a window manager running on the machine, so all the application needs to do is contact it, and communicate the right data over an API. Sometimes, the underlying operating system may need to start these managers first, but it’s not the application itself which is having to do that. These are local services, but remote services – accessing a database, for instance – operate in the same sort of way. As long as the application can contact and communicate with the database, it doesn’t need to do much itself. There’s still a dependency, and things can definitely go wrong, but it’s a weak coupling to an external application or service.
dynamic linking – this is where an application needs access to a library at run-time, but rather than having added it at build-time (“static linking”), it relies on the underlying operating system to provide a link to the library when it starts executing. This means that the application doesn’t need to be as large (it’s not “carrying” the functionality with it when it’s installed), but it does require that the version that the operating system provides is compatible with what’s expected, and does the right thing.

Conclusion

I’ve resisted the temptation to go into the impacts of these different types of dependency in terms of their impact on security. That’s a much longer article – or set of articles – but it’s worth considering where in the supply chain we consider these dependencies to live, and who controls them. Do I expect application developers to check every single language dependency, or just imported libraries? To what extent should application developers design in protections from malicious (or just poorly-written) dynamically-linked libraries? Where does the responsibility lie for service dependencies – particularly remote ones?

These are all complex questions: the supply chain is not a simple topic (partly because there is not just one supply chain, but many of them), and organisations need to think hard about how they work.