An Enarx milestone: binaries

Demoing the same binary in very different TEEs.

This week is Red Hat Summit, which is being held virtually for the first time because of the Covid-19 crisis. The lock-down has not affected the productivity of the Enarx team, however (at least not negatively), as we have a very exciting demo that we will be showing at Summit. This post should be published at 1100 EDT, 1500 BST, 1400 GMT on Tuesday, 2020-04-28, which is the time that the session which Nathaniel McCallum and I recorded will be released to the world. I hope to be able to link to that once it’s released to the world. But what will we be showing?

Well, to set the scene, and to discover a little more about the Enarx project, you might want to read these articles first (also available in Japanese – visit each article of a link):

Enarx, as you’ll discover, is about running workloads in TEEs (Trusted Execution environments), using WebAssembly, in what we call “Keeps”. It’s a mammoth job, particularly as we’re abstracting away the underlying processor architectures (currently two: Intel’s SGX and AMD’s SEV), so that you, the user, don’t need to worry about them: all you need to do is write and compile your application, then request that it be deployed. Enarx, then, has lots of moving parts, and one of the key tasks for us has been to start the work to abstract away the underlying processor architectures so that we can prepare the runtime layers on top. Here’s a general picture of the software layers, and how they sit on top of the hardware platforms:

What we’re announcing – and demoing – today is that we have an initial implementation of code to allow us to abstract away process-based and VM-based types of architecture (with examples for SGX and SEV), so that we can do this:

This seems deceptively simple, but what’s actually going on under the covers is rather more than is exposed in the picture above. The reality is more like this:

This gives more detail: the application that’s running on both architectures (SGX on the left, SEV on the right) is the very same ELF static-PIE binary. To be clear, this is not only the same source code, compiled for different platforms, but exactly the same binary, with the very same hash signature. What’s pretty astounding about this is that in order to make it run on both platforms, the engineering team has had to write two sets of seriously low-level code, including more than a little Assembly language, providing the “plumbing” to allow the binary to run on both.

This is a very big deal, because although we’ve only implemented a handful of syscalls on each platform – enough to make our simple binary run and print out a message – we now have a framework on which we know we can build. And what’s next? Well, we need to expand that framework so that we can then build the WebAssembly layers which will allow WebAssembly applications to run on top:

There’s a long way to go, but this milestone shows that we have an initial framework which we can improve, and on which we can build.

What’s next?

What’s exciting about this milestone from our point of view is that we think it puts Enarx at a stage where more people can join and take part. There’s still lots of low-level work to be done, but it’s going to be easier to split up now, and also to start some of the higher level work, too. Enarx is completely open source, and we do all of our design work in the open, along with our daily stand-ups. You’re welcome to browse our documentation, RFCs (mostly in draft at the moment), raise issues, and join our calls. You can find loads more information on the Enarx wiki: we look forward to your involvement in the project.

Last, and not least, I’d like to take a chance to note that we now have testing/CI/CD resources available for the project with both Intel SGX and AMD SEV systems available to us, all courtesy of Packet. This is amazingly generous, and we both thank them and encourage you to visit them and look at their offerings for yourself!

No security without an architecture

Your diagrams don’t need to be perfect. But they do need to be there.

I attended a virtual demo this week. It didn’t work, but none of us was stressed by that: it was an internal demo, and these things happen. Luckily, the members of the team presenting the demo had lots of information about what it would have shown us, and a particularly good architectural diagram to discuss. We’ve all been in the place where the demo doesn’t work, and all felt for the colleague who was presenting the slidedeck, and on whose screen a message popped up a few slides in, saying “Demo NO GO!” from one of her team members.

After apologies, she asked if we wanted to bail completely, or to discuss the information they had to hand. We opted for the latter – after all, most demos which aren’t foregrounding user experience components don’t show much beyond terminal windows that most of us could fake up in half an hour or so anyway. She answered a couple of questions, and then I piped up with one about security.

This article could have been about the failures in security in a project which was showing an early demo: another example of security being left till late (often too late) in the process, at which point it’s difficult and expensive to integrate. However, it’s not. It clear that thought had been given to specific aspects of security, both on the network (in transit) and in storage (at rest), and though there was probably room for improvement (and when isn’t there?), a team member messaged me more documentation during the call which allowed me to understand of the choices the team had made.

What this article is about is the fact that we were able to have a discussion at all. The slidedeck included an architecture diagram showing all of the main components, with arrows showing the direction of data flows. It was clear, colour-coded to show the provenance of the different components, which were sourced from external projects, which from internal, and which were new to this demo. The people on the call – all technical – were able to see at a glance what was going on, and the team lead, who was providing the description, had a clear explanation for the various flows. Her team members chipped in to answer specific questions or to provide more detail on particular points. This is how technical discussions should work, and there was one thing in particular which pleased me (beyond the fact that the project had thought about security at all!): that there was an architectural diagram to discuss.

There are not enough security experts in the world to go around, which means that not every project will have the opportunity to get every stage of their design pored over by a member of the security community. But when it’s time to share, a diagram is invaluable. I hate to think about the number of times I’ve been asked to look at project in order to give my thoughts about security aspects, only to find that all that’s available is a mix of code and component documentation, with no explanation of how it all fits together and, worse, no architecture diagram.

When you’re building a project, you and your team are often so into the nuts and bolts that you know how it all fits together, and can hold it in your head, or describe the key points to a colleague. The problem comes when someone needs to ask questions of a different type, or review the architecture and design from a different slant. A picture – an architectural diagram – is a great way to educate external parties (or new members of the project) in what’s going on at a technical level. It also has a number of extra benefits:

  • it forces you to think about whether everything can be described in this way;
  • it forces you to consider levels of abstraction, and what should be shown at what levels;
  • it can reveal assumptions about dependencies that weren’t previously clear;
  • it is helpful to show data flows between the various components
  • it allows for simpler conversations with people whose first language is not that of your main documentation.

To be clear, this isn’t just a security problem – the same can go for other non-functional requirements such as high-availability, data consistency, performance or resilience – but I’m a security guy, and this is how I experience the issue. I’m also aware that I have a very visual mind, and this is how I like to get my head around something new, but even for those who aren’t visually inclined, a diagram at least offers the opportunity to orient yourself and work out where you need to dive deeper into code or execution. I also believe that it’s next to impossible for anybody to consider all the security implications (or any of the higher-order emergent characteristics and qualities) of a system of any significant complexity without architectural diagrams. And that includes the people who designed the system, because no system exists on its own (or there’s no point to it), so you can’t hold all of those pieces in your head of any length of time.

I’ve written before about the book Building Evolutionary Architectures, which does a great job in helping projects think about managing requirements which can morph or change their priority, and which, unsurprisingly, makes much use of architectural diagrams. Enarx, a project with which I’m closely involved, has always had lots of diagrams, and I’m aware that there’s an overhead involved here, both in updating diagrams as designs change and in considering which abstractions to provide for different consumers of our documentation, but I truly believe that it’s worth it. Whenever we introduce new people to the project or give a demo, we ensure that we include at least one diagram – often more – and when we get questions at the end of a presentation, they are almost always preceded with a phrase such as, “could you please go back to the diagram on slide x?”.

I nearly published this article without adding another point: this is part of being “open”. I’m a strong open source advocate, but source code isn’t enough to make a successful project, or even, I would add, to be a truly open source project: your documentation should not just be available to everybody, but accessible to everyone. If you want to get people involved, then providing a way in is vital. But beyond that, I think we have a responsibility (and opportunity!) towards diversity within open source. Providing diagrams helps address four types of diversity (at least!):

  • people whose first language is not the same as that of your main documentation (noted above);
  • people who have problems reading lots of text (e.g. those with dyslexia);
  • people who think more visually than textually (like me!);
  • people who want to understand your project from different points of view (e.g. security, management, legal).

If you’ve ever visited a project on github (for instance), with the intention of understanding how it fits into a larger system, you’ll recognise the sigh of relief you experience when you find a diagram or two on (or easily reached from) the initial landing page.

And so I urge you to create diagrams, both for your benefit, and also for anyone who’s going to be looking at your project in the future. They will appreciate it (and so should you). Your diagrams don’t need to be perfect. But they do need to be there.

Enarx goes multi-platform

Now with added SGX!

Yesterday, Nathaniel McCallum and I presented a session “Confidential Computing and Enarx” at Open Source Summit Europe. As well as some new information on the architectural components for an Enarx deployment, we had a new demo. What’s exciting about this demo was that it shows off attestation and encryption on Intel’s SGX. Our initial work focussed on AMD’s SEV, so this is our first working multi-platform work flow. We’re very excited, and particularly as this week a number of the team will be attending the first face to face meetings of the Confidential Computing Consortium, at which we’ll be submitting Enarx as a project for contribution to the Consortium.

The demo had been the work of several people, but I’d like to call out Lily Sturmann in particular, who got things working late at night her time, with little time to spare.

What’s particularly important about this news is that SGX has a very different approach to providing a TEE compared with the other technology on which Enarx was previously concentrating, SEV. Whereas SEV provides a VM-based model for a TEE, SGX works at the process level. Each approach has different advantages and offers different challenges, and the very different models that they espouse mean that developers wishing to target TEEs have some tricky decisions to make about which to choose: the run-time models are so different that developing for both isn’t really an option. Add to that the significant differences in attestation models, and there’s no easy way to address more than one silicon platform at a time.

Which is where Enarx comes in. Enarx will provide platform independence both for attestation and run-time, on process-based TEEs (like SGX) and VM-based TEEs (like SEV). Our work on SEV and SGX is far from done, but also we plan to support more silicon platforms as they become available. On the attestation side (which we demoed yesterday), we’ll provide software to abstract away the different approaches. On the run-time side, we’ll provide a W3C standardised WebAssembly environment to allow you to choose at deployment time what host you want to execute your application on, rather than having to choose at development time where you’ll be running your code.

This article has sounded a little like a marketing pitch, for which I apologise. As one of the founders of the project, alongside Nathaniel, I’m passionate about Enarx, and would love you, the reader, to become passionate about it, too. Please visit enarx.io for more information – we’d love to tell you more about our passion.

What’s a Trusted Compute Base?

Tamper-evidence, auditability and measurability are three important properties.

A few months ago, in an article called “Turtles – and chains of trust“, I briefly mentioned Trusted Compute Bases, or TCBs, but then didn’t go any deeper.  I had a bit of a search across the articles on this blog, and realised that I’ve never gone into this topic in much detail, which feels like a mistake, so I’m going to do it now.

First of all, let’s think about computer systems.  When I talk about systems, I’m being both quite specific (see Systems security – why it matters) and quite broad (I don’t just mean computer that sits on your desk or in a data centre, but include phones, routers, aircraft navigation devices – pretty much anything that has a set of chips inside it).  There are surely some systems that you don’t rely on too much to do important things, but in most cases, you’re going to care if they go wrong, or, more relevant to this discussion, if they get compromised.  Even the most benign of systems – a smart light-bulb, for instance – can become a nightmare if compromised.  Even if you don’t particularly care whether you can continue to use it in the way it was intended, there are still worries about its misuse in the case of compromise:

  1. it may become a “jumping off point” for malicious attacks into your network or other systems;
  2. it may be used as part of a botnet, piggybacking on your network to attack other systems (leading to sanctions against your legitimate systems from outside);
  3. it may be used as part of a botnet, using up resources such as network bandwidth, storage or electricity (leading to resource constraints or increased charges).

For any systems dealing with sensitive data – anything from your messages to loved ones on your phone through intellectual property secrets for a manufacturing organisation through to National Security data for government department – these issues are compounded.  In order to protect your system, you can’t just say “this system is secure” (lovely as that would be).  What can you do to start making statement about the general security of a system?

The stack

Systems consist of multiple components, and modern computing systems are typically composed from multiple layers (one of my favourite xkcd comics, Stack, shows some of them).  What’s relevant from the point of view of this article is that, on the whole, the different layers of the stack start up – boot up – from the bottom upwards.  This means, following the “bottom turtle” rule (see the Turtles article referenced above), that we need to ensure that the bottom layer is as secure as possible.  In fact, in order to build a system in which we can have assurance that it will behave as expected and designed (in other words, a system in which we can have a trust relationship), we need to build a Trusted Compute Base.  This should have at least the following set of properties: tamper-evidence, auditability and measurability, all of which are related to each other.

Tamper-evidence

We want to know if the TCB – on which we are building everything else – has a problem.  Specifically, we need a set of layers or components that we are pretty sure have not been compromised, or which, if compromised, will be tamper-evident:

  • fail in expected ways,
  • refuse to start, or
  • flag that they have been compromised.

It turns out that this is not easy, and typically becomes more difficult as you move up the stack – partly because you’re adding more layers, and partly because those layers tend to get more complex.

Our TCB should have the properties listed above (around failure, refusing to start or compromise-flagging), and be as small as possible.  This seems the wrong way around: surely you would want to ensure that as much of your system was trusted as possible?  In fact, what you want is a small, easily measurable and easily auditable TCB on which you can build the rest of your system – from which you can build a “chain of trust” to the other parts of your system about which you care.  Auditability and measurability are the other two properties that you want in a TCB, and these two properties are why open source is a very useful tool in your toolkit when building a TCB.

Auditability (and open source)

Auditability means that you – or someone else who you trust to do the job – can look into the various components of the TCB and assure yourself that they have been written, compiled and  are executing properly.  As I explained in Of projects, products and (security) community, the person may not always be you, or even someone in your organisation, but if you’re using widely deployed open source software, the rest of the community can be doing that auditing for you, which is a win for you and – if you contribute your knowledge back into the community – for everybody else as well.

Auditability typically gets harder the further you go down the stack – partly because you’re getting closer and closer to bits – ones and zeros – and to actual electrons, and partly because there is very little truly open source hardware out there at the moment.  However, the more that we can see and audit of the TCB, the more confidence we can have in it as a building block for the rest of our system.

Measurability (and open source)

The other thing you want to be able to do is ensure that your TCB really is your TCB.  Tamper-evidence is related to this, but that’s a run-time property only (for software components, at least).  Being able to measure when you provision your system and then to check that what you originally loaded is still what you think it should be when you boot it is a very important property of a TCB.  If what you’re running is open source, you can check it yourself, against your own measurements and those of the community, and if changes are made – by you or others – those changes can be checked (as part of auditing) and then propagated through measurement checking to the rest of the community.  Equally important – and much more difficult – is run-time measurability.  This turns out to be very difficult to do, although there are some techniques emerging which are beginning to get traction – for now, we tend to rely on tamper-evidence, which is easier in hardware than software.

Summary

Trusted Compute Bases (TCBs) are a key concept in building systems that we hope will behave in ways we expect – or allow us to find out when they are not.  Tamper-evidence, auditability and measurability are three important properties that they should display, and it turns out that open source is an important factor in helping us ensure two of those.

 

 

 

Enarx for everyone (a quest)

In your backpack, the only tool that you have to protect you is Enarx…

You are stuck in a deep, dark wood, with spooky noises and roots that seem to move and trip you up.  Behind every tree malevolent eyes look out at you.  You look in your backpack and realise that the only tool that you have for your protection is Enarx, the trusty open source project given you by the wizened old person at the beginning of your quest.  Hard as you try, you can’t remember what it does, or how to use it.  You realise that now is that time to find out.

What do you do next?

  • If you are a business person, go to 1. Why I need Enarx to reduce business risk.
  • If you are an architect, go to 2. How I can use Enarx to protect sensitive data.
  • If you are a techy, go to 3. Tell me more about Enarx technology (I can take it).

1. Why I need Enarx to reduce business risk

You are the wise head upon which your business relies to consider and manage risk.  One of the problems that you run into is that you have sensitive data that needs to be protected.  Financial data, customer data, legal data, payroll data: it’s all at risk of compromise if it’s not adequately protected.  Who can you trust, however?  You want to be able to use public clouds, but the risks of keeping and processing information on systems which are not under your direct control are many and difficult to quantify.  Even your own systems are vulnerable to outdated patches, insider attacks or compromises: confidentiality is difficult to ensure, but vital to your business.

Enarx is a project which allows you to run applications in the public cloud, on your premises – or wherever else – with significantly reduced and better quantifiable risk.  It uses hardware-based security called “Trust Execution Environments” from CPU manufacturers, and cuts out many of the layers that can be compromised.  The only components that do need to be trusted are fully open source software, which means that they can be examined and audited by industry experts and your own teams.

Well done: you found out about Enarx.  Continue to 6. Well, what’s next?


2. How I can use Enarx to protect sensitive data

You are the expert architect who has to consider the best technologies and approaches for your organisation.  You worry about where best to deploy sensitive applications and data, given the number of layers in the stack that may have been compromised, and the number of entities – human and machine – that have the opportunity to peek into or mess with the integrity of your applications.  You can’t control the public cloud, nor know exactly what the stack it’s running is, but equally, the resources required to ensure that you can run sufficient numbers of hardened systems on premises are growing.

Enarx is an open source project which uses TEEs (Trusted Execution Environments), to allow you to run applications within “Keeps” on systems that you don’t trust.  Enarx manages the creation of these Keeps, providing cryptographic confidence that the Keeps are using valid CPU hardware and then encrypting and provisioning your applications and data to the Keep using one-time cryptographic keys.  Your applications run without any of the layers in the stack (e.g. hypervisor, kernel, user-space, middleware) being able to look into the Keep.  The Keep’s run-time can accept applications written in many different languages, including Rust, C, C++, C#, Go, Java, Python and Haskell.  It allows you to run on TEEs from various CPU manufacturers without having to worry about portability: Enarx manages that for you, along with attestation and deployment.

Well done: you found out about Enarx.  Continue to 6. Well, what’s next?


3. Tell me more about Enarx technology (I can take it)

You are a wily developer with technical skills beyond the ken of most of your peers.  A quick look at the github pages tells you more: Enarx is an open source project to allow you to deploy and applications within TEEs (Trusted Execution Environments).

  • If you’d like to learn about how to use Enarx, proceed to 4. I want to use Enarx.
  • If you’d like to learn about contributing to the Enarx project, proceed to 5. I want to contribute to Enarx.

Well done: you found out about Enarx.  Continue to 6. Well, what’s next?


4. I want to use Enarx

You learn good news: Enarx is designed to be easy to use!

If you want to run applications that process sensitive data, or which implement sensitive algorithms themselves, Enarx is for you.  Enarx is a deployment framework for applications, rather than a development framework.  What this means is that you don’t have to write to particular SDKs, or manage the tricky attestation steps required to use TEEs.  You write your application in your favourite language, and as long as it has WebAssembly as a compile target, it should run within an Enarx “Keep”.  Enarx even manages portability across hardware platforms, so you don’t need to worry about that, either.  It’s all open source, so you can look at it yourself, audit it, or even contribute (if you’re interested in that, you might want to proceed to 5. I want to contribute to Enarx).

Well done: you found out about Enarx.  Continue to 6. Well, what’s next?


5. I want to contribute to Enarx

Enarx is an open source project (under the Apache 2.0 licence), and we welcome contributions, whether you are a developer, tester, documentation guru or other enthusiastic bod with an interest in providing a way for the rest of the world to up the security level of the applications they’re running with minimal effort.  There are various components to Enarx, including attestation, hypervisor work, uni-kernel and WebAssembly run-time pieces.  We want to provide a simple and flexible framework to allow developers and operations folks to deploy applications to TEEs on any supported platform without recompilation, having to choose an obscure language or write to a particular SDK.  Please have a look around our github site and get in touch if you’re in a position to contribute.

Well done: you found out about Enarx.  Continue to 6. Well, what’s next?


6. Well, what’s next?

You now know enough to understand how Enarx can help you: well done!  At time of writing, Enarx is still in development, but we’re working hard to make it available to all.

We’ve known for a long time that we need encryption for data at rest and in transit: Enarx helps you do encryption for data in use.

For more information, you may wish to visit:

Immutability: my favourite superpower

As a security guy, I approve of defence in depth.

I’m a recent but dedicated convert to Silverblue, which I run on my main home laptop and which I’ll be putting onto my work laptop when I’m due a hardware upgrade in a few months’ time.  I wrote an article about Silverblue over at Enable Sysadmin, and over the weekend, I moved the laptop that one of my kids has over to it as well.  You can learn more about Silverblue over at the main Silverblue site, but in terms of usability, look and feel, it’s basically a version of Fedora.  There’s one key difference, however, which is that the operating system is mounted read-only, meaning that it’s immutable.

What does “immutable” mean?  It means that it can’t be changed.  To be more accurate, in a software context, it generally means that something can’t be changed during run-time.

Important digression – constant immutability

I realised as I wrote that final sentence that it might be a little misleading.  Many  programming languages have the concept of “constants”.  A constant is a variable (or set, or data structure) which is constant – that is, not variable.  You can assign a value to a constant, and generally expect it not to change.  But – and this depends on the language you are using – it may be that the constant is not immutable.  This seems to go against common sense[1], but that’s just the way that some languages are designed.  The bottom line is this: if you have a variable that you intend to be immutable, check the syntax of the programming language you’re using and take any specific steps needed to maintain that immutability if required.

Operating System immutability

In Silverblue’s case, it’s the operating system that’s immutable.  You install applications in containers (of which more later), using Flatpak, rather than onto the root filesystem.  This means not only that the installation of applications is isolated from the core filesystem, but also that the ability for malicious applications to compromise your system is significantly reduced.  It’s not impossible[2], but the risk is significantly lower.

How do you update your system, then?  Well, what you do is create a new boot image which includes any updated packages that are needed, and when you’re ready, you boot into that.  Silverblue provides simple tools to do this: it’s arguably less hassle than the standard way of upgrading your system.  This approach also makes it very easy to maintain different versions of an operating system, or installations with different sets of packages.  If you need to test an application in a particular environment, you boot into the image that reflects that environment, and do the testing.  Another environment?  Another image.

We’re more interested in the security properties that this offers us, however.  Not only is it very difficult to compromise the core operating system as a standard user[3], but you are always operating in a known environment, and knowability is very much a desirable property for security, as you can test, monitor and perform forensic analysis from a known configuration.  From a security point of view (let alone what other benefits it delivers), immutability is definitely an asset in an operating system.

Container immutability

This isn’t the place to describe containers (also known as “Linux containers” or, less frequently or accurately these days, “Docker containers) in detail, but they are basically collections of software that you create as images and then run workloads on a host server (sometimes known as a “pod”).  One of the great things about containers is that they’re generally very fast to spin up (provision and execute) from an image, and another is that the format of that image – the packaging format – is well-defined, so it’s easy to create the images themselves.

From our point of view, however, what’s great about containers is that you can choose to use them immutably.  In fact, that’s the way they’re generally used: using mutable containers is generally considered an anti-pattern.  The standard (and “correct”) way to use containers is to bundle each application component and required dependencies into a well-defined (and hopefully small) container, and deploy that as required.  The way that containers are designed doesn’t mean that you can’t change any of the software within the running container, but the way that they run discourages you from doing that, which is good, as you definitely shouldn’t.  Remember: immutable software gives better knowability, and improves your resistance to run-time compromise.  Instead, given how lightweight containers are, you should design your application in such a way that if you need to, you can just kill the container instance and replace it with an instance from an updated image.

This brings us to two of the reasons that you should never run containers with root privilege:

  • there’s a temptation for legitimate users to use that privilege to update software in a running container, reducing knowability, and possibly introducing unexpected behaviour;
  • there are many more opportunities for compromise if a malicious actor – human or automated – can change the underlying software in the container.

Double immutability with Silverblue

I mentioned above that Silverblue runs applications in containers.  This means that you have two levels of security provided as default when you run applications on a Silverblue system:

  1. the operating system immutability;
  2. the container immutability.

As a security guy, I approve of defence in depth, and this is a classic example of that property.  I also like the fact that I can control what I’m running – and what versions – with a great deal more ease than if I were on a standard operating system.


1 – though, to be fair, the phrases “programming language” and “common sense” are rarely used positively in the same sentence in my experience.

2 – we generally try to avoid the word “impossible” when describing attacks or vulnerabilities in security.

3 – as with many security issues, once you have sudo or root access, the situation is significantly degraded.

Building Evolutionary Architectures – for security and for open source

Consider the fitness functions, state them upfront, have regular review.

Ford, N., Parsons, R. & Kua, P. (2017) Building Evolution Architectures: Support Constant Change. Sebastapol, CA: O’Reilly Media.

https://www.oreilly.com/library/view/building-evolutionary-architectures/9781491986356/

This is my first book review on this blog, I think, and although I don’t plan to make a habit of it, I really like this book, and the approach it describes, so I wanted to write about it.  Initially, this article was simply a review of the book, but as I got into it, I realised that I wanted to talk about how the approach it describes is applicable to a couple of different groups (security folks and open source projects), and so I’ve gone with it.

How, then, did I come across the book?  I was attending a conference a few months ago (DeveloperWeek San Diego), and decided to go to one of the sessions because it looked interesting.  The speaker was Dr Rebecca Parsons, and I liked what she was talking about so much that I ordered this book, whose subject was the topic of her talk, to arrive at home by the time I would return a couple of days later.

Building Evolutionary Architectures is not a book about security, but it deals with security as one application of its approach, and very convincingly.  The central issue that the authors – all employees of Thoughtworks – identifies is, simplified, that although we’re good at creating features for applications, we’re less good at creating, and then maintaining, broader properties of systems. This problem is compounded, they suggest, by the fast and ever-changing nature of modern development practices, where “enterprise architects can no longer rely on static planning”.

The alternative that they propose is to consider “fitness functions”, “objectives you want your architecture to exhibit or move towards”.  Crucially, these are properties of the architecture – or system – rather than features or specific functionality.  Tests should be created to monitor the specific functions, but they won’t be your standard unit tests, nor will they necessarily be “point in time” tests.  Instead, they will measure a variety of issues, possibly over a period of time, to let you know whether your system is meeting the particular fitness functions you are measuring.  There’s a lot of discussion of how to measure these fitness functions, but I would have liked even more: from my point of view, it was one of the most valuable topics covered.

Frankly, the above might be enough to recommend the book, but there’s more.  They advocate strongly for creating incremental change to meet your requirements (gradual, rather than major changes) and “evolvable architectures”, encouraging you to realise that:

  1. you may not meet all your fitness functions at the beginning;
  2. applications which may have met the fitness functions at one point may cease to meet them later on, for various reasons;
  3. your architecture is likely to change over time;
  4. your requirements, and therefore the priority that you give to each fitness function, will change over time;
  5. that even if your fitness functions remain the same, the ways in which you need to monitor them may change.

All of these are, in my view, extremely useful insights for anybody designing and building a system: combining them with architectural thinking is even more valuable.

As is standard for modern O’Reilly books, there are examples throughout, including a worked fake consultancy journey of a particular company with specific needs, leading you through some of the practices in the book.  At times, this felt a little contrived, but the mechanism is generally helpful.  There were times when the book seemed to stray from its core approach – which is architectural, as per the title – into explanations through pseudo code, but these support one of the useful aspects of the book, which is giving examples of what architectures are more or less suited to the principles expounded in the more theoretical parts.  Some readers may feel more at home with the theoretical, others with the more example-based approach (I lean towards the former), but all in all, it seems like an appropriate balance.  Relating these to the impact of “architectural coupling” was particularly helpful, in my view.

There is a useful grounding in some of the advice in Conway’s Law (“Organizations [sic] which design systems … are constrained to produce designs which are copies of the communication structures of these organizations.”) which led me to wonder how we could model open source projects – and their architectures – based on this perspective.  There are also (as is also standard these days) patterns and anti-patterns: I would generally consider these a useful part of any book on design and architecture.

Why is this a book for security folks?

The most important thing about this book, from my point of view as a security systems architect, is that it isn’t about security.  Security is mentioned, but is not considered core enough to the book to merit a mention in the appendix.  The point, though, is that the security of a system – an embodiment of an architecture – is a perfect example of a fitness function.  Taking this as a starting point for a project will help you do two things:

  • avoid focussing on features and functionality, and look at the bigger picture;
  • consider what you really need from security in the system, and how that translates into issues such as the security posture to be adopted, and the measurements you will take to validate it through the lifecycle.

Possibly even more important than those two points is that it will force you to consider the priority of security in relation to other fitness functions (resilience, maybe, or ease of use?) and how the relative priorities will – and should – change over time.  A realisation that we don’t live in a bubble, and that our priorities are not always that same as those of other stakeholders in a project, is always useful.

Why is this a book for open source folks?

Very often – and for quite understandable and forgiveable reasons – the architectures of open source projects grow organically at first, needing major overhauls and refactoring at various stages of their lifecycles.  This is not to say that this doesn’t happen in proprietary software projects as well, of course, but the sometimes frequent changes in open source projects’ emphasis and requirements, the ebb and flow of contributors and contributions and the sometimes, um, reduced levels of documentation aimed at end users can mean that features are significantly prioritised over what we could think of as the core vision of the project.  One way to remedy this would be to consider the appropriate fitness functions of the project, to state them upfront, and to have a regular cadence of review by the community, to ensure that they are:

  • still relevant;
  • correctly prioritised at this stage in the project;
  • actually being met.

If any of the above come into question, it’s a good time to consider a wider review by the community, and maybe a refactoring or partial redesign of the project.

Open source projects have – quite rightly – various different models of use and intended users.  One of the happenstances that can negatively affect a project is when it is identified as a possible fit for a use case for which it was not originally intended.  Academic software which is designed for accuracy over performance might not be a good fit for corporate research, for instance, in the same way that a project aimed at home users which prioritises minimal computing resources might not be appropriate for a high-availability enterprise roll-out.  One of the ways of making this clear is by being very clear up-front about the fitness functions that you expect your project to meet – and, vice versa, about the fitness functions you are looking to fulfil when you are looking to select a project.  It is easy to focus on features and functionality, and to overlook the more non-functional aspects of a system, and fitness functions allow us to make some informed choices about how to balance these decisions.

Oh, how I love my TEE (or do I?)

Trusted Execution Environments use chip-level instructions to allow you to create enclaves of higher security

I realised just recently that I’ve not written yet about Trusted Execution Environments (TEEs) on this blog.  This is a surprise, honestly, because TEEs are fascinating, and I spend quite a lot of my professional time thinking – and sometimes worrying – about them.  So what, you may ask, is a TEE?

Let’s look at one of the key use cases first, and then get to what a Trusted Execution Environment is.  A good place to start it the “Cloud”, which, as we all know, is just somebody else’s computer.  What this means is that if you’re running an application (let’s call it a “workload”) in the Cloud – AWS, Azure, whatever – then what you’re doing is trusting somebody else to take the constituent parts of that workload – its code and its data – and run them on their computer.  “Yay”, you may be thinking, “that means that I don’t have to run it in my computer: it’s all good.”  I’m going to take issue with the “all good” bit of that statement.  The problem is that the company – or people within that company – who run your workload on their computer (let’s call it a “host”) can, if they so wish, look inside it, change it, and stop it running.  In other words, they can break all three classic “CIA” properties of security: confidentiality (by looking inside it); integrity (by changing it); and availability (by stopping it running).  This is because the way that workloads run on hosts – whether in hardware-mediated virtual machines, within containers or on bare-metal – all allow somebody with sufficient privilege on that machine to do all of the bad things I’ve just mentioned.

And these are bad things.  We don’t tend to care about them too much as individuals – because the amount of value a cloud provider would get from bothering to look at our information is low – but as businesses, we really should be worried.

I’m afraid that the problem doesn’t go away if you run your systems internally.  Remember that anybody with sufficient access to hosts can look inside and tamper with your workloads?  Well, are you happy that you sysadmins should all have access to your financial results?  Merger and acquisition details?  Pay roll?  Because if you have this kind of data running on your machines on your own premises, then they do have access to all of those.

Now, there are a number of controls that you can put in place to help with this – not least background checks and Acceptable Use Policies – but TEEs aim to solve this problem with technology.  Actually, they only really aim to solve the confidentiality and integrity pieces, so we’ll just have to assume for now that you’re going to be in a position to notice if your sales order process fails to run due to malicious activity (for instance).  Trusted Execution Environments use chip-level instructions to allow you to create enclaves of higher security where processes can execute (and data can be processed) in ways that mean that even privileged users of the host cannot attack their confidentiality or integrity.  To get a little bit technical, these enclaves are memory pages with particular controls on them such that they are always encrypted except when they are actually being processed by the chip.

The two best-known TEE implementations so far are Intel’s SGX and AMD’s SEV (though other silicon vendors are beginning to talk about their alternatives).  Both Intel and AMD are aiming to put these into server hardware and create an ecosystem around their version to make it easy for people to run workloads (or components of workloads) within them.  And the security community is doing what it normally does (and, to be clear, absolutely should be doing), and looking for vulnerabilities in the implementation.  So far, most of the vulnerabilities that have been identified are within Intel’s SGX – though I’m not in a position to say whether that’s because the design and implementation is weaker, or just because the researchers have concentrated on the market leader in terms of server hardware.  It looks like we need to go through a cycle or two of the technologies before the industry is convinced that we have a working design and implementation that provides the levels of security that are worth deploying.  There’s also work to be done to provide sufficiently high quality open source software and drivers to support TEEs for wide deployment.

Despite the hopes of the silicon vendors, it may be some time before TEEs are in common usage, but people are beginning to sit up and take notice, partly because there’s so much interest in moving workloads to the Cloud, but still serious concerns about the security of your sensitive processes and data when they’re there.  This has got to be a good thing, and I think it’s really worth considering how you might start designing and deploying workloads in new ways once TEEs actually do become commonly available.

The “invisible” trade-off? Security.

“For twenty years, people have been leaving security till last.”

Colleague (in a meeting): “For twenty years, people have been leaving security till last.”

Me (in response): “You could have left out those last two words.”

This article will be a short one, and it’s a plea.  It’s also not aimed at my regular readership, because if you’re part of my regular readership, then you don’t need telling.  Many of the articles on this blog, however, are written with the express intention of meeting two criteria:

  1. they should be technical credible[1].
  2. you should be able to show them to your parents or to your manager[2].

I suspect that it’s your manager, this time round, who I’ll be targeting, but I don’t want to make assumptions about your parents’ roles or influence, so let’s leave it open.

The issue I want to address this week is the impact of not placing security firmly at the beginning, middle and end of any system or application design process.  As we all know, security isn’t something that you can bolt onto the end of a project and hope that you’ll be OK.  Equally, if you think about it only at the beginning, you’ll find that by the end, your requirements, use cases, infrastructure or personae will have changed[3], and what you planned at the beginning is no longer fit for purpose.  After all, if you know that your functional requirements will change (and everybody knows this), then why would your non-functional requirements be subject to the same drift?

The problem is that security, being a non-functional requirement[4], doesn’t get the up-front visibility that it needs.  And, because it’s difficult to do well, and it’s often the responsibility of a non-core team member “flown in” as a consultant or expert for a small percentage design meetings, security is the area that it’s easy to decide to let slide a bit.  Or a lot.  Or completely.

If there’s a trade-off around features, functionality or resource location, it’s likely to be security, and often, nobody even raises the point that there has been a trade-off: it’s completely invisible (this is one of the reasons Why I love technical debt).  This is also the reason that whenever I look at a system, I try to think “what were the decisions made about security?”, because, too often, no decisions were made about security at all.

So, if you’re a manager[6], and you’re involved with designing a system or application, don’t let security be the invisible trade-off.  I’m not saying that it needs to be the be-all and end-all of the project, but at least ensure that you think about it.  Thank you.


1 – they should be accurate, to be honest, but I also try not to dive deeper into technical topics than is absolutely required for context.

2 – to be clear, this isn’t about making them work- and parent-safe, but about presenting the topics in a manner that is approachable by non-experts.

3 – or, equally likely, all of them.

4 – I don’t mean that security doesn’t function correctly[5], but rather that it’s not one of the key functions of the system or application that’s being designed.

5 – though, now you mention it…

6 – or parent – see above.

Is homogeneity bad for security?

Can it really be good for security to have such a small number of systems out there?

For the last three years, I’ve attended the Linux Security Summit (though it’s not solely about Linux, actually), and that’s where I am for the first two days of this week – the next three days are taken up with the Open Source Summit.  This year, both are being run both in North America and in Europe – and there was a version of the Open Source Summit in Asia, too.  This is all good, of course: the more people, and the more diversity we have in the community, the stronger we’ll be.

The question of diversity came up at the Linux Security Summit today, but not in the way you might necessarily expect.  As with most of the industry, this very technical conference (there’s a very strong Linux kernel developer bias) is very under-represented by women, ethnic minorities and people with disabilities.  It’s a pity, and something we need to address, but when a question came up after someone’s talk, it wasn’t diversity of people’s background that was being questioned, but of the systems we deploy around the world.

The question was asked of a panel who were talking about open firmware and how making it open source will (hopefully) increase the security of the system.  We’d already heard how most systems – laptops, servers, desktops and beyond – come with a range of different pieces of firmware from a variety of different vendors.  And when we talk about a variety, this can easily hit over 100 different pieces of firmware per system.  How are you supposed to trust a system with some many different pieces?  And, as one of the panel members pointed out, many of the vendors are quite open about the fact that they don’t see themselves as security experts, and are actually asking the members of open source projects to design APIs, make recommendations about design, etc..

This self-knowledge is clearly a good thing, and the main focus of the panel’s efforts has been to try to define a small core of well-understood and better designed elements that can be deployed in a more trusted manner.   The question that was asked from the audience was in response to this effort, and seemed to me to be a very fair one.  It was (to paraphrase slightly): “Can it really be good for security to have such a small number of systems out there?”  The argument – and it’s a good one in general – is that if you have a small number of designs which are deployed across the vast majority of installations, then there is a real danger that a small number of vulnerabilities can impact on a large percentage of that install base.

It’s a similar problem in the natural world: a population with a restricted genetic pool is at risk from a successful attacker: a virus or fungus, for instance, which can attack many individuals due to their similar genetic make-up.

In principle, I would love to see more diversity of design within computing, and particular security, but there are two issues with this:

  1. management: there is a real cost to managing multiple different implementations and products, so organisations prefer to have a smaller number of designs, reducing the number of the tools to manage them, and the number of people required to be trained.
  2. scarcity of resources: there is a scarcity of resources within IT security.  There just aren’t enough security experts around to design good security into systems, to support them and then to respond to attacks as vulnerabilities are found and exploited.

To the first issue, I don’t see many easy answers, but to the second, there are three responses:

  1. find ways to scale the impact of your resources: if you open source your code, then the number of expert resources available to work on it expands enormously.  I wrote about this a couple of years ago in Disbelieving the many eyes hypothesis.  If your code is proprietary, then the number of experts you can leverage is small: if it is open source, you have access to almost the entire worldwide pool of experts.
  2. be able to respond quickly: if attacks on systems are found, and vulnerabilities identified, then the ability to move quickly to remedy them allows you to mitigate significantly the impact on the installation base.
  3. design in defence in depth: rather than relying on one defence to an attack or type of attack, try to design your deployment in such a way that you have layers of defence. This means that you have some time to fix a problem that arises before catastrophic failure affects your deployment.

I’m hesitant to overplay the biological analogy, but the second and third of these seem quite similar to defences we see in nature.  The equivalent to quick response is to have multiple generations in a short time, giving a species the opportunity to develop immunity to a particular attack, and defence in depth is a typical defence mechanism in nature – think of human’s ability to recognise bad meat by its smell, taste its “off-ness” and then vomit it up if swallowed.  I’m not quite sure how this particular analogy would map to the world of IT security (though some of the practices you see in the industry can turn your stomach), but while we wait to have a bigger – and more diverse pool of security experts, let’s keep being open source, let’s keep responding quickly, and let’s make sure that we design for defence in depth.