Of different types of trust

What is doing the trusting, and what does the word actually even mean?

As you may have noticed if you regularly read this blog, it’s not uncommon for me to talk about trust.  In fact, one of the earliest articles that I posted – over two years ago, now – was entitled “What is trust?”.  I started thinking about this topic seriously nearly twenty years ago, specifically when thinking about peer to peer systems, and how they might establish trust relationships, and my interest has continued since, with a particular fillip during my time on the Security Working Group for ETSI NFV[1], where we had some very particular issues that we wanted to explore and I had the opportunity to follow some very interesting lines of thought.  More recently, I introduced Enarx, whose main starting point is that we want to reduce the number of trust relationships that you need to manage when you deploy software.

It was around the time of the announcement that I realised quite how much of my working life I’ve spent thinking and talking about trust;

  • how rarely most other people seem to have done the same;
  • how little literature there is on the subject; and
  • how interested people often are to talk about it when it comes up in a professional setting.

I’m going to clarify the middle bullet point in a minute, but let me get to my point first, which is this: I want to do a lot more talking about trust via this blog, with the possible intention of writing a book[2] on the subject.

Here’s the problem, though.  When you use the word trust, people think that they know what you mean.  It turns out that the almost never do.  Let’s try to tease out some of the reasons for that by starting with four fairly innocuously simple-looking statements:

  1. I trust my brother and my sister.
  2. I trust my bank.
  3. My bank trusts its IT systems.
  4. My bank’s IT systems trust each other.

When you make four statements like this, it quickly becomes clear that something different is going on in each case.  I stand by my definition of trust and the three corollaries, as expressed in “What is trust?”.  I’ll restate them here in case you can’t be bothered to follow the link:

  • “Trust is the assurance that one entity holds that another will perform particular actions according to a specific expectation.”
  • My first corollary: “Trust is always contextual.”
  • My second corollary:” One of the contexts for trust is always time”.
  • My third corollary: “Trust relationships are not symmetrical.”

These all hold true for each of the statements above – although they may not be self-evident in the rather bald way that I’ve put them.  What’s more germane to the point I want to make today, however, and hopefully obvious to you, dear reader[4], is that the word “trust” signifies something very different in each of the four statements.

  • Case 1 – my trusting my brother and sister.  This is about trust between individual humans – specifically my trust relationship to my brother, and my trust relationship to my sister.
  • Case 2 – my trusting my bank.  This is about trust between an individual and an organisation: specifically a legal entity with particular services and structure.
  • Case 3 – the bank trusting its IT systems.  This is about an organisation trusting IT systems, and it suddenly feels like we’ve moved into a very different place from the initial two cases.  I would argue that there’s a huge difference between the first and second case as well, actually, but we are often lulled into false sense of equivalence because when we interact with a bank, it’s staffed by people, and also has many of the legal protections afforded to an individual[5]. There are still humans in this case, though, in that one assumes that it is the intention of certain humans who represent the bank to have a trust relationship with certain IT systems.
  • Case 4 – the IT systems trusting each other.  We’re really not in Kansas anymore with this statement[6].  There are no humans involved in this set of trust relationships, unless you’re attributing agency to specific systems, and if so, which? What, then, is doing the trusting, and what does the word actually even mean?

It’s clear, then, that we can’t just apply the same word, “trust” to all of these different contexts and assume that it means the same thing in each case.  We need to differentiate between them.

I stated, above, that I intended to clarify my statement about the lack of literature around trust.  Actually, there’s lots and lots of literature around trust, but it deals almost exclusively with cases 1 and 2 above.  This is all well and good, but we spend so much time talking about trust with regards to systems (IT or computer systems) that we deserve, as a community, some clarity about what we mean, what assumptions we’re making, and what the ramifications of those assumptions are.

That, then, is my mission.  It’s certainly not going to be the only thing that I write about on this blog, but when I do write about trust, I’m going to try to set out my stall and add some better definition and clarification to what I – and we – are talking about.


0 – apropos of nothing in particular, I often use pixabay for my images.  This is one of the suggestions if you search on “trust”, but what exactly is going on here?  The child is trusting the squirrel thing to do what?  Not eat its nose?  Not stick its claws up its left nostril?  I mean, really?

1 – ETSI is a telco standards body, NFV is “Network Function Virtualisation”.

2 – which probably won’t just consist of a whole bunch of these articles in a random order, with the footnotes taken out[3].

3 – because, if nothing else, you know that I’m bound to keep the footnotes in.

4 – I always hope that there’s actually more than one of you, but maybe it’s just me, the solipsist, writing for a world conjured by my own brain.

5 – or it may do, depending on your jurisdiction.

6 – I think I’ve only been to Kansas once, actually.

Are my messages safe? No, but…

“Are any of these messaging services secure?”

Today brought another story about insecurity of a messenger app, and by a brilliant coincidence, I’m listening to E.L.O.’s “Secret Messages” as I start to compose this post. This article isn’t, however, about my closet 70s musical tastes[1], but about the messages you send from your mobile phone, tablet or computer to friends, families and colleagues, and how secure they are.

There are loads of options out there for messaging services, with some of the better-known including WhatsApp, Facebook Messenger, Google Chat, Signal and Telegram. Then there’s good old SMS. First question first: do I use any of these myself? Absolutely. I also indulge in Facebook, LinkedIn and Twitter. Do I trust these services? Let’s get back to this question later.

A more pressing question might be: “are any of these messaging services secure?” It turns out that this is a really simple question to answer: of course they’re not. No service is “secure”: it’s a key principle of IT security that there is no “secure”. This may sound like a glib – and frankly unhelpful – answer, but it’s not supposed to be. Once you accept that there is no perfectly secure system, you’re forced to consider what you are trying to achieve, and what risks you’re willing to take. This is a recurring theme of this blog, so regular readers shouldn’t be surprised.

Most of the popular messaging services can be thought of as consisting of at least seven components. Let’s assume that Alice is sending a message from her phone to Bob’s phone. Here’s what the various components might look like:

  1. Alice’s messenger app
  2. Alice’s phone
  3. Communications channel Alice -> server
  4. Server
  5. Communications channel server -> Bob
  6. Bob’s phone
  7. Bob’s messenger app

Each of these is a possible attack surface: combined, they make up the attack surface for what we can think of as the Alice <-> Bob and messaging system.

Let’s start in the middle, with the server. For Alice and Bob to be happy with the security of the system for their purposes, they must be happy that this server has sufficiently secure to cope with whatever risks they need to address. So, it may be that they trust that the server (which will be run, ultimately, by fallible and possibly subornable humans who also are subject to legal jurisdiction(s)) is not vulnerable. Not vulnerable to whom? Hacktivists? Criminal gangs? Commercial competitors? State actors? Legal warrants from the server’s jurisdiction? Legal warrants from Alice or Bob’s jurisdiction(s)? The likelihood of successful defence against each of these varies, and the risk posed to Alice and Bob by each is also different, and needs to be assessed, even if that assessment is “we can ignore this”.

Each of the other components is subject to similar questions. For the communication channels, we will assume that they’re encrypted, but we have to be sure that the cryptography and cryptographic protocols have been correctly implemented, and that all keys are appropriately protected by all parties. The messaging apps must be up to date, well designed and well implemented. Obviously, if they’re open source, you have a much, much better chance of being sure of the security of both software (never, ever use cryptography or protocols which have not been not open sourced and peer reviewed: just don’t). The phones in which the software is running must also be uncompromised – not to mention protected by Alice and Bob from physical tampering and delivered new to them from the manufacturer with no vulnerabilities[2].

How sure are Alice and Bob of all of the answers to all of these questions? The answer, I would submit, is pretty much always going to be “not completely”. Does this mean that Alice and Bob should not use messaging services? Not necessarily. But it does mean that they should consider what messages they should exchange via any particular messaging service. They might be happy to arrange a surprise birthday party for a colleague, but not to exchange financial details of a business deal. They might be happy to schedule a trip to visit a Non-Governmental Organisation to discuss human rights, but not to talk about specific cases over the messaging service.

This is the view that I take: I consider what information I’m happy to transfer over or store on messaging services and social media platforms. There are occasions where I may happy to pass sensitive data across messaging services, but break the data up between different services (using “different channels” in the relevant parlance): using one service for a username and another for the associated password, for instance. I still need to be careful about shared components: the two phones in the example above might qualify, but I’ve reduced the shared attack surface, and therefore the risk. I’m actually more likely to require that the password is exchanged over a phone call, and if I’m feeling particularly paranoid, I’ll use a different phone to receive that call.

My advice, therefore, is this:

  1. Keep your devices and apps up to date;
  2. Evaluate the security of your various messaging service options;
  3. Consider the types of information that you’ll be transferring and/or storing;
  4. Think about the risks you’re willing to accept;
  5. Select the appropriate option on a case by case basis:
  6. Consider using separate channels where particularly sensitive data can be split for added security.

1 – I’m also partial to 1920’s Jazz and a bit of Bluegrass, as it happens.

2 – yeah, right.

Announcing Enarx

If I’ve managed the process properly, this article should be posting at almost exactly the time that we show a demo at Red Hat Summit 2019 in Boston.  That demo, to be delivered by my colleague Nathaniel McCallum, will be of an early incarnation of Enarx, a project that a few of us at Red Hat have been working on for a few months now, and which we’re ready to start announcing to the world.  We have code, we have a demo, we have a github repository, we have a logo: what more could a project want?  Well, people – but we’ll get to that.

What’s the problem?

When you run software (a “workload”) on a system (a “host”) on the cloud or on your own premises, there are lots and lots of layers.  You often don’t see those layers, but they’re there.  Here’s an example of the layers that you might see in a standard cloud virtualisation architecture.  The different colours represent different entities that “own”  different layers or sets of layers.

classic-cloud-virt-arch

Here’s a similar diagram depicting a standard cloud container architecture.  As before, each different colour represents a different “owner” of a layer or set of layers.

cloud-container-arch

These owners may be of very different types, from hardware vendors to OEMs to Cloud Service Providers (CSPs) to middleware vendors to Operating System vendors to application vendors to you, the workload owner.  And for each workload that you run, on each host, the exact list of layers is likely to be different.  And even when they’re the same, the versions of the layers instances may be different, whether it’s a different BIOS version, a different bootloader, a different kernel version or whatever else.

Now, in many contexts, you might not worry about this and your Cloud Service Provider goes out of its way to abstract these layers and their version details away from you.  But this is a security blog, for security people, and that means that anybody who’s reading this probably does care.

The reason we care is not just the different versions and the different layers, but the number of different things – and different entities – that we need to trust if we’re going to be happy running any sort of sensitive workload on these types of stacks.  I need to trust every single layer, and the owner of every single layer, not only to do what they say they will do, but also not to be compromised.  This is a big stretch when it comes to running my sensitive workloads.

What’s Enarx?

Enarx is a project which is trying to address this problem of having to trust all of those layers.  We made the decision that we wanted to allow people running workloads to be able to reduce the number of layers – and owners – that they need to trust to the absolute minimum.  We plan to use Trusted Execution Environments (“TEEs” – see Oh, how I love my TEE (or do I?)), and to provide an architecture that looks a little more like this:

reduced-arch

In a world like this, you have to trust the CPU and firmware, and you need to trust some middleware – of which Enarx is part – but you don’t need to trust all of the other layers, because we will leverage the capabilities of the TEE to ensure the integrity and confidentiality of your application.  The Enarx project will provide attestation of the TEE, so that you know you’re running on a true and trusted TEE, and will provide open source, auditable code to help you trust the layer directly beneath you application.

The initial code is out there – working on AMD’s SEV TEE at the moment – and enough of it works now that we’re ready to tell you about it.

Making sure that your application meets your own security requirements is down to you.  🙂

How do I find out more?

Easiest is to visit the Enarx github: https://github.com/enarx.

We’ll be adding more information there – it’s currently just code – but bear with us: there are only a few of us on the project at the moment. A blog is on the list of things we’d like to have, but I thought I’d start here for now.

We’d love to have people in the community getting involved in the project.  It’s currently quite low-level, and requires quite a lot of knowledge to get running, but we’ll work on that.  You will need some specific hardware to make it work, of course.  Oh, and if you’re an early boot or a low-level kvm hacker, we’re particularly interested in hearing from you.

I will, of course, respond to comments on this article.

 

Thinking beyond “zero-trust”

The components in which you have constant trust relationships are “islands in the stream”.

I wrote a fairly complex post a few months ago called “Zero-trust”: my love/hate relationship, in which I discussed in some details what “zero-trust” networks are, and why I’m not convinced.  The key point turned out to be that I’m not happy about the way the word “zero” is being thrown around here, as I think what’s really going on is explicit trust.

Since then, there hasn’t been a widespread movement away from using the term, to my vast lack of surprise.  In fact, I’ve noticed the term “zero-trust” being used in a different context: in p2p (peer-to-peer) and Web 3.0 discussions.  The idea is that there are some components of the ecosystem that we don’t need to trust: they’re “just there”, doing what they were designed to do, and are basically neutral in terms of the rest of the actors in the network.  Now, I like the idea that there are neutral components in the ecosystem: I think it’s a really important distinction to make to other parts of the system.  What I’m not happy about is the suggestion that we have zero trust in those components.  For me, these are the components that we must trust the most of all of the entities in the system.  If they don’t do what we expect them to do, then everything falls apart pretty quickly.  I think the same argument probably applies to “zero-trust” networking, too.

I started thinking quite hard about this, and I think I understand where the confusion arises.  I’ve spent a lot of time over nearly twenty years thinking about trust, and what it means.  I described my definition of trust in another post, “What is trust?” (which goes into quite a lot of detail, and may be worth reading for a deeper understanding of what I’m going on about here):

  • “Trust is the assurance that one entity holds that another will perform particular actions according to a specific expectation.”

For the purposes of this discussion, it’s the words “will perform particular actions according to a specific expectation” that are key here.  This sounds to me as exactly what is being described in the requirement above that components are “doing what they’re designed to do”.  It is this trust in their correct functioning which is a key foundation in the systems being described.  As someone with a background in security, I always (try to) have these sorts of properties in mind when I consider a system: they are, as above, typically foundational.

What I think most people are interested in, however – because it’s a visible and core property of many p2p systems – is the building, maintaining and decay of trust between components.  In this equation, the components have zero change in trust unless there’s a failure in the system (which, being a non-standard state, is not a property that is top-of-mind).  If you’re interested in a p2p world where you need constantly to be evaluating and re-evaluating the level of trust you have in other actors, then the components in which you have (hopefully) constant trust relationships are “islands in the stream”.  If they can truly be considered neutral in terms of their trust – they are neither able to be considered “friendly” nor “malevolent” as they are neither allied to nor can be suborned by any of the actors – then their static nature is uninteresting in terms of the standard operation of the system which you are building.

This does not mean that they are uninteresting or unimportant, however.  Their correct creation and maintenance are vital to the system itself.  It’s for this reason that I’m unhappy about the phrase “zero-trust”, as it seems to suggest that these components are not worthy of our attention.  As a security bod, I reckon they’re among the most fascinating parts of any system (partly because, if I were a Bad Person[tm], they would be my first point of attack).  So you can be sure that I’m going to be keeping an eye out for these sorts of components, and trying to raise people’s awareness of their importance.  You can trust me.

6 types of attack: learning from Supermicro, State Actors and silicon

… it could have happened, and it could be happening now.

Last week, Bloomberg published a story detailing how Chinese state actors had allegedly forced employees of Supermicro (or companies subcontracting to them) to insert a small chip – the silicon in the title – into motherboards destined for Apple and Amazon.  The article talked about how an investigation into these boards had uncovered this chip and the steps that Apple, Amazon and others had taken.  The story was vigorously denied by Supermicro, Apple and Amazon, but that didn’t stop Supermicro’s stock price from tumbling by over 50%.

I have heard strong views expressed by people with expertise in the topic on both sides of the argument: that it probably didn’t happen, and that it probably did.  One side argues that the denials by Apple and Amazon, for instance, might have been impacted by legal “gagging orders” from the US government.  An opposing argument suggests that the Bloomberg reporters might have confused this story with a similar one that occurred a few months ago.  Whether this particular story is correct in every detail, or a fabrication – intentional or unintentional – is not what I’m interested in at this point.  What I’m interested in is not whether it did happen in this instance: the clear message is that it could have happened, and it could be happening now.

I’ve written before about State Actors, and whether you should worry about them.  There’s another question which this story brings up, which is possibly even more germane: what can you do about it if you are worried about them?  This breaks down further into two questions:

  • how can I tell if my systems have been compromised?
  • what can I do if I discover that they have?

The first of these is easily enough to keep us occupied for now [1], so let’s spend some time on that.  First, let’s first define six types of compromise, think about how they might be carried out, and then consider the questions above for each:

  • supply-chain hardware compromise;
  • supply-chain firmware compromise;
  • supply-chain software compromise;
  • post-provisioning hardware compromise;
  • post-provisioning firmware compromise;
  • post-provisioning software compromise.

This article doesn’t provide sufficient space to go into detail of these types of attack, and provides an overview of each, instead[2].

Terms

  • Supply-chain – all of the steps up to when you start actually running a system.  From manufacture through installation, including vendors of all hardware components and all software, OEMs, integrators and even shipping firms that have physical access to any pieces of the system.  For all supply-chain compromises, the key question is the extent to which you, the owner of a system, can trust every single member of the supply chain[3].
  • Post-provisioning – any point after which you have installed the hardware, put all of the software you want on it, and started running it: the time during which you might consider the system “under your control”.
  • Hardware – the physical components of a system.
  • Software – software that you have installed on the system and over which you have some control: typically the Operating System and application software.  The amount of control depends on factors such as whether you use proprietary or open source software, and how much of it is produced, compiled or checked by you.
  • Firmware – special software that controls how the hardware interacts with the standard software on the machine, the hardware that comprises the system, and external systems.  It is typically provided by hardware vendors and its operation opaque to owners and operators of the system.

Compromise types

See the table at the bottom of this article for a short summary of the points below.

  1. Supply-chain hardware – there are multiple opportunities in the supply chain to compromise hardware, but the more hard they are made to detect, the more difficult they are to perform.  The attack described in the Bloomberg story would be extremely difficult to detect, but the addition of a keyboard logger to a keyboard just before delivery (for instance) would be correspondingly more simple.
  2. Supply-chain firmware – of all the options, this has the best return on investment for an attacker.  Assuming good access to an appropriate part of the supply chain, inserting firmware that (for instance) impacts network performance or leaks data over a wifi connection is relatively simple.  The difficulty in detection comes from the fact that although it is possible for the owner of the system to check that the firmware is what they think it is, what that measurement confirms is only that the vendor has told them what they have supplied.  So the “medium” rating relates only to firmware that was implanted by members in the supply chain who did not source the original firmware: otherwise, it’s “high”.
  3. Supply-chain software – by this, I mean software that comes installed on a system when it is delivered.  Some organisations will insist in “clean” systems being delivered to them[4], and will install everything from the Operating System upwards themselves.  This means that they basically now have to trust their Operating System vendor[5], which is maybe better than trusting other members of the supply chain to have installed the software correctly.  I’d say that it’s not too simple to mess with this in the supply chain, if only because checking isn’t too hard for the legitimate members of the chain.
  4. Post-provisioning hardware – this is where somebody with physical access to your hardware – after it’s been set up and is running – inserts or attaches hardware to it.  I nearly gave this a “high” rating for difficulty below, assuming that we’re talking about servers, rather than laptops or desktop systems, as one would hope that your servers are well-protected, but the ease with which attackers have shown that they can typically get physical access to systems using techniques like social engineering, means that I’ve downgraded this to “medium”.  Detection, on the other hand, should be fairly simple given sufficient resources (hence the “medium” rating), and although I don’t believe anybody who says that a system is “tamper-proof”, tamper-evidence is a much simpler property to achieve.
  5. Post-provisioning firmware – when you patch your Operating System, it will often also patch firmware on the rest of your system.  This is generally a good thing to do, as patches may provide security, resilience or performance improvements, but you’re stuck with the same problem as with supply-chain firmware that you need to trust the vendor: in fact, you need to trust both your Operating System vendor and their relationship with the firmware vendor.
  6. Post-provisioning software – is it easy to compromise systems via their Operating System and/or application software?  Yes: this we know.  Luckily – though depending on the sophistication of the attack – there are generally good tools and mechanisms for detecting such compromises, including behavioural monitoring.

Table

 

Compromise type Attacker difficulty Detection difficulty
Supply-chain hardware High High
Supply-chain firmware Low Medium
Supply-chain software Medium Medium
Post-provisioning hardware Medium Medium
Post-provisioning firmware Medium Medium
Post-provisioning software Low Low

Conclusion

What are your chances of spotting a compromise on your system?  I would argue that they are generally pretty much in line with the difficulty of performing the attack in the first place: with the glaring exception of supply-chain firmware.  We’ve seen attacks of this type, and they’re very difficult to detect.  The good news is that there is some good work going on to help detection of these types of attacks, particularly in the world of Linux[6] and open source.  In the meantime, I would argue our best forms of defence are currently:

  • for supply-chain: build close relationships, use known and trusted suppliers.  You may want to restrict as much as possible of your supply chain to “friendly” regimes if you’re worried about State Actor attacks, but this is very hard in the global economy.
  • for post-provisioning: lock down your systems as much as possible – both physically and logically – and use behavioural monitoring to try to detect anomalies in what you expect them to be doing.

1 – I’ll try to write something on this other topic in a different article.

2 – depending on interest, I’ll also consider a series of articles to go into more detail on each.

3 – how certain are you, for instance, that your delivery company won’t give your own government’s security services access to the boxes containing your equipment before they deliver them to you?

4 – though see above: what about the firmware?

5 – though you can always compile your own Operating System if you use open source software[6].

6 – oh, you didn’t compile your compiler yourself?  All bets off, then…

7 – yes, “GNU Linux”.

Wow: autonomous agents!

The problem is not the autonomy. The problem isn’t even particularly with the intelligence…

Autonomous, intelligent agents offer some great opportunities for our digital lives*.  There, look, I said it.  They will book meetings for us, negotiate cheap holidays, order our children’s complete school outfit for the beginning of term, and let us know when it’s time to go to the nurse for our check-up.  Our business lives, our personal lives, our family relationships – they’ll all be revolutionised by autonomous agents.  Autonomous agents will learn our preferences, have access to our diaries, pay for items, be able to send messages to our friends.

This is all fantastic, and I’m very excited about it.  The problem is that I’ve been excited about it for nearly 20 years, when I was involved in a project around autonomous agents in Java.  It was very neat then, and it’s still very neat now***.

Of course, technology has moved on.  Some of the underlying capabilities are much more advanced now than then.  General availability of APIs, consistency of data formats, better Machine Learning (or Artificial Intelligence, if you must), less computationally expensive cryptography, and the rise of blockchains and distributed ledgers: they all bring the ability for us to build autonomous agents closer than ever before.  We talked about disintermediation back in the day, and that looked plausible.  We really can build scalable marketplaces now in ways which just weren’t as feasible two decades ago.

The problem, though, isn’t the technology.  It was never the technology.  We could have made the technology work 20 years ago, even if it wasn’t as fast, secure or wide-ranging as it could be today.  It isn’t even vested interests from the large platform players, who arguably own much of this space at the moment – though these interests are much more consolidated than they were when I was first looking at this issue.

The problem is not the autonomy.  The problem isn’t even particularly with the intelligence: you can program as much or as little in as you want, or as the technology allows.  The problem is with the agency.

How much of my life do I want to hand over to what’s basically a ‘bot?  Ignore***** the fact that these things will get hacked******, and assume we’re talking about normal, intended usage.  What does “agency” mean?  It means acting for someone: being their agent – think of what actors’ agents do, for example.  When I engage a lawyer or a builder or an accountant to do something for me, or when an actor employs an agent for that matter, we’re very clear about what they’ll be doing.  This is to protect both me and them from unintended consequences.  There’s a huge legal corpus around defining, in different fields, exactly the scope of work to be carried out by a person or a company who is acting as an agent.  There are contracts, and agreed restitutions – basically punishments – for when things go wrong.  Say that an accountant buys 500 shares in a bank, and then I turn round and say that she never had the authority to do so: if we’ve set up the relationship correctly, it should be entirely clear whether or not she did, and whose responsibility it is to deal with any fall-out from that purchase.

Now think about that in terms of autonomous, intelligent agents.  Write me that contract, and make it equivalent in software and the legal system.  Tell me what happens when things go wrong with the software.  Show me how to prove that I didn’t tell the agent to buy those shares.  Explain to me where the restitution lies.

And these are arguably the simple problems.  How to I rebuild the business reputation that I’ve built up over the past 15 years when my agent posts on Twitter a tweet about how I use a competitor’s products, when I’m just trialling them for interest?  How does an agent know not to let my wife see the diary entry for my meeting with that divorce lawyer*******?  What aspects of my browsing profile are appropriate for suggesting – or even buying – online products or services with my personal or business credit card*********?  And there’s the classic “buying flowers for the mistress and having them sent to the wife” problem**********.

I don’t think we have an answer to these questions: not even close.  You know that virtual admin assistant we’ve been promised in sci-fi movies for decades now: the one with the futuristic haircut who appears as a hologram outside our office?  Holograms – nearly.  Technology behind it – pretty much.  Trust, reputation and agency?  Nowhere near.

 


*I hate this word: “digital”.  Well, not really, but it’s used far too much as a shorthand for “newest technology”**.

**”Digital businesses”.  You mean, unlike all the analogue ones?  Come on.

***this is one of those words that my kids hate me using.  There are two types of word that come into this category: old words and new words.  Either I’m showing how old I am, or I’m trying to be hip****, which is arguably worse.  I can’t win.

****yeah, they don’t say hip.  That’s one of the “old person words”.

*****for now, at least.  Let’s not forget it.

******_everything_  gets hacked*******.

*******I could say “cracked”, but some of it won’t be malicious, and hacking might be positive.

********I’m not.  This is an example.

*********this isn’t even about “dodgy” things I might have been browsing on home time.  I may have been browsing for analyst services, with the intent to buy a subscription: how sure am I that the agent won’t decide to charge these to my personal credit card when it knows that I perform other “business-like” actions like pay for business-related books myself sometimes?

**********how many times do I have to tell you, darling…?

Isolationism

… what’s the fun in having an Internet if you can’t, well, “net” on it?

Sometimes – and I hope this doesn’t come as too much of a surprise to my readers – sometimes, there are bad people, and they do bad things with computers.  These bad things are often about stopping the good things that computers are supposed to be doing* from happening properly.  This is generally considered not to be what you want to happen**.

For this reason, when we architect and design systems, we often try to enforce isolation between components.  I’ve had a couple of very interesting discussions over the past week about how to isolate various processes from each other, using different types of isolation, so I thought it might be interesting to go through some of the different types of isolation that we see out there.  For the record, I’m not an expert on all different types of system, so I’m going to talk some history****, and then I’m going to concentrate on Linux*****, because that’s what I know best.

In the beginning

In the beginning, computers didn’t talk to one another.  It was relatively difficult, therefore, for the bad people to do their bad things unless they physically had access to the computers themselves, and even if they did the bad things, the repercussions weren’t very widespread because there was no easy way for them to spread to other computers.  This was good.

Much of the conversation below will focus on how individual computers act as hosts for a variety of different processes, so I’m going to refer to individual computers as “hosts” for the purposes of this post.  Isolation at this level – host isolation – is still arguably the strongest type available to us.  We typically talk about “air-gapping”, where there is literally an air gap – no physical network connection – between one host and another, but we also mean no wireless connection either.  You might think that this is irrelevant in the modern networking world, but there are classes of usage where it is still very useful, the most obvious being for Certificate Authorities, where the root certificate is so rarely accessed – and so sensitive – that there is good reason not to connect the host on which it is stored to be connected to any other computer, and to use other means, such as smart-cards, a printer, or good old pen and paper to transfer information from it.

And then…

And then came networks.  These allow hosts to talk to each other.  In fact, by dint of the Internet, pretty much any host can talk to any other host, given a gateway or two.  So along came network isolation to try to stop tha.  Network isolation is basically trying to re-apply host isolation, after people messed it up by allowing hosts to talk to each other******.

Later, some smart alec came up with the idea of allowing multiple processes to be on the same host at the same time.  The OS and kernel were trusted to keep these separate, but sometimes that wasn’t enough, so then virtualisation came along, to try to convince these different processes that they weren’t actually executing alongside anything else, but had their own environment to do their old thing.  Sadly, the bad processes realised this wasn’t always true and found ways to get around this, so hardware virtualisation came along, where the actual chips running the hosts were recruited to try to convince the bad processes that they were all alone in the world.  This should work, only a) people don’t always program the chips – or the software running on them – properly, and b) people decided that despite wanting to let these processes run as if they were on separate hosts, they also wanted them to be able to talk to processes which really were on other hosts.  This meant that networking isolation needed to be applied not just at the host level, but at the virtual host level, as well******.

A step backwards?

Now, in a move which may seem retrograde, it occurred to some people that although hardware virtualisation seemed like a great plan, it was also somewhat of a pain to administer, and introduced inefficiencies that they didn’t like: e.g. using up lots of RAM and lots of compute cycles.  These were often the same people who were of the opinion that processes ought to be able to talk to each other – what’s the fun in having an Internet if you can’t, well, “net” on it?  Now we, as security folks, realise how foolish this sounds – allowing processes to talk to each other just encourages the bad people, right? – but they won the day, and containers came along. Containers allow lots of processes to be run on a host in a lightweight way, and rely on kernel controls – mainly namespaces – to ensure isolation********.  In fact, there’s more you can do: you can use techniques like system call trapping to intercept the things that processes are attempting and stop them if they look like the sort of things they shouldn’t be attempting*********.

And, of course, you can write frameworks at the application layer to try to control what the different components of an application system can do – that’s basically the highest layer, and you’re just layering applications on applications at this point.

Systems thinking

So here’s where I get to the chance to mention one of my favourite topics: systems.  As I’ve said before, by “system” here I don’t mean an individual computer (hence my definition of host, above), but a set of components that work together.  The thing about isolation is that it works best when applied to a system.

Let me explain.  A system, at least as I’d define it for the purposes of this post, is a set of components that work together but don’t have knowledge of external pieces.  Most important, they don’t have knowledge of different layers below them.  Systems may impose isolation on applications at higher layers, because they provide abstractions which allow higher systems to be able to ignore them, but by virtue of that, systems aren’t – or shouldn’t be – aware of the layers below them.

A simple description of the layers – and it doesn’t always hold, partly because networks are tricky things, and partly because there are various ways to assemble the stack – may look like this.

Application (top layer)
Container
System trapping
Kernel
Hardware virtualisation
Networking
Host (bottom layer)

As I intimated above, this is a (gross) simplification, but the point holds that the basic rule is that you can enforce isolation upwards in the layers of the stack, but you can’t enforce it downwards.  Lower layer isolation is therefore generally stronger than higher layer isolation.   This shouldn’t come as a huge surprise to anyone who’s used to considering network stacks – the principle is the same – but it’s helpful to lay out and explain the principles from time to time, and the implications for when you’re designing and architecting.

Because if you are considering trust models and are defining trust domains, you need to be very, very careful about defining whether – and how – these domains spread across the layer boundaries.  If you miss a boundary out when considering trust domains, you’ve almost certainly messed up, and need to start again.  Trust domains are important in this sort of conversation because the boundaries between trust domains are typically where you want to be able to enforce and police isolation.

The conversations I’ve had recently basically ran into problems because what people really wanted to do was apply lower layer isolation from layers above which had no knowledge of the bottom layers, and no way to reach into the control plane for those layers.  We had to remodel, and I think that we came up with some sensible approaches.  It was as I was discussing these approaches that it occurred to me that it would have been a whole lot easier to discuss them if we’d started out with a discussion of layers: hence this blog post.  I hope it’s useful.

 


*although they may well not be, because, as I’m pretty sure I’ve mentioned before on this blog, the people trying to make the computers do the good things quite often get it wrong.

**unless you’re one of the bad people. But I’m pretty sure they don’t read this blog, so we’re OK***.

***if you are a bad person, and you read this blog,  would you please mind pretending, just for now, that you’re a good person?  Thank you.  It’ll help us all sleep much better in our beds.

****which I’m absolutely going to present in an order that suits me, and generally neglect to check properly.  Tough.

*****s/Linux/GNU Linux/g; Natch.

******for some reason, this seemed like a good idea at the time.

*******for those of you who are paying attention, we’ve got to techniques like VXLAN and SR-IOV.

********kernel purists will try to convince you that there’s no mention of containers in the Linux kernel, and that they “don’t really exist” as a concept.  Try downloading the kernel source and doing a search for “container” if you want some ammunition to counter such arguments.

*********this is how SELinux works, for instance.