Isolationism – not a 4 letter word (in the cloud)

Things are looking up if you’re interested in protecting your workloads.

In the world of international relations, economics and fiscal policy, isolationism doesn’t have a great reputation. I could go on, I suppose, if I did some research, but this is a security blog[1], and international relations, fascinating are of study though it is, isn’t my area of expertise: what I’d like to do is borrow the world and apply it to a different field: computing, and specifically cloud computing.

In computing, isolation is a set of techniques to protect a process, application or component from another (or a set of the former from a set of the latter). This is pretty much always a good thing – you don’t want another process interfering with the correct workings of your one, whether that’s by design (it’s malicious) or in error (because it’s badly designed or implemented). Isolationism, therefore, however unpopular it may be on the world stage, is a policy that you generally want to adopt for your applications, wherever they’re running.

This is particularly important in the “cloud”. Cloud computing is where you run your applications or processes on shared infrastructure. If you own that infrastructure, then you might call that a “private cloud”, and infrastructure owned by other people a “public cloud”, but when people say “cloud” on its own, they generally mean public clouds, such as those operated by Amazon, Microsoft, IBM, Alibaba or others.

There’s a useful adage around cloud computing: “Remember that the cloud is just somebody else’s computer”. In other words, it’s still just hardware and software running somewhere, it’s just not being run by you. Another important thing to remember about cloud computing is that when you run your applications – let’s call them “workloads” from here on in – on somebody else’s cloud (computer), they’re unlikely to be running on their own. They’re likely to be running on the same physical hardware as workloads from other users (or “tenants”) of that provider’s services. These two realisations – that your workload is on somebody else’s computer, and that it’s sharing that computer with workloads from other people – is where isolation comes into the picture.

Workload from workload isolation

Let’s start with the sharing problem. You want to ensure that your workloads run as you expect them to do, which means that you don’t want other workloads impacting on how yours run. You want them to be protected from interference, and that’s where isolation comes in. A workload running in a Linux container or a Virtual Machine (VM) is isolated from other workloads by hardware and/or software controls, which try to ensure (generally very successfully!) that your workload receives the amount of computing time it should have, that it can send and receive network packets, write to storage and the rest without interruption from another workload. Equally important, the confidentiality and integrity of its resources should be protected, so that another workload can’t look into its memory and/or change it.

The means to do this are well known and fairly mature, and the building blocks of containers and VMs, for instance, are augmented by software like KVM or Xen (both open source hypervisors) or like SELinux (an open source capabilities management framework). The cloud service providers are definitely keen to ensure that you get a fair allocation of resources and that they are protected from the workloads of other tenants, so providing workload from workload isolation is in their best interests.

Host from workload isolation

Next is isolating the host from the workload. Cloud service providers absolutely do not want workloads “breaking out” of their isolation and doing bad things – again, whether by accident or design. If one of a cloud service provider’s host machines is compromised by a workload, not only can that workload possibly impact other workloads on that host, but also the host itself, other hosts and the more general infrastructure that allows the cloud service provider to run workloads for their tenants and, in the final analysis, make money.

Luckily, again, there are well-known and mature ways to provide host from workload isolation using many of the same tools noted above. As with workload from workload isolation, cloud service providers absolutely do not want their own infrastructure compromised, so they are, of course, going to make sure that this is well implemented.

Workload from host isolation

Workload from host isolation is more tricky. A lot more tricky. This is protecting your workload from the cloud service provider, who controls the computer – the host – on which your workload is running. The way that workloads run – execute – is such that such isolation is almost impossible with standard techniques (containers, VMs, etc.) on their own, so providing ways to ensure and prove that the cloud service provider – or their sysadmins, or any compromised hosts on their network – cannot interfere with your workload is difficult.

You might expect me to say that providing this sort of isolation is something that cloud service providers don’t care about, as they feel that their tenants should trust them to run their workloads and just get on with it. Until sometime last year, that might have been my view, but it turns out to be wrong. Cloud service providers care about protecting your workloads from the host because it allows them to make more money. Currently, there are lots of workloads which are considered too sensitive to be run on public clouds – think financial, health, government, legal, … – often due to industry regulation. If cloud service providers could provide sufficient isolation of workloads from the host to convince tenants – and industry regulators – that such workloads can be safely run in the public cloud, then they get more business. And they can probably charge more for these protections as well! That doesn’t mean that isolating your workloads from their hosts is easy, though.

There is good news, however, for both cloud service providers and their teants, which is that there’s a new set of hardware techniques called TEEs – Trusted Execution Environments – which can provide exactly this sort of protection[2]. This is rapidly maturing technology, and TEEs are not easy to use – in that it can not only be difficult to run your workload in a TEE, but also to ensure that it’s running in a TEE – but when done right, they do provide the sorts of isolation from the host that a workload wants in order to maintain its integrity and confidentiality[3].

There are a number of projects looking to make using TEEs easier – I’d point to Enarx in particular – and even an industry consortium to promote open TEE adoption, the Confidential Computing Consortium. Things are looking up if you’re interested in protecting your workloads, and the cloud service providers are on board, too.


1 – sorry if you came here expecting something different, but do stick around and have a read: hopefully there’s something of interest.

2 – the best known are Intel’s SGX and AMD’s SEV.

3 – availability – ensuring that it runs fairly – is more difficult, but as this is a property that is also generally in the cloud service provider’s best interest, and something that can can control, it’s not generally too much of a concern[4].

4 – yes, there are definitely times when it is, but that’s a story for another article.

A cybersecurity tip from Hazzard County

Don’t place that bet in Boss Hogg’s betting saloon: you know he’s up to no good!

It’s a slightly guilty secret, but I used to love watching The Dukes of Hazzard in the early 80’s (the first series started in late 1979, but I suspect that it didn’t make it to the UK until the next year at the earliest).  It all seemed very glamourous, and there were lots of fast car chases.  And a basset hound, which was an extra win.  To say this was early days for cybersecurity would be an understatement, and though there are references in the Wikipedia plot summaries to computers, I can’t honestly say that I remember any of those particular episodes.

One episode has stuck with me, however, for reasons that I can’t fathom.  It’s called “Hazzard Hustle” and (*SPOILER ALERT*) in it, Boss Hogg sets up a crooked betting saloon.  The swindle (if I remember it correctly) is that he controls and delays the supposedly live feeds to the TVs in the saloon, which means that he has access to results before they come in.  Needless to say, the Duke boys (probably aided and abetted by Daisy Duke) get the better of him in the end, and everything turns out OK (for them, not Boss Hogg).

“What can this have to do with cybersecurity?” you have every right to ask.  Well, the answer is reporting and monitoring channels.  Monitoring is important because without it, there is no way for us to check that what we believe should be happening actually is.  The opportunities for direct sensory monitoring of actions in computer-based systems are limited: if I request via a web browser that a banking application transfers funds between one account and another, the only visible effect that I am likely to see is an acknowledgement on the screen. Until I actually try to spend or withdraw that money, I realistically have no way to be assured that the transactions has taken place.

Let’s take an example from the human realm.  It is as if I have a trust relationship with somebody around the corner of a street, out of view, that she will raise a red flag at the stroke of noon, and I have a friend, standing on the corner, who will watch her, and tell me when and if she raises the flag. I may be happy with this arrangement, but only because I have a trust relationship to the friend: that friend is acting as a trusted channel for information.

The word “friend” was chosen carefully above, because there is a trust relationship already implicit in the term. The same is not true for the word “somebody”, which I used to describe the person who was to raise the flag. The situation as described above is likely to make our minds presume that there is a fairly high probability that the trust relationship I have to the friend is sufficient to assure me that he will pass the information correctly. But what if my friend is actually a business partner of the flag-waver? Given our human understanding of the trust relationships typically involved with business partnerships, we may immediately begin to assume that my friend’s motivations in respect to correct reporting are not neutral.

The channels for reporting on actions – monitoring them – are vitally important within cybersecurity, and it is both easy and dangerous to fall into the trap of assuming that they are neutral, and that the only important one is between me and the acting party. In reality, the trust relationship that I have to a set of channels is key to the maintenance of the trust relationships that I have to the key entity that they monitor. In trust relationships involving computer systems, there are often multiple entities or components involved in actions, and these form a “chain of trust”, where each link depends on the other, and the chain is typically only as strong as the weakest of its links.  Don’t forget that.  Oh, and don’t place that bet in Boss Hogg’s betting saloon: you know he’s up to no good!

Timely risk or risky times?

Being aware of “the long game”.

On Friday, 29th November 2019, Jack Merritt and Saskia Jones were killed in a terrorist attack.  A number of members of the public (some with with improvised weapons) and of the emergency services acted with great heroism.  I wanted to mention the mention the names of the victims and to praise those involved in stopping him before mentioning the name of the attacker: Usman Khan.  The victims, the attacker were taking part in an offender rehabilitation conference to help offenders released from prison to reintegrate into society: Khan had been convicted to 16 years in prison for terrorist offences.

There’s an important formula that everyone involved in risk – and given that IT security is all about mitigating risk, that’s anyone involved in security – should know. It’s usually expressed thus:

Risk = likelihood x impact

Sometimes likelihood is sometimes expressed as “probability”, impact as “consequence” or “loss”, and I’ve seen some other variants as well, but the version above is generally sufficient for most purposes.

Using the formula

How should you use the formula? Well, it’s most useful for comparing risks and deciding how to mitigate them. Humans are terrible at calculating risk, and any tools that help them[1] is good.  In order to use this formula correctly, you want to compare risks over the same time period.  You could say that almost any eventuality may come to pass over the lifetime of the universe, but comparing the risk of losing broadband access to the risk of your lead developer quitting for another company between the Big Bang and the eventual heat death of the universe is probably not going to give you much actionable information.

Let’s look at the two variables that we need to have in order to calculate risk.  We’ll start with the impact, because I want to devote most of this article to the other part: likelihood.

Impact is what the damage will be if the risk happens.  In a business context, you want to look at the risk of your order system being brought down for a week by malicious attackers.  You might calculate that you would lose £15,000 in orders.  On top of that, there might be a loss of reputation which you might calculate at £30,000.  Fixing the problem might add £10,000.  Add these together, and the impact is £55,000.

What’s the likelihood?  Well, remember that we need to consider a particular time period.  What you choose will depend on what you’re interested in, but a classic use is for budgeting, and so the length of time considered is often a year.  “What is the likelihood of my order system being brought down for a week by malicious attackers over the next twelve months?” is the question you want to ask.  If you decide that it’s 0.005 (or 0.5%), then your risk is calculated thus:

Risk = 0.005 x 55,000

Risk = 275

The units don’t really matter, because what you want to do is compare risks.  If the risk of your order system being brought down through hardware failure is higher (say 500), then you should probably balance the amount of resources you assign to mitigate these risks accordingly.

Time, reputation, trust and risk

What I’m interested in is a set of rather more complicated risks, however: those associated with human behaviour.  I’m very interested in trust, and one of the interesting things about trust is how we decide to trust people.  One way is by their reputation: if someone keeps behaving well over a long period, then we tend to trust them more – or if badly, then to trust them less[2].  If we trust someone more, our calculation of risk is likely to be strongly based on that trust, as our view of the likelihood of a behaviour at odds with the reputation that person holds will be informed by that.

This makes sense: in the absence of perfect information about humans, their motivations and intentions, our view of risk must be based on something, and reputation is actually a fairly good measure for that.  We might say that the likelihood of a customer defaulting on payment terms reduces year by year as we start to think of them as a “trusted customer”.  As the likelihood reduces, we may decide to increase the amount we lend to them – and thereby the impact of defaulting – to keep the risk about the same, year on year.

The risk here is what is sometimes called “playing the long game”.  Humans sometimes manipulate their reputation, or build up a reputation, in order to perform an action once they have gained trust.  Online sellers my make lots of “good” sales in order to get a 5 star rating over time, only to wait and then make a set of “bad” sales, where they don’t ship goods at all, and then just pocket the money.  Or, they may make many small sales in order to build up a good reputation, and then use that reputation to make one big sale which they have no intention of fulfilling.  Online selling sites are wise to some of these tricks, and have algorithms to try to protect buyers (in fact, the same behaviour can be used by sellers in some cases), but these are not perfect.

I’d like to come back to the London Bridge attack.  In this case, it seems likely that the attacker bided his time over many years, behaving well, and raising his “reputation” among those who knew him – the prison staff, parole board, rehabilitation conference organisers, etc. – so that he had the opportunity to perform one major action at odds with that reputation.  The heroism of those around him stopped him being as successful as he may have hoped, but still at the cost of two innocent lives and several serious injuries.

There is no easy way to deal with such issues.  We need reputation, and we need to allow people to show that they have changed and can be integrated into society, but when we make risk calculations based on reputation in any sphere, we should take care to consider whether actors are playing a long game, and what the possible ramifications would be if they were to act at odds with that reputation.

I noted above that humans are bad at calculating risk, and to follow our example of the non-defaulting customer, one mistake might be to increase the credit we give to that customer beyond the balance of the increase of reputation: actually accepting higher risk than we would have done previously, because we consider them trustworthy.  If we do this, we’ve ceased to use the risk formula, and have started to act irrationally.  Don’t do that.

 


1 – OK, then: “us”.

2 – I’m writing this in the lead up to a UK General Election, and it occurs to me that we actually don’t apply this to most of our politicians.

Who do you trust on trust?

(I’m hoping it’s me.)

I’ve been writing about trust on this blog for a little over two years now. It’s not the only topic, but it’s one about which I’m passionate. I’ve been thinking about issues around trust, particularly in regards to computing and security, for nearly 20 years, and it’s something I care about a lot. I care about it so much that I’m writing a book about it.

In fact, I care about it maybe a little too much. I was at a conference earlier this year and – in a move that will come as little surprise to regular readers of this blog[1] – actually ended up getting quite cross about it. The problem is that lots of people talk about trust, but they either don’t really know what they’re talking about, or they really don’t know what they’re talking about. To be clear, I mean different things by those two statements. Some people know their subject, but their subject isn’t really trust. Other people don’t know their subject, but then again, the thing they think they’re talking about often isn’t trust either. Some people talk about “zero trust“, when I really need to look beyond that concept, and discuss implicit vs explicit trust. People ignore the importance of establishing trust. People ignore the importance of decaying trust. People assume that transitive trust is the same as direct trust. People ignore context. All of these are important, and arguably, its not their fault. There’s actually very little detailed writing about trust outside the social sciences. Given how much discussion there is of trust, trusted computing, trusted systems and the like within the world of IT security, there’s astonishingly little theoretical underpinning of the concept, which means that there’s very little agreement as to what is really meant. And, it turns out, although it seems that trust within the social sciences is quite like trust within computing, it really isn’t.

Anyway, there were people at this conference earlier this year who said things about trust which strongly suggested to me that it would be helpful if there were a good underpinning that people could read and discuss and disagree with: a book, in fact, about trust in computing. I got so annoyed that I made a decision to tell two people – my boss and one of the editors of Opensource.com – that I planned to write a book about it. I’m not sure whether they really believed me, but I ended up putting together a Table of Contents. And then looking for a publisher, and then sending several publishers a copy of the ToC and some further thoughts about what a book might look like, and word count estimates, and a list of possible reader types and markets.

And then someone offered me a contract. This was a little bit of surprise, but after some discussion and negotiation, I’m now contracted to write a book on trust for Wiley. I’m absolutely going to continue to publish this blog, and I’ll continue to write about trust here. And, on occasion, something a little bit more random. I don’t pretend to know everything about the subject, and writing about it here allows me to explore some of the more tricky issues. I hope you’ll join me for the ride – and if you have suggestions or questions, I’d love to hear about them.


1 – or my wife and kids.

What’s a Trusted Compute Base?

Tamper-evidence, auditability and measurability are three important properties.

A few months ago, in an article called “Turtles – and chains of trust“, I briefly mentioned Trusted Compute Bases, or TCBs, but then didn’t go any deeper.  I had a bit of a search across the articles on this blog, and realised that I’ve never gone into this topic in much detail, which feels like a mistake, so I’m going to do it now.

First of all, let’s think about computer systems.  When I talk about systems, I’m being both quite specific (see Systems security – why it matters) and quite broad (I don’t just mean computer that sits on your desk or in a data centre, but include phones, routers, aircraft navigation devices – pretty much anything that has a set of chips inside it).  There are surely some systems that you don’t rely on too much to do important things, but in most cases, you’re going to care if they go wrong, or, more relevant to this discussion, if they get compromised.  Even the most benign of systems – a smart light-bulb, for instance – can become a nightmare if compromised.  Even if you don’t particularly care whether you can continue to use it in the way it was intended, there are still worries about its misuse in the case of compromise:

  1. it may become a “jumping off point” for malicious attacks into your network or other systems;
  2. it may be used as part of a botnet, piggybacking on your network to attack other systems (leading to sanctions against your legitimate systems from outside);
  3. it may be used as part of a botnet, using up resources such as network bandwidth, storage or electricity (leading to resource constraints or increased charges).

For any systems dealing with sensitive data – anything from your messages to loved ones on your phone through intellectual property secrets for a manufacturing organisation through to National Security data for government department – these issues are compounded.  In order to protect your system, you can’t just say “this system is secure” (lovely as that would be).  What can you do to start making statement about the general security of a system?

The stack

Systems consist of multiple components, and modern computing systems are typically composed from multiple layers (one of my favourite xkcd comics, Stack, shows some of them).  What’s relevant from the point of view of this article is that, on the whole, the different layers of the stack start up – boot up – from the bottom upwards.  This means, following the “bottom turtle” rule (see the Turtles article referenced above), that we need to ensure that the bottom layer is as secure as possible.  In fact, in order to build a system in which we can have assurance that it will behave as expected and designed (in other words, a system in which we can have a trust relationship), we need to build a Trusted Compute Base.  This should have at least the following set of properties: tamper-evidence, auditability and measurability, all of which are related to each other.

Tamper-evidence

We want to know if the TCB – on which we are building everything else – has a problem.  Specifically, we need a set of layers or components that we are pretty sure have not been compromised, or which, if compromised, will be tamper-evident:

  • fail in expected ways,
  • refuse to start, or
  • flag that they have been compromised.

It turns out that this is not easy, and typically becomes more difficult as you move up the stack – partly because you’re adding more layers, and partly because those layers tend to get more complex.

Our TCB should have the properties listed above (around failure, refusing to start or compromise-flagging), and be as small as possible.  This seems the wrong way around: surely you would want to ensure that as much of your system was trusted as possible?  In fact, what you want is a small, easily measurable and easily auditable TCB on which you can build the rest of your system – from which you can build a “chain of trust” to the other parts of your system about which you care.  Auditability and measurability are the other two properties that you want in a TCB, and these two properties are why open source is a very useful tool in your toolkit when building a TCB.

Auditability (and open source)

Auditability means that you – or someone else who you trust to do the job – can look into the various components of the TCB and assure yourself that they have been written, compiled and  are executing properly.  As I explained in Of projects, products and (security) community, the person may not always be you, or even someone in your organisation, but if you’re using widely deployed open source software, the rest of the community can be doing that auditing for you, which is a win for you and – if you contribute your knowledge back into the community – for everybody else as well.

Auditability typically gets harder the further you go down the stack – partly because you’re getting closer and closer to bits – ones and zeros – and to actual electrons, and partly because there is very little truly open source hardware out there at the moment.  However, the more that we can see and audit of the TCB, the more confidence we can have in it as a building block for the rest of our system.

Measurability (and open source)

The other thing you want to be able to do is ensure that your TCB really is your TCB.  Tamper-evidence is related to this, but that’s a run-time property only (for software components, at least).  Being able to measure when you provision your system and then to check that what you originally loaded is still what you think it should be when you boot it is a very important property of a TCB.  If what you’re running is open source, you can check it yourself, against your own measurements and those of the community, and if changes are made – by you or others – those changes can be checked (as part of auditing) and then propagated through measurement checking to the rest of the community.  Equally important – and much more difficult – is run-time measurability.  This turns out to be very difficult to do, although there are some techniques emerging which are beginning to get traction – for now, we tend to rely on tamper-evidence, which is easier in hardware than software.

Summary

Trusted Compute Bases (TCBs) are a key concept in building systems that we hope will behave in ways we expect – or allow us to find out when they are not.  Tamper-evidence, auditability and measurability are three important properties that they should display, and it turns out that open source is an important factor in helping us ensure two of those.

 

 

 

Humans and (being bad at) trust

Why “signing parties” were never a good idea.

I went to a party recently, and it reminded of quite how bad humans are at trust. It was a work “mixer”, and an attempt to get people who didn’t know each other well to chat and exchange some information. We were each given two cards to hang around our necks: one on which to write our own name, and the other on which we were supposed to collect the initials of those to whom we spoke (in their own hand). At the end of the event, the plan was to hand out rewards whose value was related to the number of initials collected. Pens/markers were provided.

I gamed the system by standing by the entrance, giving out the cards, controlling the markers and ensuring that everybody signed card, hence ending up with easily the largest number of initials of anyone at the party. But that’s not the point. Somebody – a number of people, in fact – pointed out the similarities between this and “key signing parties”, and that got me thinking. For those of you not old enough – or not security-geeky enough – to have come across these, they were events which were popular in the late nineties and early parts of the first decade of the twenty-first century[1] where people would get together, typically at a tech show, and sign each other’s PGP keys. PGP keys are an interesting idea whereby you maintain a public-private key pair which you use to sign emails, assert your identity, etc., in the online world. In order for this to work, however, you need to establish that you are who you say you are, and in order for this to work, you need to convince someone of this fact.

There are two easy ways to do this:

  1. meet someone IRL[2], get them to validate your public key, and sign it with theirs;
  2. have someone who knows the person you met in step 1 agree that they can probably trust you, as the person in step 1 did, and they trust them.

This is a form of trust based on reputation, and it turns out that it is a terrible model for trust. Let’s talk about some of the reasons for it not working. There are four main ones:

  • context
  • decay
  • transitive trust
  • peer pressure.

Let’s evaluate these briefly.

Context

I can’t emphasise this enough: trust is always, always contextual (see “What is trust?” for a quick primer). When people signed other people’s key-pairs, all they should really have been saying was “I believe that the identity of this person is as stated”, but signatures and encryption based on these keys was (and is) frequently misused to make statements about, or claim access to, capabilities that were not necessarily related to identity.

I lay some of the fault of this at the US alcohol consumption policy. Many (US) Americans use their driving licence/license as a form of authorisation: I am over this age, and am therefore entitled to purchase alcohol. It was designed to prove that their were authorised to drive, and nothing more than that, but you can now get a US driving licence to prove your age even if you can’t drive, and it can be used, for instance, as security identification for getting on aircraft at airportsThis is crazy, but partly explains why there is such a confusion between identification, authentication and authorisation.

Decay

Trust, as I’ve noted before in many articles, decays. Just because I trust you now (within a particular context) doesn’t mean that I should trust you in the future (in that or any other context). Mechanisms exist within the PGP framework to expire keys, but it was (I believe) typical for someone to resign a new set of keys just because they’d signed the previous set. If they were only being used for identity, then that’s probably OK – most people rarely change their identity, after all – but, as explained above, these key pairs were often used more widely.

Transitive trust

This is the whole “trusting someone because I trust you” problem. Again, if this were only about identity, then I’d be less worried, but given people’s lack of ability to specify context, and their equal inability to communicate that to others, the “fuzziness” of the trust relationships being expressed was only going to increase with the level of transitiveness, reducing the efficacy of the system as a whole.

Peer pressure

Honestly, this occurred to me due to my gaming of the system, as described in the second paragraph at the top of this article. I remember meeting people at events and agreeing to endorse their key-pairs basically because everybody else was doing it. I didn’t really know them, though (I hope) I had at least heard of them (“oh, you’re Denny’s friend, I think he mentioned you”), and I certainly shouldn’t have been signing their key-pairs. I am certain that I was not the only person to fall into this trap, and it’s a trap because humans are generally social animals[3], and they like to please others. There was ample opportunity for people to game the system much more cynically than I did at the party, and I’d be surprised if this didn’t happen from time to time.

Stepping back a bit

To be fair, it is possible to run a model like this properly. It’s possible to avoid all of these by insisting on proper contextual trust (with multiple keys for different contexts), by re-evaluating trust relationships on a regular basis, by being very careful about trusting people just due to their trusting someone else (or refusing to do so at all), and by refusing just to agree to trust someone because you’ve met them and they “seem nice”. But I’m not aware of anyone – anyone – who kept to these rules, and it’s why I gave up on this trust model over a decade ago. I suspect that I’m going to get some angry comments from people who assert that they used (and use) the system properly, and I’m sure that there are people out there who did and do: but as a widespread system, it was only going to work if the large majority of all users treated it correctly, and given human nature and failings, that never really happened.

I’m also not suggesting that we have many better models – but we really, really need to start looking for some, as this is important, and difficult stuff.


1 – I refuse to refer to these years the “aughts”.

2 – In Real Life – this used to be an actual distinction to online.

3 – even a large enough percentage of IT folks to make this a problem.

Breaking the security chain(s)

Your environment is n-dimensional – your trust must be, too.

One of the security principles by which we[1] live[2] is that security is only as strong as the weakest link in a chain.  That link is variously identified as:

  • your employees
  • external threat actors
  • all humans
  • lack of training
  • cryptography
  • logging
  • anti-virus
  • auditing capabilities
  • the development lifecycle
  • waterfall methodology
  • passwords
  • any other authentication mechanisms
  • electrical wiring
  • hurricanes
  • earthquakes
  • and pestilence.

Actually, I don’t think I’ve ever seen the last one mentioned, but it’s only a matter of time.  However, very rarely does anybody bother to identify exactly what the chain is that it is being broken by the weakest link splintering into a thousand pieces.

There are a number of candidates that spring to mind:

  1. your application flow.  This is rather an old-fashioned way of thinking of applications: that a program is started, goes through a set of actions, and then terminates, but to think more broadly about it, any action which causes an application to behave in unexpected or unintended ways is a possible security flow, whether that is a monolithic application, a set of microservices or an app on  mobile device.
  2. your software stack.  Depending on how you think about your stack, there are likely to be at least 5, maybe a dozen or maybe even scores of layers in your software stack (for an example with just a few simple layers, see Announcing Enarx).  However you think about it, you need to trust all of those layers to do what who expect them to do.  If one of them is compromised, malicious, or just poorly implemented or maintained, then you have a security issue.
  3. your hardware stack.  There was a time, barely five years ago, when most people (excepting us[1], of course), assumed that hardware did what we thought it was supposed to do, all of the time.  In fact, we should all have known better, given the Clipper Chip and the Pentium bug (to name just to famous examples), but with Spectre, Meltdown and a growing realisation that hardware isn’t as trustworthy as was previously thought, everybody needs to decide exactly what security they can trust in which components.
  4. your operational processes.  You can have the best software and hardware in the world, but if you don’t maintain it and operate it properly, it’s going to be full of holes.  Failing to invest in operations, monitoring, logging, auditing and the rest leaves you wide open.
  5. your supply chain. There’s a growing understanding in the industry that our software and hardware supply chains are possible points of failure[3].  Whether your vendor is entirely proprietary (in which case their security is largely opaque) or open source (in which case you’ve got a chance to be able to see what’s going on), errors or maliciousness in the supply chain can scupper any hopes you had of security for your deployment.
  6. your software and hardware lifecycle.  Developing software?  Patching it?  Upgrading hardware (or software)?  Unit testing?  Unit testingg?  We all know that a failure to manage the lifecycle of our environment can lead to security problems.

The point I’m trying to make above is that there’s no single chain.  Your environment is n-dimensional – your trust must be, too.  If you don’t think about all of these contexts – and there will be more beyond the half-dozen that I’ve just noted – then you can’t have a good chance of managing security in your environment.  I honestly don’t think that there’s any single weakest link in the chain, because there are always already multiple chains in play: our job is to think about as many of them as possible, and then manage an mitigate the risks associated with each.


1 – the mythical “IT security community”.

2 – you’re right: “which we live by” would sound much more natural.

3 – and a growing industry to try to provide fixes.