I’m turning off your security.

“Don’t worry, I know what I’m doing.”

Today’s security story is people turning security off.  For me, the fact that it’s even a story is the story.  This particular story is covered in The Register, who explain (to nobody’s surprise) that some of the patches to fix issues identified in CPU’s (think Spectre, Meltdown, etc.) can actually slow down the applications running on them.  The problem is that, in some cases, they don’t slow them down a little bit, but rather a lot.  By which I mean up to 50%.  And if you’ve bought expensive hardware – or rented it [1] – then you’d generally prefer it if it runs your applications/programs/workloads quickly, rather than just half as fast as they might run.

And so you turn off the security patches.  Your decision: fine.

No, stop: this isn’t what has happened.

The mythical “you”, the person running the workload, isn’t the person who makes the decision, in most cases, because it’s been made for you.  This is the real story.

Linus Torvalds, and a bunch of other experts in the Linux kernel[2], have decided that although the patch that could make your workloads secure is available, the functionality that does it should be “off” by default.  They reason – quite correctly, in my opinion – that the vast majority of people running workloads, won’t easily be able to turn this functionality on themselves

They also reason – again, correctly, in my opinion – that most people will care more about how quickly their workloads run than about how secure they are.  I’m not happy about this, but that’s the way it is.

What I worry about is the final step in the logic to making the decision.  I’m going to quote Linus:

“Have you seen any actual realistic attacks for normal human users?” he asked. “Things where the kernel should actually care? The JavaScript thing is for the browser to fix up, not for the kernel to say ‘now everything should run up to 50 per cent slower.'”

I get the reasoning behind this, but I don’t like it.  To give some context, somebody came up with an example attack which could compromise certain workloads, and Linus points out that there are better ways to fix this attack than fixing it in the kernel. My concerns are two-fold:

  1. although there may be better places to fix that particular attack, a kernel-level fix is likely to fix an entire class of attacks, meaning better protection for users who are using any application which might include an attack vector.
  2. pointing out that there haven’t been any attacks yet not only ignores the fact that there is a future out there[3] but also points malicious actors in the direction of a likely attack vector.

Now, I know that the more dedicated malicious actors are already looking for these things, but do we really need to advertise?

What’s my fix?

I don’t have one, or at least not an easy one.

Somebody, somewhere, needs to decide whether security is turned on or off.  What I’d honestly like to see is an easier set of controls to allow people to turn on or off security, and to understand the trade-offs when they do that.  The problems with that are:

  • the trade-offs are often much more complex than just “fast and insecure” or “slow and secure”, and are really difficult to explain.
  • in order to make a sensible decision about trade-offs, people need to understand risk.  And people are awful at understanding risk.

And there’s a “chicken and egg problem”[7] here: people won’t understand risk until they are offered the chance to make decisions, but there’s little incentive to offer them complex decisions unless they understand risk.

My plea?  Where possible, expose risk, and explain what it is.  And if you’re turning off security-related functionality, make it easy to turn back on for those who need it.


1 – a quick heads-up: this is what “deploying to the cloud” actually is.

2 – what sits at the bottom of many of the workloads that are running in servers.

3 – hopefully.  If the Three Minute Warning[4] sounds while you’re reading this, you may wish to duck and cover.  You can come back to it later[6].

4 – “… sounds like this …”[5].

5 – 80s reference.

6 – or not.  See [3].

7 – for non-native English readers, this means “a problem where the solution requires two pieces, both of which are dependent on each other”.

Why security policies are worthless

A policy, to have any utility at all, needs to exist in a larger context.

“We need a policy on that.” This is a phrase that seems to act as a universal panacea to too many managers. When a problem is identified, and the blame game has been exhausted, the way to sort things out, they believe, is to create a policy. If you’ve got a policy, then everything will be fine, because everything will be clear, and everyone will obey the policy, and nothing can go wrong.

Right[1].

The problem is that policy, on its own, is worthless.

A policy, to have any utility at all, needs to exist in a larger context, or, to think of it in a different way, to sit in a chain of artefacts.  It is its place in this chain that actually gives it meaning.  Let me explain.

When that manager said that they wanted a policy, what did that actually mean?  That rather depends on how wise the manager is[2].  Hopefully, the reason that the manager identified the need for a policy was because:

  1. they noticed that something had gone wrong that shouldn’t have done and;
  2. they wanted to have a way to make sure it didn’t happen again.

The problem with policy on its own is that it doesn’t actually help with either of those points.  What use does it have, then?

Governance

Let’s look at those pieces separately.  When we say that “something had gone wrong that shouldn’t have done“, what we’re saying is that there is some sort of model for what the world should look like, giving us often general advice on our preferred state.  Sometimes this is a legal framework, sometimes it’s a regulatory framework, and sometimes it’s a looser governance model.  The sort of thing I’d expect to see at this level would be statements like:

  • patient data must be secured against theft;
  • details of unannounced mergers must not be made available to employees who are not directors of the company;
  • only authorised persons may access the military base[4].

These are high level requirements, and are statements of intent.  They don’t tell you what to do in order to make these things happen (or not happen), they just tell you that you have to do them (or not do them).  I’m going to call collections of these types of requirements “governance models”.

Processes

At the other end of the spectrum, you’ve got the actual processes (in the broader sense of the term) required to make the general intent happen.  For the examples above, these might include:

  • AES-256 encryption using OpenSSL version 1.1.1 FIPS[5], with key patient-sym-current for all data on database patients-20162019;
  • documents Indigo-1, Indigo-3, Indigo-4 to be collected after meeting and locked in cabinet Socrates by CEO or CFO;
  • guards on duty at post Alpha must report any expired passes to the base commander and refuse entry to all those producing them.

These are concrete processes that must be followed, which will hopefully allow the statements of intent that we noted above to be carried out.

Policies and audit

The problem with both the governance statements and the processes identified is that they’re both insufficient.  The governance statements don’t tell you how to do what needs to be done, and the processes are context-less, which means that you can’t tell what they relate to, or how they’re helping.  It’s also difficult to know what to do if they fail: what happens, for example, if the base commander turns up with an expired pass?

Policies are what sit in the middle.  They provide guidance as to how to implement the governance model, and provide context for the processes.  They should give you enough detail to cover all eventualities, even if that’s to say “consult the Legal Department when unsure[6]”.  What’s more, they should be auditable.  If you can’t audit your security measures, then how can you be sure that your governance model is being followed?  But governance models, as we’ve already discovered, are at the level of intent – not the sort of thing that can be audited.

Policies, then, should provide enough detail that they can be auditable, but they should also preferably be separated enough from implementation that rules can be written that are applicable in different circumstances.  Here are a few policies that we might create to follow the first governance model examples above:

  • patient data must be secured against theft;
    • Policy 1: all data at rest must be encrypted by a symmetric key of at least 256 bits or equivalent;
    • Policy 2: all storage keys must be rotated at least once a month;
    • Policy 3: in the event of a suspected key compromise, all keys at the same level in that department must be rotated.

You can argue that these policies are too detailed, or you can argue that they’re not detailed enough: both arguments are fair, and level of detail (or granularity) should depend on the context and the use to which they are being put.  However, though I’m sure that all of the example policies I’ve given could be improved, I hope that they are all:

  • auditable;
  • able to be implemented by one or more well-defined processes;
  • understandable by both those who concerned with the governance level and those involved at the process implementation and operations level.

The value of auditing

I’ve written about auditing before (Confessions of an auditor).  I think it’s really important.  Done well, it should allow you to discover whether:

  1. your processes are covering all of the eventualities of your policies;
  2. whether your policies are actually being implemented correctly.

Auditing may also address whether your policies fully meet your governance model.  Auditing well is a skill, but in order to help your auditor – whether they are good at it or bad at it – having a clearly defined set of policies is a must.  But, as I pointed out at the beginning of this article, policies for policies’ sake are worthless: put them together with governance and processes, however, and they provide technical and business value.


1 – this is sarcasm.

2 – yes, I know.  But let’s not be rude if we can avoid it.  We want to help managers, because the more clue we deliver to them, the easier our lives will be[3].

3 – and you never know: one day, even you might be a manager.

4 – which is probably not a US military base, given the spelling of “authorised”.

5 – example only…

6 – this, in my experience, is the correct answer to many questions.

The 3 things you need to know about disk encryption

Use software encryption, preferably an open-source and audited solution.

It turns out that somebody – well, lots of people, in fact – failed to implement a cryptographic standard very well.  This isn’t a surprise, I’m afraid, but it’s bad news.  I’ve written before about how important it is to be using disk encryption, but it turns out that the advice I gave wasn’t sufficient, or detailed enough.

Here’s a bit of background.  There are two ways to do disk encryption:

  1. let the disk hardware (and firmware) manage it: HDD (hard disk drive), SSD (solid state drive) and hybrid (a mix of HDD and SDD technologies) manufacturers create drives which have encryption built in.
  2. allow your Operating System (e.g. Linux[0], OSX[1], Windows[2]) to do the job: the O/S will have a little bit of itself on the disk unencrypted, which will allow it to decrypt the rest of the disk (which is encrypted) when provided with a password or key.

You’d think, wouldn’t you, that option 1 would be the safest?  It should be quick, as it’s done in hardware, and well, the companies who manufacture these disks will know that they’re doing, right?

No.

A paper (link opens a PDF file) written by some researchers in the Netherlands reveals some work that they did on several SSD drives to try to work out how good a job had been done on the encryption security.  They are all supposed to have implemented a fairly complex standard from the TCG[4] called Opal, but it seems that none of them did it right.  It turns out that someone with physical access to your hardware can, fairly trivially, decrypt what’s on your drive.  And they can do this without the password that you use to lock it or any associated key(s).  The simple lesson from this is that you shouldn’t trust hardware disk encryption.

So, software disk encryption is OK, then?

Also no.

Well, actually yes, as long as you’re not using Microsoft’s BitLocker in its default mode.  It turns out that BitLocker will just use hardware encryption if the drive its using supports it.  In other words, using BitLocker just uses hardware encryption unless you tell it not to do so.

What about other options?  Well, you can tell BitLocker not to use hardware encryption, but only for a new installation: it won’t change on an existing disk.  The best option[5] is to use a software encryption solution which is open source and audited by the wider community.  LUKS is the default for most Linux distributions.  One suggested by the papers’ authors for Windows is Veracrypt.  Can we be certain that there are no holes or mistakes in the implementation of these solutions?  No, we can’t, but the chances of security issues being found and fixed are much, much higher than for proprietary software[6].

What, then are my recommendations?

  1. Don’t use hardware disk encryption.  It’s been shown to be flawed in many implementations.
  2. Don’t use proprietary software.  For anything, honestly, particularly anything security-related, but specifically not for disk encryption.
  3. If you have to use Windows, and are using BitLocker, run with VeraCrypt on top.

 


1 – GNU Linux.

2 – I’m not even sure if this is the OS that Macs run anymore, to be honest.

3 – not my thing either, but I’m pretty sure this is what it’s call.  Couldn’t be certain of the version, though.

4 – Trusted Computing Group.

5 – as noted by the paper’s authors, and heartily endorsed by me.

6 – I’m not aware of any problems with Macintosh-based implementations, but open source is just better – read the article linked from earlier in the sentence.

On being acquired – a personal view

It’s difficult to think of a better fit than IBM.

First off, today is one of those days when I need to point you at the standard disclaimer that the views expressed in this post are my own, and not necessarily those of my employers.  That said, I think that many of them probably align, but better safe than sorry[1].  Another note: I believe that all of the information in this article is public knowledge.

The news came out two days ago (last Sunday, 2018-10-28) that Red Hat, my employer, is being acquired by IBM for $34bn.  I didn’t know about it the deal in advance (I’m not that exalted within the company hierarchy, which is probably a good thing, as all those involved needed to keep very tight-lipped about it, and that would have been hard), so the first intimation I got was when people started sharing stories from various news sites on internal chat discussions.  They (IBM) are quite clear about the fact that they are acquiring us for the people, which means that each of us (including me!) is worth around $2.6m, based on our current headcount.  Sadly, I don’t think it works quite like this, and certainly nobody has (yet) offered to pay me that amount[2].  IBM have also said that they intend to keep Red Hat operating as a separate entity within IBM.

How do I feel?  My initial emotion was shock.  It’s always a surprise when you get news that you weren’t expecting, and the message that we’d carried for a long time was the Red Hat would attempt to keep ploughing its own furrow[3] for as long as possible.  But I’d always known that, as a public company, we were available to be bought, if the money was good enough.  It appears on this occasion that it was.  And that emotion turned to interest as to what was going to happen next.

And do you know what?  It’s difficult to think of a better fit than IBM.  I’m not going to enumerate the reasons that I feel that other possible acquirers would have been worse, but here are some of the reasons that IBM, at least in this arrangement, is good:

  • they “get” open source, and have a long history of encouraging its use;
  • they seem to understand that Red Hat has a very distinctive culture, and want to encourage that, post-acquisition;
  • they have a hybrid cloud strategy and products, Red Hat has a hybrid cloud strategy and products: they’re fairly well-aligned;
  • we’re complementary in a number of sectors and markets;
  • they’re a much bigger player than us, and suddenly, we’ll have access to more senior people in new and exciting companies.

What about the impact on me, though?  Well, IBM takes security seriously.  IBM has some fantastic research and academic connections.  The group in which I work has some really bright and interesting people in it, and it’s difficult to imagine IBM wanting to break it up.  A number of the things I’m working on will continue to align with both Red Hat’s direction and IBM’s.  The acquisition will take up to a year to complete – assuming no awkward regulatory hurdles along the way – and not much is going to change in the day-to-day.  Except that I hope to get even better access to my soon-to-be-colleagues working in similar fields to me, but within IBM.

Will there be issues along the way?  Yes.  Will there be uncertainty?  Yes.  But do I trust that the leadership within Red Hat and IBM have an honest commitment to making things work in a way that will benefit Red Hatters?  Yes.

And am I looking to jump ship?  Oh, no.  Far too much interesting stuff to be doing.  We’ve got an interesting few months and years ahead of us.  My future looked red, until Sunday night.  Then maybe blue.  But now I’m betting on something somewhere between the two: go Team Purple.


1 – because, well, lawyers, the SEC, etc., etc.

2 – if it does, then, well, could somebody please contact me?

3 – doing its own thing independently.

“All systems nominal” – borrowing a useful phrase

Wouldn’t it be lovely if everything were functioning exactly as it should?

Wouldn’t it be lovely if everything were functioning exactly as it should? All the time. Alas, that is not to be our lot: we in IT security know that there’s always something that needs attention, something that’s not quite right, something that’s limping along and needs to be fixed.

The question I want to address in this article is whether that’s actually OK. I’ve written before about managed degradation (the idea that planning for when things go wrong is a useful and important practice[1]) in Service degradation: actually a good thing. The subject of this article, however, is living in a world where everything is running almost normally – or, more specifically, “nominally”. Most definitions of the word “nominal” focus on its meaning of “theoretical” or “token” value. A quick search of online dictionaries provided two definitions which were more relevant to the usage I’m going to be looking at today:

  • informal (chiefly in the context of space travel) functioning normally or acceptably[2].
  • being according to plan: satisfactory[3].

I’d like to offer a slightly different one:

  • within acceptable parameters for normal system operation.

I’ve seen “tolerances” used instead of “parameters”, and that works, too, but it’s not a word that I think we use much within IT security, so I lean towards “parameters”[4].

Utility

Why do I think that this is a useful concept? Because, as I noted above, we all know that it’s a rare day when everything works perfectly. But we find ways to muddle through, or we find enough bandwidth to make the backups happen without significant impact on database performance, or we only lose 1% of the credit card details collected that day[5]. All of these (except the last one), are fine, and if we are wise, we might start actually defining what counts as acceptable operation – nominal operation – for all of our users. This is what we should be striving for, and exactly how far off perfect operation we are in will give us clues as to how much effort we need to expend to sort them out, and how quickly we need to perform mitigations.

In fact, many organisations which provide services do this already: that’s where SLAs (Service Level Agreements) come from. I remember, at school, doing some maths[6] around food companies ensuring that they were in little danger of being in trouble for under-filling containers by looking at standard deviations to understand what the likely amount in each would be. This is similar, and the likelihood of hardware failures, based on similar calculations, is often factored into uptime planning.

So far, much of the above seems to be about resilience: where does security come in? Well, your security components, features and functionality are also subject to the same issues of resiliency to which any other part of your system is. The problem is that if a security piece fails, it may well be a single point of failure, which means that although the rest of the system is operating at 99% performance, your security just hit zero.

These are the reasons that we perform failure analysis, and why we consider defence in depth. But when we’re looking at defence in depth, do we remember to perform second order analysis? For instance, if my back-up LDAP server for user authentication is running on older hardware, what are the chances that it will fail when put under load?

Broader usage

It should come as no surprise to regular readers that I want to expand the scope of the discussion beyond just hardware and software components in systems to the people who are involved, and beyond that to processes in general.

Train companies are all too aware of the impact on their services if a bad flu epidemic hits their drivers – or if the weather is good, so their staff prefer to enjoy the sunshine with their families, rather than take voluntary overtime[7]. You may have considered the impact of a staff member or two being sick, but have you gone as far as modelling what would happen if four members of your team were sick, and, just as important, how likely that is? Equally vital to consider may be issues of team dynamics, or even terrorism attacks or union disputes. What about external factors, like staff not being able to get into work because of train cancellations? What are the chances of broadband failures occurring at the same time, scuppering your fall-back plan of allowing key staff to work from home?

We can go deeper and deeper into this, and at some point it becomes pointless to go any further. But I believe that it’s useful to consider how far to go with it, and also to spend some time considering exactly what you consider “nominal” operation, particularly for you security systems.


1 – I nearly wrote “art”.

2 – Oxford Dictionaries: https://en.oxforddictionaries.com/definition/nominal.

3 – Merriam-Webster: https://www.merriam-webster.com/dictionary/nominal.

4 – this article was inspired by the Public Service Broadcasting Song Go. Listen to it: it rocks. And they’re great live, too.

5 – Note: this is a joke, and not a very funny one. You’ve probably just committed a GDPR breach, and you need to tell someone about it. Now.

6 – in the UK, and most countries speaking versions of Commonwealth English, we do more than one math. Because “mathematics“.

7 – this happened: I saw it it on TV[8].

8 – so it must be true.

6 types of attack: learning from Supermicro, State Actors and silicon

… it could have happened, and it could be happening now.

Last week, Bloomberg published a story detailing how Chinese state actors had allegedly forced employees of Supermicro (or companies subcontracting to them) to insert a small chip – the silicon in the title – into motherboards destined for Apple and Amazon.  The article talked about how an investigation into these boards had uncovered this chip and the steps that Apple, Amazon and others had taken.  The story was vigorously denied by Supermicro, Apple and Amazon, but that didn’t stop Supermicro’s stock price from tumbling by over 50%.

I have heard strong views expressed by people with expertise in the topic on both sides of the argument: that it probably didn’t happen, and that it probably did.  One side argues that the denials by Apple and Amazon, for instance, might have been impacted by legal “gagging orders” from the US government.  An opposing argument suggests that the Bloomberg reporters might have confused this story with a similar one that occurred a few months ago.  Whether this particular story is correct in every detail, or a fabrication – intentional or unintentional – is not what I’m interested in at this point.  What I’m interested in is not whether it did happen in this instance: the clear message is that it could have happened, and it could be happening now.

I’ve written before about State Actors, and whether you should worry about them.  There’s another question which this story brings up, which is possibly even more germane: what can you do about it if you are worried about them?  This breaks down further into two questions:

  • how can I tell if my systems have been compromised?
  • what can I do if I discover that they have?

The first of these is easily enough to keep us occupied for now [1], so let’s spend some time on that.  First, let’s first define six types of compromise, think about how they might be carried out, and then consider the questions above for each:

  • supply-chain hardware compromise;
  • supply-chain firmware compromise;
  • supply-chain software compromise;
  • post-provisioning hardware compromise;
  • post-provisioning firmware compromise;
  • post-provisioning software compromise.

This article doesn’t provide sufficient space to go into detail of these types of attack, and provides an overview of each, instead[2].

Terms

  • Supply-chain – all of the steps up to when you start actually running a system.  From manufacture through installation, including vendors of all hardware components and all software, OEMs, integrators and even shipping firms that have physical access to any pieces of the system.  For all supply-chain compromises, the key question is the extent to which you, the owner of a system, can trust every single member of the supply chain[3].
  • Post-provisioning – any point after which you have installed the hardware, put all of the software you want on it, and started running it: the time during which you might consider the system “under your control”.
  • Hardware – the physical components of a system.
  • Software – software that you have installed on the system and over which you have some control: typically the Operating System and application software.  The amount of control depends on factors such as whether you use proprietary or open source software, and how much of it is produced, compiled or checked by you.
  • Firmware – special software that controls how the hardware interacts with the standard software on the machine, the hardware that comprises the system, and external systems.  It is typically provided by hardware vendors and its operation opaque to owners and operators of the system.

Compromise types

See the table at the bottom of this article for a short summary of the points below.

  1. Supply-chain hardware – there are multiple opportunities in the supply chain to compromise hardware, but the more hard they are made to detect, the more difficult they are to perform.  The attack described in the Bloomberg story would be extremely difficult to detect, but the addition of a keyboard logger to a keyboard just before delivery (for instance) would be correspondingly more simple.
  2. Supply-chain firmware – of all the options, this has the best return on investment for an attacker.  Assuming good access to an appropriate part of the supply chain, inserting firmware that (for instance) impacts network performance or leaks data over a wifi connection is relatively simple.  The difficulty in detection comes from the fact that although it is possible for the owner of the system to check that the firmware is what they think it is, what that measurement confirms is only that the vendor has told them what they have supplied.  So the “medium” rating relates only to firmware that was implanted by members in the supply chain who did not source the original firmware: otherwise, it’s “high”.
  3. Supply-chain software – by this, I mean software that comes installed on a system when it is delivered.  Some organisations will insist in “clean” systems being delivered to them[4], and will install everything from the Operating System upwards themselves.  This means that they basically now have to trust their Operating System vendor[5], which is maybe better than trusting other members of the supply chain to have installed the software correctly.  I’d say that it’s not too simple to mess with this in the supply chain, if only because checking isn’t too hard for the legitimate members of the chain.
  4. Post-provisioning hardware – this is where somebody with physical access to your hardware – after it’s been set up and is running – inserts or attaches hardware to it.  I nearly gave this a “high” rating for difficulty below, assuming that we’re talking about servers, rather than laptops or desktop systems, as one would hope that your servers are well-protected, but the ease with which attackers have shown that they can typically get physical access to systems using techniques like social engineering, means that I’ve downgraded this to “medium”.  Detection, on the other hand, should be fairly simple given sufficient resources (hence the “medium” rating), and although I don’t believe anybody who says that a system is “tamper-proof”, tamper-evidence is a much simpler property to achieve.
  5. Post-provisioning firmware – when you patch your Operating System, it will often also patch firmware on the rest of your system.  This is generally a good thing to do, as patches may provide security, resilience or performance improvements, but you’re stuck with the same problem as with supply-chain firmware that you need to trust the vendor: in fact, you need to trust both your Operating System vendor and their relationship with the firmware vendor.
  6. Post-provisioning software – is it easy to compromise systems via their Operating System and/or application software?  Yes: this we know.  Luckily – though depending on the sophistication of the attack – there are generally good tools and mechanisms for detecting such compromises, including behavioural monitoring.

Table

 

Compromise type Attacker difficulty Detection difficulty
Supply-chain hardware High High
Supply-chain firmware Low Medium
Supply-chain software Medium Medium
Post-provisioning hardware Medium Medium
Post-provisioning firmware Medium Medium
Post-provisioning software Low Low

Conclusion

What are your chances of spotting a compromise on your system?  I would argue that they are generally pretty much in line with the difficulty of performing the attack in the first place: with the glaring exception of supply-chain firmware.  We’ve seen attacks of this type, and they’re very difficult to detect.  The good news is that there is some good work going on to help detection of these types of attacks, particularly in the world of Linux[6] and open source.  In the meantime, I would argue our best forms of defence are currently:

  • for supply-chain: build close relationships, use known and trusted suppliers.  You may want to restrict as much as possible of your supply chain to “friendly” regimes if you’re worried about State Actor attacks, but this is very hard in the global economy.
  • for post-provisioning: lock down your systems as much as possible – both physically and logically – and use behavioural monitoring to try to detect anomalies in what you expect them to be doing.

1 – I’ll try to write something on this other topic in a different article.

2 – depending on interest, I’ll also consider a series of articles to go into more detail on each.

3 – how certain are you, for instance, that your delivery company won’t give your own government’s security services access to the boxes containing your equipment before they deliver them to you?

4 – though see above: what about the firmware?

5 – though you can always compile your own Operating System if you use open source software[6].

6 – oh, you didn’t compile your compiler yourself?  All bets off, then…

7 – yes, “GNU Linux”.