Is homogeneity bad for security?

Can it really be good for security to have such a small number of systems out there?

For the last three years, I’ve attended the Linux Security Summit (though it’s not solely about Linux, actually), and that’s where I am for the first two days of this week – the next three days are taken up with the Open Source Summit.  This year, both are being run both in North America and in Europe – and there was a version of the Open Source Summit in Asia, too.  This is all good, of course: the more people, and the more diversity we have in the community, the stronger we’ll be.

The question of diversity came up at the Linux Security Summit today, but not in the way you might necessarily expect.  As with most of the industry, this very technical conference (there’s a very strong Linux kernel developer bias) is very under-represented by women, ethnic minorities and people with disabilities.  It’s a pity, and something we need to address, but when a question came up after someone’s talk, it wasn’t diversity of people’s background that was being questioned, but of the systems we deploy around the world.

The question was asked of a panel who were talking about open firmware and how making it open source will (hopefully) increase the security of the system.  We’d already heard how most systems – laptops, servers, desktops and beyond – come with a range of different pieces of firmware from a variety of different vendors.  And when we talk about a variety, this can easily hit over 100 different pieces of firmware per system.  How are you supposed to trust a system with some many different pieces?  And, as one of the panel members pointed out, many of the vendors are quite open about the fact that they don’t see themselves as security experts, and are actually asking the members of open source projects to design APIs, make recommendations about design, etc..

This self-knowledge is clearly a good thing, and the main focus of the panel’s efforts has been to try to define a small core of well-understood and better designed elements that can be deployed in a more trusted manner.   The question that was asked from the audience was in response to this effort, and seemed to me to be a very fair one.  It was (to paraphrase slightly): “Can it really be good for security to have such a small number of systems out there?”  The argument – and it’s a good one in general – is that if you have a small number of designs which are deployed across the vast majority of installations, then there is a real danger that a small number of vulnerabilities can impact on a large percentage of that install base.

It’s a similar problem in the natural world: a population with a restricted genetic pool is at risk from a successful attacker: a virus or fungus, for instance, which can attack many individuals due to their similar genetic make-up.

In principle, I would love to see more diversity of design within computing, and particular security, but there are two issues with this:

  1. management: there is a real cost to managing multiple different implementations and products, so organisations prefer to have a smaller number of designs, reducing the number of the tools to manage them, and the number of people required to be trained.
  2. scarcity of resources: there is a scarcity of resources within IT security.  There just aren’t enough security experts around to design good security into systems, to support them and then to respond to attacks as vulnerabilities are found and exploited.

To the first issue, I don’t see many easy answers, but to the second, there are three responses:

  1. find ways to scale the impact of your resources: if you open source your code, then the number of expert resources available to work on it expands enormously.  I wrote about this a couple of years ago in Disbelieving the many eyes hypothesis.  If your code is proprietary, then the number of experts you can leverage is small: if it is open source, you have access to almost the entire worldwide pool of experts.
  2. be able to respond quickly: if attacks on systems are found, and vulnerabilities identified, then the ability to move quickly to remedy them allows you to mitigate significantly the impact on the installation base.
  3. design in defence in depth: rather than relying on one defence to an attack or type of attack, try to design your deployment in such a way that you have layers of defence. This means that you have some time to fix a problem that arises before catastrophic failure affects your deployment.

I’m hesitant to overplay the biological analogy, but the second and third of these seem quite similar to defences we see in nature.  The equivalent to quick response is to have multiple generations in a short time, giving a species the opportunity to develop immunity to a particular attack, and defence in depth is a typical defence mechanism in nature – think of human’s ability to recognise bad meat by its smell, taste its “off-ness” and then vomit it up if swallowed.  I’m not quite sure how this particular analogy would map to the world of IT security (though some of the practices you see in the industry can turn your stomach), but while we wait to have a bigger – and more diverse pool of security experts, let’s keep being open source, let’s keep responding quickly, and let’s make sure that we design for defence in depth.

 

Single point of failure

Any failure which completely brings down a system for over 12 hours counts as catastrophic.

Yesterday[1], Gatwick Airport suffered a catastrophic failure. It wasn’t Air Traffic Control, it wasn’t security scanners, it wasn’t even check-in desk software, but the flight information boards. Catastrophic? Well, maybe the impact on the functioning of the airport wasn’t catastrophically affected, but the system itself was. For my money, any failure which completely brings down a system for over 12 hours (from 0430 to 1700 BST, reportedly), counts as catastrophic.

The failure has been blamed on damage to a fibre optic cable. It turned out that if this particular component of the system was brought down, then the system failed to operate as expected: it was a single point of failure. Now, in this case, it could be argued that the failure did not have a security impact: this was a resilience problem. Setting aside the fact that resilience and security are often bedfellows[2], many single points of failure absolutely are security issues, as they become obvious points of vulnerability for malicious actors to attack.

A key skill that needs to be grown with IT in general, but security in particular, is systems thinking, as I’ve discussed elsewhere, including in my first post on this blog: Systems security – why it matters. We need more systems engineers, and more systems architects. The role of systems architects, specifically, is to look beyond the single components that comprise a system, and to consider instead the behaviour of the system as a whole. This may mean looking past our first focus and our to, to for instance, hardware or externally managed systems to consider what the impact of failure, damage or compromise would be to the system’s overall operation.

Single points of failure are particularly awkward.  They crop up in all sorts of places, and they are a very good example of why diversity is important within IT security, and why you shouldn’t trust a single person – including yourself – to be the only person who looks at the security of a system.  My particular biases are towards crypto and software, for instance, so I’m more likely to miss a hardware or network point of failure than somebody with a different background to me.  Not to say that we shouldn’t try to train ourselves to think outside of whatever little box we come from – that’s part of the challenge and excitement of being a systems architect – but an acknowledgement of our own lack of expertise is in itself a realisation of our expertise: if you realise that you’re not an expert, you’re part way to becoming one.

I wanted to finish with an example of a single point of failure that is relevant to security, and exposes a process vulnerability.  The Register has a good write-up of the Foreshadow attack and its impact on SGX, Intel’s Trusted Execution Environment (TEE) capability.  What’s interesting, if the write-up is correct, is that what seems like a small break to a very specific part of the entire security chain means that you suddenly can’t trust anything.  The trust chain is broken, and you have to distrust everything you think you know.  This is a classic security problem – trust is a very tricky set of concepts – and one of the nasty things about it is that it may be entirely invisible to the user that an attack has taken place at all, particularly as the user, at this point, may have no visibility of the chain of trust that has been established – or not – up to the point that they are involved.  There’s a lot more to write about on this subject, but that’s for another day.  For now, if you’re planning to visit an airport, ensure that you have an app on your phone which will tell you your flight departure time and the correct gate.


1 – at time of writing, obviously.

2 – for non-native readers[3] , what I mean is that they are often closely related and should be considered together.

3 – and/or those unaquainted with my somewhat baroque language and phrasing habits[4].

4 – I prefer to double-dot when singing or playing Purcell, for instance[5].

5 – this is a very, very niche comment, for which slight apologies.

On holiday

Lots of people in the InfoSec world are at Black Hat and Def Con in Las Vegas this week, and there are more stories out there than you can shake a stick at.  I’m on holiday, and although it’s not as if I’m disinterested, I’ve decided to take the whole “not working” thing seriously, and I’m not going to blog about any of them this week.

My brush with GDPR

I ended up reporting a possible breach of data. My data

Since the first appearance of GDPR[1], I’ve strenuously avoided any direct interaction with it if at all possible.  In particular, I’ve been careful to ensure that nobody is under any illusion that my role involves any responsibility for our company’s implementation of GDPR.  In this I have been largely successful.  I say largely, because the people who send spam don’t seem to have noticed[2]: I suspect that anybody with the word “security” in their title has had a similar experience.

Of course, I have a decent idea what GDPR is supposed to be about: making sure that data that organisations hold about people is only used as it should be, is kept up to date, and that people can find out what exactly what information relating to them exactly is held[3].

This week, I got more involved GDPR than I’d expected: I ended up reporting a possible breach of data.  My data.  As it happens, my experience with the process was pretty good: so good, in fact, that I think it’s worth giving it as an example.

Breach!

A bit of scene-setting.  I live in the UK, which is (curently[4]) in the EU, and which means that, like pretty much all companies and organisations here, it is subject to the GDPR.  Last week, I had occasion to email a department within local government about an issue around services in my area.  Their website had suggested that they’d get back to me within 21 days or so, so I was slightly (and pleasantly) surprised when they replied within 5.

The email started so well: the title referred to the village in which I live.

It went downhill from there.  “Dear Mr Benedict[5]”, it ran.  I should be clear that I had used my actual name (which is not Benedict) for the purposes of this enquiry, so this was something of a surprise.  “Oh, well,” I thought to myself, “they’ve failed to mail merge the name field properly.” I read on.  “Here is the information you have requested about Ambridge…[5][6]”.  I do not live in Ambridge.  So far, this was just annoying: clearly the department had responded to the wrong query.  But it got worse.  “In particular regards to your residence, Willow Farm, Ambridge…[5]”

The department had sent me information which allowed me to identify Mr Benedict and his place of residence.  They had also failed to send me the information that I had requested.  What worried me more was that this might well not be an isolated event.  There was every chance that my name and address details had been sent to somebody else, and even that there was a cascade effect of private details being sent to email address after email address.  I mentioned this in annoyance to my wife – and she was the one to point out that it was a likely breach of GDPR.  “You should report it,” she said.

So I went to the local government office website and had a quick look around it.  Nothing obvious for reporting GDPR breaches.  I phoned the main number and got through to enquiries.  “I’d like to report a possible data breach, please,” I said.  “Could you put me through to whoever covers GDPR?”

To be honest, this was where I thought it would all go wrong.  It didn’t.  The person on the enquiries desk asked for more information.  I explained what department it was, about the email, and the fact that somebody else’s details had been exposed to me, I strongly suspected in breach of GDPR.

“Let me just see if I can find someone in our data team,” she said, and put me on hold.

I don’t know if you’ve ever been put on hold by someone in a local government office, but it’s rarely an event that should be greeted with rejoicing.  I prepared myself for a long wait, and was surprised when I was put through to someone fairly quickly.

The man to whom I spoke knew what he was doing.  In fact, he did an excellent job.  He took my details, he took details of the possible breach, he reassured me that this would be investigated.  He was polite, and seemed keen to get to the bottom of the affair.  He also immediately grasped what the problem was, and agreed that it needed to be investigated.  I’m not sure whether I was the first person ever to call up and so this was an adrenaline-fuelled roller coaster ride into uncharted territory[8] for him, or whether this was a routine conversation in the office, but he pitched his questions and responses at exactly the right level.  I offered to forward the relevant email to him so that he had the data himself.  He accepted.  The last point was the one that impressed me the most.  “Could you please delete the email from your system?” he asked.  This was absolutely the right request.  I agreed and did so.

And how slowly do the wheels of local government grind?  How long would it take me to get a response to my query?

I received a response the next day, from the department concerned.  They assured me that this was a one-off problem, that my personal data had not been compromised, and that there had not been a widespread breach, as I had feared.  They even sent me the information I had initially requested.

My conclusions from this?  They’re two-fold:

  1. From an individual’s point of view: yes, it is worth reporting breaches.  Action can, and should be taken.  If you don’t get a good response, you may need to escalate, but you have rights, and organisations have responsibilities: exercise those rights, and hold the organisations to account.  You may be pleasantly surprised by the outcome, as I was.
  2. From an organisation’s point of view: make sure that people within your organisation know what to do if someone contacts them about a data breach, whether it’s covered by statutory regulations (like GDPR) or not.  This should include whoever it is answers your main enquiry line or receives messages to generic company email accounts, and not just your IT or legal departments.  Educate everybody in the basics, and make sure that those tasked with dealing with issues are as well-trained and ready to respond as the people I encountered.

1 – General Data Protection Regulation, wouldn’t it be more fun if it were something like “Good Dogs Pee Regularly?”

2 – though GDPR does seem to have reduced the amount of spam, I think.

3 – if you’re looking for more information, I did write an article about this on Opensource.com earlier this year: Being open about data privacy.

4 – don’t start me.

5 – I’ve changed this information, for reasons which I hope are obvious.

6 – “dum-di-dum-di-dum-di-dum, dum-di-dum-di-da-da”[7]

7 – if you get this, then you either live in the UK (and probably listen to Radio 4), or you’re a serious Anglophile.

8 – hopefully not so uncharted for the designers and builders of the metaphorical roller coaster.

16 ways in which users are(n’t) like kittens

I’m going to exploit you all with an article about kittens and security.

It’s summer[1], it’s hot[2], nobody wants to work[3].  What we all want to do is look at pictures of cute kittens[5] and go “ahhh”.  So I’m going to exploit you all with an article about kittens and (vaguely about) security.  It’s light-hearted, it’s fluffy[6], and it has a picture of two of our cats at the top of it.  What’s not to like?

Warning: this article includes extreme footnoting, and may not be suitable for all readers[7].

Now, don’t get me wrong: I like users. realise the importance of users, really I do.  They are the reason we have jobs.  Unluckily, they’re often the reason we wish we didn’t have the jobs we do.  I’m surprised that nobody has previously bothered to compile a list comparing them with kittens[7.5], so I’ve done it for you.   For ease of reading, I’ve grouped ways in which users are like kittens towards the top of the table, and ways in which they’re unlike kittens towards the bottom[7.8].

Please enjoy this post, share it inappropriately on social media and feel free to suggest other ways in which kittens and users are similar or dissimilar.

Research findings

Hastily compiled table

Property Users Kittens
Capable of circumventing elaborate security measures
Yes Yes
Take up all of your time Yes Yes
Do things they’re not supposed to
Yes Yes
Forget all training instantly
Yes Yes
Damage sensitive equipment Yes Yes
Can often be found on Facebook
Yes Yes
Constantly need cleaning up after
Yes Yes
Often seem quite stupid, but are capable of extreme cunning at inopportune moments Yes Yes
Can turn savage for no obvious reason Yes Yes
Can be difficult to tell apart[10] Yes Yes
Fluffy No[8] Yes
Fall asleep a lot No[8] Yes
Wake you up at night No[9] Yes
Like to have their tummy tickled
No[8] Yes
Generally fun to be around No[8] Yes
Generally lovable No[8] Yes

1 – at time of writing, in the Northern Hemisphere, where I’m currently located.  Apologies to all those readers for whom it is not summer.

2 – see 1.

3 – actually, I don’t think this needs a disclaimer[4].

4 – sorry for wasting your time[4].

5 – for younger readers, “kittehs”.

6 – like the kittens.

7 – particularly those who object to footnotes.  You know who you are.

7.5 – actually, they may well have done, but I couldn’t be bothered to look[7.7]

7.7 – yes, I wrote the rest of the article first and then realised that I needed another footnote (now two), but couldn’t be bothered to renumber them all.  I’m lazy.

7.8 – you’re welcome[7.9].

7.9 – you know, this reminds me of programming BASIC in the old days, when it wasn’t easy to renumber your program, and you’d start out numbering in 10s, and then fill in the blanks and hope you didn’t need too many extra lines[7.95].

7.95 – b*gger.

8 – with some exceptions.

9 – unless you’re on support duty.  Then you can be pretty sure that they will.

10 – see picture.

11 – unused.

12 – intentionally left blank.

13 – unintentionally left blank.

Mitigate or remediate?

What’s the difference between mitigate and remediate?

I very, very nearly titled this article “The ‘aters gonna ‘ate”, and then thought better of it. This is a rare event, and I put it down to the extreme temperatures that we’re having here in the UK at the moment[1].

What prompted this article was reading something online today where I saw the word mitigate, and thought to myself, “When did I start using that word? It’s not a normal one to drop into conversation. And what about remediate? What’s the difference between mitigate and remediate? In fact, how well could I describe the difference between the two?” Both are quite jargon-y word, so, in the spirit of my recent article Jargon – a force for good or ill? here’s my attempt at describing the difference, but also pointing out how important both are – along with a couple of other “-ate” words.

Let’s work backwards.

Remediate

Remediation is a set of actions to get things back the way they should be. It’s the last step in the process of recovery from an attack or other failure. The reprefix here is the give -away: like resetting and reconciliation. When you’re remediating, there may be an expectation that you’ll be returning your systems to the same state they were before, for example, power failure, but that’s not necessarily the case. What you should be focussing on is the service you’re providing, rather than the system(s) that are providing it. A set of steps for remediation might require you to replace your database with another one from a completely different provider[3], to add a load-balancer and to change all of your hardware, but you’re still remediating the problem that hit you. At the very least, if you’ve suffered an attack, you should make sure that you plug any hole that allowed the attacker in to start with[4].

Mitigate

Mitigation isn’t about making things better – it’s about reducing the impact of an attack or failure. Mitigation is the first set of steps you take when you realise that you’ve got a problem. You could argue that these things are connected, but mitigation isn’t about returning the service to normal (remediation), but about taking steps to reduce the impact of an attack. Those mitigations may be external – adding load-balancers to help deal with a DDoS attack, maybe – or internal – shutting down systems that have been actively compromised.

In fact, I’d argue that some mitigations might quite properly actually have an adverse effect on the service: there may be short term reductions in availability to ensure that long-term remediations can be performed. Planning for this is vitally important, as I’ve discussed previously, in Service degradation: actually a good thing.

The other -ates: update and operate

As I promised above, there are a couple of other words that are part of your response to an attack – or possible attack. The first is update. Updating is one of the key measures that you can put in place to reduce the chance of a successful attack. It’s the step you take before mitigation – because, if you’re lucky, you won’t need mitigation, because you’ll be immune from attack[5].

The second of these is operate. Operation is your normal state: it’s where you want to be. But operation doesn’t mean that you can just sit back and feel secure: you need to be keeping an eye on what’s going on, planning upgrades, considering mitigations and preparing for remediations. We too often think of the operate step as our Happy Place, where all is rosy, and we can sit back and watch the daisies grow. As DevOps (and DevSecOps) is teaching us, this is absolutely not the case: operate is very much a dynamic state, and not a passive one.


1 – 30C (~86F) is hot for the UK. Oh, and most houses (ours included) don’t have air conditioning[2].

2 – and I work from an office in the garden. Direct sunlight through the mainly glass wall is a blessing in the winter. Less so in the summer.

3 – hopefully open source, of course.

4 – hint: patch, patch, patch.

5 – well, that attack, at least. You’re never immune from all attacks: sorry.

Who do you trust: your data, or your enemy’s?

Our security logs define our organisational memory.

Imagine, just imagine, that you’re head of an organisation, and you suspect that there’s been some sort of attack on your systems[1].  You don’t know for sure, but a bunch of your cleverest folks are pretty sure that there are sufficient signs to merit an investigation.  So, you agree to allow that investigation, and it’s pretty clear from the outset – and becomes yet more clear as the investigation unfolds – that there was an attack on your organisation.  And as this investigation draws close to its completion, you happen to meet the head of another organisation, which happens to be not only your competitor, but also the party that your investigators are certain was behind the attack.  That person – the leader of your competitor – tells you that they absolutely didn’t perform an attack: no, sirree.  Who do you believe?  Your people, who have been laboring[2] away for months, or your competitor?

What a ridiculous question: of course you’d believe your own people over your competitor, right?

So, having set up such an absurd scenario[4], let’s look at a scenario which is actually much more likely.  Your systems have been acting strangely, and there seems to be something going on.  Based on the available information, you believe that you’ve been attacked, but you’re not sure.  Your experts think it’s pretty likely, so you approve an investigation.  And then one of your investigatory team come to you to tell you some really bad news about the data.  “What’s the problem?” you ask.  “Is there no data?”

“No,” they reply, “it’s worse than that.  We’ve got loads of data, but we don’t know which is real.”

Our logs are our memory

There is a literary trope – one of my favourite examples is Margery Allingham’s The Traitor’s Purse – where a character realises that he or she is somebody other than who they think they are when they start to question their memories.  We are defined by our memories, and the same goes for organisational security.  Our security logs define our organisational memory.  If you cannot prove that the data you are looking at is[5] correct, then you cannot be sure what led to the state you are in now.

Let’s give a quick example.  In my organisation, I am careful to log every time I upgrade a piece of software, but I begin to wonder whether a particular application is behaving as expected.  Maybe I see some traffic to an external IP address which I’ve never seen before, for instance.  Well, the first thing I’m going to do is to check the logs to see whether somebody has updated the software with a malicious version.  Assuming that somebody has installed a malicious version, there are three likely outcomes at this point:

  1. you check the logs, and it’s clear that a non-authorised version was installed.  This isn’t good, because you now know that somebody has unauthorised access to your system, but at least you can do something about it.
  2. at least some of the logs are missing.  This is less good, because you really can’t be sure what’s gone on, but you know have a pretty strong suspicion that somebody with unauthorised access has deleted some logs to cover up their tracks, which means that you have a starting point for remediation.
  3. there’s nothing wrong.  All the logs look fine.  You’re now really concerned, as you can’t be sure of your own data – your organisation’s memories.  You may be looking at correct data, or you may be looking at incorrect data: data which has been written by your enemy.  Attackers can – and do – go into log files and change the data in them to obscure the fact that they have performed particular actions.  It’s actually one of the first steps that a competent attacker will perform.

In the scenario as defined, things probably aren’t too bad: you can check the checksum or hash of the installed software, and check that it’s the version you expect.  But what if the attacker has also changed the version of your checksum- or hash-checker so that, for packages associated with this particular application, they always return what you expect to see?  This is not a theoretical attack, and nor is it the only way approach that attackers have to try to muddy the waters as to what they’ve done. And if it has happened, they you have no way of confirming that you have the correct version of the application.  You can try updating the checksum- or hash-checker, but what if the attacker has meddled with the software installer to ensure that it always installs their version…?

It’s a slippery slope, and bar wiping the entire system and reinstalling[6], there’s little you can do to be certain that you’ve cleared things up properly.  And in some cases, it may be too late: what if the attacker’s main aim was to exfiltrate some of your proprietary data, for example?

Lessons to learn, actions to take

Well, the key lesson is that your logs are important, and they need to be protected.  If you can’t trust your logs, then it can be very, very difficult not only to identify the extent of an attack, but also to remediate it or, in the worst case, even to know that you’ve been attacked at all.

And what can you do?  Well, there are many techniques that you can employ, and the best combination will depend on a number of questions, including your regulatory regime, your security posture, and what attackers you decide to defend against.  I’d start with a fairly simple combination:

  • move your most important logs off-system.  Where possible, host logs on different systems to the ones that are doing the reporting.  If you’re an attacker, it’s more difficult to jump from one system to another than it is to make changes to logs on a system which you’ve already compromised;
  • make your logs write-only.  This sounds crazy – how are you supposed to check logs if they can’t be read?  What this really means is that you separate read and write privileges on your logs so that only those with a need to read them can do so.  Writing is less worrisome – though there are attacks here, including filesystem starvation – because if you can’t see what you need to change, then it’s almost impossible to do so.  If you’re an attacker, you might be able to wipe some logs – see our case 2 above – but obscuring what you’ve actually done is more difficult.

Exactly what steps you take are up to you, but remember: if you can’t trust your logs, you can’t trust your data, and if you can’t trust your data, you don’t know what has happened.  That’s what your enemy wants: confusion.


1 – like that’s ever going to happen.

2- on this, very rare, occasion, I’m going to countenance[2] a US spelling.  I think you can guess why.

3 – contenance?

4 – I know, I know.

5 – are?

6 – you checked the firmware, right?  Hmm – maybe safer just to buy completely new hardware.