The Curious Incident of the Patch in the Night-Time

Gregory: “The patch did nothing in the night-time.”
Holmes: “That was the curious incident.”

To misquote Sir Arthur Conan-Doyle:

Gregory (cyber-security auditor) “Is there any other point to which you would wish to draw my attention?”
Holmes: “To the curious incident of the patch in the night-time.”
Gregory: “The patch did nothing in the night-time.”
Holmes: “That was the curious incident.”

I considered a variety of (munged) literary titles to head up this blog, and settled on the one above or “We Need to Talk about Patching”.  Either way round, there’s something rotten in the state of patching*.

Let me start with what I hope is a fairly uncontroversial statement: “we all know that patches are important for security and stability, and that we should really take them as soon as they’re available and patch all of our systems”.

I don’t know about you, but I suspect you’re the same as me: I run ‘sudo dnf –refresh upgrade’** on my home machines and work laptop at least once every day that I turn them on.  I nearly wrote that when an update comes out to patch my phone, I take it pretty much immediately, but actually, I’ve been burned before with dodgy patches, and I’ll often have a check of the patch number to see if anyone has spotted any problems with it before downloading it. This feels like basic due diligence, particularly as I don’t have a “staging phone” which I could use to test pre-production and see if my “production phone” is likely to be impacted***.

But the overwhelming evidence from the industry is that people really don’t apply patches – including security patches – even though they understand that they ought to.  I plan to post another blog entry at some point about similarities – and differences – between patching and vaccinations, but let’s take as read, for now, the assumption that organisations know they should patch, and look at the reasons they don’t, and what we might do to improve that.

Why people don’t patch

Here are the legitimate reasons that I can think of for organisations not patching****.

  1. they don’t know about patches
    • not all patches are advertised well enough
    • organisations don’t check for patches
  2. they don’t know about their systems
    • incomplete knowledge of their IT estate
  3. legacy hardware
    • patches not compatible with legacy hardware
  4. legacy software
    • patches not  compatible with legacy software
  5. known impact with up-to-date hardware & software
  6. possible impact with up-to-date hardware & software

Some of these are down to the organisations, or their operating environment, clearly: 1b, 2, 3 and 4.  The others, however, are down to us as an industry.  What it comes down to is a balance of risk: the IT operations department doesn’t dare to update software with patches because they know that if the systems that they maintain go down, they’re in real trouble.  Sometimes they know there will be a problem (typically because they test patches in a staging environment of some type), and sometimes because they just don’t dare.  This may be because they are in the middle of their own software update process, and the combination of Operating System, middleware or integrated software updates with their ongoing changes just can’t be trusted.

What we can do

Here are some thoughts about what we as an industry can do to try to address this problem – or set of problems.

Staging

Staging – what is a staging environment for?  It’s for testing changes before they go into production, of course.  But what changes?  Changes to your software, or your suppliers’ software?  The answer has to be “both”, I think.  You may need separate estates so that you can look at changes of these two sets of software separately before seeing what combining them does, but in the end, it is the combination of the two that matters.  You may consider using the same estate at different times to test the different options, but that’s not an option for all organisations.

DevOps

DevOps shouldn’t just be about allowing agile development practices to become part of the software lifecycle: it should also be about allowing agile operational practices become a part of the software lifecycle.  DevOps can really help with patching strategy if you think of it this way.  Remember, in DevOps, everybody has responsibility.  So your DevOps pipeline the perfect way to test how changes in your software are affected by changes in the underlying estate.  And because you’re updating regularly, and have unit tests to check all the key functionality*****, any changes can be spotted and addressed quickly.

Dependencies

Patches sometimes have dependencies.  We should be clear when a patch requires other changes, resulting a large patchset, and when a large patchset just happens to be released because multiple patches are available.  Some dependencies may be outside the control of the vendor.  This is easier to test when your patch has dependencies on an underlying Operating System, for instance, but more difficult if the dependency is on the opposite direction.  If you’re the one providing the underlying update and the customer is using software that you don’t explicitly test, then it’s incumbent on you, I’d argue, to use some of the other techniques that I’ve outlined to help your customers understand likely impact.

Visibility of likely impact

One obvious option available to those providing patches is a good description of areas of impact.  You’d hope that everyone did this already, of course, but a brief line something like “this update is for the storage subsystem, and should affect only those systems using EXT3”, for instance, is a great help in deciding the likely impact of a patch.  You can’t always get it right – there may always be unexpected consequences, and vendors can’t test for all configurations.  But they should at least test all supported configurations…

Risk statements

This is tricky, and maybe political, but is it time that we started giving those customers who need it a little more detail about the likely impact of the changes within a patch?  It’s difficult to quantify, of course: a one-character change may affect 95% of the flows through a module, whereas what may seem like a simple functional addition to a customer may actually require thousands of lines of code.  But as vendors, we should have an idea of the impact of a change, and we ought to be considering how we expose that to customers.

Combinations

Beyond that, however, I think there are opportunities for customers to understand what the impact of not having accepted a previous patch is.  Maybe the risk of accepting patch A is low, but the risk of not accepting patch A and patch B is much higher.  Maybe it’s safer to accept patch A and patch C, but wait for a successor to patch B.  I’m not sure quite how to quantify this, or how it might work, but I think there’s grounds for research******.

Conclusion

Businesses have every right not to patch.  There are business reasons to balance the risk of patching against not patching.  But the balance is currently often tipped too far in direction of not patching.  Much too far.  And if we’re going to improve the state of IT security, we, the industry, need to do something about it.  By helping organisations with better information, by encouraging them to adopt better practices, by training them in how to assess risk, and by adopting better practices ourselves.

 


*see what I did there?

**your commands my vary.

***this almost sounds like a very good excuse for a second phone, though I’m not sure that my wife would agree.

****I’d certainly be interested to hear of others: please let me know via comments.

*****you do have these two things, right?  Because if you don’t, you’re really not doing DevOps.  Sorry.

******as soon as I wrote this, I realised that somebody’s bound to have done research on this issue.  Please let me know if you have: or know somebody who has.

 

“Zero-trust”: my love/hate relationship

… “explicit-trust networks” really is a much better way of describing what’s going on here.

A few weeks ago, I wrote a post called “What is trust?”, about how we need to be more precise about what we mean when we talk about trust in IT security.  I’m sure it’s case of confirmation bias*, but since then I’ve been noticing more and more references to “zero-trust networks”.  This both gladdens and annoys me, a set of conflicting emotions almost guaranteed to produce a new blog post.

Let’s start with the good things about the term.  “Zero-trust networks” are an attempt to describe an architectural approach which address the disappearance of macro-perimeters within the network.  In other words, people have realised that putting up a firewall or two between one network and another doesn’t have a huge amount of effect when traffic flows across an organisation – or between different organisations – are very complex and don’t just follow one or two easily defined – and easily defended – routes.  This problem is exacerbated when the routes are not only multiple – but also virtual.  I’m aware that all network traffic is virtual, of course, but in the old days**, even if you had multiple routing rules, ingress and egress of traffic all took place through a single physical box, which meant that this was a good place to put controls***.

These days (mythical as they were) have gone.  Not only do we have SDN (Software-Defined Networking) moving packets around via different routes willy-nilly, but networks are overwhelmingly porous.  Think about your “internal network”, and tell me that you don’t have desktops, laptops and mobile phones connected to it which have multiple links to other networks which don’t go through your corporate firewall.  Even if they don’t******, when they leave your network and go home for the night, those laptops and mobile phones – and those USB drives that were connected to the desktop machines – are free to roam the hinterlands of the Internet******* and connect to pretty much any system they want.

And it’s not just end-point devices, but components of the infrastructure which are much more likely to have – and need – multiple connections to different other components, some of which may be on your network, and some of which may not.  To confuse matters yet further, consider the “Rise of the Cloud”, which means that some of these components may start on “your” network, but may migrate – possibly in real time – to a completely different network.  The rise of micro-services (see my recent post describing the basics of containers) further exacerbates the problem, as placement of components seems to become irrelevant, so you have an ever-growing (and, if you’re not careful, exponentially-growing) number of flows around the various components which comprise your application infrastructure.

What the idea of “zero-trust networks” says about this – and rightly – is that a classical, perimeter-based firewall approach becomes pretty much irrelevant in this context.  There are so many flows, in so many directions, between so many components, which are so fluid, that there’s no way that you can place firewalls between all of them.  Instead, it says, each component should be responsible for controlling the data that flows in and out of itself, and should that it has no trust for any other component with which it may be communicating.

I have no problem with the starting point for this – which is as far as some vendors and architects take it: all users should always be authenticated to any system, and authorised before they access any service provided by that system. In fact, I’m even more in favour of extending this principle to components on the network: it absolutely makes sense that a component should control access its services with API controls.  This way, we can build distributed systems made of micro-services or similar components which can be managed in ways which protect the data and services that they provide.

And there’s where the problem arises.  Two words: “be managed”.

In order to make this work, there needs to be one or more policy-dictating components (let’s call them policy engines) from which other components can derive their policy for enforcing controls.  The client components must have a level of trust in these policy engines so that they can decide what level of trust they should have in the other components with which they communicate.

This exposes a concomitant issue: these components are not, in fact, in charge of making the decisions about who they trust – which is how “zero-trust networks” are often defined.  They may be in charge of enforcing these decisions, but not the policy with regards to the enforcement.  It’s like a series of military camps: sentries may control who enters and exits (enforcement), but those sentries apply orders that they’ve been given (policies) in order to make those decisions.

Here, then, is what I don’t like about “zero-trust networks” in a few nutshells:

  1. although components may start from a position of little trust in other components, that moves to a position of known trust rather than maintaining a level of “zero-trust”
  2. components do not decide what other components to trust – they enforce policies that they have been given
  3. components absolutely do have to trust some other components – the policy engines – or there’s no way to bootstrap the system, nor to enforce policies.

I know it’s not so snappy, but “explicit-trust networks” really is a much better way of describing what’s going on here.  What I do prefer about this description is it’s a great starting point to think about trust domains.  I love trust domains, because they allow you to talk about how to describe shared policy between various components, and that’s what you really want to do in the sort of architecture that’s I’ve talked about above.  Trust domains allow you to talk about issues such as how placement of components is often not irrelevant, about how you bootstrap your distributed systems, about how components are not, in the end, responsible for making decisions about how much they trust other components, or what they trust those other components to do.

So, it looks like I’m going to have to sit down soon and actually write about trust domains.  I’ll keep you posted.


*one of my favourite cognitive failures

**the mythical days that my children believe in, where people have bouffant hairdos, the Internet could fit on a single Winchester disk, and Linux Torvalds still lived in Finland.

***of course, there was no such perfect time – all I should need to say to convince you is one word: “Joshua”****

****yes, this is another filmic***** reference.

*****why, oh why doesn’t my spell-checker recognise this word?

******or you think they don’t – they do.

*******and the “Dark Web”: ooooOOOOoooo.

That Backdoor Fallacy revisited – delving a bit deeper

…if it breaks just once that becomes always,..

A few weeks ago, I wrote a post called The Backdoor Fallacy: explaining it slowly for governments.  I wish that it hadn’t been so popular.  Not that I don’t like the page views – I do – but because it seems that it was very timely, and this issue isn’t going away.  The German government is making the same sort of noises that the British government* was making when I wrote that post**.  In other words, they’re talking about forcing backdoors in encryption.  There was also an amusing/worrying story from slashdot which alleges that “US intelligence agencies” attempted to bribe the developers of Telegram to weaken the encryption in their app.

Given some of the recent press on this, and some conversations I’ve had with colleagues, I thought it was worth delving a little deeper***.  There seem to be three sets of use cases that it’s worth addressing, and I’m going to call them TSPs, CSPs and Other.  I’d also like to make it clear here that I’m talking about “above the board” access to encrypted messages: access that has been condoned by the relevant local legal system.  Not, in other words, the case of the “spooks”.  What they get up to is for another blog post entirely****.  So, let’s look at our three cases.

TSPs – telecommunications service providers

In order to get permission to run a telecommunications service(wired or wireless) in most (all?) jurisdictions, you need to get approval from the local regulator: a licence.  This licence is likely to include lots of requirements: a typical one is that you, the telco (telecoms company) must provide access at all times to emergency numbers (999, 911, 112, etc.).  And another is likely to be that, when local law enforcement come knocking with a legal warrant, you must give them access to data and call information so that they can basically do wire-taps.  There are well-established ways to do this, and fairly standard legal frameworks within which it happens: basically, if a call or data stream is happening on a telco’s network, they must provide access to it to legal authorities.  I don’t see an enormous change to this provision in what we’re talking about.

CSPs – cloud service providers

Things get a little more tricky where cloud service providers are concerned.  Now, I’m being rather broad with my definition, and I’m going to lump your Amazons, Googles, Rackspaces and such in with folks like Facebook, Microsoft and other providers who could be said to be providing “OTT” (Over-The-Top – in that they provide services over the top of infrastructure that they don’t own) services.  Here things are a little greyer*****.  As many of these companies (some of who are telcos, how also have a business operating cloud services, just to muddy the waters further) are running messaging, email services and the like, governments are very keen to apply similar rules to them as those regulating the telcos. The CSPs aren’t keen, and the legal issues around jurisdiction, geography and what the services are complicate matter.  And companies have a duty to their shareholders, many of whom are of the opinion that keeping data private from government view is to be encouraged.  I’m not sure how this is going to pan out, to be honest, but I watch it with interest.  It’s a legal battle that these folks need to fight, and I think it’s generally more about cryptographic key management – who controls the keys to decrypt customer information – than about backdoors in protocols or applications.

Other

And so we come to other.  This bucket includes everything else.  And sadly, our friends the governments want their hands on all of that everything else.    Here’s a little list of some of that everything else.  Just a subset.  See if you can see anything on the list that you don’t think there should be unfettered access to (and remember my previous post about how once access is granted, it’s basically game over, as I don’t believe that backdoors end up staying secret only to “approved” parties…):

  • the messages you send via apps on your phone, or tablet, or laptop or PC;
  • what you buy on Amazon;
  • your banking records – whether on your phone or at the bank;
  • your emails via your company VPN;
  • the stored texts on your phone when you enquired about the woman’s shelter
  • your emails to your doctor;
  • your health records – whether stored at your insurers, your hospital or your doctor’s surgery;
  • your browser records about emergency contraception services;
  • access to your video doorbell;
  • access to your home wifi network;
  • your neighbour’s child’s chat message to the ChildLine (a charity for abused children in the UK – similar exist elsewhere)
  • the woman’s shelter’s records;
  • the rape crisis charity’s records;
  • your mortgage details.

This is a short list.  I’ve chosen emotive issues, of course I have, but they’re all legal.  They don’t even include issues like extra-marital affairs or access to legal pornography or organising dissent against oppressive regimes, all of which might well edge into any list that many people might copmile.  But remember – if a backdoor is put into encryption, or applications, then these sorts of information will start leaking.  And they will leak to people you don’t want to have them.

Our lives revolve around the Internet and the services that run on top of it.  We have expectations of privacy.  Governments have an expectation that they can breach that privacy when occasion demands.  And I don’t dispute that such an expectation is valid.  The problem that this is not the way to do it, because of that phrase “when occasion demands”.  If the occasion breaks just once, then that becomes always, and not just to “friendly” governments.  To unfriendly governments, to criminals, to abusive partners and abusive adults and bad, bad people.  This is not a fight for us to lose.


*I’m giving the UK the benefit of the doubt here: as I write, it’s unclear whether we really have a government, and if we do, for how long it’ll last, but let’s just with it for now.

**to be fair, we did have a government then.

***and not just because I like the word “delving”.  Del-ving.  Lovely.

****one which I probably won’t be writing if I know what’s good for me.

*****I’m a Brit, so I use British spelling: get over it.

Helping our governments – differently

… we may live in a new security and terrorism landscape

Two weeks ago, I didn’t write a full post, because the Manchester arena bombing was too raw.  We are only a few days on from the London Bridge attack, and I could make the same decision, but think it’s time to recognise that we have a new reality that we need to face in Britain: that we may live in a new security and terrorism landscape.  The sorts of attacks – atrocities – that have been perpetrated over the past few weeks (and the police and security services say that despite three succeeding, they’ve foiled another five) are likely to keep happening.

And they’re difficult to predict, which means that they’re difficult to stop.  There are already renewed calls for tech companies* to provide tools to allow the Good Guys[tm**] to read the correspondence of the people who are going to commit terrorist acts.  The problem is that the preferred approach requested/demanded by governments seems to be backdoors in encryption and/or communications software, which just doesn’t work – see my post The Backdoor Fallacy – explaining it slowly for governments.  I understand that “reasonable people” believe that this is a solution, but it really isn’t, for all sorts of reasons, most of which aren’t really that technical at all.

So what can we do?  Three things spring to mind, and before I go into them, I’d like to make something clear, and it’s that I have a huge amount of respect for the men and women who make up our security services and intelligence community.  All those who I’ve met have a strong desire to perform their job to the best of their ability, and to help protect us from people and threats which could damage us, our property, and our way of life.  Many of these people and threats we know nothing about, and neither do we need to.  The job that the people in the security services do is vital, and I really don’t see any conspiracy to harm us or take huge amounts of power because it’s there for the taking.  I’m all for helping them, but not at the expense of the rights and freedoms that we hold dear.  So back to the question of what we can do.  And by “we” I mean the nebulous Security Community****.  Please treat these people with respect, and be aware they they work very, very hard, and often in difficult and stressful jobs*****.

The first is to be more aware of our environment.  We’re encouraged to do this in our daily lives (“Report unaccompanied luggage”…), but what more could we do in our professional lives?  Or what could we do in our daily lives by applying our professional capabilities and expertise to everyday activities?  What suspicious activities – from traffic on networks from unexpected place to new malware – might be a precursor to something else?  I’m not saying that we’re likely to spot the next terrorism attack – though we might – but helping to combat other crime more effectively both reduces the attack surface for terrorists and increases the available resourcing for counter-intelligence.

Second: there are, I’m sure, many techniques that are available to the intelligence community that we don’t know about.  But there is a great deal of innovation within enterprise, health and telco (to choose three sectors that I happen to know quite well******) that could well benefit our security services.  Maybe your new network analysis tool, intrusion detector, data aggregator has some clever smarts in it, or creates information which might be of interest to the security community.  I think we need to be more open to the idea of sharing these projects, products and skills – proactively.

The third is information sharing.  I work for Red Hat, an Open Source company which also tries to foster open thinking and open management styles.  We’re used to sharing, and industry, in general, is getting better about sharing information with other organisations, government and the security services.  We need to get better at sharing both active data from systems which are running as designed and bad data from systems that are failing, under attack or compromised.  Open, I firmly believe, should be our default state*******.

If we get better at sharing information and expertise which can help the intelligence services in ways which don’t impinge negatively on our existing freedoms, maybe we can reduce the calls for laws that will do so.  And maybe we can help stop more injuries, maimings and deaths.  Stand tall, stand proud.  We will win.


*who isn’t a tech company, these days, though?  If you sell home-made birthday cards on Etsy, or send invoices via email, are you a tech company?  Who knows.

**this an ironic tm***

***not that I don’t think that there are good guys – and gals – but just that it’s difficult to define them.  Read on: you’ll see.

****I’ve talked about this before – some day I’ll define it.

*****and most likely for less money than most of the rest of us.

******feel free to add or substitute your own.

*******OK, DROP for firewall and authorisation rules, but you get my point.

“What is trust?”

I trust my brother and my sister with my life.

Academic discussions about trust abound*.  Particularly in the political and philosophical spheres, the issue of how people trust in institutions, and when and where they don’t, is an important topic of discussion, particularly in the current political climate.  Trust is also a concept which is very important within security, however, and not always well-defined or understood.  It’s central,to my understanding of what security means, and how I discuss it, so I’m going to spend this post trying to explain what I mean by “trust”.

Here’s my definition of trust, and three corollaries.

  • “Trust is the assurance that one entity holds that another will perform particular actions according to a specific expectation.”
  • My first corollary**: “Trust is always contextual.”
  • My second corollary:” One of the contexts for trust is always time”.
  • My third corollary: “Trust relationships are not symmetrical.”

Why do we need this set of definitions?  Surely we all know what trust is?

The problem is that whilst humans are very good at establishing trust with other humans (and sometimes betraying it), we tend to do so in a very intuitive – and therefore imprecise – way.  “I trust my brother” is all very well as a statement, and may well be true, but such a statement is always made contextually, and that context is usually implicit.  Let me provide an example.

I trust my brother and my sister with my life.  This is literally true for me, and you’ll notice that I’ve already contextualised the statement already: “with my life”.  Let’s be a little more precise.  My brother is a doctor, and my sister a trained scuba diving professional.  I would trust my brother to provide me with emergency medical aid, and I would trust my sister to service my diving gear****.  But I wouldn’t trust my brother to service my diving gear, nor my sister to provide me with emergency medical aid.  In fact, I need to be even more explicit, because there are times which I would trust my sister in the context of emergency medical aid: I’m sure she’d be more than capable of performing CPR, for example.  On the other hand, my brother is a paediatrician, not a surgeon, so I’d not be very confident about allowing him to perform an appendectomy on me.

Let’s look at what we’ve addressed.  First, we dealt with my definition:

  • the entities are me and my siblings;
  • the actions ranged from performing an emergency appendectomy to servicing my scuba gear;
  • the expectation was actually fairly complex, even in this simple example: it turns out that trusting someone “with my life” can mean a variety of things from performing specific actions to remedy an emergency medical conditions to performing actions which, if neglected or incorrectly carried out, could cause death in the future.

We also addressed the first corollary:

  • the contexts included my having a cardiac arrest, requiring an appendectomy, and planning to go scuba diving.

Let’s add time – the second corollary:

  • my sister has not recently renewed her diving instructor training, so I might feel that I have less trust in her to service my diving gear than I might have done five years ago.

The third corollary is so obvious in human trust relationships that we often ignore it, but it’s very clear in our examples:

  • I’m neither a doctor nor a trained scuba diving instructor, so my brother and my sister trust me neither to provide emergency medical care nor to service their scuba gear.******

What does this mean to us in the world of IT security?  It means that we need to be a lot more precise about trust, because humans come to this arena with a great many assumptions.  When we talk about a “trusted platform”, what does that mean?  It must surely mean that the platform is trusted by an entity (the workload?) to perform particular actions (provide processing time and memory?) whilst meeting particular expectations (not inspecting program memory? maintaining the integrity of data?).  The context of what we mean for a “trusted platform” is likely to be very different between a mobile phone, a military installation and an IoT gateway.  And that trust may erode over time (are patches applied? is there a higher likelihood that an attacker my have compromised the platform a day, a month or a year after the workload was provisioned to it?).

We should also never simply say, following the third corollary, that “these entities trust each other”.  A web server and a browser may have established trust relationships, for example, but these are not symmetrical.  The browser has  probably established with sufficient assurance for the person operating it to give up credit card details that the web server represents the provider of particular products and services.  The web server has probably established that the browser currently has permission to access the account of the user operating it.

Of course, we don’t need to be so explicit every time we make such a statement.  We can explain these relationships in definitions of documents, but we must be careful to clarify what the entities, the expectations, the actions, the contexts and possible changes in context.  Without this, we risk making dangerous assumptions about how these entities operate and what breakdowns in trust mean and could entail.


*Which makes me thinks of rabbits.

**I’m hoping that we can all agree on these – otherwise we may need to agree on a corollary bypass.***

***I’m sorry.

****I’m a scuba diver, too.  At least in theory.*****

*****Bringing up children is expensive and time-consuming, it turns out.

******I am, however, a trained CFR, so I hope they’d trust me to perform CPR on them.

Service degradation: actually a good thing

…here’s the interesting distinction between the classic IT security mindset and that of “the business”: the business generally want things to keep running.

Well, not all the time, obviously*.  But bear with me: we spend most of our time ensuring that all of our systems are up and secure and working as expected, because that’s what we hope for, but there’s a real argument for not only finding out what happens when they don’t, and not just planning for when they don’t, but also planning for how they shouldn’t.  Let’s start by examining some techniques for how we might do that.

Part 1 – planning

There’s a story** that the oil company Shell, in the 1970’s, did some scenario planning that examined what were considered, at the time, very unlikely events, and which allowed it to react when OPEC’s strategy surprised most of the rest of the industry a few years later.  Sensitivity modelling is another technique that organisations use at the financial level to understand what impact various changes – in order fulfilment, currency exchange or interest rates, for instance – make to the various parts of their business.  Yet another is war gaming, which the military use to try to understand what will happen when failures occur: putting real people and their associated systems into situations and watching them react.  And Netflix are famous for taking this a step further in the context of the IT world and having a virtual Chaos Monkey (a set of processes and scripts) which they use to bring down parts of their systems in real time to allow them to understand how resilient they the wider system is.

So that gives us four approaches that are applicable, with various options for automation:

  1. scenario planning – trying to understand what impact large scale events might have on your systems;
  2. sensitivity planning – modelling the impact on your systems of specific changes to the operating environment;
  3. wargaming – putting your people and systems through simulated events to see what happens;
  4. real outages – testing your people and systems with actual events and failures.

Actually going out of your way to sabotage your own systems might seem like insane behaviour, but it’s actually a work of genius.  If you don’t plan for failure, what are you going to do when it happens?

So let’s say that you’ve adopted all of these practices****: what are you going to do with the information?  Well, there are some obvious things you can do, such as:

  • removing discovered weaknesses;
  • improving resilience;
  • getting rid of single points of failure;
  • ensuring that you have adequately trained staff;
  • making sure that your backups are protected, but available to authorised entities.

I won’t try to compile an exhaustive list, because there are loads books and articles and training courses about this sort of thing, but there’s another, maybe less obvious, course of action which I believe we must take, and that’s plan for managed degradation.

Part 2 – managed degradation

What do I mean by that?  Well, it’s simple.  We***** are trained and indoctrinated to take the view that if something fails, it must always “fail to safe” or “fail to secure”.  If something stops working right, it should stop working at all.

There’s value in this approach, of course there is, and we’re paid****** to ensure everything is secure, right?  Wrong.  We’re actually paid to help keep the business running, and here’s the interesting distinction between the classic IT security mindset and that of “the business”: the business generally want things to keep running.  Crazy, right?  “The business” want to keep making money and servicing customers even if things aren’t perfectly secure!  Don’t they know the risks?

And the answer to that question is “no”.  They don’t know the risks.  And that’s our real job: we need to explain the risks and the mitigations, and allow a balancing act to take place.  In fact, we’re always making those trade-offs and managing that balance – after all, the only truly secure computer is one with no network connection, no keyboard, no mouse and no power connection*******.  But most of the time, we don’t need to explain the decisions we make around risk: we just take them, following best industry practice, regulatory requirements and the rest.  Nor are the trade-offs usually so stark, because when failure strikes – whether through an attack, accident or misfortune – it’s often a pretty simple choice between maintaining a particular security posture and keeping the lights on.  So we need to think about and plan for some degradation, and realise that on occasion, we may need to adopt a different security posture to the perfect (or at least preferred) one in which we normally operate.

How would we do that?  Well, the approach I’m advocating is best described as “managed degradation”.  We allow our systems – including, where necessary our security systems – to degrade to a managed (and preferably planned) state, where we know that they’re not operating at peak efficiency, but where they are operating.  Key, however, is that we know the conditions under which they’re working, so we understand their operational parameters, and can explain and manage the risks associated with this new posture.  That posture may change, in response to ongoing events, and the systems and our responses to those events, so we need to plan ahead (using the techniques I discussed above) so that we can be flexible enough to provide real resiliency.

We need to find modes of operation which don’t expose the crown jewels******** of the business, but do allow key business operations to take place.  And those key business operations may not be the ones we expect – maybe it’s more important to be able to create new orders than to collect payments for them, for instance, at least in the short term.  So we need to discuss the options with the business, and respond to their needs.  This planning is not just security resiliency planning: it’s business resiliency planning.  We won’t be able to consider all the possible failures – though the techniques I outlined above will help us to identify many of them – but the more we plan for, the better we will be at reacting to the surprises.  And, possibly best of all, we’ll be talking to the business, informing them, learning from them, and even, maybe just a bit, helping them understand that the job we do does have some value after all.


*I’m assuming that we’re the Good Guys/Gals**.

**Maybe less story than MBA*** case study.

***There’s no shame in it.

****Well done, by the way.

*****The mythical security community again – see past posts.

******Hopefully…

*******Preferably at the bottom of a well, encased in concrete, with all storage already removed and destroyed.

********Probably not the actual Crown Jewels, unless you work at the Tower of London.

大勢がレビューしていても信用しないという仮説

大勢がレビューしているから、バグがないという考え方。これは神話です。

この記事は https://aliceevebob.com/2017/04/04/disbelieving-the-many-eyes-hypothesis/ を翻訳したものです。
コードを書くことは大変な作業です。
ソースコードを書くのは、もっと大変です。もっと、もっと大変なのです。

セキュリティの機能を実装するコードを書いているとわかりますが、このようなコードは詳細まで精査されたアーキテクチャとデザインに基づいています。

さらに世界中でレビューされるというプロセスを経た、パーフェクトで破られることがないスタンダードが反映されているかもしれません。
(「パーフェクトで破られることがない」まあ、本当かどうかは置いておいて、そう仮定しましょう)

が、このようにデザインとアーキテクチャが素晴らしくても、実際のソフトウェアに組み込むのはとても特殊なことです。

数学的にソフトウェアが正しいと証明されても、実現しようとしている機能が正しく実装できるソフトウェアを書くということは、科学と芸術の間にあるようなものです。(デザインとアーキテクチャに寄るとは言っても結局自分が何をしたいかに寄るのですが。)

ソフトウェアをデバッグしたり、ソフトウェアに踏み入ることでその正確さを推測しようとする人にとっては、当然のことです。ただ、このブログ記事のポイントではありません。

誰も(コードを5行以上書いたことがある人、Perlでだったら6行)実際にソフトウェアがこのプロセスでパーフェクトになるとは思わないでしょう。
しかしソフトウェアは出来るだけパーフェクトでバグがないよう作るべきだ、というのはわかってもらえると思います。
だからこそ、コードレビューはソフトウェア開発の中心なのです。

ラッキーなことに(少なくとも私的には)、私たちが日々使うコードのほとんどはオープンソースです。これは誰もがコード見ることができ、何千もの人にレビューしてもらうことができる、ということです。

ここで問題が生じます。
オープンソースがたくさんの人にレビューされるということは、全てのバグが解決されるという考え方があります。

これは神話です。とても危険な都市伝説です。

先ほどの考えには二面性があります。
一つ目は、「ビルドすれば、出来上がる」という間違った考え方です。
世界中に全てのウェブサイトがリストされていた昔、そこにあなたのウェブサイトを登録すれば皆さんに見てもらえると思った時代があったでしょう。
(私も登録しました。見てもらえました。これは奇跡です)

同じように、オープンソースプロジェクトは(おそらく)少なかったので、たくさんの人がコードレビューしてくれました。
このような時代は終わりました。ずっと昔に。

二つ目に、セキュリティの機能を、良い例として暗号化の基礎などですが、正しくレビューできる人はすごく少ないのです。

ベンダー特有のコードに問題が少ないという訳ではありません。むしろ、正反対です。
ベンダー特有のコードのデザインやアーキテクチャがレビューに対してオープンになっていない、ということではなく、コードを見てくれる人が少なくてヒエラルキーなプレッシャーと組織的な考え方に寄る危険性がとても大きいということなのです。

「ベンダーのコードは安全だ」というのは神話というより、フェイクニュースでしょうね。
多くの企業がセキュリティソフトを隠したい理由がよくわかります。
「知的財産の保護」と彼らがよく言うのは、実際リリースするのに安全ではないからなのではないか、と心配してしまいます。
私にとってセキュリティソフトとは、一貫して「オープンなソース」なのです。

では、何ができるでしょう。
セキュリティを気にする企業や組織はリソースを割いて機能を実装したりコードのチェックやレビューができるはずです。そして、そうする責任があると私は考えます。
その部分に私の勤めるRed Hatがコミットしているのです。

同時にオープンソースコミュニティはクリティカルなプロジェクトをサポートし、コードに含まれるレビューの数を増やそうとする方法を探しているのです。(例えば Linux FoundationCore Infrastructure Initiativeです)

さらに学術組織に、オープンソースソフトの大切さをハイライトすることはもちろん、セキュリティソフトの書き方とレビューという暗黒アートを
生徒に訓練させるよう推奨する必要があります。

改善できることはあるでしょう。事実、改善もしています。
「多くのレビュー(略)神話」はコードを改善できないという訳ではなく、むしろできるということを認識しなければいけません。ただ、もっとエキスパートのレビューが必要だということです。
元の記事:
https://aliceevebob.com/2017/04/04/disbelieving-the-many-eyes-hypothesis/
2017年4月4日 Mike Bursell

Disbelieving the many eyes hypothesis

There is a view that because Open Source Software is subject to review by many eyes, all the bugs will be ironed out of it. This is a myth.

大勢がレビューしていても信用しないという仮説

Writing code is hard.  Writing secure code is harder: much harder.  And before you get there, you need to think about design and architecture.  When you’re writing code to implement security functionality, it’s often based on architectures and designs which have been pored over and examined in detail.  They may even reflect standards which have gone through worldwide review processes and are generally considered perfect and unbreakable*.

However good those designs and architectures are, though, there’s something about putting things into actual software that’s, well, special.  With the exception of software proven to be mathematically correct**, being able to write software which accurately implements the functionality you’re trying to realise is somewhere between a science and an art.  This is no surprise to anyone who’s actually written any software, tried to debug software or divine software’s correctness by stepping through it.  It’s not the key point of this post either, however.

Nobody*** actually believes that the software that comes out of this process is going to be perfect, but everybody agrees that software should be made as close to perfect and bug-free as possible.  It is for this reason that code review is a core principle of software development.  And luckily – in my view, at least – much of the code that we use these days in our day-to-day lives is Open Source, which means that anybody can look at it, and it’s available for tens or hundreds of thousands of eyes to review.

And herein lies the problem.  There is a view that because Open Source Software is subject to review by many eyes, all the bugs will be ironed out of it.  This is a myth.  A dangerous myth.  The problems with this view are at least twofold.  The first is the “if you build it, they will come” fallacy.  I remember when there was a list of all the websites in the world, and if you added your website to that list, people would visit it****.  In the same way, the number of Open Source projects was (maybe) once so small that there was a good chance that people might look at and review your code.  Those days are past – long past.  Second, for many areas of security functionality – crypto primitives implementation is a good example – the number of suitably qualified eyes is low.

Don’t think that I am in any way suggesting that the problem is any lesser in proprietary code: quite the opposite.  Not only are the designs and architectures in proprietary software often hidden from review, but you have fewer eyes available to look at the code, and the dangers of hierarchical pressure and groupthink are dramatically increased.  “Proprietary code is more secure” is less myth, more fake news.  I completely understand why companies like to keep their security software secret – and I’m afraid that the “it’s to protect our intellectual property” line is too often a platitude they tell themselves, when really, it’s just unsafe to release it.  So for me, it’s Open Source all the way when we’re looking at security software.

So, what can we do?  Well, companies and other organisations that care about security functionality can – and have, I believe a responsibility to – expend resources on checking and reviewing the code that implements that functionality.  That is part of what Red Hat, the organisation for whom I work, is committed to doing.  Alongside that, we, the Open Source community, can – and are – finding ways to support critical projects and improve the amount of review that goes into that code*****.  And we should encourage academic organisations to train students in the black art of security software writing and review, not to mention highlighting the importance of Open Source Software.

We can do better – and we are doing better.  Because what we need to realise is that the reason the “many eyes hypothesis” is a myth is not that many eyes won’t improve code – they will – but that we don’t have enough expert eyes looking.  Yet.


* Yeah, really: “perfect and unbreakable”.  Let’s just pretend that’s true for the purposes of this discussion.

** …and which still relies on the design and architecture actually to do what you want – or think you want – of course, so good luck.

*** nobody who’s actually written more than about 5 lines of code (or more than 6 characters of Perl)

**** I added one.  They came.  It was like some sort of magic.

***** see, for instance, the Linux Foundation‘s Core Infrastructure Initiative

Ignorance as a virtue: being proud to say “I don’t know”

“I am the wisest man alive, for I know one thing, and that is that I know nothing.” Socrates

In order to be considered an expert in any field, you have to spend a lot of time learning things.  In fact, I’d argue that one of the distinguishing traits of someone who is – or could become – an expert is their willingness and enthusiasm to learn, and keep learning.  The ability to communicate that knowledge is another of those traits: you can’t really be an expert if you have no way to communicate that knowledge.  Though that doesn’t mean that you need to be a great speaker, or even a great writer: by “communicate” I’m thinking of something much broader.  In the field of security and IT, that communication may be by architecture diagram, by code writing, by firewall rule instantiation, or by GUI, database or kernel module design, to name just a few examples.  These are all ways by which expertise can be communicated, instantiated or realised: the key is that the knowledge that has been gained is not contained, but can be externalised.

There’s another trait that, for me, betrays a true expert, and that’s the ability to say “I don’t know”.  And it’s difficult.  We enjoy and cultivate our expert status and other’s recognition of it: it’s part of our career progression, and it hits the “esteem” block in Maslow’s Hierarchy of Needs[1].  We like people asking our opinion, and we like being able to enlighten them: we take pride in our expertise, and why wouldn’t we?  We’ve earned it, after all, with all that hard graft and studying.  What’s more, we’ve all seen what happens when people get asked a question to which they don’t know the answer to something – they can become flustered, embarrassed, and they can be labelled stupid.*  Why would we want that for ourselves?

The problem, and very particularly in the security field, is that you’ll always get found out if you fake it.  In my experience, you’ll go into a customer meeting, for instance, and there’s either the sandal-wearing grey-beard, the recently-graduated genius or just the subject matter expert who’s been there for fifteen years and knows this specific topic better than … well, possibly anybody else on the planet, but certainly better than you.  They may not be there in the first meeting, but you can bet your bottom dollar*** that they’ll be in the second meeting, or the third – and you’ll get busted.  And when that happens, everything else you’ve said is called into question.  That may not seem fair, but that’s the way it goes.  Your credibility is dented, possibly irreparably.

The alternative to faking it is to accept that awkward question and simply to say, “I don’t know”.  You may want to give the question a moment’s thought – there have been times when I’ve plunged into an response and then stopped myself to admit that I just can’t give a full or knowledgeable answer, and when I could have saved myself bother by just pausing and considering it for a few seconds.  And you may want to follow up that initial acknowledgement of ignorance by saying that you know somebody else who does (if that happens to be true), or “I can find out” (if you think you can) or even “do you have any experts who might be able to help with that?”

This may not impress people who think you should know, but they’re generally either asking because they don’t (in which case they need a real answer) or because they’re trying to trip you up (in which case you don’t want to oblige them).  But it will impress those who are experts, because they know that nobody knows everything, and it’s much better to have that level of self-awareness than to dig yourself an enormous hole from which it’s difficult to recover.  But they’ll also understand, from your follow-up, that you want to find out: you want to learn.  And that is how one expert recognises another.


* it’s always annoyed me when people mock Donald Rumsfeld for pointing out that there are “unknown unknowns”: it’s probably one of the wisest soundbites in recent history**, for my money.

** and for an equivalently wise soundbite in ancient history, how about “I am the wisest man alive, for I know one thing, and that is that I know nothing”, by Socrates
*** other currencies and systems of exchange are available

Systems security – why it matters

… to understand how things will work together, you have to consider them as a system…

“A system is a set of interacting or interdependent component parts forming a complex or intricate whole.  Every system is delineated by its spatial and temporal boundaries, surrounded and influenced by its environment, described by its structure and purpose and expressed in its functioning.” (Wikipedia: system)

I’ve been involved with various types of security over the years, from features within products to storage, network and other communications security, and including stand-alone application security, cryptographic protocol design and other weird and wonderful issues like why you shouldn’t lose too much weight on holiday.*  That’s a subject for another post.  But what I keep coming back to is systems security.

And that’s because you can design all the security into a particular component that you like, you take as much care in coding it as you like, you can ensure that you compile is safely, you can test it to within an inch of its life, and ensure that it is deployed where and how you like – but if it’s part of a system, and that system has other holes, than you might as well not bother.  We** often talk about “the weakest link in the chain” as a way of pointing out that if you have a single problem in a set of components, that’s what will break.  That’s too simplistic an analogy***, though, as different components interact in different ways with each other, dependent on a variety of factors.

In order to understand how things will work together, you have to consider them as a system, to define what their behaviour as a system will be, and to architect the system with an understanding of the risks, threats and likely attackers that it will have to deal with in its lifetime.

Much of the content this blog may discuss components, but I hope that I’ll manage to explain their place in systems, and how they work together.  Join me: it should be fun****.


*that’s a subject for another post – it’ll be fun

**by which I mean the nebulous “security community”

***don’t start me on analogies

****another disclaimer – I think that security is fun.  Not everybody agrees.  I’m presuming that the fact that you’ve made it this far means that you are at least open to the suggestion.