7 steps to security policy greatness

… we’ve got a good chance of doing the right thing, and keeping the auditors happy.

Security policies are something that everybody knows they should have, but which are deceptively simple to get wrong.  A set of simple steps to check when you’re implementing them is something that I think it’s important to share.  You know when you come up a set of steps, you suddenly realise that you’ve got a couple of vowels in there and you think, “wow, I can make an acronym!”?[1] This was one of those times.  The problem was that when I looked at the different steps, I decided that DDEAVMM doesn’t actually have much of a ring to it.  I’ve clearly still got work to do before I can name major public sector projects, for instance[2].  However, I still think it’s worth sharing, so let’s go through them in order.  Order, as for many sets of steps, is important here.

I’m going to give an example and walk through the steps for clarity.  Let’s say that our CISO, in his or her infinite wisdom, has decided that they don’t want anybody outside our network to be able to access our corporate website via port 80.  This is the policy that we need to implement.

1. Define

The first thing I need is a useful definition.  We nearly have that from the request[3] noted above, when our CISO said “I don’t want anybody outside our network to be able to access our corporate website via port 80”.  So, let’s make that into a slightly more useful definition.

“Access to IP address mycorporate.network.com on port 80 must be blocked to all hosts outside the 10.0.x.x to 10.3.x.x network range.”

I’m assuming that we already know that our main internal network is within 10.0.x.x to 10.3.x.x.  It’s not exactly a machine readable definition, but actually, that’s the point: we’re looking for a definition which is clear and human understandable.  Let’s assume that we’re happy with this, at least for now.

2. Design

Next, we need to design a way to implement this.  There are lots of ways of doing this – from iptables[4] to a full-blown, commercially supported firewall[6] – and I’m not fluent in any of them these days, so I’m going to assume that somebody who is better equipped than I has created a rule or set of rules to implement the policy defined in step 1.

We also need to define some tests – we’ll come back to these in step 5.

3. Evaluate

But we need to check.  What if they’ve mis-implemented it?    What if they misunderstood the design requirement?  It’s good practice – in fact, it’s vital – to do some evaluation of the design to ensure it’s correct.  For this rather simple example, it should be pretty to check by eye, but we might want to set up a test environment to evaluate that it meets the policy definition or take other steps to evaluate its correctness.  And don’t forget: we’re not checking that the design does what the person/team writing it thinks it should do: we’re checking that it meets the definition.  It’s quite possible that at this point we’ll realise that the definition was incorrect.  Maybe there’s another subnet – 10.5.x.x, for instance – that the security policy designer didn’t know about.  Or maybe the initial request wasn’t sufficiently robust, and our CISO actually wanted to block all access on any port other than 443 (for HTTPS), which should be allowed.  Now is a very good time to find that out.

4. Apply

We’ve ascertained that the design does what it should do – although we may have iterated a couple of times on exactly “what it should do” means – so now we can implement it.  Whether that’s ssh-ing into a system, uploading a script, using some sort of GUI or physically installing a new box, it’s done.

Excellent: we’re finished, right?

No, we’re not.  The problem is that often, that’s exactly what people think.  Let’s move to our next step: arguably the most important, and the most often forgotten or ignored.

5. Validate

I really care about this one: it’s arguably the point of this post.  Once you’ve implemented a security policy, you need to validate that it actually does what it’s supposed to do.  I’ve written before about this, in my post If it isn’t tested, it doesn’t work, and it’s absolutely true of security policy implementations.  You need to check all of the different parts of the design.  This, you might hope, would be really easy, but even in the simple case that we’re working with, there are lots of tests you should be doing.  Let’s say that we took the two changes mentioned in step 3.  Here are some tests that I would want to be doing, with the expected result:

  • FAIL: access port 80 from an external IP address
  • FAIL: access port 8080 from an external IP address
  • FAIL: access port 80 from a 10.4.x.x address
  • PASS: access port 443 from an external IP address
  • UNDEFINED: access port 80 from a 10.5.x.x
  • UNDEFINED: access port 80 from a 10.0.x.x address
  • UNDEFINED: access port 443 from a 10.0.x.x address
  • UNDEFINED: access port 80 from an external IP address but with a VPN connection into 10.0.x.x

Of course, we’d want to be performing these tests on a number of other ports, and from a variety of IP addresses, too.

What’s really interesting about the list is the number of expected results that are “UNDEFINED”.  Unless we have a specific requirement, we just can’t be sure what’s expected.  We can guess, but we can’t be sure.  Maybe we don’t care?  I particularly like the last one, because the result we get may lead us much deeper into our IT deployment than we might expect.

The point, however, is that we need to check that the actual results meet our expectations, and maybe even define some new requirements if we want to remove some of the “UNDEFINED”.  We may be fine to leave some the expected results as “UNDEFINED”, particularly if they’re out of scope for our work or our role.  Obviously, if the results don’t meet our expectations, then we also need to make some changes and apply them and then re-validate.  We also need to record the final results.

When we’ve got more complex security policy – multiple authentication checks, or complex SDN[7] routing rules – then our tests are likely to be much, much more complex.

6. Monitor

We’re still not done.  Remember those results we got in our previous tests?  Well, we need to monitor our system and see if there’s any change.  We should do this on a fairly frequent basis.  I’m not going to say “regular”, because regular practices can lead to sloppiness, and also leave windows of opportunities open to attackers.  We also need to perform checks whenever we make a change to a connected system.  Oh, if I had a dollar for every time I’ve heard “oh, this won’t affect system X at all.”…[8]

One interesting point is that we should also note when results whose expected value remains “UNDEFINED” change.  This may be a sign that something in our system has changed, it may be a sign that a legitimate change has been made in a connected system, or it may be a sign of some sort of attack.  It may not be quite as important as a change in one of our expected “PASS” or “FAIL” results, but it certainly merits further investigation.

7. Mitigate

Things will go wrong.  Some of them will be our fault, some of them will be our colleagues’ fault[10], some of them will be accidental, or due to hardware failure, and some will be due to attacks.  In all of these cases, we need to act to mitigate the failure.  We are in charge of this policy, so even if the failure is out of our control, we want to make sure that mitigating mechanism are within our control.  And once we’ve completed the mitigation, we’re going to have to go back at least to step 2 and redesign our implementation.  We might even need to go back to step 1 and redefine what our definition should be.

Final steps

There are many other points to consider, and one of the most important is the question of responsibility, touched on in step 7 (and which is particularly important during holiday seasons), special circumstances and decommissioning, but if we can keep these steps in mind when we’re implementing – and running – security policies, we’ve got a good chance of doing the right thing, and keeping the auditors happy, which is always worthwhile.


1 – I’m really sure it’s not just me.  Ask your colleagues.

2 – although, back when Father Ted was on TV, a colleague of mine and I came up with a database result which we named a “Fully Evaluated Query”.  It made us happy, at least.

3 – if it’s from the CISO, and he/she is my boss, then it’s not a request, it’s an order, but you get the point.

4 – which would be my first port[5] of call, but might not be the appropriate approach in this context.

5 – sorry, that was unintentional.

6 – just because it’s commercially support doesn’t mean it has to be proprietary: open source is your friend, boys and girls, ladies and gentlemen.

7 – Software-Defined Networking.

8 – I’m going to leave you hanging, other than to say that, at current exchange rates, and assuming it was US dollars I was collecting, then I’d have almost exactly 3/4 of that amount in British pounds[9].

9 – other currencies are available, but please note that I’m not currently accepting bitcoin.

10 – one of the best types.

Confessions of an auditor

Moving to a postive view of security auditing.

So, right up front, I need to admit that I’ve been an auditor.  A security auditor.  Professionally.  It’s time to step up and be proud, not ashamed.  I am, however, dialled into a two-day workshop on compliance today and tomorrow, so this is front and centre of my thinking right at the moment.  Hence this post.

Now, I know that not everybody has an entirely … positive view of auditors.  And particularly security auditors[1].  They are, for many people, a necessary evil[2].  The intention of this post is to try to convince you that auditing[3] is, or can and should be, a Force for Good[tm].

The first thing to address is that most people have auditors in because they’re told that they have to, and not because they want to.  This is often – but not always – associated with regulatory requirements.  If you’re in the telco space, the government space or the financial services space, for instance, there are typically very clear requirements for you to adhere to particular governance frameworks.  In order to be compliant with the regulations, you’re going to need to show that you meet the frameworks, and in order to do that, you’re going to need to be audited.  For pretty much any sensible auditing regime, that’s going to require an external auditor who’s not employed by or otherwise accountable to the organisation being audited.

The other reason that audit may happen is when a C-level person (e.g. the CISO) decides that they don’t have a good enough idea of exactly what the security posture of their organisation – or specific parts of it – is like, so they put in place an auditing regime.  If you’re lucky, they choose an industry-standard regime, and/or one which is actually relevant to your organisation and what it does.  If you’re unlucky, … well, let’s not go there.

I think that both of the reasons above – for compliance and to better understand a security posture – are fairly good ones.  But they’re just the first order reasons, and the second order reasons – or the outcomes – are where it gets interesting.

Sadly, one of key outcomes of auditing seems to be blame.  It’s always nice to have somebody else to blame when things go wrong, and if you can point to an audited system which has been compromised, or a system which has failed audit, then you can start pointing fingers.  I’d like to move away from this.

Some positive suggestions

What I’d like to see would be a change in attitude towards auditing.  I’d like more people and organisations to see security auditing as a net benefit to their organisations, their people, their systems and their processes[4].  This requires some changes to thinking – changes which many organisations have, of course, already made, but which more could make.  These are the second order reasons that we should be considering.

  1. Stop tick-box[5] auditing.  Too many audits – particularly security audits – seem to be solely about ticking boxes.  “Does this product or system have this feature?”  Yes or no.  That may help you pass your audit, but it doesn’t give you a chance to go further, and think about really improving what’s going on.  In order to do this, you’re going to need to find auditors who actually understand the systems they’re looking at, and are trained to some something beyond tick-box auditing.  I was lucky enough to be encouraged to audit in this broader way, alongside a number of tick-boxes, and I know that the people I was auditing always preferred this approach, because they told me that they had to do the other type, too – and hated it.
  2. Employ internal auditors.  You may not be able to get approval for actual audit sign-off if you use internal auditors, so internal auditors may have to operate alongside – or before – your external auditors, but if you find and train people who can do this job internally, then you get to have people with deep knowledge (or deeper knowledge, at least, than the external auditors) of your systems, people and processes looking at what they all do, and how they can be improved.
  3. Look to be proactive, not just reactive.  Don’t just pick or develop products, applications and systems to meet audit, and don’t wait until the time of the audit to see whether they’re going to pass.  Auditing should be about measuring and improving your security posture, so think about posture, and how you can improve it instead.
  4. Use auditing for risk-based management.  Last, but not least – far from least – think about auditing as part of your risk-based management planning.  An audit shouldn’t be something you do, pass[6] and then put away in a drawer.  You should be pushing the results back into your governance and security policy model, monitoring, validation and enforcement mechanisms.  Auditing should be about building, rather than destroying – however often it feels like it.

 


1 – you may hear phrases like “a special circle of Hell is reserved for…”.

2 – in fact, many other people might say that they’re an unnecessary evil.

3 – if not auditors.

4 – I’ve been on holiday for a few days: I’ve maybe got a little over-optimistic while I’ve been away.

5 – British usage alert: for US readers, you’ll probably call this a “check-box”.

6 – You always pass every security audit first time, right?

What’s a State Actor, and should I care?

The bad thing about State Actors is that they rarely adhere to an ethical code.

How do you know what security measures to put in place for your organisation and the systems that you run?  As we’ve seen previously, There are no absolutes in security, so there’s no way that you’re ever going to make everything perfectly safe.  But how much effort should you put in?

There are a number of parameters to take into account.  One thing to remember is that there are always trade-offs with security.  My favourite one is a three-way: security, cost and usability.  You can, the saying goes, have two of the three, but not the third: choose security, cost or usability.  If you want security and usability, it’s going to cost you.  If you want a cheaper solution with usability, security will be reduced.  And if you want security cheaply, it’s not going to be easily usable.  Of course, cost stands in for time spent as well: you might be able to make things more secure and usable by spending more time on the solution, but that time is costly.

But how do you know what’s an appropriate level of security?  Well, you need to think about what you’re protecting, the impact of it being:

  • exposed (confidentiality);
  • changed (integrity);
  • inaccessible (availability).

These are three classic types of control, often recorded as C.I.A.[1].  Who would be impacted?  What mitigations might you put in place? What vulnerabilities might exist in the system, and how might they be exploited?

One of the big questions to ask alongside all of these is “who exactly might be wanting to attack my systems?”  There’s a classic adage within security that “no system is secure against a sufficiently motivated and resourced attacker”.  Luckily for us, there are actually very few attackers who fall into this category.  Some examples of attackers might be:

  • insiders[3]
  • script-kiddies
  • competitors
  • trouble-makers
  • hacktivists[4]
  • … and more.

Most of these will either not be that motivated, or not particularly well-resourced.  There are two types of attackers for whom that is not the case: academics and State Actors.  The good thing about academics is that they have adhere to an ethical code, and shouldn’t be trying anything against your systems without your permission.  The bad thing about State Actors is that they rarely adhere to an ethical code.

State Actors have the resources of a nation state behind them, and are therefore well-resourced.  They are also generally considered to be well-motivated – if only because they have many people available to perform attack, and those people are well-protected from legal process.

One thing that State Actors may not be, however, is government departments or parts of the military.  While some nations may choose to attack their competitors or enemies (or even, sometimes, partners) with “official” parts of the state apparatus, others may choose a “softer” approach, directing attacks in a more hands-off manner.  This help may take a variety of forms, from encouragement, logistical support, tools, money or even staff.  The intent, here, is to combine direction with plausible deniability: “it wasn’t us, it was this group of people/these individuals/this criminal gang working against you.  We’ll certainly stop them from performing any further attacks[5].”

Why should this matter to you, a private organisation or public company?  Why should a nation state have any interest in you if you’re not part of the military or government?

The answer is that there are many reasons why a State Actor may consider attacking you.  These may include:

  • providing a launch point for further attacks
  • to compromise electoral processes
  • to gain Intellectual Property information
  • to gain competitive information for local companies
  • to gain competitive information for government-level trade talks
  • to compromise national infrastructure, e.g.
    • financial
    • power
    • water
    • transport
    • telecoms
  • to compromise national supply chains
  • to gain customer information
  • as revenge for perceived slights against the nation by your company – or just by your government
  • and, I’m sure, many others.

All of these examples may be reasons to consider State Actors when you’re performing your attacker analysis, but I don’t want to alarm you.  Most organisations won’t be a target, and for those that are, there are few measures that are likely to protect you from a true State Actor beyond measures that you should be taking anyway: frequent patching, following industry practice on encryption, etc..  Equally important is monitoring for possible compromise, which, again, you should be doing anyway.  The good news is that if you might be on the list of possible State Actor targets, most countries provide good advice and support before and after the act for organisations which are based or operate within their jurisdiction.


1 – I’d like to think that they tried to find a set of initials for G.C.H.Q. or M.I.5., but I suspect that they didn’t[2].

2 – who’s “they” in this context?  Don’t ask.  Just don’t.

3 – always more common than we might think: malicious, bored, incompetent, bankrupt – the reasons for insider-related security issues are many and varied.

4 – one of those portmanteau words that I want to dislike, but find rather useful.

5 – yuh-huh, right.

Why I should have cared more about lifecycle

Every deployment is messy.

I’ve always been on the development and architecture side of the house, rather than on the operations side. In the old days, this distinction was a useful and acceptable one, and wasn’t too difficult to maintain. From time to time, I’d get involved with discussions with people who were actually running the software that I had written, but on the whole, they were a fairly remote bunch.

This changed as I got into more senior architectural roles, and particularly as I moved through some pre-sales roles which involved more conversations with users. These conversations started to throw up[1] an uncomfortable truth: not only were people running the software that I helped to design and write[3], but they didn’t just set it up the way we did in our clean test install rig, run it with well-behaved, well-structured data input by well-meaning, generally accurate users in a clean deployment environment, and then turn it off when they’re done with it.

This should all seem very obvious, and I had, of course, be on the receiving end of requests from support people who exposed that there were odd things that users did to my software, but that’s usually all it felt like: odd things.

The problem is that odd is normal.  There is no perfect deployment, no clean installation, no well-structured data, and certainly very few generally accurate users.  Every deployment is messy, and nobody just turns off the software when they’re done with it.  If it’s become useful, it will be upgraded, patched, left to run with no maintenance, ignored or a combination of all of those.  And at some point, it’s likely to become “legacy” software, and somebody’s going to need to work out how to transition to a new version or a completely different system.  This all has major implications for security.

I was involved in an effort a few years ago to describe the functionality, lifecycle for a proposed new project.  I was on the security team, which, for all the usual reasons[4] didn’t always interact very closely with some of the other groups.  When the group working on error and failure modes came up with their state machine model and presented it at a meeting, we all looked on with interest.  And then with horror.  All the modes were “natural” failures: not one reflected what might happen if somebody intentionally caused a failure.  “Ah,” they responded, when called on it by the first of the security to be able to form a coherent sentence, “those aren’t errors, those are attacks.”  “But,” one of us blurted out, “don’t you need to recover from them?”  “Well, yes,” they conceded, “but you can’t plan for that.  It’ll need to be on a case-by-case basis.”

This is thinking that we need to stamp out.  We need to design our systems so that, wherever possible, we consider not only what attacks might be brought to bear on them, but also how users – real users – can recover from them.

One way of doing this is to consider security as part of your resilience planning, and bake it into your thinking about lifecycle[5].  Failure happens for lots of reasons, and some of those will be because of bad people doing bad things.  It’s likely, however, that as you analyse the sorts of conditions that these attacks can lead to, a number of them will be similar to “natural” errors.  Maybe you could lose network connectivity to your database because of a loose cable, or maybe because somebody is performing a denial of service attack on it.  In both these cases, you may well start off with similar mitigations, though the steps to fix it are likely to be very different.  But considering all of these side by side means that you can help the people who are actually going to be operating those systems plan and be ready to manage their deployments.

So the lesson from today is the same as it so often is: make sure that your security folks are involved from the beginning of a project, in all parts of it.  And an extra one: if you’re a security person, try to think not just about the attackers, but also about all those poor people who will be operating your software.  They’ll thank you for it[6].


1 – not literally, thankfully[2].

2 – though there was that memorable trip to Singapore with food poisoning… I’ll stop there.

3 – a fact of which I actually was aware.

4 – some due entirely to our own navel-gazing, I’m pretty sure.

5 – exactly what we singularly failed to do in the project I’ve just described.

6 – though probably not in person.  Or with an actual gift.  But at least they’ll complain less, and that’s got to be worth something.

If it isn’t tested, it doesn’t work

Testing isn’t just coming up with tests for desired use cases.

Huh.  Shouldn’t that title be “If it isn’t tested, it’s not going to work”?

No.

I’m asserting something slightly different here – in fact, two things.  The first can be stated thus:

“In order for a system to ‘work’ correctly, and to defined parameters, test cases for all plausible conditions must be documented, crafted – and passed – before the system is considered to ‘work’.”

The second is a slightly more philosophical take on the question of what a “working system” is:

“An instantiated system – including software, hardware, data and wetware[1] components – may be considered to be ‘working’ if both its current state, and all known plausible future states from the working state have been anticipated, documented and appropriately tested.”

Let’s deal with these one by one, starting with the first[3].

Case 1 – a complete test suite

I may have given away the basis for my thinking by the phrasing in the subtitle above.  What I think we need to be looking for, when we’re designing a system, is what we should be doing ensuring that we have a test case for every plausible condition.  I considered “possible” here, but I think that may be going too far: for most systems, for instance, you don’t need to worry too much about meteor strikes.  This is an extension of the Agile methodology dictum: “a feature is not ‘done’ until it has a test case, and that test case has been passed.”  Each feature should be based on a use case, and a feature is considered correctly implemented when the test cases that are designed to test that feature are all correctly passed.

It’s too easy, however, to leave it there.  Defining features is, well not easy, but something we know how to do.  “When a user enters enters a valid username/password combination, the splash-screen should appear.”  “When a file has completed writing, a tick should appear on the relevant icon.”  “If a user cancels the transaction, no money should be transferred between accounts.”  The last is a good one, in that it deals with an error condition.  In fact, that’s the next step beyond considering test cases for features that implement functionality to support actions that are desired: considering test cases to manage conditions that arise from actions that are undesired.

The problem is that many people, when designing systems, only consider one particular type of undesired action: accidental, non-malicious action.  This is the reason that you need to get security folks[4] in when you’re designing your system, and the related test cases.  In order to ensure that you’re reaching all plausible conditions, you need to consider intentional, malicious actions.  A system which has not considered these and test for these cannot, in my opinion, be said truly to be “working”.

Case 2 – the bigger systems picture

I write fairly frequently[5] about the importance of systems and systems thinking, and one of the interesting things about a system, from my point of view, is that it’s arguably not really a system until it’s up and running: “instantiated”, in the language I used in my definition above.

Case 2 dealt, basically, with test cases and the development cycle.  That, by definition, is before you get to a fully instantiated system: one which is operating in the environment for which it was designed – you really, really hope – and is in situ.  Part of it may be quiescent, and that is hopefully as designed, but it is instantiated.

A system has a current state; it has a set of defined (if not known[7]) past states; and a set of possible future states that it can reach from there.  Again, I’m not going to insist that all possible states should be considered, for the same reasons I gave above, but I think that we do need to talk about all known plausible future states.

These types of conditions won’t all be security-related.  Many of them may be more appropriately thought of as to do with assurance or resilience.  But if you don’t get the security folks in, and early in the planning process, then you’re likely to miss some.

Here’s how it works.  If I am a business owner, and I am relying on a system to perform the tasks for which it was designed, then I’m likely to be annoyed if some IT person comes to me and says “the system isn’t working”.  However, if, in response to my question, “and did it fail due to something we had considered in our design and deployment of the system” is “yes”, then I’m quite lightly to move beyond annoyed to a state which, if we’re honest, the IT person could easily have considered, nay predicted, and which is closer to “incandescent” than “contented”[8].

Because if we’d considered a particular problem  – it was “known”, and “plausible” – then we should have put in place measures to deal with it. Some of those will be preventative measures, to stop the bad thing happening in the first place, and others will be mitigations, to deal with the effects of the bad thing that happened.  And there may also be known, plausible states for which we may consciously decide not to prepare.  If I’m a small business owner in Weston-super-mare[9], then I may be less worried about industrial espionage than if I’m a multi-national[10].  Some risks aren’t worth the bother, and that’s fine.

To be clear: the mitigations that we prepare won’t always be technical.  Let’s say that we come up with a scenario where an employee takes data from the system on a USB stick and gives it to a competitor.  It may be that we can’t restrict all employees from using USB sticks with the system, so we have to rely on legal recourse if that happens.  If, in that case, we call in the relevant law enforcement agency, then the system is working as designed if that was our plan to deal with this scenario.

Another point is that not all future conditions can be reached from the current working state, and if they can’t, then it’s fair to decide not to deal with them.  Once a TPM is initialised, for instance, taking it back to its factory state basically requires to reset it, so any system which is relying on it has also been reset.

What about the last bit of my definition?  “…[A]nticipated, documented and appropriately tested.”  Well, you can’t test everything fully.  Consider that the following scenarios are all known and plausible for your system:

  • a full power-down for your entire data centre;
  • all of your workers are incapacitate by a ‘flu virus;
  • your main sysadmin is kidnapped;
  • an employee takes data from the system on a USB stick and gives it to a competitor.

You’re really not going to want to test all of these.  But you can at least perform paper exercises to consider what steps you should take, and also document them.  You might ensure that you know which law enforcement agency to call, and what the number is, for instance, instead of actually convincing an employee to leak information to a competitor and then having them arrested[11].

Conclusion

Testing isn’t just coming up with tests for desired use cases.  It’s not even good enough just to prepare for accidental undesired use cases on top of that.  We need to consider malicious use cases, too.   And testing in development isn’t good enough either: we need to test with live systems, in situ.  Because if we don’t, something, somewhere, is going to go wrong.

And you really don’t want to be the person telling your boss that, “well, we thought it might, but we never tested it.”

 

 


1 – “wetware” is usually defined as human components of a system (as here), but you might have non-human inputs (from animals or aliens), or even from fauna[2], I suppose.

2 – “woodware”?

3 – because I, for one, need a bit of a mental run-up to the second one.

4 – preferably the cynical, suspicious types.

5 – if not necessarily regularly: people often confuse the two words.  A regular customer may only visit once a year, but always does it on the same day, whereas a frequent customer may visit on average once a week, but may choose a different day each week.[6]

6 – how is this relevant?  It’s not.

7 – yes, I know: Schrödinger’s cat, quantum effects, blah, blah.

8 – Short version: if the IT person says “it broke, and it did it in a way we had thought of before”, then I’m going to be mighty angry.

9 – I grew up nearby.  Windy, muddy, donkeys.

10 – which might, plausibly, also be based in Weston-super-mare, though I’m not aware of any.

11 – this is, I think, probably at least bordering on the unethical, and might get you in some hot water with your legal department, and possibly some other interested parties[12].

12 – your competitor might be pleased, though, so there is that.

Meltdown and Spectre: thinking about embargoes and disclosures

The technical details behind Meltdown and Spectre are complex and fascinating – and the possible impacts wide-ranging and scary.  I’m not going to go into those here, though there are some excellent articles explaining them.  I’d point readers in particular at the following URLs (which both resolve to the same page, so there’s no need to follow both):

I’d also recommend this article on the Red Hat blog, written by my colleague Jon Masters: What are Meltdown and Spectre? Here’s what you need to know.  It includes a handy cut-out-and-keep video[1] explaining the basics.   Jon has been intimately involved in the process of creating and releasing mitigations and fixes to Meltdown and Spectre, and is very well-placed to talk about it.

All that is interesting and important, but I want to talk about the mechanics and process behind how a vulnerability like this moves from discovery through to fixing.  Although similar processes exist for proprietary software, I’m most interested in open source software, so that’s what I’m going to describe.

 

Step 1 – telling the right people

Let’s say that someone has discovered a vulnerability in a piece of open source software.  There are a number of things they can do.  The first is ignore it.  This is no good, and isn’t interesting for our story, so we’ll ignore it in turn.  Second is to keep it to themselves or try to make money out of it.  Let’s assume that the discoverer is a good type of person[2], so isn’t going to do this[3].  A third is to announce it on the project mailing list.  This might seem like a sensible thing to do, but it can actually be very damaging to the project and the community at large. The problem with this is that the discoverer has just let all the Bad Folks[tm] know about a vulnerability which hasn’t been fixed, and they can now exploit.  Even if the discoverer submits a patch, it needs to be considered and tested, and it may be that the fix they’ve suggested isn’t the best approach anyway.  For a serious security issue, the project maintainers are going to want to have a really good look at it in order to decide what the best fix – or mitigation – should be.

It’s for this reason that many open source projects have a security disclosure process.  Typically, there’s a closed mailing list to which you can send suspected vulnerabilities, often something like “security@projectname.org“.  Once someone has emailed that list, that’s where the process kicks in.

Step 2 – finding a fix

Now, before that email got to the list, somebody had to decide that they needed a list in the first place, so let’s assume that you, the project leader, has already put a process of some kind in place.  There are a few key decisions to make, of which the most important are probably:

  • are you going to keep it secret[5]?  It’s easy to default to “yes” here, for the reasons that I mentioned above, but the number of times you actually put in place restrictions on telling people about the vulnerability – usually referred to as an embargo, given its similarity to a standard news embargo – should be very, very small.  This process is going to be painful and time-consuming: don’t overuse it.  Oh, and your project’s meant to be about open source software, yes?  Default to that.
  • who will you tell about the vulnerability?  We’re assuming that you ended up answering “yes” to te previous bullet, and that you don’t want everybody to know, and also that this is a closed list.  So, who’s the most important set of people?  Well, most obvious is the set of project maintainers, and key programmers and testers – of which more below.  These are the people who will get the fix completed, but there are two other constituencies you might want to consider: major distributors and consumers of the project. Why these people, and who are they?  Well, distributors might be OEMs, Operating System Vendors or ISVs who bundle your project as part of their offering.  These may be important because you’re going to want to ensure that people who will need to consume your patch can do so quickly, and via their preferred method of updating.  What about major consumers?  Well, if an organisation has a deployed estate of hundreds or thousands of instances of a project[6], then the impact of having to patch all of those at short notice – let alone the time required to test the patch – may be very significant, so they may want to know ahead of time, and they may be quite upset if they don’t.  This privileging of major over rank-and-file users of your project is politically sensitive, and causes much hand-wringing in the community.  And, of course, the more people you tell, the more likely it is that a leak will occur, and news of the vulnerability to get out before you’ve had a chance to fix it properly.
  • who should be on the fixing team?  Just because you tell people doesn’t mean that they need to be on the team.  By preference, you want people who are both security experts and also know your project software inside-out.  Good luck with this.  Many projects aren’t security projects, and will have no security experts attached – or a few at most.  You may need to call people in to help you, and you may not even know which people to call in until you get a disclosure in the first place.
  • how long are you going to give yourself?  Many discoverers of vulnerabilities want their discovery made public as soon as possible.  This could be for a variety of reasons: they have an academic deadline; they don’t want somebody else to beat them to disclosure; they believe that it’s important to the community that the project is protected as soon as possible; they’re concerned that other people have found – are are exploiting – the vulnerability; they don’t trust the project to take the vulnerability seriously and are concerned that it will just be ignored.  This last reason is sadly justifiable, as there are projects who don’t react quickly enough, or don’t take security vulnerabilities seriously, and there’s a sorry history of proprietary software vendors burying vulnerabilities and pretending they’ll just go away[7].  A standard period of time before disclosure is 2-3 weeks, but as projects get bigger and more complex, balancing that against issues such as pressure from those large-scale users to give them more time to test, holiday plans and all the rest becomes important.  Having an agreed time and sticking to it can be vital, however, if you want to avoid deadline slip after deadline slip.  There’s another interesting issue, as well, which is relevant to the Meltdown and Spectre vulnerabilities – you can’t just patch hardware.  Where hardware is involved, the fixes may involve multiple patches to multiple projects, and not just standard software but microcode as well: this may significantly increase the time needed.
  • what incentives are you going to provide?  Some large projects are in a position to offer bug bounties, but these are few and far between.  Most disclosers want – and should expect to be granted – public credit when the fix is finally provided and announced to the public at large.  This is not to say that disclosers should necessarily be involved in the wider process that we’re describing: this can, in fact, be counter-productive, as their priorities (around, for instance, timing) may be at odds with the rest of the team.

There’s another thing you might want to consider, which is “what are we going to do if this information leaks early?” I don’t have many good answers for this one, as it will depend on numerous factors such as how close are you to a fix, how major is the problem, and whether anybody in the press picks up on it.  You should definitely consider it, though.

Step 3 – external disclosure

You’ve come up with a fix?  Well done.  Everyone’s happy?  Very, very well done[8].  Now you need to tell people.  But before you do that, you’re going to need to decide what to tell them.  There are at least three types of information that you may well want to prepare:

  • technical documentation – by this, I mean project technical documentation.  Code snippets, inline comments, test cases, wiki information, etc., so that when people who code on your project – or code against it – need to know what’s happened, they can look at this and understand.
  • documentation for techies – there will be people who aren’t part of your project, but who use it, or are interested in it, who will want to understand the problem you had, and how you fixed it.  This will help them to understand the risk to them, or to similar software or systems.  Being able to explain to these people is really important.
  • press, blogger and family documentation – this is a category that I’ve just come up with.  Certain members of the press will be quite happy with “documentation for techies”, but, particularly if your project is widely used, many non-technical members of the press – or technical members of the press writing for non-technical audiences – are going to need something that they can consume and which will explain in easy-to-digest snippets a) what went wrong; b) why it matters to their readers; and c) how you fixed it.  In fact, on the whole, they’re going to be less interested in c), and probably more interested in a hypothetical d) & e) (which are “whose fault was it and how much blame can we heap on them?” and “how much damage has this been shown to do, and can you point us at people who’ve suffered due to it?” – I’m not sure how helpful either of these is).  Bloggers may also be useful in spreading the message, so considering them is good.  And generally, I reckon that if I can explain a problem to my family, who aren’t particularly technical, then I can probably explain it to pretty much anybody I care about.

Of course, you’re now going to need to coordinate how you disseminate all of these types of information.  Depending on how big your project is, your channels may range from your project code repository through your project wiki through marketing groups, PR and even government agencies[9].

 

Conclusion

There really is no “one size fits all” approach to security vulnerability disclosures, but it’s an important consideration for any open source software project, and one of those that can be forgotten as a project grows, until suddenly you’re facing a real use case.  I hope this overview is interesting and useful, and I’d love to hear thoughts and stories about examples where security disclosure processes have worked – and when they haven’t.


2 – because only good people read this blog, right?

3 – some government security agencies have a policy of collecting – and not disclosing – security vulnerabilities for their own ends.  I’m not going to get into the rights or wrongs of this approach.[4]

4 – well, not in this post, anyway.

5 – or try, at least.

6 – or millions or more – consider vulnerabilities in smartphone software, for instance, or cloud providers’ install bases.

7 – and even, shockingly, of using legal mechanisms to silence – or try to silence – the discloser.

8 – for the record, I don’t believe you: there’s never been a project – let alone an embargo – where everyone was happy.  But we’ll pretend, shall we?

9 – I really don’t think I want to be involved in any of these types of disclosure, thank you.

Top 5 resolutions for security folks – 2018

Yesterday, I wrote some jokey resolutions for 2018 – today, as it’s a Tuesday, my regular day for posts, I decided to come up with some real ones.

1 – Embrace the open

I’m proud to have been using Linux[1] and other open source software for around twenty years now.  Since joining Red Hat in 2016, and particularly since I started writing for Opensource.com, I’ve become more aware of other areas of open-ness out there, from open data to open organisations.  There are still people out there who are convinced that open source is less secure than proprietary software.  You’ll be unsurprised to discover that I disagree.  I encourage everyone to explore how embracing the open can benefit them and their organisations.

2 – Talk about risk

I’m convinced that we talk too much about security for security’s sake, and not about risk, which is what most “normal people” think about.  There’s education needed here as well: of us, and of others.  If we don’t understand the organisations we’re part of, and how they work, we’re not going to be able to discuss risk sensibly.  In the other direction, we need to be able to talk about security a bit, in order to explain how it will mitigate risk, so we need to learn how to do this in a way that informs our colleagues, rather than alienating them.

3 – Think about systems

I don’t believe that we[2] talk enough about systems.  We spend a lot of our time thinking about functionality and features, or how “our bit” works, but not enough about how all the bits fit together. I don’t often link out to external sites or documents, but I’m going to make an exception for NIST special publication 800-160 “Systems Security Engineering: Considerations for a Multidisciplinary Approach in the Engineering of Trustworthy Secure Systems”, and I particularly encourage you to read Appendix E “Roles, responsibilities and skills: the characteristics and expectations of a systems security engineer”.  I reckon this is an excellent description of the core skills and expertise required for anyone looking to make a career in IT security.

4 – Examine the point of conferences

I go to a fair number of conferences, both as an attendee and as a speaker – and also do my share of submission grading.  I’ve written before about how annoyed I get (and I think others get) by product pitches at conferences.  There are many reasons to attend the conferences, but I think it’s important for organisers, speakers and attendees to consider what’s most important to them.  For myself, I’m going to try to ensure that what I speak about is what I think other people will be interested in, and not just what I’m focussed on.  I’d also highlight the importance of the “hallway track”: having conversations with other attendees which aren’t necessarily directly related to the specific papers or talks. We should try to consider what conferences we need to attend, and which ones to allow to fall by the wayside.

5 – Read outside the IT security discipline

We all need downtime.  One way to get that is to read – on an e-reader, online, on your phone, magazines, newspapers or good old-fashioned books.  Rather than just pick up something directly related to work, choose something which is at least a bit off the beaten track.  Whether it’s an article on a topic to do with your organisation’s business,  a non-security part of IT[3], something on current affairs, or a book on a completely unrelated topic[4], taking the time to gain a different perspective on the world is always[5] worth it.

What have I missed?

I had lots of candidates for this list, and I’m sure that I’ve missed something out that you think should be in there.  That’s what comments are for, so please share your thoughts.


1 GNU Linux.

2 the mythical IT community

3 – I know, it’s not going to be as sexy as security, but go with it.  At least once.

4 – I’m currently going through a big espionage fiction phase.  Which is neither here nor there, but hey.

5 – well, maybe almost always.