Security, cost and usability (pick 2)

If we cannot be explicit that there is a trade-off, it’s always security that loses.

Everybody wants security: why wouldn’t you? Let’s role-play: you’re a software engineer on a project to create a security product. There comes a time in the product life-cycle when it’s nearly due, and, as usual, time is tight. So you’re in the regular project meeting and the product manager’s there, so you ask them what they want you to do: should you prioritise security? The product manager is very clear[1]: they will tell you that they want the product as secure as possible – and they’re right, because that’s what customers want. I’ve never spoken to a customer (and I’ve spoken to lots of customers over the years) who said that they’d prefer a product which wasn’t as secure as possible. But there’s a problem, which is that all customers also want their products tomorrow – in fact, most customers want their products today, if not yesterday.

Luckily, products can generally be produced more quickly if more resources are applied (though Frederick Brooks’ The Mythical Man Month tells us that simple application of more engineers is actually likely to have a negative impact), so the requirement for speed of delivery can be translated to cost. There’s another thing that customers want, however, and that is for products to be easy to use: who wants to get a new product and then, when it arrives, for it to take months to integrate or for it to be almost impossible for their employees to run it as they expect?

So, to clarify, customers want a security product to be be the following:

  1. secure – security is a strong requirement for many enterprises and organisations[3], and although we shouldn’t ever use the word secure on its own, that’s still what customers want;
  2. cheap – nobody wants to pay more than the minimum they can;
  3. usable – everybody likes simple-to-use, easy-to-integrate applications.

There’s a problem, however, which is that out of the three properties above, you can only choose two for any application or project. You say this to your product manager (who’s always right, remember[1]), and they’ll say: “don’t be ridiculous! I want all three”.

But it just doesn’t work like that: why? Here’s my take on the reasons. Security, simply stated, is designed to stop people doing things. Stated from the point of view of a user, security’s view is to reduce usability. “Doing security” is generally around applying controls to actions in a system – whether by users or non-human entities – and the simplest way to apply it is “blanket security” – defaulting to blocking or denying actions. This is sometimes known as fail to safe or fail to closed.

Let’s take an example: you have a simple internal network in your office and you wish to implement a firewall between your network and the Internet, to stop malicious actors from probing your internal machines and to compromised systems on the internal network from communicating out to the Internet. “Easy,” you think, and set up a DENY ALL rule for connections originating outside the firewall, and a DENY ALL rule for connections originating inside the firewall, with the addition of a ALLOW all outgoing port 443 connections to ensure that people can use web browsers to make HTTPS connections. You set up the firewall, and get ready to head home, knowing that your work is done. But then the problems arise:

  • it turns out that some users would like to be able to send email, which requires a different outgoing port number;
  • sending email often goes hand in hand with receiving email, so you need to allow incoming connections to your mail server;
  • one of your printers has been compromised, and is making connections over port 443 to an external botnet;
  • in order to administer the pay system, your accountant – who is not a full-time employee, and works from home, needs to access your network via a VPN, which requires the ability to accept an incoming connection.

Your “easy” just became more difficult – and it’s going to get more difficult still as more users start encountering what they will see as your attempts to make their day-to-day revenue-generating lives more difficult.

This is a very simple scenario, but it’s clear that in order to allow people actually to use a system, you need to spend a lot more time understanding how security will interact with it, and how people’s experience of the measures you put in place will be impacted. Usability and user experience (“UX”) is a complex field on its own, but when you combine it with the extra requirements around security, things become even more tricky.

You need both to manage the requirements of users to whom the security measures should be transparent (“TLS encryption should be on by default”) and those who may need much more control (“developers need to be able to select the TLS cipher suite options when connecting to a vendor’s database”), so you need to understand the different personae[4] you are targeting for your application. You also need to understand the different failures modes, and what the correct behaviour should be: if authentication fails three times in a row, should the medical professional who is trying to get a rush blood test result be locked out of the system, or should the result be provided, and a message sent to an administrator, for example? There will be more decisions to make, based on what your application does, the security policies of your customers, their risk profiles, and more. All of these investigations and decisions take time, and time equates to money. What is more, they also require expertise – both in terms of security but also usability – and that is in itself expensive.

So, you have three options:

  1. choose usability and cost – you can prioritise usability and low cost, but you won’t be able to apply security as you might like;
  2. choose security and cost – in this case, you can apply more security to the system, but you need to be aware that usability – and therefore your customer’s acceptance of the system – will suffer;
  3. choose usability and security – I wish this was the one that we chose every time: you decide that you’re willing to wait longer or pay more for a more secure product, which people can use.

I’m not going to pretend that these are easy decisions, nor that they are always clear cut. And a product manager’s job is sometimes to make difficult choices – hopefully ones which can be re-balanced in a later release, but difficult choices nevertheless. It’s really important, however, that anyone involved in security – as an engineer, as a UX expert, as a product manager, as a customer – understands the trade-off here. If we cannot be explicit that there is a trade-off, then the trade-off will be made silently, and in my experience, it’s always security that loses.


1 – and right: product managers are always right[2].

2 – I know: I used to be a product manager.

3 – and the main subject of this blog, so it shouldn’t be a surprise that I’m writing about it.

4 – or personas if you really, really must. I got an “A” in Latin O level, and I’m not letting this one go.

In praise of triage

It’s all too easy to prioritise based on the “golfing test”.

Not all bugs are created equal.

Some bugs need fixing now, some bugs can wait. Some bugs are in your implementation, some are in the underlying design. Some bugs will annoy a few customers, some will destroy your business.

Bugs come in all shapes and sizes, and one of the tasks of a product owner, product manager, chief architect – whoever makes the call about where to assign resources – is to decide which ones to address in which order: to prioritise them. The problem is deciding how to prioritise them. It’s all too easy to prioritise based on the “golfing test”: your CEO meets someone on golf course who mentions that his or her company loves your product, except for one tiny issue. The CEO comes back, and makes it clear that fixing this “major bug” is now your one and only task until it’s done, and your world is turned upside down. You have to fix the bug as quickly as possible, with no thought to the impact it has on the rest of the project, or the immense pile technical debt that’s just been accrued. You don’t want to live in this world. What, then, is the alternative?

The answer – though it’s only the beginning of the answer – is triage. Triage (from the French for “separating out”) comes from the world of battlefield medicine. When deciding which wounded soldiers to treat, rapid (hopefully objective) assessments are carried out, allowing a quick sorting of each soldier, typically into categories such as “not urgent: wait”, “urgent: treat immediately” and “not saveable: do not treat”. We can apply the same to software bugs in order to decide what to treat (fix) and with what priority. The important thing is not so much the categories – which will vary based on your context – but the assessment criteria, and how they are applied. Here are a list of just some of the possible criteria:

  • likely monetary impact per customer
  • number of customers impacted
  • reputational impact on your organisation
  • ease to fix
  • impact on system security
  • impact on system performance
  • impact on system stability
  • annoyance of CEO not to be listened to.

We do not, of course, only need to apply one of these: a number of them can be combined with a weighting system, though the more you add, the less clear your priorities will be, and the more likely it is that someone will “put a finger on the scales” – tweak the numbers to give the outcome they want. Another important point about the categories that you decide to apply is that they should be as measurable as you can make them, to allow as objective scoring as possible. I wrote a review of the book Building Evolutionary Architectures a while ago: the methodology adopted there, where you measure and test in order to meet specific criteria, is exactly the sort of approach you should be choosing when designing your triage system.

This is (ostensibly) a blog about security, and so you might expect me to say that “security always wins”, but that should absolutely not be the approach you take. Security might be the most important category for you (that is, carry the most weight), but you need to understand why that is the case – at this particular time – and what exactly you mean by “security”. The “security of the system” is not an objective measure: in order to mean anything, such a phrase needs to reference measurements that can be made (“resistance to physical tampering”, “resistance to brute force attacks”, “number or PhD students likely to be needed to reverse engineer our ‘secure’ protocol”[1]). More importantly, it may be that at this point in your organisation’s life, the damage done by lack of stability or decreased performance outweighs the impact of a security bug. If that’s the case, then your measurements should encapsulate that information and lead you to prioritise bugs with impact in these categories over security issues[3].

There’s one proviso that I feel I need to put in at this point, and it’s about the power of what, in Agile Methodology terms, is called the Product Owner. This is the person who represents the users of the product/project, and should have final say about the direction of development in terms of features, functionality and, most relevant in this discussion, bug-fixing. As noted above, this may be an architect, product manager or someone enjoying another title, but their role should be clear: they get to call the shots. There are times when this person goes against the evidence provided by the triage, and makes a decision to prioritise a particular bug over others despite the outcome of the measurements. This is typically very painful for the technical team[4], but, when it comes down to it, as the product owner, they get to decide. The technical team – after appropriate warnings and discussion[5] – must be ready to step aside and accept the decision. Such decisions (and related discussions) should be recorded, and the product owner must be ready to stand or fall based on the outcome, but that is their job. Triage is a guide, and there are occasions when there are measurements which cannot be easily made objectively, and which sit outside the expertise or scope of knowledge of the technical team. If this sort of decision keeps being made, and you think you know better, you may have a future in technical product management, where people with a view of both the technical and the business side of technology are much in demand. In the end, though, the product owner will need to justify their decision to management, and if they get it wrong, then they must be ready to take the blame (this is one reason why you should make sure that you’ve recorded the process taken to get to this decision – you don’t want to take the blame for a poor decision which you advised against).

So: go out an design a triage process, be ready to follow it, and be ready to defend it. Oh, and one last point: you might want to buy a set of golf clubs.

—–

1 – this last one is a joke: don’t design your own protocol, or if you do, make it open and have it peer-reviewed[2].

2 – and then throw it away and use an open source implementation of better, more thoroughly-reviewed one.

3 – much as it pains me to say it.

4 – I’ve been on both sides of these decisions: I know.

5 -often rather heated, in my experience.

Are you positive?

What do pregnancy tests and the Ukrainian aircraft missile strike have in common?

Not everything in life is nicely binary, much as we[1] might like it to be. There are shades of grey[2] in many aspects of life, and though humans can often cope with uncertainty, computer systems are less good at it: they generally want a “yes” or “no” answer. This means that decisions sometimes need to be made on incomplete evidence, and, well, that means that the answers aren’t always correct. There’s a whole area of computer science related to this: fuzzy logic.

Let’s look into what the options are. Assuming that we’re looking two options: “yes” (a positive) and “no” (a negative). That means that there are two ways in which the answer can be incorrect:

  1. a “yes” answer was incorrectly chosen (false positive);
  2. a “no” answer was incorrectly chosen (false negative).

An example to allow us to explore this is pregnancy. It’s generally agreed that you can’t be a little bit pregnant: if you take a test, any result it gives you needs to be either positive or negative. If you are pregnant, and a test result comes back negative, then that’s a false negative. If you are not pregnant, and a test comes back positive, that’s a false positive. The implications of a false positive or a false negative can both be pretty major – as anybody who has received one will tell you. I spent a little time online trying to find expected false positive and false negatives for pregnancy tests, but it turns out that the rates are so dependent on a variety of factors that it was difficult to find a sensible answer[3].

A tragic recent example of a false positive took place on Wednesday, 8th January 2020, when a Ukrainian International Airlines flight was shot down by an Iranian missile, killing all 176 people on board. It appears that an air defence radar system misidentified the aircraft as a cruise missile. As the radar system was looking for a positive identification of a threat, this can be counted as a false positive.

What might have been the alternative in this case? If the aircraft actually had been a cruise missile, but was identified as a civilian aircraft, this would have been a false negative, and the impact might well have been significant damage to an Iranian military installation.

Which is the most damaging? Well, in the case of the aircraft, it would seem pretty clear to most observers that the false positive would be worse, but from a military point of view, that might not be the case. Maybe the impact of a missile strike on a major military installation might be considered worse than the civilian loss of life in the other case. In this case, as in many others, a decision needs to be made as to which is most important to reduce: the chance of a false negative or the chance of a false positive? In a perfect world, of course, there would be no false results, negative or positive. The problem with many systems that take analogue[4] inputs and turn them into digital outputs in this way is that avoiding false results is very costly, and sometimes impossible. Even worse news is that reducing probability of one of the two types of false result tends to increase the probability of the other.

A classic example of this is in the use of biometrics for user identification. Fingerprints, facial recognition, iris scanning and similar techniques have to balance the likelihood of a false positive with a false negative. Which is worse: the chance that the CEO will not be able to update the payroll details, or that a rogue employee will update her details to improve her salary package?[5]

One good piece of news is that AI/ML (Artificial Intelligence/Machine Learning) is improving the performance of biometric systems and, in fact, other areas of computing where “fuzzy logic” is required. In most cases, humans are still better at reducing messy sets of information to yes/no results, but that is changing, and where multiple automated decisions need to be made, then AI/ML is worth considering.

Whenever you are dealing with “messy” data[6] which needs to be reduced to a “yes/no” or “positive/negative” binary result, you need to consider the likelihood of false positives or negatives. Not only do you need to consider the likelihood of each, but also the impact of each. Once you have understood these, you can then decide which you want to try to minimise, and what techniques you should use to do so.

We may be stuck with false results, but we need to understand what our choices are, and how we can get the best outcomes available from messy data.


1 – in talking security, but I’m sure this goes for lots of other people, too.

2. “gray” for our non-Commonwealth readers.

3. good advice seems to be to test several times over several days.

4. “analog”, I suppose – see [2].

5. this is one of the reasons that authentication systems generally use two factors from the three “something you are”, “something you know”, “something you have”.

6. most real-world data, to be honest.

3 tips to avoid security distracti… Look: squirrel!

Executive fashions change – and not just whether shoulder-pads are “in”.

There are many security issues to worry about as an organisation or business[1].  Let’s list some of them:

  • insider threats
  • employee incompetence
  • hacktivists
  • unpatched systems
  • patched systems that you didn’t test properly
  • zero-day attacks
  • state actor attacks
  • code quality
  • test quality
  • operations quality
  • underlogging
  • overlogging
  • employee-owned devices
  • malware
  • advanced persistent threats
  • data leakage
  • official wifi points
  • unofficial wifi points
  • approved external access to internal systems via VPN
  • unapproved external access to internal systems via VPN
  • unapproved external access to internal systems via backdoors
  • junior employees not following IT-mandated rules
  • executives not following IT-mandated rules

I could go on: it’s very, very easy to find lots of things that should concern us.  And it’s particularly confusing if you just go around finding lots of unconnected things which are entirely unrelated to each other and aren’t even of the same type[2]. I mean: why list “code quality” in the same list as “executives not following IT-mandated rules”?  How are you supposed to address issues which are so diverse?

And here, of course, is the problem: this is what organisations and businesses do have to address.  All of these issues may present real risks to the functioning (or at least continued profitability) of the organisations[3].  What are you supposed to do?  How are you supposed to keep track of all these things?

The first answer that I want to give is “don’t get distracted”, but that’s actually the final piece of advice, because it doesn’t really work unless you’ve already done some work up front.  So what are my actual answers?

1 – Perform risk analysis

You’re never going to be able to give your entire attention to everything, all the time: that’s not how life works.  Nor are you likely to have sufficient resources to be happy that everything has been made as secure as you would like[4].  So where do you focus your attention and apply those precious, scarce resources?  The answer is that you need to consider what poses the most risk to your organisation.  The classic way to do this is to use the following formula:

Risk = Likelihood x Impact

This looks really simple, but sadly it’s not, and there are entire books and companies dedicated to the topic.  Impact may be to reputation, to physical infrastructure, system up-time, employee morale, or one of hundreds of other items.  The difficulty of assessing the likelihood may range from simple (“the failure rate on this component is once every 20 years, and we have 500 of them”[5]) to extremely difficult (“what’s the likelihood of our CFO clicking on a phishing attack?”[6]).  Once it’s complete, however, for all the various parts of the business you can think of – and get other people from different departments in to help, as they’ll think of different risks, I can 100% guarantee – then you have an idea of what needs the most attention.  (For now: because you need to repeat this exercise of a regular basis, considering changes to risk, your business and the threats themselves.)

2 – Identify and apply measures

You have a list of risks.  What to do?  Well, a group of people – and this is important, as one person won’t have a good enough view of everything – needs to sit[7] down and work out what measures to put in place to try to reduce or at least mitigate the various risks.  The amount of resources that the organisation should be willing to apply to this will vary from risk to risk, and should generally be proportional to the risk being addressed, but won’t always be of the same kind.  This is another reason why having different people involved is important.  For example, one risk that you might be able to mitigate by spending a £50,000 (that’s about the same amount of US dollars) on a software solution might be equally well addressed by a physical barrier and a sign for a few hundred pounds.  On the other hand, the business may decide that some risks should not be mitigated against directly, but rather insured against.  Other may require training regimes and investment in t-shirts.

Once you’ve identified what measures are appropriate, and how much they are going to cost, somebody’s going to need to find money to apply them.  Again, it may be that they are not all mitigated: it may just be too expensive.  But the person who makes that decision should be someone senior – someone senior enough to take the flak should the risk come home to roost.

Then you apply your measures, and, wherever possible, you automate them and their reporting.  If something is triggered, or logged, you then know:

  1. that you need to pay attention, and maybe apply some more measures;
  2. that the measure was at least partially effective;
  3. that you should report to the business how good a job you – and all those involved – have done.

3 – Don’t get distracted

My final point is where I’ve been trying to go with this article all along: don’t get distracted.  Distractions come in many flavours, but here are three of the most dangerous.

  1. A measure was triggered, and you start paying all of your attention to that measure, or the system(s) that it’s defending.  If you do this, you will miss all of the other attacks that are going on.  In fact, here’s your opportunity to look more broadly and work out whether there are risks that you’d not considered, and attacks that are coming in right now, masked by the one you have noticed.
  2. You assume that the most expensive measures are the ones that require the most attention, and ignore the others.  Remember: the amount of resources you should be ready to apply to each risk should be proportional to the risk, but the amount actually applied may not be.  Check that the barrier you installed still works and that the sign is still legible – and if not, then consider whether you need to spend that £50,000 on software after all.  Also remember that just because a risk is small, that doesn’t mean that it’s zero, or that the impact won’t be high if it does happen.
  3. Executive fashions change – and not just whether shoulder-pads are “in”, or the key to the boardroom bathroom is now electronic, but a realisation that executives (like everybody else) are bombarded with information.  The latest concern that your C-levels read about in the business section, or hears about from their buddies on the golf course[9] may require consideration, but you need to ensure that it’s considered in exactly the same way as all of the other risks that you addressed in the first step.  You need to be firm about this – both with the executive(s), but also yourself, because although I identified this as an executive risk, the same goes for the rest of us. Humans are generally better at keeping their focus on the new, shiny thing in front of them, rather than the familiar and the mundane.

Conclusion

You can’t know everything, and you probably won’t be able to cover everything, either, but having a good understanding of risk – and maintaining your focus in the event of distractions – means that at least you’ll be covering and managing what you can know, and can be ready to address new ones as they arrive.


1 – let’s be honest: there are lots if you’re a private individual, too, but that’s for another day.

2- I did this on purpose, annoying as it may be to some readers. Stick with it.

3 – not to mention the continued employment of those tasked with stopping these issues.

4 – note that I didn’t write “everything has been made secure”: there is no “secure”.

5 – and, to be picky, this isn’t as simple as it looks either: does that likelihood increase or decrease over time, for instance?

6 – did I say “extremely difficult”?  I meant to say…

7 – you can try standing, but you’re going to get tired: this is not a short process.

8 – now that, ladles and gentlespoons, is a nicely mixed metaphor, though I did stick with an aerial theme.

9 – this is a gross generalisation, I know: not all executives play golf.  Some of them play squash/racketball instead.

Don’t talk security: talk risk

We rush to implement the latest, greatest AI-enhanced, post-quantum container-based blockchain security solution.

We don’t do security because it’s fun. No: let me qualify that. Most of us don’t do security because it’s fun, but none of us get paid to do security because it’s fun[1]. Security isn’t a thing in itself, it’s a means to an end, and that end is to reduce risk.  This was a notable change in theme in and around the RSA Conference last week.  I’d love to say that it was reflected in the Expo, but although it got some lip service, selling point solutions still seemed to be the approach for most vendors.  We’re way overdue some industry consolidation, given the number of vendors advertising solutions which, to me, seemed almost indistinguishable.

In some of the sessions, however, and certainly in many of the conversations that I had in the “hallway track” or the more focused birds-of-a-feather type after show meetings, risk is beginning to feature large.  I ended up spending quite a lot of time with CISO folks and similar – CSO (Chief Security Officer) and CPSO (Chief Product Security Officer) were two other of the favoured titles – and risk is top of mind as we see the security landscape develop.  The reason this has happened, of course, is that we didn’t win.

What didn’t we win?  Well, any of it, really.  It’s become clear that the “it’s not if, it’s when” approach to security breaches is correct.  Given some of the huge, and long-term, breaches across some huge organisations from British Airways to the Marriott group to Citrix, and the continued experience of the industry after Sony and Equifax, nobody is confident that they can plug all of the breaches, and everybody is aware that it just takes one breach, in a part of the attack surface that you weren’t even thinking about, for you to be exposed, and to be exposed big time.

There are a variety of ways to try to manage this problem, all of which I heard expressed at the conference.  They include:

  • cultural approaches (making security everybody’s responsibility/problem, training more staff in different ways, more or less often);
  • process approaches (“shifting left” so that security is visible earlier in your projects);
  • technical approaches (too many to list, let alone understand or implement fully, and ranging from hardware to firmware to software, using Machine Learning, not using Machine Learning, relying on hardware, not relying on hardware, and pretty much everything in between);
  • design approaches (using serverless, selecting security-friendly languages, using smart contracts, not using smart contracts);
  • cryptographic approaches (trusting existing, tested, peer-reviewed primitives, combining established but underused techniques such as threshold signatures, embracing quantum-resistant algorithms, ensuring that you use “quantum-generated” entropy);
  • architectural approaches (placing all of your sensitive data in the cloud, placing none of your sensitive data in the cloud).

In the end, none of these is going to work.  Not singly, not in concert.  We must use as many of them as make sense in our environment, and ensure that we’re espousing a “defence in depth” philosophy such that no vulnerability will lay our entire estate or stack open if it is compromised.  But it’s not going to be enough.

Businesses and organisations exist to run, not to be weighed down by the encumbrance of security measure after security measure.  Hence the “as make sense in our environment” above, because there will always come a point where the balance of security measures outweighs the ability of the business to function effectively.

And that’s fine, actually.  Security people have always managed risk.  We may have forgotten this, as we rush to implement the latest, greatest AI-enhanced, post-quantum container-based blockchain security solution[2], but we’re always making a balance.  Too often that balance is “if we lose data, I’ll get fired”, though, rather than a different conversation entirely.

The people who pay our salaries are not our customers, despite what your manager and SVP of Sales may tell you.  They are the members of the Board.  Whether the relevant person on the Board is the CFO, the CISO, the CSO, the CTO or the CRO[3], they need to be able to talk to their colleagues about risk, because that’s the language that the rest of them will understand.  In fact, it’s what they talk about every day.  Whether it’s fraud risk, currency exchange risk, economic risk, terrorist risk, hostile take-over risk, reputational risk, competitive risk or one of the dozens of other types, risk is what they want to hear about.  And not security.  Security should be a way to measure, monitor and mitigate risk.  They know by now – and if they don’t, it’s the C[F|IS|S|T|R]O’s job to explain to them – that there’s always a likelihood that the security of your core product/network/sales system/whatever won’t be sufficient.  What they need to know is what risks that exposes.  Is it risk that:

  • the organisation’s intellectual property will be stolen;
  • customers’ private information will be exposed to the Internet;
  • merger and acquisition information will go to competitors;
  • payroll information will be leaked to the press – and employees;
  • sales won’t be able to take any orders for a week;
  • employees won’t be paid for a month;
  • or something completely different?

The answer (or, more likely, answers) will depend on the organisation and sector, but the risks will be there.  And the Board will be happy to hear about them.  Well, maybe that’s an overstatement, but they’ll be happier hearing about them in advance than after an attack has happened.  Because if they hear about them in advance, they can plan mitigations, whether that’s insurance, changes in systems, increased security or something else.

So we, as a security profession, need to get better a presenting the risk, and also at presenting options to the Board, so that they can make informed decisions.  We don’t always have all the information, and neither will anybody else, but the more understanding there is of what we do, and why we do it, the more we will be valued.  And there’s little risk in that.


1 – if I’m wrong about this, and you do get paid to do security because it’s fun, please contact me privately. I interested, but don’t think we should share the secret too widely.

2 – if this buzzphrase-compliant clickbait doesn’t get me page views, I don’t know what will.

3 – Chief [Financial|Information Security|Security|Technology|Risk] Officer.

Why security policies are worthless

A policy, to have any utility at all, needs to exist in a larger context.

“We need a policy on that.” This is a phrase that seems to act as a universal panacea to too many managers. When a problem is identified, and the blame game has been exhausted, the way to sort things out, they believe, is to create a policy. If you’ve got a policy, then everything will be fine, because everything will be clear, and everyone will obey the policy, and nothing can go wrong.

Right[1].

The problem is that policy, on its own, is worthless.

A policy, to have any utility at all, needs to exist in a larger context, or, to think of it in a different way, to sit in a chain of artefacts.  It is its place in this chain that actually gives it meaning.  Let me explain.

When that manager said that they wanted a policy, what did that actually mean?  That rather depends on how wise the manager is[2].  Hopefully, the reason that the manager identified the need for a policy was because:

  1. they noticed that something had gone wrong that shouldn’t have done and;
  2. they wanted to have a way to make sure it didn’t happen again.

The problem with policy on its own is that it doesn’t actually help with either of those points.  What use does it have, then?

Governance

Let’s look at those pieces separately.  When we say that “something had gone wrong that shouldn’t have done“, what we’re saying is that there is some sort of model for what the world should look like, giving us often general advice on our preferred state.  Sometimes this is a legal framework, sometimes it’s a regulatory framework, and sometimes it’s a looser governance model.  The sort of thing I’d expect to see at this level would be statements like:

  • patient data must be secured against theft;
  • details of unannounced mergers must not be made available to employees who are not directors of the company;
  • only authorised persons may access the military base[4].

These are high level requirements, and are statements of intent.  They don’t tell you what to do in order to make these things happen (or not happen), they just tell you that you have to do them (or not do them).  I’m going to call collections of these types of requirements “governance models”.

Processes

At the other end of the spectrum, you’ve got the actual processes (in the broader sense of the term) required to make the general intent happen.  For the examples above, these might include:

  • AES-256 encryption using OpenSSL version 1.1.1 FIPS[5], with key patient-sym-current for all data on database patients-20162019;
  • documents Indigo-1, Indigo-3, Indigo-4 to be collected after meeting and locked in cabinet Socrates by CEO or CFO;
  • guards on duty at post Alpha must report any expired passes to the base commander and refuse entry to all those producing them.

These are concrete processes that must be followed, which will hopefully allow the statements of intent that we noted above to be carried out.

Policies and audit

The problem with both the governance statements and the processes identified is that they’re both insufficient.  The governance statements don’t tell you how to do what needs to be done, and the processes are context-less, which means that you can’t tell what they relate to, or how they’re helping.  It’s also difficult to know what to do if they fail: what happens, for example, if the base commander turns up with an expired pass?

Policies are what sit in the middle.  They provide guidance as to how to implement the governance model, and provide context for the processes.  They should give you enough detail to cover all eventualities, even if that’s to say “consult the Legal Department when unsure[6]”.  What’s more, they should be auditable.  If you can’t audit your security measures, then how can you be sure that your governance model is being followed?  But governance models, as we’ve already discovered, are at the level of intent – not the sort of thing that can be audited.

Policies, then, should provide enough detail that they can be auditable, but they should also preferably be separated enough from implementation that rules can be written that are applicable in different circumstances.  Here are a few policies that we might create to follow the first governance model examples above:

  • patient data must be secured against theft;
    • Policy 1: all data at rest must be encrypted by a symmetric key of at least 256 bits or equivalent;
    • Policy 2: all storage keys must be rotated at least once a month;
    • Policy 3: in the event of a suspected key compromise, all keys at the same level in that department must be rotated.

You can argue that these policies are too detailed, or you can argue that they’re not detailed enough: both arguments are fair, and level of detail (or granularity) should depend on the context and the use to which they are being put.  However, though I’m sure that all of the example policies I’ve given could be improved, I hope that they are all:

  • auditable;
  • able to be implemented by one or more well-defined processes;
  • understandable by both those who concerned with the governance level and those involved at the process implementation and operations level.

The value of auditing

I’ve written about auditing before (Confessions of an auditor).  I think it’s really important.  Done well, it should allow you to discover whether:

  1. your processes are covering all of the eventualities of your policies;
  2. whether your policies are actually being implemented correctly.

Auditing may also address whether your policies fully meet your governance model.  Auditing well is a skill, but in order to help your auditor – whether they are good at it or bad at it – having a clearly defined set of policies is a must.  But, as I pointed out at the beginning of this article, policies for policies’ sake are worthless: put them together with governance and processes, however, and they provide technical and business value.


1 – this is sarcasm.

2 – yes, I know.  But let’s not be rude if we can avoid it.  We want to help managers, because the more clue we deliver to them, the easier our lives will be[3].

3 – and you never know: one day, even you might be a manager.

4 – which is probably not a US military base, given the spelling of “authorised”.

5 – example only…

6 – this, in my experience, is the correct answer to many questions.

7 steps to security policy greatness

… we’ve got a good chance of doing the right thing, and keeping the auditors happy.

Security policies are something that everybody knows they should have, but which are deceptively simple to get wrong.  A set of simple steps to check when you’re implementing them is something that I think it’s important to share.  You know when you come up a set of steps, you suddenly realise that you’ve got a couple of vowels in there and you think, “wow, I can make an acronym!”?[1] This was one of those times.  The problem was that when I looked at the different steps, I decided that DDEAVMM doesn’t actually have much of a ring to it.  I’ve clearly still got work to do before I can name major public sector projects, for instance[2].  However, I still think it’s worth sharing, so let’s go through them in order.  Order, as for many sets of steps, is important here.

I’m going to give an example and walk through the steps for clarity.  Let’s say that our CISO, in his or her infinite wisdom, has decided that they don’t want anybody outside our network to be able to access our corporate website via port 80.  This is the policy that we need to implement.

1. Define

The first thing I need is a useful definition.  We nearly have that from the request[3] noted above, when our CISO said “I don’t want anybody outside our network to be able to access our corporate website via port 80”.  So, let’s make that into a slightly more useful definition.

“Access to IP address mycorporate.network.com on port 80 must be blocked to all hosts outside the 10.0.x.x to 10.3.x.x network range.”

I’m assuming that we already know that our main internal network is within 10.0.x.x to 10.3.x.x.  It’s not exactly a machine readable definition, but actually, that’s the point: we’re looking for a definition which is clear and human understandable.  Let’s assume that we’re happy with this, at least for now.

2. Design

Next, we need to design a way to implement this.  There are lots of ways of doing this – from iptables[4] to a full-blown, commercially supported firewall[6] – and I’m not fluent in any of them these days, so I’m going to assume that somebody who is better equipped than I has created a rule or set of rules to implement the policy defined in step 1.

We also need to define some tests – we’ll come back to these in step 5.

3. Evaluate

But we need to check.  What if they’ve mis-implemented it?    What if they misunderstood the design requirement?  It’s good practice – in fact, it’s vital – to do some evaluation of the design to ensure it’s correct.  For this rather simple example, it should be pretty to check by eye, but we might want to set up a test environment to evaluate that it meets the policy definition or take other steps to evaluate its correctness.  And don’t forget: we’re not checking that the design does what the person/team writing it thinks it should do: we’re checking that it meets the definition.  It’s quite possible that at this point we’ll realise that the definition was incorrect.  Maybe there’s another subnet – 10.5.x.x, for instance – that the security policy designer didn’t know about.  Or maybe the initial request wasn’t sufficiently robust, and our CISO actually wanted to block all access on any port other than 443 (for HTTPS), which should be allowed.  Now is a very good time to find that out.

4. Apply

We’ve ascertained that the design does what it should do – although we may have iterated a couple of times on exactly “what it should do” means – so now we can implement it.  Whether that’s ssh-ing into a system, uploading a script, using some sort of GUI or physically installing a new box, it’s done.

Excellent: we’re finished, right?

No, we’re not.  The problem is that often, that’s exactly what people think.  Let’s move to our next step: arguably the most important, and the most often forgotten or ignored.

5. Validate

I really care about this one: it’s arguably the point of this post.  Once you’ve implemented a security policy, you need to validate that it actually does what it’s supposed to do.  I’ve written before about this, in my post If it isn’t tested, it doesn’t work, and it’s absolutely true of security policy implementations.  You need to check all of the different parts of the design.  This, you might hope, would be really easy, but even in the simple case that we’re working with, there are lots of tests you should be doing.  Let’s say that we took the two changes mentioned in step 3.  Here are some tests that I would want to be doing, with the expected result:

  • FAIL: access port 80 from an external IP address
  • FAIL: access port 8080 from an external IP address
  • FAIL: access port 80 from a 10.4.x.x address
  • PASS: access port 443 from an external IP address
  • UNDEFINED: access port 80 from a 10.5.x.x
  • UNDEFINED: access port 80 from a 10.0.x.x address
  • UNDEFINED: access port 443 from a 10.0.x.x address
  • UNDEFINED: access port 80 from an external IP address but with a VPN connection into 10.0.x.x

Of course, we’d want to be performing these tests on a number of other ports, and from a variety of IP addresses, too.

What’s really interesting about the list is the number of expected results that are “UNDEFINED”.  Unless we have a specific requirement, we just can’t be sure what’s expected.  We can guess, but we can’t be sure.  Maybe we don’t care?  I particularly like the last one, because the result we get may lead us much deeper into our IT deployment than we might expect.

The point, however, is that we need to check that the actual results meet our expectations, and maybe even define some new requirements if we want to remove some of the “UNDEFINED”.  We may be fine to leave some the expected results as “UNDEFINED”, particularly if they’re out of scope for our work or our role.  Obviously, if the results don’t meet our expectations, then we also need to make some changes and apply them and then re-validate.  We also need to record the final results.

When we’ve got more complex security policy – multiple authentication checks, or complex SDN[7] routing rules – then our tests are likely to be much, much more complex.

6. Monitor

We’re still not done.  Remember those results we got in our previous tests?  Well, we need to monitor our system and see if there’s any change.  We should do this on a fairly frequent basis.  I’m not going to say “regular”, because regular practices can lead to sloppiness, and also leave windows of opportunities open to attackers.  We also need to perform checks whenever we make a change to a connected system.  Oh, if I had a dollar for every time I’ve heard “oh, this won’t affect system X at all.”…[8]

One interesting point is that we should also note when results whose expected value remains “UNDEFINED” change.  This may be a sign that something in our system has changed, it may be a sign that a legitimate change has been made in a connected system, or it may be a sign of some sort of attack.  It may not be quite as important as a change in one of our expected “PASS” or “FAIL” results, but it certainly merits further investigation.

7. Mitigate

Things will go wrong.  Some of them will be our fault, some of them will be our colleagues’ fault[10], some of them will be accidental, or due to hardware failure, and some will be due to attacks.  In all of these cases, we need to act to mitigate the failure.  We are in charge of this policy, so even if the failure is out of our control, we want to make sure that mitigating mechanism are within our control.  And once we’ve completed the mitigation, we’re going to have to go back at least to step 2 and redesign our implementation.  We might even need to go back to step 1 and redefine what our definition should be.

Final steps

There are many other points to consider, and one of the most important is the question of responsibility, touched on in step 7 (and which is particularly important during holiday seasons), special circumstances and decommissioning, but if we can keep these steps in mind when we’re implementing – and running – security policies, we’ve got a good chance of doing the right thing, and keeping the auditors happy, which is always worthwhile.


1 – I’m really sure it’s not just me.  Ask your colleagues.

2 – although, back when Father Ted was on TV, a colleague of mine and I came up with a database result which we named a “Fully Evaluated Query”.  It made us happy, at least.

3 – if it’s from the CISO, and he/she is my boss, then it’s not a request, it’s an order, but you get the point.

4 – which would be my first port[5] of call, but might not be the appropriate approach in this context.

5 – sorry, that was unintentional.

6 – just because it’s commercially support doesn’t mean it has to be proprietary: open source is your friend, boys and girls, ladies and gentlemen.

7 – Software-Defined Networking.

8 – I’m going to leave you hanging, other than to say that, at current exchange rates, and assuming it was US dollars I was collecting, then I’d have almost exactly 3/4 of that amount in British pounds[9].

9 – other currencies are available, but please note that I’m not currently accepting bitcoin.

10 – one of the best types.

Confessions of an auditor

Moving to a postive view of security auditing.

So, right up front, I need to admit that I’ve been an auditor.  A security auditor.  Professionally.  It’s time to step up and be proud, not ashamed.  I am, however, dialled into a two-day workshop on compliance today and tomorrow, so this is front and centre of my thinking right at the moment.  Hence this post.

Now, I know that not everybody has an entirely … positive view of auditors.  And particularly security auditors[1].  They are, for many people, a necessary evil[2].  The intention of this post is to try to convince you that auditing[3] is, or can and should be, a Force for Good[tm].

The first thing to address is that most people have auditors in because they’re told that they have to, and not because they want to.  This is often – but not always – associated with regulatory requirements.  If you’re in the telco space, the government space or the financial services space, for instance, there are typically very clear requirements for you to adhere to particular governance frameworks.  In order to be compliant with the regulations, you’re going to need to show that you meet the frameworks, and in order to do that, you’re going to need to be audited.  For pretty much any sensible auditing regime, that’s going to require an external auditor who’s not employed by or otherwise accountable to the organisation being audited.

The other reason that audit may happen is when a C-level person (e.g. the CISO) decides that they don’t have a good enough idea of exactly what the security posture of their organisation – or specific parts of it – is like, so they put in place an auditing regime.  If you’re lucky, they choose an industry-standard regime, and/or one which is actually relevant to your organisation and what it does.  If you’re unlucky, … well, let’s not go there.

I think that both of the reasons above – for compliance and to better understand a security posture – are fairly good ones.  But they’re just the first order reasons, and the second order reasons – or the outcomes – are where it gets interesting.

Sadly, one of key outcomes of auditing seems to be blame.  It’s always nice to have somebody else to blame when things go wrong, and if you can point to an audited system which has been compromised, or a system which has failed audit, then you can start pointing fingers.  I’d like to move away from this.

Some positive suggestions

What I’d like to see would be a change in attitude towards auditing.  I’d like more people and organisations to see security auditing as a net benefit to their organisations, their people, their systems and their processes[4].  This requires some changes to thinking – changes which many organisations have, of course, already made, but which more could make.  These are the second order reasons that we should be considering.

  1. Stop tick-box[5] auditing.  Too many audits – particularly security audits – seem to be solely about ticking boxes.  “Does this product or system have this feature?”  Yes or no.  That may help you pass your audit, but it doesn’t give you a chance to go further, and think about really improving what’s going on.  In order to do this, you’re going to need to find auditors who actually understand the systems they’re looking at, and are trained to some something beyond tick-box auditing.  I was lucky enough to be encouraged to audit in this broader way, alongside a number of tick-boxes, and I know that the people I was auditing always preferred this approach, because they told me that they had to do the other type, too – and hated it.
  2. Employ internal auditors.  You may not be able to get approval for actual audit sign-off if you use internal auditors, so internal auditors may have to operate alongside – or before – your external auditors, but if you find and train people who can do this job internally, then you get to have people with deep knowledge (or deeper knowledge, at least, than the external auditors) of your systems, people and processes looking at what they all do, and how they can be improved.
  3. Look to be proactive, not just reactive.  Don’t just pick or develop products, applications and systems to meet audit, and don’t wait until the time of the audit to see whether they’re going to pass.  Auditing should be about measuring and improving your security posture, so think about posture, and how you can improve it instead.
  4. Use auditing for risk-based management.  Last, but not least – far from least – think about auditing as part of your risk-based management planning.  An audit shouldn’t be something you do, pass[6] and then put away in a drawer.  You should be pushing the results back into your governance and security policy model, monitoring, validation and enforcement mechanisms.  Auditing should be about building, rather than destroying – however often it feels like it.

 


1 – you may hear phrases like “a special circle of Hell is reserved for…”.

2 – in fact, many other people might say that they’re an unnecessary evil.

3 – if not auditors.

4 – I’ve been on holiday for a few days: I’ve maybe got a little over-optimistic while I’ve been away.

5 – British usage alert: for US readers, you’ll probably call this a “check-box”.

6 – You always pass every security audit first time, right?

What’s a State Actor, and should I care?

The bad thing about State Actors is that they rarely adhere to an ethical code.

How do you know what security measures to put in place for your organisation and the systems that you run?  As we’ve seen previously, There are no absolutes in security, so there’s no way that you’re ever going to make everything perfectly safe.  But how much effort should you put in?

There are a number of parameters to take into account.  One thing to remember is that there are always trade-offs with security.  My favourite one is a three-way: security, cost and usability.  You can, the saying goes, have two of the three, but not the third: choose security, cost or usability.  If you want security and usability, it’s going to cost you.  If you want a cheaper solution with usability, security will be reduced.  And if you want security cheaply, it’s not going to be easily usable.  Of course, cost stands in for time spent as well: you might be able to make things more secure and usable by spending more time on the solution, but that time is costly.

But how do you know what’s an appropriate level of security?  Well, you need to think about what you’re protecting, the impact of it being:

  • exposed (confidentiality);
  • changed (integrity);
  • inaccessible (availability).

These are three classic types of control, often recorded as C.I.A.[1].  Who would be impacted?  What mitigations might you put in place? What vulnerabilities might exist in the system, and how might they be exploited?

One of the big questions to ask alongside all of these is “who exactly might be wanting to attack my systems?”  There’s a classic adage within security that “no system is secure against a sufficiently motivated and resourced attacker”.  Luckily for us, there are actually very few attackers who fall into this category.  Some examples of attackers might be:

  • insiders[3]
  • script-kiddies
  • competitors
  • trouble-makers
  • hacktivists[4]
  • … and more.

Most of these will either not be that motivated, or not particularly well-resourced.  There are two types of attackers for whom that is not the case: academics and State Actors.  The good thing about academics is that they have adhere to an ethical code, and shouldn’t be trying anything against your systems without your permission.  The bad thing about State Actors is that they rarely adhere to an ethical code.

State Actors have the resources of a nation state behind them, and are therefore well-resourced.  They are also generally considered to be well-motivated – if only because they have many people available to perform attack, and those people are well-protected from legal process.

One thing that State Actors may not be, however, is government departments or parts of the military.  While some nations may choose to attack their competitors or enemies (or even, sometimes, partners) with “official” parts of the state apparatus, others may choose a “softer” approach, directing attacks in a more hands-off manner.  This help may take a variety of forms, from encouragement, logistical support, tools, money or even staff.  The intent, here, is to combine direction with plausible deniability: “it wasn’t us, it was this group of people/these individuals/this criminal gang working against you.  We’ll certainly stop them from performing any further attacks[5].”

Why should this matter to you, a private organisation or public company?  Why should a nation state have any interest in you if you’re not part of the military or government?

The answer is that there are many reasons why a State Actor may consider attacking you.  These may include:

  • providing a launch point for further attacks
  • to compromise electoral processes
  • to gain Intellectual Property information
  • to gain competitive information for local companies
  • to gain competitive information for government-level trade talks
  • to compromise national infrastructure, e.g.
    • financial
    • power
    • water
    • transport
    • telecoms
  • to compromise national supply chains
  • to gain customer information
  • as revenge for perceived slights against the nation by your company – or just by your government
  • and, I’m sure, many others.

All of these examples may be reasons to consider State Actors when you’re performing your attacker analysis, but I don’t want to alarm you.  Most organisations won’t be a target, and for those that are, there are few measures that are likely to protect you from a true State Actor beyond measures that you should be taking anyway: frequent patching, following industry practice on encryption, etc..  Equally important is monitoring for possible compromise, which, again, you should be doing anyway.  The good news is that if you might be on the list of possible State Actor targets, most countries provide good advice and support before and after the act for organisations which are based or operate within their jurisdiction.


1 – I’d like to think that they tried to find a set of initials for G.C.H.Q. or M.I.5., but I suspect that they didn’t[2].

2 – who’s “they” in this context?  Don’t ask.  Just don’t.

3 – always more common than we might think: malicious, bored, incompetent, bankrupt – the reasons for insider-related security issues are many and varied.

4 – one of those portmanteau words that I want to dislike, but find rather useful.

5 – yuh-huh, right.

Why I should have cared more about lifecycle

Every deployment is messy.

I’ve always been on the development and architecture side of the house, rather than on the operations side. In the old days, this distinction was a useful and acceptable one, and wasn’t too difficult to maintain. From time to time, I’d get involved with discussions with people who were actually running the software that I had written, but on the whole, they were a fairly remote bunch.

This changed as I got into more senior architectural roles, and particularly as I moved through some pre-sales roles which involved more conversations with users. These conversations started to throw up[1] an uncomfortable truth: not only were people running the software that I helped to design and write[3], but they didn’t just set it up the way we did in our clean test install rig, run it with well-behaved, well-structured data input by well-meaning, generally accurate users in a clean deployment environment, and then turn it off when they’re done with it.

This should all seem very obvious, and I had, of course, be on the receiving end of requests from support people who exposed that there were odd things that users did to my software, but that’s usually all it felt like: odd things.

The problem is that odd is normal.  There is no perfect deployment, no clean installation, no well-structured data, and certainly very few generally accurate users.  Every deployment is messy, and nobody just turns off the software when they’re done with it.  If it’s become useful, it will be upgraded, patched, left to run with no maintenance, ignored or a combination of all of those.  And at some point, it’s likely to become “legacy” software, and somebody’s going to need to work out how to transition to a new version or a completely different system.  This all has major implications for security.

I was involved in an effort a few years ago to describe the functionality, lifecycle for a proposed new project.  I was on the security team, which, for all the usual reasons[4] didn’t always interact very closely with some of the other groups.  When the group working on error and failure modes came up with their state machine model and presented it at a meeting, we all looked on with interest.  And then with horror.  All the modes were “natural” failures: not one reflected what might happen if somebody intentionally caused a failure.  “Ah,” they responded, when called on it by the first of the security to be able to form a coherent sentence, “those aren’t errors, those are attacks.”  “But,” one of us blurted out, “don’t you need to recover from them?”  “Well, yes,” they conceded, “but you can’t plan for that.  It’ll need to be on a case-by-case basis.”

This is thinking that we need to stamp out.  We need to design our systems so that, wherever possible, we consider not only what attacks might be brought to bear on them, but also how users – real users – can recover from them.

One way of doing this is to consider security as part of your resilience planning, and bake it into your thinking about lifecycle[5].  Failure happens for lots of reasons, and some of those will be because of bad people doing bad things.  It’s likely, however, that as you analyse the sorts of conditions that these attacks can lead to, a number of them will be similar to “natural” errors.  Maybe you could lose network connectivity to your database because of a loose cable, or maybe because somebody is performing a denial of service attack on it.  In both these cases, you may well start off with similar mitigations, though the steps to fix it are likely to be very different.  But considering all of these side by side means that you can help the people who are actually going to be operating those systems plan and be ready to manage their deployments.

So the lesson from today is the same as it so often is: make sure that your security folks are involved from the beginning of a project, in all parts of it.  And an extra one: if you’re a security person, try to think not just about the attackers, but also about all those poor people who will be operating your software.  They’ll thank you for it[6].


1 – not literally, thankfully[2].

2 – though there was that memorable trip to Singapore with food poisoning… I’ll stop there.

3 – a fact of which I actually was aware.

4 – some due entirely to our own navel-gazing, I’m pretty sure.

5 – exactly what we singularly failed to do in the project I’ve just described.

6 – though probably not in person.  Or with an actual gift.  But at least they’ll complain less, and that’s got to be worth something.