planning – Alice, Eve and Bob

10(+1) plans for 2022

I’m not a big fan of New Year’s resolutions, as I don’t like to set myself up to fail.

This week’s song: Bleed to Love Her by Fleetwood Mac.

I’m not a big fan of New Year’s resolutions, as I don’t like to set myself up to fail. Instead, here are a few things – professional and personal – that I hope or expect to be doing this year. Call them resolutions if you want, but words have power, and I’m avoiding the opportunity

Spend lots of time shepherding Enarx to greater maturity. At Profian, we see our future as closely ties to that of Enarx, and we’ll be growing the project’s capabilities and functionality significantly over this year. Keep an eye out for announcements!
Get fit(ter) again. Yeah, that.
Promote my book. I’m really proud of my book Trust in Computer Systems and the Cloud, which was published right at the end of the year. It aims to raise the standard of knowledge within the industry by proposing a framework for discussion, and I want to make that happen.
Start travelling again. I miss conferences, I miss seeing colleagues, I miss meeting new people. Hopefully it’s going to be easier and safer to travel this year.
Delegate better (and more). As the CEO of a startup, there’s lots I need to make happen. I’m not always the best person actually to be doing it all, and learning to help other people take some (more!) of it over is actually really important not just dot me, but for the business.
Drink lots of tea. No real change here.
Drjnk good whisky. In moderation.
Keep gaming. Possibly a weird one, but gaming is an important downtime activity for me, and helps me relax.
Make the most of music. I listen to lots of music whilst working, travelling, driving, relaxing, etc.. Watch out for a link to the playlist associated with my book – I also plan to list a song or track a week on my blog (see the top of this article for this week’s offering!).
Enjoy reading. One of the benefits of having completed the book is that I now have more time to read; more specifically, more time when I don’t feel guilty that I’m reading rather than doing book-work.
A bonus one: spend more time over at Opensource.com. I’m a Correspondent over there, and enjoy both writing for them and reading other people’s contributions. A great way to get into – or keep up-to-date with – the open source community.

So – not the most inspiring list, but if I can manage most of these this year, I’ll be happy.

Organisational suppleness

Growing the ability to react to the unexpected is a valuable skill.

“In preparing for battle I have always found that plans are useless but planning is indispensable.”
Dwight D. Eisenhower

Much of this blog is about security – cybersecurity – in one way or another, but on occasion I do try to take a broader view. Cybersecurity is often modelled or described in military terms, talking about “fighting battles”, “wars of attrition” and “arms races” with “attackers”. These can be useful metaphors (and it’s why I started this article with a quote from a general), but there is a broader set of responsibilities that many of us in the sector need to consider, which is the continued (and hopefully healthy) functioning of our businesses and organisations. In particular, I like to talk about risk and how it relates not just to security, but to how businesses work and plan. One theme that I’ve visited before is that known or planned degradation of a service is often significantly better than failure, or even planned closure (see Service degradation: actually a good thing). My argument is that there are many occasions where keeping a service or business function running, albeit at reduced capacity, or with reductions in known capabilities, allows for better continuity than just stopping it.

Keeping a service running requires work. You can’t just hope that everything is installed and will run as you expect: what happens when your administrator is ill, your fibre-optic cable gets severed by a back hoe, or a DDoS attack is directed at you? You need to plan and practice what to do in these situations. What I’d like to explore in this article goes somewhat beyond the expectation of that planning in three directions. Let’s call them scenario coverage, muscle memory and organisational suppleness.

Scenario coverage

The first, and most obvious of the three directions, is about understanding eventualities. The more scenarios that we model and practice, the more we reduce our risk, simply because we have reduced the number of unknown eventualities in the probability space. There is a actually a side benefit to modelling lost of scenarios, which is that the more situations you consider, the more will come to mind. Every situation involves sets of choices or probabilities – “after the door closes, will it lock or not?” or “if the coolant fails, will the system turn off or burst into flames?” – and the more scenarios you consider, the more questions will arise. This can be daunting – and it’s almost impossible to consider every eventuality – but the more options are covered, the better your opportunities to mitigate the various risks they present.

Muscle memory

Muscle memory is what comes with training and practice. Assuming that you are including your teams in the scenario planning

And I’m assuming here that the planning isn’t solely a paper exercise. Theoretical planning, while useful, only goes so far, for a couple of important reasons:

systems will always fails in unexpected ways
people will do unexpected things.

What the first of these means is that however much you assume that your back-up generator will kick in if there’s a power outage, until you test it, you can’t be sure that it will. The second of these relates to the fact that however much you tell people what to do, when it actually comes to the doing of it, they’re unlikely to as you expect. This is likely to be even worse if there’s been no training, and you’re just assuming that person X will know how to operate a fire extinguisher, or that team Y will, of course, exit the building in an orderly manner via exit Z (rather than find fourteen different exits, or not even leave the building at all).

For both of these reasons, getting people together to work through possible scenarios, and then, where possible, actually practising what to do, means that you have a higher assurance that when one of the situations you’ve considered does arrive, that they will know what to do, and act as you expect.

Organisational suppleness

While you cannot, as we’ve noted, plan for every eventuality or know exactly how an employee or team will react when things go wrong, there is another benefit to involving a broad group of people in your scenario planning and training. This is that their very involvement gives them practice in dealing with uncertainty, working out how they will react, and giving them experience in how those around them will act. While I may not know exactly what to do if the payroll system goes down an hour before it is due to run, if I have worked with colleagues on scenarios where the sales processing system fails, I’ve got a better chance of making some sensible choices about who to contact, initial steps to take and information to collect than if this is the first time I’ve ever seen anything like it. Likewise, we may not have modelled our response to a physical failure of one of our network links, but our shared experience of practising our response to a DDoS attack means that we have an idea of what to do.

And it is not just having an idea of what to do that is important, but also having gathered and practised the cognitive skills associated with investigating failures, collating data, sharing information and working with others to ameliorate the situation which allows a team or an organisation to respond better to new, maybe unexpected situations. We can think of this as suppleness, as it means that rather than just failing, or cracking, an organisation can react as a tree does to strong winds, or a gymnast does to a new exercise. Growing the ability to react to the unexpected is a valuable skill for an organisation, and knowing that it is supple allows its leaders to plan with more certainty and mitigate more risk.

Hanging up my Red Hat

It’s time to move on.

Friday (2021-06-11) was my last day at Red Hat. I’ve changed my LinkedIn, Facebook and Twitter profiles and updated the information on this blog, too. I’d been at Red Hat for just under 5 years, which is one of the longest stays I’ve had at any company in my career. When I started there, I realised that there was a huge amount about the company which really suited who I was, and my attitude to life, and, in particular, open source. That hasn’t changed, and although the company is very different to the one I joined in 2016 – it’s been acquired by IBM, got a new CEO and more than doubled in size – there’s still lots about it which feels very familiar and positive. Bright people, doing stuff they care about, and sharing what they’re doing with the rest of the company and the world: great values.

I’ve also made lots of friends, and got involved in lots of cool things and institutions. I’d particularly call out Opensource.com and the Confidential Computing Consortium. And, of course, the Enarx project.

But … it’s time to move on. I’ve developed an interest in something I care a whole lot about, and which, after lots of discussion and soul-searching, I’ve decided to move into that. I hope to be able to talk more about it in a few weeks, and until then, this blog may be a little quiet. In the meantime, have fun, keep safe and do all that good security stuff.

Security, cost and usability (pick 2)

If we cannot be explicit that there is a trade-off, it’s always security that loses.

Everybody wants security: why wouldn’t you? Let’s role-play: you’re a software engineer on a project to create a security product. There comes a time in the product life-cycle when it’s nearly due, and, as usual, time is tight. So you’re in the regular project meeting and the product manager’s there, so you ask them what they want you to do: should you prioritise security? The product manager is very clear[1]: they will tell you that they want the product as secure as possible – and they’re right, because that’s what customers want. I’ve never spoken to a customer (and I’ve spoken to lots of customers over the years) who said that they’d prefer a product which wasn’t as secure as possible. But there’s a problem, which is that all customers also want their products tomorrow – in fact, most customers want their products today, if not yesterday.

Luckily, products can generally be produced more quickly if more resources are applied (though Frederick Brooks’ The Mythical Man Month tells us that simple application of more engineers is actually likely to have a negative impact), so the requirement for speed of delivery can be translated to cost. There’s another thing that customers want, however, and that is for products to be easy to use: who wants to get a new product and then, when it arrives, for it to take months to integrate or for it to be almost impossible for their employees to run it as they expect?

So, to clarify, customers want a security product to be be the following:

secure – security is a strong requirement for many enterprises and organisations[3], and although we shouldn’t ever use the word secure on its own, that’s still what customers want;
cheap – nobody wants to pay more than the minimum they can;
usable – everybody likes simple-to-use, easy-to-integrate applications.

There’s a problem, however, which is that out of the three properties above, you can only choose two for any application or project. You say this to your product manager (who’s always right, remember[1]), and they’ll say: “don’t be ridiculous! I want all three”.

But it just doesn’t work like that: why? Here’s my take on the reasons. Security, simply stated, is designed to stop people doing things. Stated from the point of view of a user, security’s view is to reduce usability. “Doing security” is generally around applying controls to actions in a system – whether by users or non-human entities – and the simplest way to apply it is “blanket security” – defaulting to blocking or denying actions. This is sometimes known as fail to safe or fail to closed.

Let’s take an example: you have a simple internal network in your office and you wish to implement a firewall between your network and the Internet, to stop malicious actors from probing your internal machines and to compromised systems on the internal network from communicating out to the Internet. “Easy,” you think, and set up a DENY ALL rule for connections originating outside the firewall, and a DENY ALL rule for connections originating inside the firewall, with the addition of a ALLOW all outgoing port 443 connections to ensure that people can use web browsers to make HTTPS connections. You set up the firewall, and get ready to head home, knowing that your work is done. But then the problems arise:

it turns out that some users would like to be able to send email, which requires a different outgoing port number;
sending email often goes hand in hand with receiving email, so you need to allow incoming connections to your mail server;
one of your printers has been compromised, and is making connections over port 443 to an external botnet;
in order to administer the pay system, your accountant – who is not a full-time employee, and works from home, needs to access your network via a VPN, which requires the ability to accept an incoming connection.

Your “easy” just became more difficult – and it’s going to get more difficult still as more users start encountering what they will see as your attempts to make their day-to-day revenue-generating lives more difficult.

This is a very simple scenario, but it’s clear that in order to allow people actually to use a system, you need to spend a lot more time understanding how security will interact with it, and how people’s experience of the measures you put in place will be impacted. Usability and user experience (“UX”) is a complex field on its own, but when you combine it with the extra requirements around security, things become even more tricky.

You need both to manage the requirements of users to whom the security measures should be transparent (“TLS encryption should be on by default”) and those who may need much more control (“developers need to be able to select the TLS cipher suite options when connecting to a vendor’s database”), so you need to understand the different personae[4] you are targeting for your application. You also need to understand the different failures modes, and what the correct behaviour should be: if authentication fails three times in a row, should the medical professional who is trying to get a rush blood test result be locked out of the system, or should the result be provided, and a message sent to an administrator, for example? There will be more decisions to make, based on what your application does, the security policies of your customers, their risk profiles, and more. All of these investigations and decisions take time, and time equates to money. What is more, they also require expertise – both in terms of security but also usability – and that is in itself expensive.

So, you have three options:

choose usability and cost – you can prioritise usability and low cost, but you won’t be able to apply security as you might like;
choose security and cost – in this case, you can apply more security to the system, but you need to be aware that usability – and therefore your customer’s acceptance of the system – will suffer;
choose usability and security – I wish this was the one that we chose every time: you decide that you’re willing to wait longer or pay more for a more secure product, which people can use.

I’m not going to pretend that these are easy decisions, nor that they are always clear cut. And a product manager’s job is sometimes to make difficult choices – hopefully ones which can be re-balanced in a later release, but difficult choices nevertheless. It’s really important, however, that anyone involved in security – as an engineer, as a UX expert, as a product manager, as a customer – understands the trade-off here. If we cannot be explicit that there is a trade-off, then the trade-off will be made silently, and in my experience, it’s always security that loses.

1 – and right: product managers are always right[2].

2 – I know: I used to be a product manager.

3 – and the main subject of this blog, so it shouldn’t be a surprise that I’m writing about it.

4 – or personas if you really, really must. I got an “A” in Latin O level, and I’m not letting this one go.

In praise of triage

It’s all too easy to prioritise based on the “golfing test”.

Not all bugs are created equal.

Some bugs need fixing now, some bugs can wait. Some bugs are in your implementation, some are in the underlying design. Some bugs will annoy a few customers, some will destroy your business.

Bugs come in all shapes and sizes, and one of the tasks of a product owner, product manager, chief architect – whoever makes the call about where to assign resources – is to decide which ones to address in which order: to prioritise them. The problem is deciding how to prioritise them. It’s all too easy to prioritise based on the “golfing test”: your CEO meets someone on golf course who mentions that his or her company loves your product, except for one tiny issue. The CEO comes back, and makes it clear that fixing this “major bug” is now your one and only task until it’s done, and your world is turned upside down. You have to fix the bug as quickly as possible, with no thought to the impact it has on the rest of the project, or the immense pile technical debt that’s just been accrued. You don’t want to live in this world. What, then, is the alternative?

The answer – though it’s only the beginning of the answer – is triage. Triage (from the French for “separating out”) comes from the world of battlefield medicine. When deciding which wounded soldiers to treat, rapid (hopefully objective) assessments are carried out, allowing a quick sorting of each soldier, typically into categories such as “not urgent: wait”, “urgent: treat immediately” and “not saveable: do not treat”. We can apply the same to software bugs in order to decide what to treat (fix) and with what priority. The important thing is not so much the categories – which will vary based on your context – but the assessment criteria, and how they are applied. Here are a list of just some of the possible criteria:

likely monetary impact per customer
number of customers impacted
reputational impact on your organisation
ease to fix
impact on system security
impact on system performance
impact on system stability
annoyance of CEO not to be listened to.

We do not, of course, only need to apply one of these: a number of them can be combined with a weighting system, though the more you add, the less clear your priorities will be, and the more likely it is that someone will “put a finger on the scales” – tweak the numbers to give the outcome they want. Another important point about the categories that you decide to apply is that they should be as measurable as you can make them, to allow as objective scoring as possible. I wrote a review of the book Building Evolutionary Architectures a while ago: the methodology adopted there, where you measure and test in order to meet specific criteria, is exactly the sort of approach you should be choosing when designing your triage system.

This is (ostensibly) a blog about security, and so you might expect me to say that “security always wins”, but that should absolutely not be the approach you take. Security might be the most important category for you (that is, carry the most weight), but you need to understand why that is the case – at this particular time – and what exactly you mean by “security”. The “security of the system” is not an objective measure: in order to mean anything, such a phrase needs to reference measurements that can be made (“resistance to physical tampering”, “resistance to brute force attacks”, “number or PhD students likely to be needed to reverse engineer our ‘secure’ protocol”[1]). More importantly, it may be that at this point in your organisation’s life, the damage done by lack of stability or decreased performance outweighs the impact of a security bug. If that’s the case, then your measurements should encapsulate that information and lead you to prioritise bugs with impact in these categories over security issues[3].

There’s one proviso that I feel I need to put in at this point, and it’s about the power of what, in Agile Methodology terms, is called the Product Owner. This is the person who represents the users of the product/project, and should have final say about the direction of development in terms of features, functionality and, most relevant in this discussion, bug-fixing. As noted above, this may be an architect, product manager or someone enjoying another title, but their role should be clear: they get to call the shots. There are times when this person goes against the evidence provided by the triage, and makes a decision to prioritise a particular bug over others despite the outcome of the measurements. This is typically very painful for the technical team[4], but, when it comes down to it, as the product owner, they get to decide. The technical team – after appropriate warnings and discussion[5] – must be ready to step aside and accept the decision. Such decisions (and related discussions) should be recorded, and the product owner must be ready to stand or fall based on the outcome, but that is their job. Triage is a guide, and there are occasions when there are measurements which cannot be easily made objectively, and which sit outside the expertise or scope of knowledge of the technical team. If this sort of decision keeps being made, and you think you know better, you may have a future in technical product management, where people with a view of both the technical and the business side of technology are much in demand. In the end, though, the product owner will need to justify their decision to management, and if they get it wrong, then they must be ready to take the blame (this is one reason why you should make sure that you’ve recorded the process taken to get to this decision – you don’t want to take the blame for a poor decision which you advised against).

So: go out an design a triage process, be ready to follow it, and be ready to defend it. Oh, and one last point: you might want to buy a set of golf clubs.

—–

1 – this last one is a joke: don’t design your own protocol, or if you do, make it open and have it peer-reviewed[2].

2 – and then throw it away and use an open source implementation of better, more thoroughly-reviewed one.

3 – much as it pains me to say it.

4 – I’ve been on both sides of these decisions: I know.

5 -often rather heated, in my experience.

Are you positive?

What do pregnancy tests and the Ukrainian aircraft missile strike have in common?

Not everything in life is nicely binary, much as we[1] might like it to be. There are shades of grey[2] in many aspects of life, and though humans can often cope with uncertainty, computer systems are less good at it: they generally want a “yes” or “no” answer. This means that decisions sometimes need to be made on incomplete evidence, and, well, that means that the answers aren’t always correct. There’s a whole area of computer science related to this: fuzzy logic.

Let’s look into what the options are. Assuming that we’re looking two options: “yes” (a positive) and “no” (a negative). That means that there are two ways in which the answer can be incorrect:

a “yes” answer was incorrectly chosen (false positive);
a “no” answer was incorrectly chosen (false negative).

An example to allow us to explore this is pregnancy. It’s generally agreed that you can’t be a little bit pregnant: if you take a test, any result it gives you needs to be either positive or negative. If you are pregnant, and a test result comes back negative, then that’s a false negative. If you are not pregnant, and a test comes back positive, that’s a false positive. The implications of a false positive or a false negative can both be pretty major – as anybody who has received one will tell you. I spent a little time online trying to find expected false positive and false negatives for pregnancy tests, but it turns out that the rates are so dependent on a variety of factors that it was difficult to find a sensible answer[3].

A tragic recent example of a false positive took place on Wednesday, 8th January 2020, when a Ukrainian International Airlines flight was shot down by an Iranian missile, killing all 176 people on board. It appears that an air defence radar system misidentified the aircraft as a cruise missile. As the radar system was looking for a positive identification of a threat, this can be counted as a false positive.

What might have been the alternative in this case? If the aircraft actually had been a cruise missile, but was identified as a civilian aircraft, this would have been a false negative, and the impact might well have been significant damage to an Iranian military installation.

Which is the most damaging? Well, in the case of the aircraft, it would seem pretty clear to most observers that the false positive would be worse, but from a military point of view, that might not be the case. Maybe the impact of a missile strike on a major military installation might be considered worse than the civilian loss of life in the other case. In this case, as in many others, a decision needs to be made as to which is most important to reduce: the chance of a false negative or the chance of a false positive? In a perfect world, of course, there would be no false results, negative or positive. The problem with many systems that take analogue[4] inputs and turn them into digital outputs in this way is that avoiding false results is very costly, and sometimes impossible. Even worse news is that reducing probability of one of the two types of false result tends to increase the probability of the other.

A classic example of this is in the use of biometrics for user identification. Fingerprints, facial recognition, iris scanning and similar techniques have to balance the likelihood of a false positive with a false negative. Which is worse: the chance that the CEO will not be able to update the payroll details, or that a rogue employee will update her details to improve her salary package?[5]

One good piece of news is that AI/ML (Artificial Intelligence/Machine Learning) is improving the performance of biometric systems and, in fact, other areas of computing where “fuzzy logic” is required. In most cases, humans are still better at reducing messy sets of information to yes/no results, but that is changing, and where multiple automated decisions need to be made, then AI/ML is worth considering.

Whenever you are dealing with “messy” data[6] which needs to be reduced to a “yes/no” or “positive/negative” binary result, you need to consider the likelihood of false positives or negatives. Not only do you need to consider the likelihood of each, but also the impact of each. Once you have understood these, you can then decide which you want to try to minimise, and what techniques you should use to do so.

We may be stuck with false results, but we need to understand what our choices are, and how we can get the best outcomes available from messy data.

1 – in talking security, but I’m sure this goes for lots of other people, too.

2. “gray” for our non-Commonwealth readers.

3. good advice seems to be to test several times over several days.

4. “analog”, I suppose – see [2].

5. this is one of the reasons that authentication systems generally use two factors from the three “something you are”, “something you know”, “something you have”.

6. most real-world data, to be honest.

3 tips to avoid security distracti… Look: squirrel!

Executive fashions change – and not just whether shoulder-pads are “in”.

There are many security issues to worry about as an organisation or business[1]. Let’s list some of them:

insider threats
employee incompetence
hacktivists
unpatched systems
patched systems that you didn’t test properly
zero-day attacks
state actor attacks
code quality
test quality
operations quality
underlogging
overlogging
employee-owned devices
malware
advanced persistent threats
data leakage
official wifi points
unofficial wifi points
approved external access to internal systems via VPN
unapproved external access to internal systems via VPN
unapproved external access to internal systems via backdoors
junior employees not following IT-mandated rules
executives not following IT-mandated rules

I could go on: it’s very, very easy to find lots of things that should concern us. And it’s particularly confusing if you just go around finding lots of unconnected things which are entirely unrelated to each other and aren’t even of the same type[2]. I mean: why list “code quality” in the same list as “executives not following IT-mandated rules”? How are you supposed to address issues which are so diverse?

And here, of course, is the problem: this is what organisations and businesses do have to address. All of these issues may present real risks to the functioning (or at least continued profitability) of the organisations[3]. What are you supposed to do? How are you supposed to keep track of all these things?

The first answer that I want to give is “don’t get distracted”, but that’s actually the final piece of advice, because it doesn’t really work unless you’ve already done some work up front. So what are my actual answers?

1 – Perform risk analysis

You’re never going to be able to give your entire attention to everything, all the time: that’s not how life works. Nor are you likely to have sufficient resources to be happy that everything has been made as secure as you would like[4]. So where do you focus your attention and apply those precious, scarce resources? The answer is that you need to consider what poses the most risk to your organisation. The classic way to do this is to use the following formula:

Risk = Likelihood x Impact

This looks really simple, but sadly it’s not, and there are entire books and companies dedicated to the topic. Impact may be to reputation, to physical infrastructure, system up-time, employee morale, or one of hundreds of other items. The difficulty of assessing the likelihood may range from simple (“the failure rate on this component is once every 20 years, and we have 500 of them”[5]) to extremely difficult (“what’s the likelihood of our CFO clicking on a phishing attack?”[6]). Once it’s complete, however, for all the various parts of the business you can think of – and get other people from different departments in to help, as they’ll think of different risks, I can 100% guarantee – then you have an idea of what needs the most attention. (For now: because you need to repeat this exercise of a regular basis, considering changes to risk, your business and the threats themselves.)

2 – Identify and apply measures

You have a list of risks. What to do? Well, a group of people – and this is important, as one person won’t have a good enough view of everything – needs to sit[7] down and work out what measures to put in place to try to reduce or at least mitigate the various risks. The amount of resources that the organisation should be willing to apply to this will vary from risk to risk, and should generally be proportional to the risk being addressed, but won’t always be of the same kind. This is another reason why having different people involved is important. For example, one risk that you might be able to mitigate by spending a £50,000 (that’s about the same amount of US dollars) on a software solution might be equally well addressed by a physical barrier and a sign for a few hundred pounds. On the other hand, the business may decide that some risks should not be mitigated against directly, but rather insured against. Other may require training regimes and investment in t-shirts.

Once you’ve identified what measures are appropriate, and how much they are going to cost, somebody’s going to need to find money to apply them. Again, it may be that they are not all mitigated: it may just be too expensive. But the person who makes that decision should be someone senior – someone senior enough to take the flak should the risk come home to roost.

Then you apply your measures, and, wherever possible, you automate them and their reporting. If something is triggered, or logged, you then know:

that you need to pay attention, and maybe apply some more measures;
that the measure was at least partially effective;
that you should report to the business how good a job you – and all those involved – have done.

3 – Don’t get distracted

My final point is where I’ve been trying to go with this article all along: don’t get distracted. Distractions come in many flavours, but here are three of the most dangerous.

A measure was triggered, and you start paying all of your attention to that measure, or the system(s) that it’s defending. If you do this, you will miss all of the other attacks that are going on. In fact, here’s your opportunity to look more broadly and work out whether there are risks that you’d not considered, and attacks that are coming in right now, masked by the one you have noticed.
You assume that the most expensive measures are the ones that require the most attention, and ignore the others. Remember: the amount of resources you should be ready to apply to each risk should be proportional to the risk, but the amount actually applied may not be. Check that the barrier you installed still works and that the sign is still legible – and if not, then consider whether you need to spend that £50,000 on software after all. Also remember that just because a risk is small, that doesn’t mean that it’s zero, or that the impact won’t be high if it does happen.
Executive fashions change – and not just whether shoulder-pads are “in”, or the key to the boardroom bathroom is now electronic, but a realisation that executives (like everybody else) are bombarded with information. The latest concern that your C-levels read about in the business section, or hears about from their buddies on the golf course[9] may require consideration, but you need to ensure that it’s considered in exactly the same way as all of the other risks that you addressed in the first step. You need to be firm about this – both with the executive(s), but also yourself, because although I identified this as an executive risk, the same goes for the rest of us. Humans are generally better at keeping their focus on the new, shiny thing in front of them, rather than the familiar and the mundane.

Conclusion

You can’t know everything, and you probably won’t be able to cover everything, either, but having a good understanding of risk – and maintaining your focus in the event of distractions – means that at least you’ll be covering and managing what you can know, and can be ready to address new ones as they arrive.

1 – let’s be honest: there are lots if you’re a private individual, too, but that’s for another day.

2- I did this on purpose, annoying as it may be to some readers. Stick with it.

3 – not to mention the continued employment of those tasked with stopping these issues.

4 – note that I didn’t write “everything has been made secure”: there is no “secure”.

5 – and, to be picky, this isn’t as simple as it looks either: does that likelihood increase or decrease over time, for instance?

6 – did I say “extremely difficult”? I meant to say…

7 – you can try standing, but you’re going to get tired: this is not a short process.

8 – now that, ladles and gentlespoons, is a nicely mixed metaphor, though I did stick with an aerial theme.

9 – this is a gross generalisation, I know: not all executives play golf. Some of them play squash/racketball instead.

Don’t talk security: talk risk

We rush to implement the latest, greatest AI-enhanced, post-quantum container-based blockchain security solution.

We don’t do security because it’s fun. No: let me qualify that. Most of us don’t do security because it’s fun, but none of us get paid to do security because it’s fun[1]. Security isn’t a thing in itself, it’s a means to an end, and that end is to reduce risk. This was a notable change in theme in and around the RSA Conference last week. I’d love to say that it was reflected in the Expo, but although it got some lip service, selling point solutions still seemed to be the approach for most vendors. We’re way overdue some industry consolidation, given the number of vendors advertising solutions which, to me, seemed almost indistinguishable.

In some of the sessions, however, and certainly in many of the conversations that I had in the “hallway track” or the more focused birds-of-a-feather type after show meetings, risk is beginning to feature large. I ended up spending quite a lot of time with CISO folks and similar – CSO (Chief Security Officer) and CPSO (Chief Product Security Officer) were two other of the favoured titles – and risk is top of mind as we see the security landscape develop. The reason this has happened, of course, is that we didn’t win.

What didn’t we win? Well, any of it, really. It’s become clear that the “it’s not if, it’s when” approach to security breaches is correct. Given some of the huge, and long-term, breaches across some huge organisations from British Airways to the Marriott group to Citrix, and the continued experience of the industry after Sony and Equifax, nobody is confident that they can plug all of the breaches, and everybody is aware that it just takes one breach, in a part of the attack surface that you weren’t even thinking about, for you to be exposed, and to be exposed big time.

There are a variety of ways to try to manage this problem, all of which I heard expressed at the conference. They include:

cultural approaches (making security everybody’s responsibility/problem, training more staff in different ways, more or less often);
process approaches (“shifting left” so that security is visible earlier in your projects);
technical approaches (too many to list, let alone understand or implement fully, and ranging from hardware to firmware to software, using Machine Learning, not using Machine Learning, relying on hardware, not relying on hardware, and pretty much everything in between);
design approaches (using serverless, selecting security-friendly languages, using smart contracts, not using smart contracts);
cryptographic approaches (trusting existing, tested, peer-reviewed primitives, combining established but underused techniques such as threshold signatures, embracing quantum-resistant algorithms, ensuring that you use “quantum-generated” entropy);
architectural approaches (placing all of your sensitive data in the cloud, placing none of your sensitive data in the cloud).

In the end, none of these is going to work. Not singly, not in concert. We must use as many of them as make sense in our environment, and ensure that we’re espousing a “defence in depth” philosophy such that no vulnerability will lay our entire estate or stack open if it is compromised. But it’s not going to be enough.

Businesses and organisations exist to run, not to be weighed down by the encumbrance of security measure after security measure. Hence the “as make sense in our environment” above, because there will always come a point where the balance of security measures outweighs the ability of the business to function effectively.

And that’s fine, actually. Security people have always managed risk. We may have forgotten this, as we rush to implement the latest, greatest AI-enhanced, post-quantum container-based blockchain security solution[2], but we’re always making a balance. Too often that balance is “if we lose data, I’ll get fired”, though, rather than a different conversation entirely.

The people who pay our salaries are not our customers, despite what your manager and SVP of Sales may tell you. They are the members of the Board. Whether the relevant person on the Board is the CFO, the CISO, the CSO, the CTO or the CRO[3], they need to be able to talk to their colleagues about risk, because that’s the language that the rest of them will understand. In fact, it’s what they talk about every day. Whether it’s fraud risk, currency exchange risk, economic risk, terrorist risk, hostile take-over risk, reputational risk, competitive risk or one of the dozens of other types, risk is what they want to hear about. And not security. Security should be a way to measure, monitor and mitigate risk. They know by now – and if they don’t, it’s the C[F|IS|S|T|R]O’s job to explain to them – that there’s always a likelihood that the security of your core product/network/sales system/whatever won’t be sufficient. What they need to know is what risks that exposes. Is it risk that:

the organisation’s intellectual property will be stolen;
customers’ private information will be exposed to the Internet;
merger and acquisition information will go to competitors;
payroll information will be leaked to the press – and employees;
sales won’t be able to take any orders for a week;
employees won’t be paid for a month;
or something completely different?

The answer (or, more likely, answers) will depend on the organisation and sector, but the risks will be there. And the Board will be happy to hear about them. Well, maybe that’s an overstatement, but they’ll be happier hearing about them in advance than after an attack has happened. Because if they hear about them in advance, they can plan mitigations, whether that’s insurance, changes in systems, increased security or something else.

So we, as a security profession, need to get better a presenting the risk, and also at presenting options to the Board, so that they can make informed decisions. We don’t always have all the information, and neither will anybody else, but the more understanding there is of what we do, and why we do it, the more we will be valued. And there’s little risk in that.

1 – if I’m wrong about this, and you do get paid to do security because it’s fun, please contact me privately. I interested, but don’t think we should share the secret too widely.

2 – if this buzzphrase-compliant clickbait doesn’t get me page views, I don’t know what will.

3 – Chief [Financial|Information Security|Security|Technology|Risk] Officer.

Why security policies are worthless

A policy, to have any utility at all, needs to exist in a larger context.

“We need a policy on that.” This is a phrase that seems to act as a universal panacea to too many managers. When a problem is identified, and the blame game has been exhausted, the way to sort things out, they believe, is to create a policy. If you’ve got a policy, then everything will be fine, because everything will be clear, and everyone will obey the policy, and nothing can go wrong.

Right[1].

The problem is that policy, on its own, is worthless.

A policy, to have any utility at all, needs to exist in a larger context, or, to think of it in a different way, to sit in a chain of artefacts. It is its place in this chain that actually gives it meaning. Let me explain.

When that manager said that they wanted a policy, what did that actually mean? That rather depends on how wise the manager is[2]. Hopefully, the reason that the manager identified the need for a policy was because:

they noticed that something had gone wrong that shouldn’t have done and;
they wanted to have a way to make sure it didn’t happen again.

The problem with policy on its own is that it doesn’t actually help with either of those points. What use does it have, then?

Governance

Let’s look at those pieces separately. When we say that “something had gone wrong that shouldn’t have done“, what we’re saying is that there is some sort of model for what the world should look like, giving us often general advice on our preferred state. Sometimes this is a legal framework, sometimes it’s a regulatory framework, and sometimes it’s a looser governance model. The sort of thing I’d expect to see at this level would be statements like:

patient data must be secured against theft;
details of unannounced mergers must not be made available to employees who are not directors of the company;
only authorised persons may access the military base[4].

These are high level requirements, and are statements of intent. They don’t tell you what to do in order to make these things happen (or not happen), they just tell you that you have to do them (or not do them). I’m going to call collections of these types of requirements “governance models”.

Processes

At the other end of the spectrum, you’ve got the actual processes (in the broader sense of the term) required to make the general intent happen. For the examples above, these might include:

AES-256 encryption using OpenSSL version 1.1.1 FIPS[5], with key patient-sym-current for all data on database patients-20162019;
documents Indigo-1, Indigo-3, Indigo-4 to be collected after meeting and locked in cabinet Socrates by CEO or CFO;
guards on duty at post Alpha must report any expired passes to the base commander and refuse entry to all those producing them.

These are concrete processes that must be followed, which will hopefully allow the statements of intent that we noted above to be carried out.

Policies and audit

The problem with both the governance statements and the processes identified is that they’re both insufficient. The governance statements don’t tell you how to do what needs to be done, and the processes are context-less, which means that you can’t tell what they relate to, or how they’re helping. It’s also difficult to know what to do if they fail: what happens, for example, if the base commander turns up with an expired pass?

Policies are what sit in the middle. They provide guidance as to how to implement the governance model, and provide context for the processes. They should give you enough detail to cover all eventualities, even if that’s to say “consult the Legal Department when unsure[6]”. What’s more, they should be auditable. If you can’t audit your security measures, then how can you be sure that your governance model is being followed? But governance models, as we’ve already discovered, are at the level of intent – not the sort of thing that can be audited.

Policies, then, should provide enough detail that they can be auditable, but they should also preferably be separated enough from implementation that rules can be written that are applicable in different circumstances. Here are a few policies that we might create to follow the first governance model examples above:

patient data must be secured against theft;
- Policy 1: all data at rest must be encrypted by a symmetric key of at least 256 bits or equivalent;
- Policy 2: all storage keys must be rotated at least once a month;
- Policy 3: in the event of a suspected key compromise, all keys at the same level in that department must be rotated.

You can argue that these policies are too detailed, or you can argue that they’re not detailed enough: both arguments are fair, and level of detail (or granularity) should depend on the context and the use to which they are being put. However, though I’m sure that all of the example policies I’ve given could be improved, I hope that they are all:

auditable;
able to be implemented by one or more well-defined processes;
understandable by both those who concerned with the governance level and those involved at the process implementation and operations level.

The value of auditing

I’ve written about auditing before (Confessions of an auditor). I think it’s really important. Done well, it should allow you to discover whether:

your processes are covering all of the eventualities of your policies;
whether your policies are actually being implemented correctly.

Auditing may also address whether your policies fully meet your governance model. Auditing well is a skill, but in order to help your auditor – whether they are good at it or bad at it – having a clearly defined set of policies is a must. But, as I pointed out at the beginning of this article, policies for policies’ sake are worthless: put them together with governance and processes, however, and they provide technical and business value.

1 – this is sarcasm.

2 – yes, I know. But let’s not be rude if we can avoid it. We want to help managers, because the more clue we deliver to them, the easier our lives will be[3].

3 – and you never know: one day, even you might be a manager.

4 – which is probably not a US military base, given the spelling of “authorised”.

5 – example only…

6 – this, in my experience, is the correct answer to many questions.

7 steps to security policy greatness

… we’ve got a good chance of doing the right thing, and keeping the auditors happy.

Security policies are something that everybody knows they should have, but which are deceptively simple to get wrong. A set of simple steps to check when you’re implementing them is something that I think it’s important to share. You know when you come up a set of steps, you suddenly realise that you’ve got a couple of vowels in there and you think, “wow, I can make an acronym!”?[1] This was one of those times. The problem was that when I looked at the different steps, I decided that DDEAVMM doesn’t actually have much of a ring to it. I’ve clearly still got work to do before I can name major public sector projects, for instance[2]. However, I still think it’s worth sharing, so let’s go through them in order. Order, as for many sets of steps, is important here.

I’m going to give an example and walk through the steps for clarity. Let’s say that our CISO, in his or her infinite wisdom, has decided that they don’t want anybody outside our network to be able to access our corporate website via port 80. This is the policy that we need to implement.

1. Define

The first thing I need is a useful definition. We nearly have that from the request[3] noted above, when our CISO said “I don’t want anybody outside our network to be able to access our corporate website via port 80”. So, let’s make that into a slightly more useful definition.

“Access to IP address mycorporate.network.com on port 80 must be blocked to all hosts outside the 10.0.x.x to 10.3.x.x network range.”

I’m assuming that we already know that our main internal network is within 10.0.x.x to 10.3.x.x. It’s not exactly a machine readable definition, but actually, that’s the point: we’re looking for a definition which is clear and human understandable. Let’s assume that we’re happy with this, at least for now.

2. Design

Next, we need to design a way to implement this. There are lots of ways of doing this – from iptables[4] to a full-blown, commercially supported firewall[6] – and I’m not fluent in any of them these days, so I’m going to assume that somebody who is better equipped than I has created a rule or set of rules to implement the policy defined in step 1.

We also need to define some tests – we’ll come back to these in step 5.

3. Evaluate

But we need to check. What if they’ve mis-implemented it? What if they misunderstood the design requirement? It’s good practice – in fact, it’s vital – to do some evaluation of the design to ensure it’s correct. For this rather simple example, it should be pretty to check by eye, but we might want to set up a test environment to evaluate that it meets the policy definition or take other steps to evaluate its correctness. And don’t forget: we’re not checking that the design does what the person/team writing it thinks it should do: we’re checking that it meets the definition. It’s quite possible that at this point we’ll realise that the definition was incorrect. Maybe there’s another subnet – 10.5.x.x, for instance – that the security policy designer didn’t know about. Or maybe the initial request wasn’t sufficiently robust, and our CISO actually wanted to block all access on any port other than 443 (for HTTPS), which should be allowed. Now is a very good time to find that out.

4. Apply

We’ve ascertained that the design does what it should do – although we may have iterated a couple of times on exactly “what it should do” means – so now we can implement it. Whether that’s ssh-ing into a system, uploading a script, using some sort of GUI or physically installing a new box, it’s done.

Excellent: we’re finished, right?

No, we’re not. The problem is that often, that’s exactly what people think. Let’s move to our next step: arguably the most important, and the most often forgotten or ignored.

5. Validate

I really care about this one: it’s arguably the point of this post. Once you’ve implemented a security policy, you need to validate that it actually does what it’s supposed to do. I’ve written before about this, in my post If it isn’t tested, it doesn’t work, and it’s absolutely true of security policy implementations. You need to check all of the different parts of the design. This, you might hope, would be really easy, but even in the simple case that we’re working with, there are lots of tests you should be doing. Let’s say that we took the two changes mentioned in step 3. Here are some tests that I would want to be doing, with the expected result:

FAIL: access port 80 from an external IP address
FAIL: access port 8080 from an external IP address
FAIL: access port 80 from a 10.4.x.x address
PASS: access port 443 from an external IP address
UNDEFINED: access port 80 from a 10.5.x.x
UNDEFINED: access port 80 from a 10.0.x.x address
UNDEFINED: access port 443 from a 10.0.x.x address
UNDEFINED: access port 80 from an external IP address but with a VPN connection into 10.0.x.x

Of course, we’d want to be performing these tests on a number of other ports, and from a variety of IP addresses, too.

What’s really interesting about the list is the number of expected results that are “UNDEFINED”. Unless we have a specific requirement, we just can’t be sure what’s expected. We can guess, but we can’t be sure. Maybe we don’t care? I particularly like the last one, because the result we get may lead us much deeper into our IT deployment than we might expect.

The point, however, is that we need to check that the actual results meet our expectations, and maybe even define some new requirements if we want to remove some of the “UNDEFINED”. We may be fine to leave some the expected results as “UNDEFINED”, particularly if they’re out of scope for our work or our role. Obviously, if the results don’t meet our expectations, then we also need to make some changes and apply them and then re-validate. We also need to record the final results.

When we’ve got more complex security policy – multiple authentication checks, or complex SDN[7] routing rules – then our tests are likely to be much, much more complex.

6. Monitor

We’re still not done. Remember those results we got in our previous tests? Well, we need to monitor our system and see if there’s any change. We should do this on a fairly frequent basis. I’m not going to say “regular”, because regular practices can lead to sloppiness, and also leave windows of opportunities open to attackers. We also need to perform checks whenever we make a change to a connected system. Oh, if I had a dollar for every time I’ve heard “oh, this won’t affect system X at all.”…[8]

One interesting point is that we should also note when results whose expected value remains “UNDEFINED” change. This may be a sign that something in our system has changed, it may be a sign that a legitimate change has been made in a connected system, or it may be a sign of some sort of attack. It may not be quite as important as a change in one of our expected “PASS” or “FAIL” results, but it certainly merits further investigation.

7. Mitigate

Things will go wrong. Some of them will be our fault, some of them will be our colleagues’ fault[10], some of them will be accidental, or due to hardware failure, and some will be due to attacks. In all of these cases, we need to act to mitigate the failure. We are in charge of this policy, so even if the failure is out of our control, we want to make sure that mitigating mechanism are within our control. And once we’ve completed the mitigation, we’re going to have to go back at least to step 2 and redesign our implementation. We might even need to go back to step 1 and redefine what our definition should be.

Final steps

There are many other points to consider, and one of the most important is the question of responsibility, touched on in step 7 (and which is particularly important during holiday seasons), special circumstances and decommissioning, but if we can keep these steps in mind when we’re implementing – and running – security policies, we’ve got a good chance of doing the right thing, and keeping the auditors happy, which is always worthwhile.

1 – I’m really sure it’s not just me. Ask your colleagues.

2 – although, back when Father Ted was on TV, a colleague of mine and I came up with a database result which we named a “Fully Evaluated Query”. It made us happy, at least.

3 – if it’s from the CISO, and he/she is my boss, then it’s not a request, it’s an order, but you get the point.

4 – which would be my first port[5] of call, but might not be the appropriate approach in this context.

5 – sorry, that was unintentional.

6 – just because it’s commercially support doesn’t mean it has to be proprietary: open source is your friend, boys and girls, ladies and gentlemen.

7 – Software-Defined Networking.

8 – I’m going to leave you hanging, other than to say that, at current exchange rates, and assuming it was US dollars I was collecting, then I’d have almost exactly 3/4 of that amount in British pounds[9].

9 – other currencies are available, but please note that I’m not currently accepting bitcoin.

10 – one of the best types.