A new state of mind

I’m quite proud; though maybe slightly ashamed that I didn’t do it before.

Last year, I co-founded Profian with Nathaniel McCallum, a colleague from Red Hat. It’s a security start-up in the Confidential Computing Space, based on the open source Enarx project. There’s an update on that on the Profian blog with an article entitled Design to Roadmap to Product.

It’s an article on what we’ve been up to in the company, and a records the realisation that it’s time for me to step into yet another role as one of the founders: moving beyond the “let’s make sure that we have a team and that the basic day-to-day running of the company is working” to “OK, let’s really map out our product roadmap and how we present them to customers.”

A new state of mind

Which leads me to the main point of this short article. This is not an easy transition – it’s yet another new thing to learn, discover which bits I’m good at, improve the bits I’m not, get internal or external help to scale with, etc. – but it’s a vital part of being the CEO of a start-up.

It’s also something which I had, to be honest, been resisting. Most of us prefer to stick to stuff which we know – whether we’re good at it or not, sometimes! – rather than “embracing change”. Sometimes that’s OK, but in the position I’m in at the moment, it’s not. I have responsibility to the company and everyone involved in it to ensure that we can be successful. And that means doing something. So I’ve been listening to people say, “these are the things you need to do”, “here are the ways we can help you”, “this is what you should be looking for” and, while listening, just, well, putting it off, I suppose. Towards the end of last week, I ordered a book (The Founder Handbook) to try to get my head round it a bit more. There are loads of this type of book, but I did a little research, and this looked like it might be one of the better ones.

So, it arrived, and I started reading it. And, darn it, it made sense. It made me start seeing the world in a new way – a way which might not have been relevant to me (or the company) a few months ago, but really is, now. And I really need to embrace lots of the things the authors are discussing. I’m not saying that it’s a perfect book, or that no other book would have prompted this response, but at some point over the weekend, I thought: “right, it’s time to change and to move into this persona, thinking about these issues, being proactive and not putting it off anymore”.

I’m quite proud, to be honest; though maybe slightly ashamed that I didn’t do it before. I cemented the decision to jump into a new mindset by doing what I’ve done on a couple of occasions before (including when I decided to commit to writing my book): I told a few people what I was planning to do. This really works for me on several levels:

  1. I’ve made a public commitment (even if it’s to a few people[1]), so it’s difficult to roll it back;
  2. I’ve made a commitment to myself, so I can’t pretend that I haven’t and let myself drift back into the old mindset;
  3. it sets expectations from other people as to what I’m going to do;
  4. people are predisposed to being helpful when you struggle, or ask for help.

These are all big positives, and while telling people you’ve made a big decision may not work for everyone, it certainly helps for me. This is going to be only one of many changes I need to make if we’re to build a successful company out of Profian and Enarx, but acknowledging that it needed to be made – and that I was the one who was going to have to effect that change – is important to me, the company, our investors and our employees. Now all I need to do is make a success of it! Wish me luck (and keep an eye out for more…).


1 – a few more people now, I suppose, now that I’ve published this article!

Open source and cyberwar

If cyberattacks happen to the open source community, the impact may be greater than you expect.

There are some things that it’s more comfortable not thinking about, and one of them is war. For many of us, direct, physical violence is a long way from us, and that’s something for which we can be very thankful. As the threat of physical violence recedes, however, it’s clear that the spectre of cyberattacks as part of a response to aggression – physical or virtual – is becoming more and more likely.

It’s well attested that many countries have “cyber-response capabilities”, and those will include aggressive as well as protective measures. And some nation states have made it clear not only that they consider cyberwarfare part of any conflict, but that they would be entirely comfortable with initiating cyberwarfare with attacks.

What, you should probably be asking, has that to do with us? And by “us”, I mean the open source software community. I think that the answer, I’m afraid, is “a great deal”. I should make it clear that I’m not speaking from a place of privileged knowledge here, but rather from thoughtful and fairly informed opinion. But it occurs to me that the “old style” of cyberattacks, against standard “critical infrastructure” like military installations, power plants and the telephone service, was clearly obsolete when the Two Towers collapsed (if not in 1992, when the film Sneakers hypothesised attacks against targets like civil aviation). Which means that any type of infrastructure or economic system is a target, and I think that open source is up there. Let me explore two ways in which open source may be a target.

Active targets

If we had been able to pretend that open source wasn’t a core part of the infrastructure of nations all over the globe, that self-delusion was finally wiped away by the log4j vulnerabilities and attacks. Open source is everywhere now, and whether or not your applications are running any open source, the chances are that you deploy applications to public clouds running open source, at least some of your employees use an open source operating system on their phones, and that the servers running your chat channels, email providers, Internet providers and beyond make use – extensive use – of open source software: think apache, think bind, think kubernetes. At one level, this is great, because it means that it’s possible for bugs to be found and fixed before they can be turned into vulnerabilities, but that’s only true if enough attention is being paid to the code in the first place. We know that attackers will have been stockpiling exploits, and many of them will be against proprietary software, but given the amount of open source deployed out there, they’d be foolish not to be collecting exploits against that as well.

Passive targets

I hate to say it, but there also are what I’d call “passive targets”, those which aren’t necessarily first tier targets, but whose operation is important to the safe, continued working of our societies and economies, and which are intimately related to open source and open source communities. Two of the more obvious ones are GitHub and GitLab, which hold huge amounts of our core commonwealth, but long-term attacks on foundations such as the Apache Foundation and the Linux Foundation, let alone kernel.org, could also have impact on how we, as a community, work. Things are maybe slightly better in terms of infrastructure like chat services (as there’s a choice of more than one, and it’s easier to host your own instance), but there aren’t that many public servers, and a major attack on either them or the underlying cloud services on which many of them rely could be crippling.

Of course, the impact on your community, business or organisation will depend on your usage of difference pieces of infrastructure, how reliant you are on them for your day-to-day operation, and what mitigations you have available to you. Let’s quickly touch on that.

What can I do?

The Internet was famously designed to route around issues – attacks, in fact – and that helps. But, particularly where there’s a pretty homogeneous software stack, attacks on infrastructure could still have very major impact. Start thinking now:

  • how would I support my customers if my main chat server went down?
  • could I continue to develop if my main git provider became unavailable?
  • would we be able to offer at least reduced services if a cloud provider lost connectivity for more than an hour or two?

By doing an analysis of what your business dependencies are, you have the opportunity to plan for at least some of the contingencies (although, as I note in my book, Trust in Computer Systems and the Cloud, the chances of your being able to analyse the entire stack, or discover all of the dependencies, is lower than you might think).

What else can you do? Patch and upgrade – make sure that whatever you’re running is the highest (supported!) version. Make back-ups of anything which is business critical. This should include not just your code but issues and bug-tracking, documentation and sales information. Finally, consider having backup services available for time-critical services like a customer support chat line.

Cyberattacks may not happen to your business or organisation directly, but if they happen to the open source community, the impact may be greater than you expect. Analyse. Plan. Mitigate.

10(+1) plans for 2022

I’m not a big fan of New Year’s resolutions, as I don’t like to set myself up to fail.

This week’s song: Bleed to Love Her by Fleetwood Mac.

I’m not a big fan of New Year’s resolutions, as I don’t like to set myself up to fail. Instead, here are a few things – professional and personal – that I hope or expect to be doing this year. Call them resolutions if you want, but words have power, and I’m avoiding the opportunity

  1. Spend lots of time shepherding Enarx to greater maturity. At Profian, we see our future as closely ties to that of Enarx, and we’ll be growing the project’s capabilities and functionality significantly over this year. Keep an eye out for announcements!
  2. Get fit(ter) again. Yeah, that.
  3. Promote my book. I’m really proud of my book Trust in Computer Systems and the Cloud, which was published right at the end of the year. It aims to raise the standard of knowledge within the industry by proposing a framework for discussion, and I want to make that happen.
  4. Start travelling again. I miss conferences, I miss seeing colleagues, I miss meeting new people. Hopefully it’s going to be easier and safer to travel this year.
  5. Delegate better (and more). As the CEO of a startup, there’s lots I need to make happen. I’m not always the best person actually to be doing it all, and learning to help other people take some (more!) of it over is actually really important not just dot me, but for the business.
  6. Drink lots of tea. No real change here.
  7. Drjnk good whisky. In moderation.
  8. Keep gaming. Possibly a weird one, but gaming is an important downtime activity for me, and helps me relax.
  9. Make the most of music. I listen to lots of music whilst working, travelling, driving, relaxing, etc.. Watch out for a link to the playlist associated with my book – I also plan to list a song or track a week on my blog (see the top of this article for this week’s offering!).
  10. Enjoy reading. One of the benefits of having completed the book is that I now have more time to read; more specifically, more time when I don’t feel guilty that I’m reading rather than doing book-work.
  11. A bonus one: spend more time over at Opensource.com. I’m a Correspondent over there, and enjoy both writing for them and reading other people’s contributions. A great way to get into – or keep up-to-date with – the open source community.

So – not the most inspiring list, but if I can manage most of these this year, I’ll be happy.

Organisational suppleness

Growing the ability to react to the unexpected is a valuable skill.

“In preparing for battle I have always found that plans are useless but planning is indispensable.”

Dwight D. Eisenhower

Much of this blog is about security – cybersecurity – in one way or another, but on occasion I do try to take a broader view. Cybersecurity is often modelled or described in military terms, talking about “fighting battles”, “wars of attrition” and “arms races” with “attackers”. These can be useful metaphors (and it’s why I started this article with a quote from a general), but there is a broader set of responsibilities that many of us in the sector need to consider, which is the continued (and hopefully healthy) functioning of our businesses and organisations. In particular, I like to talk about risk and how it relates not just to security, but to how businesses work and plan. One theme that I’ve visited before is that known or planned degradation of a service is often significantly better than failure, or even planned closure (see Service degradation: actually a good thing). My argument is that there are many occasions where keeping a service or business function running, albeit at reduced capacity, or with reductions in known capabilities, allows for better continuity than just stopping it.

Keeping a service running requires work. You can’t just hope that everything is installed and will run as you expect: what happens when your administrator is ill, your fibre-optic cable gets severed by a back hoe, or a DDoS attack is directed at you? You need to plan and practice what to do in these situations. What I’d like to explore in this article goes somewhat beyond the expectation of that planning in three directions. Let’s call them scenario coverage, muscle memory and organisational suppleness.

Scenario coverage

The first, and most obvious of the three directions, is about understanding eventualities. The more scenarios that we model and practice, the more we reduce our risk, simply because we have reduced the number of unknown eventualities in the probability space. There is a actually a side benefit to modelling lost of scenarios, which is that the more situations you consider, the more will come to mind. Every situation involves sets of choices or probabilities – “after the door closes, will it lock or not?” or “if the coolant fails, will the system turn off or burst into flames?” – and the more scenarios you consider, the more questions will arise. This can be daunting – and it’s almost impossible to consider every eventuality – but the more options are covered, the better your opportunities to mitigate the various risks they present.

Muscle memory

Muscle memory is what comes with training and practice. Assuming that you are including your teams in the scenario planning

And I’m assuming here that the planning isn’t solely a paper exercise. Theoretical planning, while useful, only goes so far, for a couple of important reasons:

  • systems will always fails in unexpected ways
  • people will do unexpected things.

What the first of these means is that however much you assume that your back-up generator will kick in if there’s a power outage, until you test it, you can’t be sure that it will. The second of these relates to the fact that however much you tell people what to do, when it actually comes to the doing of it, they’re unlikely to as you expect. This is likely to be even worse if there’s been no training, and you’re just assuming that person X will know how to operate a fire extinguisher, or that team Y will, of course, exit the building in an orderly manner via exit Z (rather than find fourteen different exits, or not even leave the building at all).

For both of these reasons, getting people together to work through possible scenarios, and then, where possible, actually practising what to do, means that you have a higher assurance that when one of the situations you’ve considered does arrive, that they will know what to do, and act as you expect.

Organisational suppleness

While you cannot, as we’ve noted, plan for every eventuality or know exactly how an employee or team will react when things go wrong, there is another benefit to involving a broad group of people in your scenario planning and training. This is that their very involvement gives them practice in dealing with uncertainty, working out how they will react, and giving them experience in how those around them will act. While I may not know exactly what to do if the payroll system goes down an hour before it is due to run, if I have worked with colleagues on scenarios where the sales processing system fails, I’ve got a better chance of making some sensible choices about who to contact, initial steps to take and information to collect than if this is the first time I’ve ever seen anything like it. Likewise, we may not have modelled our response to a physical failure of one of our network links, but our shared experience of practising our response to a DDoS attack means that we have an idea of what to do.

And it is not just having an idea of what to do that is important, but also having gathered and practised the cognitive skills associated with investigating failures, collating data, sharing information and working with others to ameliorate the situation which allows a team or an organisation to respond better to new, maybe unexpected situations. We can think of this as suppleness, as it means that rather than just failing, or cracking, an organisation can react as a tree does to strong winds, or a gymnast does to a new exercise. Growing the ability to react to the unexpected is a valuable skill for an organisation, and knowing that it is supple allows its leaders to plan with more certainty and mitigate more risk.

Trust book – chapter index and summary

I thought it might be interesting to provide the chapter index and a brief summary of each chapter addresses.

In a previous article, I presented the publisher’s blurb for my upcoming book with Wiley, Trust in Computer Systems and the Cloud. I thought it might be interesting, this time around, to provide the chapter index of the book and to give a brief summary of what each chapter addresses.

While it’s possible to read many of the chapters on their own, I haved tried to maintain a logical progression of thought through the book, building on earlier concepts to provide a framework that can be used in the real world. It’s worth noting that the book is not about how humans trust – or don’t trust – computers (there’s a wealth of literature around this topic), but about how to consider the issue of trust between computing systems, or what we can say about assurances that computing systems can make, or can be made about them. This may sound complex, and it is – which is pretty much why I decided to write the book in the first place!

  • Introduction
    • Why I think this is important, and how I came to the subject.
  • Chapter 1 – Why Trust?
    • Trust as a concept, and why it’s important to security, organisations and risk management.
  • Chapter 2 – Humans and Trust
    • Though the book is really about computing and trust, and not humans and trust, we need a grounding in how trust is considered, defined and talked about within the human realm if we are to look at it in our context.
  • Chapter 3 – Trust Operations and Alternatives
    • What are the main things you might want to do around trust, how can we think about them, and what tools/operations are available to us?
  • Chapter 4 – Defining Trust in Computing
    • In this chapter, we delve into the factors which are specific to trust in computing, comparing and contrasting them with the concepts in chapter 2 and looking at what we can and can’t take from the human world of trust.
  • Chapter 5 – The Importance of Systems
    • Regular readers of this blog will be unsurprised that I’m interested in systems. This chapter examines why systems are important in computing and why we need to understand them before we can talk in detail about trust.
  • Chapter 6 – Blockchain and Trust
    • This was initially not a separate chapter, but is an important – and often misunderstood or misrepresented – topic. Blockchains don’t exist or operate in a logical or computational vacuum, and this chapter looks at how trust is important to understanding how blockchains work (or don’t) in the real world.
  • Chapter 7 – The Importance of Time
    • One of the important concepts introduced earlier in the book is the consideration of different contexts for trust, and none is more important to understand than time.
  • Chapter 8 – Systems and Trust
    • Having introduced the importance of systems in chapter 5, we move to considering what it means to have establish a trust relationship from or to a system, and how the extent of what is considered part of the system is vital.
  • Chapter 9 – Open Source and Trust
    • Another topc whose inclusion is unlikely to surprise regular readers of this blog, this chapter looks at various aspects of open source and how it relates to trust.
  • Chapter 10 – Trust, the Cloud, and the Edge
    • Definitely a core chapter in the book, this addresses the complexities of trust in the modern computing environments of the public (and private) cloud and Edge networks.
  • Chapter 11 – Hardware, Trust, and Confidential Computing
    • Confidential Computing is a growing and important area within computing, but to understand its strengths and weaknesses, there needs to be a solid theoretical underpinning of how to talk about trust. This chapter also covers areas such as TPMs and HSMs.
  • Chapter 12 – Trust Domains
    • Trust domains are a concept that allow us to apply the lessons and frameworks we have discussed through the book to real-world situations at large scale. They also allow for modelling at the business level and for issues like risk management – introduced at the beginning of the book – to be considered more explicitly.
  • Chapter 13 – A World of Explicit Trust
    • Final musings on what a trust-centric (or at least trust-inclusive) view of the world enables and hopes for future work in the field.
  • References
    • List of works cited within the book.

In praise of … the Community Manager

I am not – and could never be – a community manager

This is my first post in a while. Since Hanging up my Red Hat I’ve been busy doing … stuff. Stuff which I hope to be able to speak about soon. But in the meantime, I wanted to start blogging regularly again. Here’s my first post back, a celebration of an important role associated with open source projects: the community manager.

Open source communities don’t just happen. They require work. Sometimes the technical interest in an open source project is enough to attract a group of people to get involved, but after some time, things are going to get too big for those with a particular bent (documentation, coding, testing) to manage the interactions between the various participants, moderate awkward (or downright aggressive) communications, help encourage new members to contribute, raise the visibility of the project into new areas or market sectors and all the other pieces that go into keeping a project healthy.

Enter the Community Manager. The typical community manager is in that awkward position of having lots of responsibility, but no direct authority. Open source projects being what they are, few of them have empowered “officers”, and even when there are governance structures, they tend to operate by consent of those involved – by negotiated, rather than direct, authority. That said, by the point a community manager is appointed for a community manager, it’s likely that at least one commercial entity is sufficiently deep into the project to fund or part-fund the community manager position. This means that the community manager will hopefully have some support from at least one set of contributors, but will still need to build consensus across the rest of the community. There may also be tricky times, also, when the community manager will need to decide whether their loyalties lie with their employer or with the community. A wise employer should set expectations about how to deal with such situations before they arise!

What does the community manager need to do, then? The answer to this will depend on a number of issues, and there is likely to be a balance between these tasks, but here’s a list of some that come to mind[1].

  • marketing/outreach – this is about raising visibility of the project, either in areas where it is already known, or new markets/sectors, but there are lots of sub-tasks such as a branding, swag ordering (and distribution!), analyst and press relations.
  • event management – setting up meetups, hackathons, booths at larger events or, for really big projects, organising conferences.
  • community growth – spotting areas where the project could use more help (docs, testing, outreach, coding, diverse and inclusive representation, etc.) and finding ways to recruit contributors to help improve the project.
  • community lubrication – this is about finding ways to keep community members talking to each other, celebrate successes, mourn losses and generally keep conversations civil at least and enthusiastically friendly at best.
  • project strategy – there are times in a project when new pastures may beckon (a new piece of functionality might make the project exciting to the healthcare or the academic astronomy community for instance), and the community manager needs to recognise such opportunities, present them to the community, and help the community steer a path.
  • product management – in conjunction with project strategy, situations are likely to occur when a set of features or functionality are presented to the community which require decisions about their priority or the ability of the community to resource them. These may even create tensions between various parts of the community, including involved commercial interests. The community manager needs to help the community reason about how to make choices, and may even be called upon to lead the decision-making process.
  • partner management – as a project grows, partners (open source projects, academic institutions, charities, industry consortia, government departments or commercial organisations) may wish to be associated with the project. Managing expectations, understanding the benefits (or dangers) and relative value can be a complex and time-consuming task, and the community manager is likely to be the first person involved.
  • documentation management – while documentation is only one part of a project, it can often be overlooked by the core code contributors. It is, however, a vital resource when considering many of the tasks associated with the points above. Managing strategy, working with partners, creating press releases: all of these need good documentation, and while it’s unlikely that the community manager will need to write it (well, hopefully not all of it!), making sure that it’s there is likely to be their responsibility.
  • developer enablement – this is providing resources (including, but not restricted to, documentation) to help developers (particularly those new to the project) to get involved in the project. It is often considered a good idea to separate this set of tasks out, rather than expecting a separate role to that of a community manager, partly because it may require a deeper technical focus than is required for many of the other responsibilities associated with the role. This is probably sensible, but the community manager is likely to want to ensure that developer enablement is well-managed, as without new developers, almost any project will eventually calcify and die.
  • cat herding – programmers (who make up the core of any project) are notoriously difficult to manage. Working with them – particularly encouraging them to work to a specific set of goals – has been likened to herding cats. If you can’t herd cats, you’re likely to struggle as a community manager!

Nobody (well almost nobody) is going to be an expert in all of these sets of tasks, and many projects won’t need all of them at the same time. Two of the attributes of a well-established community manager are an awareness of the gaps in their expertise and a network of contacts who they can call on for advice or services to fill out those gaps.

I am not – and could never be – a community manager. I don’t have the skills (or the patience), and one of the joys of gaining experience and expertise in the world is realising when others do have skills that you lack, and being able to recognise and celebrate what they can bring to your world that you can’t. So thank you, community managers!


1 – as always, I welcome comments and suggestions for how to improve or extend this list.

The importance of process (and people and rules)

If there is no process, you can throw technology at it as much as you want, but you are still likely to fail.

Those of us in Europe awoke to the news that the US electoral college have voted for Joe Biden as 46th President of the United States of America. Getting to this point has seemed (at least from the outside) to be a rather tortuous route, but from my understanding of how the US Constitution works[1], this is it: the process is complete and Joe Biden will be sworn in a President of the United States on a (probably very chilly) day next month, at the beginning of 2021. I have no intention of weighing the pros and cons of the candidates, nor even of examining the process (sometime labelled “arcane” by journalists”) by which US presidents are elected, but I do want to spend some time on the fact that there is a process, and thinking about how that works, and what supports it.

This is, first and foremost, a blog about IT security (though I have been known to post on a much wider range of issues from time to time), and so I unsurprisingly spend quite a lot of time discussing technology, but on this occasion I want to avoid doing that, as far as possible. If we look at the process for electing a US president, one of the most striking things about it, we might note, is the lack of technology. Yes, there are electronic voting machines to allow votes to be cast, yes, a myriad computers are deployed by psephologists[2] to forecast the results, but the actual process is lacking in much that we would normally think of as technology.

We often fixate on technology, but if there is no process in place to get from point A to point B, then you can throw technology at it as much as you want, but you are still likely to fail. Those points may be getting from having no president-elect to having a new president, completing a transaction to buy a house or a paperclip, hiring a new CEO or sous-chef, moving from a set of requirements to a working software program, or literally getting from a point A on a map to point B – they all require a process.

What is a process? Google, courtesy of Oxford Languages, offers the definition: a series of actions or steps taken in order to achieve a particular end. This seems like a useful description, but in the contexts we’re describing, it is the fact that the actions or steps are defined which is important. In the world of computing, we might say that there is an algorithm to be followed to complete the process. This algorithm allows a variety of things, all of which are important:

  1. the writing down and codification of the process;
  2. the allocation of different people to different roles in the process;
  3. norms, rules, regulation and/or legislation to be created to ensure the correct following of the process;
  4. the application of technology to simplify, speed up or automate parts of the process.

I don’t want to talk about point 4 particularly – I spend far too much of my time on that in most of my life – and the ways of achieving point 1 are so diverse as to defy consideration in this context, so let’s briefly discuss points 2 and 3.

Allocating people

If you have a process, you can break that process into steps, you can assign roles and responsibilities to those steps. This is useful in a variety of ways, the first of which is that you can start to scale the process by having different people working on different steps – sometimes in parallel. Imagine having one person having to count all of the votes in the US presidential election, or even having multiple people doing it, but having to do so in series: it might work, but it’s going to take way too long. Another benefit is one on which the Industrial Revolution was built: specialisation. Some people will be good at some parts of the process, and others at other parts of the process. You can increase efficiency by putting those with expertise on the right pieces of the process. A third, unrelated to efficiency, is separation of responsibilities. Sometimes, it’s important that certain people, who are experts or certified to perform a particular role, are the ones who do that. Often, it’s even more important that certain people don’t perform those roles. An example of this would be if one of the candidates in an election was the one to perform the final tally of votes and hand the result to the person making the announcement, or if they made the announcement themselves. This is equally true for other types of process: your bank does not want you to be the person who provides the final approval for your loan, and a company does not want a spouse, partner or family member to be providing sign-off for a hiring decision.

Norms, rules, regulation and legislation

In the UK, we have strong social norms around the process of queuing, and you will be subject to social (and sometimes stronger!) censure if you break them. Rules around other processes may be stronger, and sometimes regulation by an industry body or even legislation at the nation level (or multi-national level such as EU or UN) is required to safeguard the appropriate execution of a process. The ability for courts to intervene where vote-rigging may have taken place is a good example in the US election process, but legislation and regulation around anything from wiring a house to what fertilisers are allowed on particular crops provide additional levels of checking and assurance that processes are following correctly (by including censure or punishment for those who have contravened them) or can be remedied when not (through other processes such as legal review or court cases).

Legislation and regulation can be annoying, but without them (or equivalent rules and norms for other types of process), we cannot be sure of what we are getting into, or whether, if we get into it improperly, that we will ever get out of it. People support and are subject to these checks and balances, and without the combination of all of them (not forgetting the technology as well), processes are next to useless.


1 – I am not a lawyer. Nor a constitutional expert. Nor even a US citizen. Basically, do not take my word for any of this.

2 – I love this word. We should use it more often.

Security, cost and usability (pick 2)

If we cannot be explicit that there is a trade-off, it’s always security that loses.

Everybody wants security: why wouldn’t you? Let’s role-play: you’re a software engineer on a project to create a security product. There comes a time in the product life-cycle when it’s nearly due, and, as usual, time is tight. So you’re in the regular project meeting and the product manager’s there, so you ask them what they want you to do: should you prioritise security? The product manager is very clear[1]: they will tell you that they want the product as secure as possible – and they’re right, because that’s what customers want. I’ve never spoken to a customer (and I’ve spoken to lots of customers over the years) who said that they’d prefer a product which wasn’t as secure as possible. But there’s a problem, which is that all customers also want their products tomorrow – in fact, most customers want their products today, if not yesterday.

Luckily, products can generally be produced more quickly if more resources are applied (though Frederick Brooks’ The Mythical Man Month tells us that simple application of more engineers is actually likely to have a negative impact), so the requirement for speed of delivery can be translated to cost. There’s another thing that customers want, however, and that is for products to be easy to use: who wants to get a new product and then, when it arrives, for it to take months to integrate or for it to be almost impossible for their employees to run it as they expect?

So, to clarify, customers want a security product to be be the following:

  1. secure – security is a strong requirement for many enterprises and organisations[3], and although we shouldn’t ever use the word secure on its own, that’s still what customers want;
  2. cheap – nobody wants to pay more than the minimum they can;
  3. usable – everybody likes simple-to-use, easy-to-integrate applications.

There’s a problem, however, which is that out of the three properties above, you can only choose two for any application or project. You say this to your product manager (who’s always right, remember[1]), and they’ll say: “don’t be ridiculous! I want all three”.

But it just doesn’t work like that: why? Here’s my take on the reasons. Security, simply stated, is designed to stop people doing things. Stated from the point of view of a user, security’s view is to reduce usability. “Doing security” is generally around applying controls to actions in a system – whether by users or non-human entities – and the simplest way to apply it is “blanket security” – defaulting to blocking or denying actions. This is sometimes known as fail to safe or fail to closed.

Let’s take an example: you have a simple internal network in your office and you wish to implement a firewall between your network and the Internet, to stop malicious actors from probing your internal machines and to compromised systems on the internal network from communicating out to the Internet. “Easy,” you think, and set up a DENY ALL rule for connections originating outside the firewall, and a DENY ALL rule for connections originating inside the firewall, with the addition of a ALLOW all outgoing port 443 connections to ensure that people can use web browsers to make HTTPS connections. You set up the firewall, and get ready to head home, knowing that your work is done. But then the problems arise:

  • it turns out that some users would like to be able to send email, which requires a different outgoing port number;
  • sending email often goes hand in hand with receiving email, so you need to allow incoming connections to your mail server;
  • one of your printers has been compromised, and is making connections over port 443 to an external botnet;
  • in order to administer the pay system, your accountant – who is not a full-time employee, and works from home, needs to access your network via a VPN, which requires the ability to accept an incoming connection.

Your “easy” just became more difficult – and it’s going to get more difficult still as more users start encountering what they will see as your attempts to make their day-to-day revenue-generating lives more difficult.

This is a very simple scenario, but it’s clear that in order to allow people actually to use a system, you need to spend a lot more time understanding how security will interact with it, and how people’s experience of the measures you put in place will be impacted. Usability and user experience (“UX”) is a complex field on its own, but when you combine it with the extra requirements around security, things become even more tricky.

You need both to manage the requirements of users to whom the security measures should be transparent (“TLS encryption should be on by default”) and those who may need much more control (“developers need to be able to select the TLS cipher suite options when connecting to a vendor’s database”), so you need to understand the different personae[4] you are targeting for your application. You also need to understand the different failures modes, and what the correct behaviour should be: if authentication fails three times in a row, should the medical professional who is trying to get a rush blood test result be locked out of the system, or should the result be provided, and a message sent to an administrator, for example? There will be more decisions to make, based on what your application does, the security policies of your customers, their risk profiles, and more. All of these investigations and decisions take time, and time equates to money. What is more, they also require expertise – both in terms of security but also usability – and that is in itself expensive.

So, you have three options:

  1. choose usability and cost – you can prioritise usability and low cost, but you won’t be able to apply security as you might like;
  2. choose security and cost – in this case, you can apply more security to the system, but you need to be aware that usability – and therefore your customer’s acceptance of the system – will suffer;
  3. choose usability and security – I wish this was the one that we chose every time: you decide that you’re willing to wait longer or pay more for a more secure product, which people can use.

I’m not going to pretend that these are easy decisions, nor that they are always clear cut. And a product manager’s job is sometimes to make difficult choices – hopefully ones which can be re-balanced in a later release, but difficult choices nevertheless. It’s really important, however, that anyone involved in security – as an engineer, as a UX expert, as a product manager, as a customer – understands the trade-off here. If we cannot be explicit that there is a trade-off, then the trade-off will be made silently, and in my experience, it’s always security that loses.


1 – and right: product managers are always right[2].

2 – I know: I used to be a product manager.

3 – and the main subject of this blog, so it shouldn’t be a surprise that I’m writing about it.

4 – or personas if you really, really must. I got an “A” in Latin O level, and I’m not letting this one go.

In praise of triage

It’s all too easy to prioritise based on the “golfing test”.

Not all bugs are created equal.

Some bugs need fixing now, some bugs can wait. Some bugs are in your implementation, some are in the underlying design. Some bugs will annoy a few customers, some will destroy your business.

Bugs come in all shapes and sizes, and one of the tasks of a product owner, product manager, chief architect – whoever makes the call about where to assign resources – is to decide which ones to address in which order: to prioritise them. The problem is deciding how to prioritise them. It’s all too easy to prioritise based on the “golfing test”: your CEO meets someone on golf course who mentions that his or her company loves your product, except for one tiny issue. The CEO comes back, and makes it clear that fixing this “major bug” is now your one and only task until it’s done, and your world is turned upside down. You have to fix the bug as quickly as possible, with no thought to the impact it has on the rest of the project, or the immense pile technical debt that’s just been accrued. You don’t want to live in this world. What, then, is the alternative?

The answer – though it’s only the beginning of the answer – is triage. Triage (from the French for “separating out”) comes from the world of battlefield medicine. When deciding which wounded soldiers to treat, rapid (hopefully objective) assessments are carried out, allowing a quick sorting of each soldier, typically into categories such as “not urgent: wait”, “urgent: treat immediately” and “not saveable: do not treat”. We can apply the same to software bugs in order to decide what to treat (fix) and with what priority. The important thing is not so much the categories – which will vary based on your context – but the assessment criteria, and how they are applied. Here are a list of just some of the possible criteria:

  • likely monetary impact per customer
  • number of customers impacted
  • reputational impact on your organisation
  • ease to fix
  • impact on system security
  • impact on system performance
  • impact on system stability
  • annoyance of CEO not to be listened to.

We do not, of course, only need to apply one of these: a number of them can be combined with a weighting system, though the more you add, the less clear your priorities will be, and the more likely it is that someone will “put a finger on the scales” – tweak the numbers to give the outcome they want. Another important point about the categories that you decide to apply is that they should be as measurable as you can make them, to allow as objective scoring as possible. I wrote a review of the book Building Evolutionary Architectures a while ago: the methodology adopted there, where you measure and test in order to meet specific criteria, is exactly the sort of approach you should be choosing when designing your triage system.

This is (ostensibly) a blog about security, and so you might expect me to say that “security always wins”, but that should absolutely not be the approach you take. Security might be the most important category for you (that is, carry the most weight), but you need to understand why that is the case – at this particular time – and what exactly you mean by “security”. The “security of the system” is not an objective measure: in order to mean anything, such a phrase needs to reference measurements that can be made (“resistance to physical tampering”, “resistance to brute force attacks”, “number or PhD students likely to be needed to reverse engineer our ‘secure’ protocol”[1]). More importantly, it may be that at this point in your organisation’s life, the damage done by lack of stability or decreased performance outweighs the impact of a security bug. If that’s the case, then your measurements should encapsulate that information and lead you to prioritise bugs with impact in these categories over security issues[3].

There’s one proviso that I feel I need to put in at this point, and it’s about the power of what, in Agile Methodology terms, is called the Product Owner. This is the person who represents the users of the product/project, and should have final say about the direction of development in terms of features, functionality and, most relevant in this discussion, bug-fixing. As noted above, this may be an architect, product manager or someone enjoying another title, but their role should be clear: they get to call the shots. There are times when this person goes against the evidence provided by the triage, and makes a decision to prioritise a particular bug over others despite the outcome of the measurements. This is typically very painful for the technical team[4], but, when it comes down to it, as the product owner, they get to decide. The technical team – after appropriate warnings and discussion[5] – must be ready to step aside and accept the decision. Such decisions (and related discussions) should be recorded, and the product owner must be ready to stand or fall based on the outcome, but that is their job. Triage is a guide, and there are occasions when there are measurements which cannot be easily made objectively, and which sit outside the expertise or scope of knowledge of the technical team. If this sort of decision keeps being made, and you think you know better, you may have a future in technical product management, where people with a view of both the technical and the business side of technology are much in demand. In the end, though, the product owner will need to justify their decision to management, and if they get it wrong, then they must be ready to take the blame (this is one reason why you should make sure that you’ve recorded the process taken to get to this decision – you don’t want to take the blame for a poor decision which you advised against).

So: go out an design a triage process, be ready to follow it, and be ready to defend it. Oh, and one last point: you might want to buy a set of golf clubs.

—–

1 – this last one is a joke: don’t design your own protocol, or if you do, make it open and have it peer-reviewed[2].

2 – and then throw it away and use an open source implementation of better, more thoroughly-reviewed one.

3 – much as it pains me to say it.

4 – I’ve been on both sides of these decisions: I know.

5 -often rather heated, in my experience.

Are you positive?

What do pregnancy tests and the Ukrainian aircraft missile strike have in common?

Not everything in life is nicely binary, much as we[1] might like it to be. There are shades of grey[2] in many aspects of life, and though humans can often cope with uncertainty, computer systems are less good at it: they generally want a “yes” or “no” answer. This means that decisions sometimes need to be made on incomplete evidence, and, well, that means that the answers aren’t always correct. There’s a whole area of computer science related to this: fuzzy logic.

Let’s look into what the options are. Assuming that we’re looking two options: “yes” (a positive) and “no” (a negative). That means that there are two ways in which the answer can be incorrect:

  1. a “yes” answer was incorrectly chosen (false positive);
  2. a “no” answer was incorrectly chosen (false negative).

An example to allow us to explore this is pregnancy. It’s generally agreed that you can’t be a little bit pregnant: if you take a test, any result it gives you needs to be either positive or negative. If you are pregnant, and a test result comes back negative, then that’s a false negative. If you are not pregnant, and a test comes back positive, that’s a false positive. The implications of a false positive or a false negative can both be pretty major – as anybody who has received one will tell you. I spent a little time online trying to find expected false positive and false negatives for pregnancy tests, but it turns out that the rates are so dependent on a variety of factors that it was difficult to find a sensible answer[3].

A tragic recent example of a false positive took place on Wednesday, 8th January 2020, when a Ukrainian International Airlines flight was shot down by an Iranian missile, killing all 176 people on board. It appears that an air defence radar system misidentified the aircraft as a cruise missile. As the radar system was looking for a positive identification of a threat, this can be counted as a false positive.

What might have been the alternative in this case? If the aircraft actually had been a cruise missile, but was identified as a civilian aircraft, this would have been a false negative, and the impact might well have been significant damage to an Iranian military installation.

Which is the most damaging? Well, in the case of the aircraft, it would seem pretty clear to most observers that the false positive would be worse, but from a military point of view, that might not be the case. Maybe the impact of a missile strike on a major military installation might be considered worse than the civilian loss of life in the other case. In this case, as in many others, a decision needs to be made as to which is most important to reduce: the chance of a false negative or the chance of a false positive? In a perfect world, of course, there would be no false results, negative or positive. The problem with many systems that take analogue[4] inputs and turn them into digital outputs in this way is that avoiding false results is very costly, and sometimes impossible. Even worse news is that reducing probability of one of the two types of false result tends to increase the probability of the other.

A classic example of this is in the use of biometrics for user identification. Fingerprints, facial recognition, iris scanning and similar techniques have to balance the likelihood of a false positive with a false negative. Which is worse: the chance that the CEO will not be able to update the payroll details, or that a rogue employee will update her details to improve her salary package?[5]

One good piece of news is that AI/ML (Artificial Intelligence/Machine Learning) is improving the performance of biometric systems and, in fact, other areas of computing where “fuzzy logic” is required. In most cases, humans are still better at reducing messy sets of information to yes/no results, but that is changing, and where multiple automated decisions need to be made, then AI/ML is worth considering.

Whenever you are dealing with “messy” data[6] which needs to be reduced to a “yes/no” or “positive/negative” binary result, you need to consider the likelihood of false positives or negatives. Not only do you need to consider the likelihood of each, but also the impact of each. Once you have understood these, you can then decide which you want to try to minimise, and what techniques you should use to do so.

We may be stuck with false results, but we need to understand what our choices are, and how we can get the best outcomes available from messy data.


1 – in talking security, but I’m sure this goes for lots of other people, too.

2. “gray” for our non-Commonwealth readers.

3. good advice seems to be to test several times over several days.

4. “analog”, I suppose – see [2].

5. this is one of the reasons that authentication systems generally use two factors from the three “something you are”, “something you know”, “something you have”.

6. most real-world data, to be honest.