Functional vs non-functional requirements: a dangerous dichotomy?

Non-functional requirements are at least as important as functional requirements.

Imagine you’re thinking about an application or a system: how would you describe it? How would you explain what you want it to do? Most of us, I think, would start with statements like:

  • it should read JPEGs and output SVG images;
  • it should buy the stocks I tell it to when they reach a particular price;
  • it should take a customer’s credit history and decide whether to approve a loan application;
  • it should ensure that the car maintains a specific speed unless the driver presses the brakes or disengages the system;
  • it should level me up when I hit 10,000 gold bars mined;
  • it should take a prompt and output several hundred words about a security topic that sound as if I wrote them;
  • it should strike out any text which would give away its non-human status.

These are all requirements on the system. Specifically, they are functional requirements: they are things that an application or a system should do based on the state of inputs and outputs to which it is exposed.

Now let’s look at another set of requirements: requirements which are important to the correct operation of the system, but which aren’t core to what it does. These are non-functional requirements, in that they don’t describe the functions it performs, but its broader operation. Here are some examples:

  • it should not leak cryptographic keys if someone performs a side-channel attack on it;
  • it should be able to be deployed on premises or in the Cloud;
  • it should be able to manage 30,000 transactions a second;
  • it should not slow stop a user’s phone from receiving a phone call when it is running;
  • it should not fail catastrophically, but degrade its performance gracefully under high load;
  • it should be allowed to empty the bank accounts of its human masters;
  • it should recover from unexpected failures, such as its operator switching off the power in a panic on seeing unexpected financial transactions.

You may notice that some of the non-functional requirements are expressed as negatives – “it should not” – this is fairly common, and though functional requirements are sometimes expressed in the negative, it is more rare.

So now we come to the important question, and the core of this article: which of the above lists is more important? Is it the list with the functional requirements or the non-functional requirements? I think that there’s a fair case to be made for the latter: the non-functional requirements. Even if that’s not always the case, my (far too) many years of requirements gathering (and requirements meeting) lead me to note that while there may be a core set of functional requirements that typically are very important, it’s very easy for a design, architecture or specification to collect more and more functional requirements which pale into insignificance against some of the non-functional requirements that accrue.

But the problem is that non-functional requirements are almost always second-class citizens when compared to functional requirements on an application or system. They are are often collected after the functional requirements – if at all – and are often the first to be discarded when things get complicated. They also typically require input from people with skill sets outside the context of the application or system: for instance, it may not be obvious to the designer of a back-end banking application that they need to consider data-in-use protection (such as Confidential Computing) when they are collecting requirements of an application which will initially be run in an internal data centre.

Agile and DevOps methodologies can be relevant in these contexts, as well. On the one hand, ensuring that the people who will be operating an application or system is likely to focus their minds on some of the non-functional requirements which might impact them if they are not considered early enough. On the other hand, however, a model of development where the the key performance indicator is having something that runs means that the functional requirements are fore-grounded (“yes, you can log in – though we’re not actually checking passwords yet…”).

What’s the take-away from this article? It’s to consider non-functional requirements as at least as important as functional requirements. Alongside that, it’s vital to be aware that the people in charge of designing, architecting and specifying an application or system may not be best placed to collect all of the broader requirements that are, in fact, core to its safe and continuing (business critical) operation.

What’s a secure channel?

Always beware of products and services which call themselves “secure”. Because they’re not.

A friend asked me what I considered a secure channel a couple of months ago, and it made me think. Many of us have information that we wish to communicate which we’d rather other people can’t look at, for all sorts of reasons. These might range from present ideas for our spouse or partner sent by a friend to my phone to diplomatic communications about espionage targets sent between embassies over the Internet, with lots in between: intellectual property discussions, bank transactions and much else. Sometimes, we want to ensure that people can’t change what’s in the messages we send: it might be OK for other people to know that I pay £300 in rent, but not for them to be able to change the amount (or the bank account into which it goes). These two properties are referred to as confidentiality (keeping information secret) and integrity (keeping information unchangeable), and often you want to combine them – in the case of our espionage plans, I’d prefer that my enemies don’t know what targets are at risk, but also that they don’t change the targets I’ve selected to something less bothersome for them.

Modern encryption systems generally provide both confidentiality and integrity for messages and data, so I’m going to treat these as standard properties for an encrypted channel. Which means that if I use encryption on a channel, it’s secure, right?

Hmm. Let’s step back a bit, because, unfortunately, there’s rather a lot more to unpack than that. Three of the questions we need to tackle should give us pause. They are: “secure from whom?”, “secure for how long?” and “secure where?”. The answers we give to these questions will be important, and though they are all somewhat intertwined, I’m going to deal with them in order, and I’m going to use the examples of the espionage message and the present ideas to discuss them. I’m also going to talk more about confidentiality than integrity – though we’ll assume that both properties are important to what we mean by “secure”.

Secure from whom?

In our examples, we have very different sets of people wanting to read our messages – a nation state and my spouse. Unless my spouse has access to skills and facilities of which I’m unaware (and I wouldn’t put it past her), the resources that she has at her disposal to try to break the security of my communication are both fewer and less powerful than those of the nation state. A nation state may be able to apply cryptologic attacks to messages, attack the software (and even firmware or hardware) implementations of the encryption system, mess with the amount of entropy available for key generation at either or both ends of the channel, perform interception (e.g. Person-In-The-Middle) attacks, coerce the sender or recipient of the message and more. I’m hoping that most of the above are not options for my wife (though coercion might be, I suppose!). The choice of encryption system, including entropy sources, cipher suite(s), hardware and software implementation are all vital in the diplomatic message case, as are vetting of staff and many other issues. In the case of gift ideas for my wife’s birthday, I’m assuming that a standard implementation of a commercial messaging system should be enough.

Secure for how long?

It’s only a few days till my wife’s birthday (yes, I have got her a present, though that does remind me; I need a card…), so I only have to keep the gift ideas secure for a little longer. It turns out that, in this case, the time sensitivity of the integrity of the message is different to that of the confidentiality: even if she managed to change what the gift idea in the message was, it wouldn’t make a difference to what I’ve got her at this point. However, I’d still prefer if she didn’t know what the gift ideas are.

In the case of the diplomatic espionage message, we can assume that confidentiality and the integrity are both important for a much longer time, but we’ll concentrate on the confidentiality. Obviously an attacking country would prefer it if the target were unaware of an attack before it happened, but if the enemy managed to prove an attack was performed by the message sender’s or recipient’s country, even a decade or more in the future, this could also lead to major (and negative) consequences. We want to ensure that whatever steps we take to protect the message are sufficient that access to a copy of the message taken when it was sent (via wire-tapping, for instance) or retrieved at a later date (via access to a message store in the future), is insufficient to allow it to be cracked. This is tricky, and the history of cryptologic attacks on encryption schemes, not to mention human failures (such as leaks) and advances in computation (such as quantum computing) should serve as a strong warning that we need to consider very carefully what mechanisms we should use to protect our messages.

Secure where?

Are the embassies secure? Are all the machines between the embassies secure? Is the message stored before delivery? If so, is it stored on a machine within the embassy or on a server elsewhere? Is it end-to-end encrypted, or is it decrypted before delivery and then re-encrypted (I really, really hope not). While this is unlikely in the case of diplomatic messages, a good number of commercially sensitive messages (including much email) is not end-to-end encrypted, leading to vulnerabilities if someone trying to break the security can get access to the system where they are stored, or intercept them between decryption and re-encryption.

Typically, we have better control over different parts of the infrastructure which carry or host our communications than we do over others. For most of the article above, I’ve generally assumed that the nation state trying to read embassy message is going to have more relevant resources to try to breach the security of the message than my wife does, but there’s a significant weakness in protecting my wife’s gift idea: she has easy access to my phone. I tend to keep it locked, and it has a PIN, but, if I’m honest, I don’t tend to go out of my way to keep her out: the PIN is to deter someone who might steal it. Equally, it’s entirely possible that I may be sharing some material (a video or news article) with her at exactly the time that the gift idea message arrives from our mutual friend, leading her to see the notification. In either case, there’s a good chance that the property of confidentiality is not that strong after all.

Conclusion

I’ve said it before, and I plan to say it again (and again, and again): there is no “secure”. When we talk about secure channels, we must be aware that what we mean should be “channels secured with appropriate measures to protect against the risks associated with the security being compromised”. This is a long way of saying “if I’m protecting diplomatic messages, I need to make greater efforts than if I’m trying to stop my wife finding out ahead of time what she’s getting for her birthday”, but it’s important to understand this. Part of the problem is that we’re bombarded with words like “secure”, which are unqualified, and may lead us to think that they’re absolute, when they’re absolutely not. Another part of the problem is that once we’ve put one type of security in place, particularly when it’s sold or marketed as “best in breed” or “best practice”, that it addresses all of the issues we might have. This is clearly not the case – using the strongest encryption possible for messages between my friend and me isn’t going to stop my wife from knowing I’ve bought her if knows the PIN for my phone. Please, please, consider what you need when you’re protecting your communications (and other data, of course), and always beware of products and services which call themselves “secure”. Because they’re not.

Open source and cyberwar

If cyberattacks happen to the open source community, the impact may be greater than you expect.

There are some things that it’s more comfortable not thinking about, and one of them is war. For many of us, direct, physical violence is a long way from us, and that’s something for which we can be very thankful. As the threat of physical violence recedes, however, it’s clear that the spectre of cyberattacks as part of a response to aggression – physical or virtual – is becoming more and more likely.

It’s well attested that many countries have “cyber-response capabilities”, and those will include aggressive as well as protective measures. And some nation states have made it clear not only that they consider cyberwarfare part of any conflict, but that they would be entirely comfortable with initiating cyberwarfare with attacks.

What, you should probably be asking, has that to do with us? And by “us”, I mean the open source software community. I think that the answer, I’m afraid, is “a great deal”. I should make it clear that I’m not speaking from a place of privileged knowledge here, but rather from thoughtful and fairly informed opinion. But it occurs to me that the “old style” of cyberattacks, against standard “critical infrastructure” like military installations, power plants and the telephone service, was clearly obsolete when the Two Towers collapsed (if not in 1992, when the film Sneakers hypothesised attacks against targets like civil aviation). Which means that any type of infrastructure or economic system is a target, and I think that open source is up there. Let me explore two ways in which open source may be a target.

Active targets

If we had been able to pretend that open source wasn’t a core part of the infrastructure of nations all over the globe, that self-delusion was finally wiped away by the log4j vulnerabilities and attacks. Open source is everywhere now, and whether or not your applications are running any open source, the chances are that you deploy applications to public clouds running open source, at least some of your employees use an open source operating system on their phones, and that the servers running your chat channels, email providers, Internet providers and beyond make use – extensive use – of open source software: think apache, think bind, think kubernetes. At one level, this is great, because it means that it’s possible for bugs to be found and fixed before they can be turned into vulnerabilities, but that’s only true if enough attention is being paid to the code in the first place. We know that attackers will have been stockpiling exploits, and many of them will be against proprietary software, but given the amount of open source deployed out there, they’d be foolish not to be collecting exploits against that as well.

Passive targets

I hate to say it, but there also are what I’d call “passive targets”, those which aren’t necessarily first tier targets, but whose operation is important to the safe, continued working of our societies and economies, and which are intimately related to open source and open source communities. Two of the more obvious ones are GitHub and GitLab, which hold huge amounts of our core commonwealth, but long-term attacks on foundations such as the Apache Foundation and the Linux Foundation, let alone kernel.org, could also have impact on how we, as a community, work. Things are maybe slightly better in terms of infrastructure like chat services (as there’s a choice of more than one, and it’s easier to host your own instance), but there aren’t that many public servers, and a major attack on either them or the underlying cloud services on which many of them rely could be crippling.

Of course, the impact on your community, business or organisation will depend on your usage of difference pieces of infrastructure, how reliant you are on them for your day-to-day operation, and what mitigations you have available to you. Let’s quickly touch on that.

What can I do?

The Internet was famously designed to route around issues – attacks, in fact – and that helps. But, particularly where there’s a pretty homogeneous software stack, attacks on infrastructure could still have very major impact. Start thinking now:

  • how would I support my customers if my main chat server went down?
  • could I continue to develop if my main git provider became unavailable?
  • would we be able to offer at least reduced services if a cloud provider lost connectivity for more than an hour or two?

By doing an analysis of what your business dependencies are, you have the opportunity to plan for at least some of the contingencies (although, as I note in my book, Trust in Computer Systems and the Cloud, the chances of your being able to analyse the entire stack, or discover all of the dependencies, is lower than you might think).

What else can you do? Patch and upgrade – make sure that whatever you’re running is the highest (supported!) version. Make back-ups of anything which is business critical. This should include not just your code but issues and bug-tracking, documentation and sales information. Finally, consider having backup services available for time-critical services like a customer support chat line.

Cyberattacks may not happen to your business or organisation directly, but if they happen to the open source community, the impact may be greater than you expect. Analyse. Plan. Mitigate.

Win a copy of my book!

What’s better than excerpts? That’s right: the entire book.

As regular readers of this blog will know, I’ve got a book coming out with Wiley soon. It’s called “Trust in Computer Systems and the Cloud”, and the publisher’s blurb is available here. We’ve now got to the stage where we’ve completed not only the proof-reading for the main text, but also the front matter (acknowledgements, dedication, stuff like that), cover and “praise page”. I’d not heard the term before, but it’s where endorsements of the book go, and I’m very, very excited by the extremely kind comments from a variety of industry leaders which you’ll find quoted there and, in some cases, on the cover. You can find a copy of the cover (without endorsement) below.

Trust book front cover (without endorsement)

I’ve spent a lot of time on this book, and I’ve written a few articles about it, including providing a chapter index and summary to let you get a good idea of what it’s about. More than that, some of the articles here actually contain edited excerpts from the book.

What’s better than excerpts, though? That’s right: the entire book. Instead of an article today, however, I’m offering the opportunity to win a copy of the book. All you need to do is follow this blog (with email updates, as otherwise I can’t contact you), and when it’s published (soon, we hope – the March date should be beaten), I’ll choose one lucky follower to receive a copy.

No Wiley employees, please, but other than that, go for it, and I’ll endeavour to get you a copy as soon as I have any available. I’ll try to get it to you pretty much anywhere in the world, as well. So far, it’s only available in English, so apologies if you were hoping for an immediate copy in another language (hint: let me know, and I’ll lobby my publisher for a translation!).

Organisational suppleness

Growing the ability to react to the unexpected is a valuable skill.

“In preparing for battle I have always found that plans are useless but planning is indispensable.”

Dwight D. Eisenhower

Much of this blog is about security – cybersecurity – in one way or another, but on occasion I do try to take a broader view. Cybersecurity is often modelled or described in military terms, talking about “fighting battles”, “wars of attrition” and “arms races” with “attackers”. These can be useful metaphors (and it’s why I started this article with a quote from a general), but there is a broader set of responsibilities that many of us in the sector need to consider, which is the continued (and hopefully healthy) functioning of our businesses and organisations. In particular, I like to talk about risk and how it relates not just to security, but to how businesses work and plan. One theme that I’ve visited before is that known or planned degradation of a service is often significantly better than failure, or even planned closure (see Service degradation: actually a good thing). My argument is that there are many occasions where keeping a service or business function running, albeit at reduced capacity, or with reductions in known capabilities, allows for better continuity than just stopping it.

Keeping a service running requires work. You can’t just hope that everything is installed and will run as you expect: what happens when your administrator is ill, your fibre-optic cable gets severed by a back hoe, or a DDoS attack is directed at you? You need to plan and practice what to do in these situations. What I’d like to explore in this article goes somewhat beyond the expectation of that planning in three directions. Let’s call them scenario coverage, muscle memory and organisational suppleness.

Scenario coverage

The first, and most obvious of the three directions, is about understanding eventualities. The more scenarios that we model and practice, the more we reduce our risk, simply because we have reduced the number of unknown eventualities in the probability space. There is a actually a side benefit to modelling lost of scenarios, which is that the more situations you consider, the more will come to mind. Every situation involves sets of choices or probabilities – “after the door closes, will it lock or not?” or “if the coolant fails, will the system turn off or burst into flames?” – and the more scenarios you consider, the more questions will arise. This can be daunting – and it’s almost impossible to consider every eventuality – but the more options are covered, the better your opportunities to mitigate the various risks they present.

Muscle memory

Muscle memory is what comes with training and practice. Assuming that you are including your teams in the scenario planning

And I’m assuming here that the planning isn’t solely a paper exercise. Theoretical planning, while useful, only goes so far, for a couple of important reasons:

  • systems will always fails in unexpected ways
  • people will do unexpected things.

What the first of these means is that however much you assume that your back-up generator will kick in if there’s a power outage, until you test it, you can’t be sure that it will. The second of these relates to the fact that however much you tell people what to do, when it actually comes to the doing of it, they’re unlikely to as you expect. This is likely to be even worse if there’s been no training, and you’re just assuming that person X will know how to operate a fire extinguisher, or that team Y will, of course, exit the building in an orderly manner via exit Z (rather than find fourteen different exits, or not even leave the building at all).

For both of these reasons, getting people together to work through possible scenarios, and then, where possible, actually practising what to do, means that you have a higher assurance that when one of the situations you’ve considered does arrive, that they will know what to do, and act as you expect.

Organisational suppleness

While you cannot, as we’ve noted, plan for every eventuality or know exactly how an employee or team will react when things go wrong, there is another benefit to involving a broad group of people in your scenario planning and training. This is that their very involvement gives them practice in dealing with uncertainty, working out how they will react, and giving them experience in how those around them will act. While I may not know exactly what to do if the payroll system goes down an hour before it is due to run, if I have worked with colleagues on scenarios where the sales processing system fails, I’ve got a better chance of making some sensible choices about who to contact, initial steps to take and information to collect than if this is the first time I’ve ever seen anything like it. Likewise, we may not have modelled our response to a physical failure of one of our network links, but our shared experience of practising our response to a DDoS attack means that we have an idea of what to do.

And it is not just having an idea of what to do that is important, but also having gathered and practised the cognitive skills associated with investigating failures, collating data, sharing information and working with others to ameliorate the situation which allows a team or an organisation to respond better to new, maybe unexpected situations. We can think of this as suppleness, as it means that rather than just failing, or cracking, an organisation can react as a tree does to strong winds, or a gymnast does to a new exercise. Growing the ability to react to the unexpected is a valuable skill for an organisation, and knowing that it is supple allows its leaders to plan with more certainty and mitigate more risk.

Trust book – chapter index and summary

I thought it might be interesting to provide the chapter index and a brief summary of each chapter addresses.

In a previous article, I presented the publisher’s blurb for my upcoming book with Wiley, Trust in Computer Systems and the Cloud. I thought it might be interesting, this time around, to provide the chapter index of the book and to give a brief summary of what each chapter addresses.

While it’s possible to read many of the chapters on their own, I haved tried to maintain a logical progression of thought through the book, building on earlier concepts to provide a framework that can be used in the real world. It’s worth noting that the book is not about how humans trust – or don’t trust – computers (there’s a wealth of literature around this topic), but about how to consider the issue of trust between computing systems, or what we can say about assurances that computing systems can make, or can be made about them. This may sound complex, and it is – which is pretty much why I decided to write the book in the first place!

  • Introduction
    • Why I think this is important, and how I came to the subject.
  • Chapter 1 – Why Trust?
    • Trust as a concept, and why it’s important to security, organisations and risk management.
  • Chapter 2 – Humans and Trust
    • Though the book is really about computing and trust, and not humans and trust, we need a grounding in how trust is considered, defined and talked about within the human realm if we are to look at it in our context.
  • Chapter 3 – Trust Operations and Alternatives
    • What are the main things you might want to do around trust, how can we think about them, and what tools/operations are available to us?
  • Chapter 4 – Defining Trust in Computing
    • In this chapter, we delve into the factors which are specific to trust in computing, comparing and contrasting them with the concepts in chapter 2 and looking at what we can and can’t take from the human world of trust.
  • Chapter 5 – The Importance of Systems
    • Regular readers of this blog will be unsurprised that I’m interested in systems. This chapter examines why systems are important in computing and why we need to understand them before we can talk in detail about trust.
  • Chapter 6 – Blockchain and Trust
    • This was initially not a separate chapter, but is an important – and often misunderstood or misrepresented – topic. Blockchains don’t exist or operate in a logical or computational vacuum, and this chapter looks at how trust is important to understanding how blockchains work (or don’t) in the real world.
  • Chapter 7 – The Importance of Time
    • One of the important concepts introduced earlier in the book is the consideration of different contexts for trust, and none is more important to understand than time.
  • Chapter 8 – Systems and Trust
    • Having introduced the importance of systems in chapter 5, we move to considering what it means to have establish a trust relationship from or to a system, and how the extent of what is considered part of the system is vital.
  • Chapter 9 – Open Source and Trust
    • Another topc whose inclusion is unlikely to surprise regular readers of this blog, this chapter looks at various aspects of open source and how it relates to trust.
  • Chapter 10 – Trust, the Cloud, and the Edge
    • Definitely a core chapter in the book, this addresses the complexities of trust in the modern computing environments of the public (and private) cloud and Edge networks.
  • Chapter 11 – Hardware, Trust, and Confidential Computing
    • Confidential Computing is a growing and important area within computing, but to understand its strengths and weaknesses, there needs to be a solid theoretical underpinning of how to talk about trust. This chapter also covers areas such as TPMs and HSMs.
  • Chapter 12 – Trust Domains
    • Trust domains are a concept that allow us to apply the lessons and frameworks we have discussed through the book to real-world situations at large scale. They also allow for modelling at the business level and for issues like risk management – introduced at the beginning of the book – to be considered more explicitly.
  • Chapter 13 – A World of Explicit Trust
    • Final musings on what a trust-centric (or at least trust-inclusive) view of the world enables and hopes for future work in the field.
  • References
    • List of works cited within the book.

Trust book preview

What it means to trust in the context of computer and network security

Just over two years ago, I agreed a contract with Wiley to write a book about trust in computing. It was a long road to get there, starting over twenty years ago, but what pushed me to commit to writing something was a conference I’d been to earlier in 2019 where there was quite a lot of discussion around “trust”, but no obvious underlying agreement about what was actually meant by the term. “Zero trust”, “trusted systems”, “trusted boot”, “trusted compute base” – all terms referencing trust, but with varying levels of definition, and differing understanding if what was being expected, by what components, and to what end.

I’ve spent a lot of time thinking about trust over my career and also have a major professional interest in security and cloud computing, specifically around Confidential Computing (see Confidential computing – the new HTTPS? and Enarx for everyone (a quest) for some starting points), and although the idea of a book wasn’t a simple one, I decided to go for it. This week, we should have the copy-editing stage complete (technical editing already done), with the final stage being proof-reading. This means that the book is close to down. I can’t share a definitive publication date yet, but things are getting there, and I’ve just discovered that the publisher’s blurb has made it onto Amazon. Here, then, is what you can expect.


Learn to analyze and measure risk by exploring the nature of trust and its application to cybersecurity 

Trust in Computer Systems and the Cloud delivers an insightful and practical new take on what it means to trust in the context of computer and network security and the impact on the emerging field of Confidential Computing. Author Mike Bursell’s experience, ranging from Chief Security Architect at Red Hat to CEO at a Confidential Computing start-up grounds the reader in fundamental concepts of trust and related ideas before discussing the more sophisticated applications of these concepts to various areas in computing. 

The book demonstrates in the importance of understanding and quantifying risk and draws on the social and computer sciences to explain hardware and software security, complex systems, and open source communities. It takes a detailed look at the impact of Confidential Computing on security, trust and risk and also describes the emerging concept of trust domains, which provide an alternative to standard layered security. 

  • Foundational definitions of trust from sociology and other social sciences, how they evolved, and what modern concepts of trust mean to computer professionals 
  • A comprehensive examination of the importance of systems, from open-source communities to HSMs, TPMs, and Confidential Computing with TEEs. 
  • A thorough exploration of trust domains, including explorations of communities of practice, the centralization of control and policies, and monitoring 

Perfect for security architects at the CISSP level or higher, Trust in Computer Systems and the Cloud is also an indispensable addition to the libraries of system architects, security system engineers, and master’s students in software architecture and security. 

Track and trace failure: a systems issue

The problem was not Excel, but that somebody used the wrong tools in a system.

Like many other IT professionals in the UK – and across the world, having spoken to some other colleagues in other countries – I was first surprised and then horrified as I found out more about the failures of the UK testing and track and trace systems. What was most shocking about the failure is not that it was caused by some alleged problem with Microsoft Excel, but that anyone thought this was a problem due to Excel. The problem was not Excel, but that somebody used the wrong tools in a system which was not designed properly, tested properly, or run properly. I have written many words about systems, and one article in particular seems relevant: If it isn’t tested, it doesn’t work. In it, I assert that a system cannot be said to work properly if it has not been tested, as a fully working system requires testing in order to be “working”.

In many software and hardware projects, in order to complete a piece of work, it has to meet one or more of a set of tests which allow it to be described as “done”. These tests may be actual software tests, or documentation, or just checks done by members of the team (other than the person who did the piece of work!), but the list needs to be considered and made part of the work definition. This “done” definition is as much part of the issue being addressed, functionality added or documentation being written as the actual work done itself.

I find it difficult to believe that there was any such definition for the track and trace system. If there was, then it was not, I’m afraid, defined by someone who is an expert in distributed or large-scale systems. This may not have been the fault of the person who chose Excel for the task of recording information, but it is the fault of the person who was in charge of the system, because Excel is not, and never was, a fit application for what it was being used for. It does not have the scalability characteristics, the integrity characteristics or the versioning characteristics required. This is not the fault of Microsoft, any more than it would be fault of Porsche if a 911T broke down because its owner filled with diesel fuel, rather than petrol[1]. Any competent systems architect or software engineer, qualified to be creating such a system, would have known this: the only thing that seems possible is that whoever put together the system was unqualified to do so.

There seem to be several options here:

  1. the person putting together the system did not know they were unqualified;
  2. the person putting together the system realised that they were unqualified, but did not feel able to tell anyone;
  3. the person putting together the systems realised that they were unqualified, but lied.

In any of the above, where was the oversight? Where was the testing? Where were the requirements? This was a system intended to safeguard the health of millions – millions – of people.

Who can we blame for this? In the end, the government needs to take some large measure of responsibility: they commissioned the system, which means that they should have come up with realistic and appropriate requirements. Requirements of this type may change over the life-cycle of a project, and there are ways to manage this: I recommend a useful book in another article, Building Evolutionary Architectures – for security and for open source. These are not new problems, and they are not unsolved problems: we know how to do this as a profession, as a society.

And then again, should we blame someone? Normally, I’d consider this a question out of scope for this blog, but people may die because of this decision – the decision not to define, design, test and run a system which was fit for purpose. At the very least, there are people who are anxious and worried about whether they have Covid-19, whether they need to self-isolate, whether they may have infected vulnerable friends or family. Blame is a nasty thing, but if it’s about holding people to account, then that’s what should happen.

IT systems are important. Particularly when they involve people’s health, but in many other areas, too: banking, critical infrastructure, defence, energy, even retail and entertainment, where people’s jobs will be at stake. It is appropriate for those of us with a voice to speak out, to remind the IT community that we have a responsibility to society, and to hold those who commission IT systems to account.


1 – or “gasoline” in some geographies.

What’s a Trusted Compute Base?

Tamper-evidence, auditability and measurability are three important properties.

A few months ago, in an article called “Turtles – and chains of trust“, I briefly mentioned Trusted Compute Bases, or TCBs, but then didn’t go any deeper.  I had a bit of a search across the articles on this blog, and realised that I’ve never gone into this topic in much detail, which feels like a mistake, so I’m going to do it now.

First of all, let’s think about computer systems.  When I talk about systems, I’m being both quite specific (see Systems security – why it matters) and quite broad (I don’t just mean computer that sits on your desk or in a data centre, but include phones, routers, aircraft navigation devices – pretty much anything that has a set of chips inside it).  There are surely some systems that you don’t rely on too much to do important things, but in most cases, you’re going to care if they go wrong, or, more relevant to this discussion, if they get compromised.  Even the most benign of systems – a smart light-bulb, for instance – can become a nightmare if compromised.  Even if you don’t particularly care whether you can continue to use it in the way it was intended, there are still worries about its misuse in the case of compromise:

  1. it may become a “jumping off point” for malicious attacks into your network or other systems;
  2. it may be used as part of a botnet, piggybacking on your network to attack other systems (leading to sanctions against your legitimate systems from outside);
  3. it may be used as part of a botnet, using up resources such as network bandwidth, storage or electricity (leading to resource constraints or increased charges).

For any systems dealing with sensitive data – anything from your messages to loved ones on your phone through intellectual property secrets for a manufacturing organisation through to National Security data for government department – these issues are compounded.  In order to protect your system, you can’t just say “this system is secure” (lovely as that would be).  What can you do to start making statement about the general security of a system?

The stack

Systems consist of multiple components, and modern computing systems are typically composed from multiple layers (one of my favourite xkcd comics, Stack, shows some of them).  What’s relevant from the point of view of this article is that, on the whole, the different layers of the stack start up – boot up – from the bottom upwards.  This means, following the “bottom turtle” rule (see the Turtles article referenced above), that we need to ensure that the bottom layer is as secure as possible.  In fact, in order to build a system in which we can have assurance that it will behave as expected and designed (in other words, a system in which we can have a trust relationship), we need to build a Trusted Compute Base.  This should have at least the following set of properties: tamper-evidence, auditability and measurability, all of which are related to each other.

Tamper-evidence

We want to know if the TCB – on which we are building everything else – has a problem.  Specifically, we need a set of layers or components that we are pretty sure have not been compromised, or which, if compromised, will be tamper-evident:

  • fail in expected ways,
  • refuse to start, or
  • flag that they have been compromised.

It turns out that this is not easy, and typically becomes more difficult as you move up the stack – partly because you’re adding more layers, and partly because those layers tend to get more complex.

Our TCB should have the properties listed above (around failure, refusing to start or compromise-flagging), and be as small as possible.  This seems the wrong way around: surely you would want to ensure that as much of your system was trusted as possible?  In fact, what you want is a small, easily measurable and easily auditable TCB on which you can build the rest of your system – from which you can build a “chain of trust” to the other parts of your system about which you care.  Auditability and measurability are the other two properties that you want in a TCB, and these two properties are why open source is a very useful tool in your toolkit when building a TCB.

Auditability (and open source)

Auditability means that you – or someone else who you trust to do the job – can look into the various components of the TCB and assure yourself that they have been written, compiled and  are executing properly.  As I explained in Of projects, products and (security) community, the person may not always be you, or even someone in your organisation, but if you’re using widely deployed open source software, the rest of the community can be doing that auditing for you, which is a win for you and – if you contribute your knowledge back into the community – for everybody else as well.

Auditability typically gets harder the further you go down the stack – partly because you’re getting closer and closer to bits – ones and zeros – and to actual electrons, and partly because there is very little truly open source hardware out there at the moment.  However, the more that we can see and audit of the TCB, the more confidence we can have in it as a building block for the rest of our system.

Measurability (and open source)

The other thing you want to be able to do is ensure that your TCB really is your TCB.  Tamper-evidence is related to this, but that’s a run-time property only (for software components, at least).  Being able to measure when you provision your system and then to check that what you originally loaded is still what you think it should be when you boot it is a very important property of a TCB.  If what you’re running is open source, you can check it yourself, against your own measurements and those of the community, and if changes are made – by you or others – those changes can be checked (as part of auditing) and then propagated through measurement checking to the rest of the community.  Equally important – and much more difficult – is run-time measurability.  This turns out to be very difficult to do, although there are some techniques emerging which are beginning to get traction – for now, we tend to rely on tamper-evidence, which is easier in hardware than software.

Summary

Trusted Compute Bases (TCBs) are a key concept in building systems that we hope will behave in ways we expect – or allow us to find out when they are not.  Tamper-evidence, auditability and measurability are three important properties that they should display, and it turns out that open source is an important factor in helping us ensure two of those.

 

 

 

How to be a no-shame generalist

There is no shame in being a generalist, and knowing when you need to consult a specialist.

There comes a time in any person’s life[1] when they realise that they’re not going to be able to do all the things they might like to do to a high level of expertise.  I used to kid myself that I could do anything if I tried hard enough and practised enough, but then I tried juggling.  It turns out that I’m never going to be able to juggle.  Not just juggle expertly.  I mean juggle at all.  My trying to juggle – with only one ball, let alone more than one – is so amusing that my family realised years ago that it was a great party trick.  “Daddy,” they’ll say, “show everyone your juggling.  It’s really funny.”  “But I can’t juggle,” I retort.  “Yes,” they respond, “that’s what’s funny[2].”

I’m also never going to be able to draw or do any art with any competence.

Or play any racquet sport with any level of skill.

Or do any gardening, painting or DIY-based household jobs with any degree of expertise[3].

Some people will retort that any old fool can be taught to do x activity (usually, it’s juggling, actually), but not only do I not believe this, but also, to be honest, there just isn’t enough time in the day to learn all the things I’d kind of like to try.

What has all this to do with security?

Specialism and education

Well, I’ve posted before that I’m a systems person, and the core of thinking about systems is that you need to look at the big picture.  In order to do that, you need to be a generalist.  There’s a phrase[5] in English: “Jack of all trades, master of none”, which is often used to condemn those who know a little about many things and are seen to dabble in them without a full understanding of any of them.  Interestingly, this version may be an abbreviation of the original, more positive:

Jack of all trades, master of none,
though oftentimes better than master of one.

The core inference, though, is that generalists aren’t as useful as specialists.  I don’t believe this.

In many educational systems, there’s a tendency to push students towards narrower and narrower fields of study.  For some, this is just what is needed, but for others – “systems people”, “synthesists” and “generalists” – this isn’t the best way to harness their talents, at least in the long term.  We need people who can see the big picture, who can take a wider view, and look beyond a single blocking issue to realise that the answer to a problem may not be a better implementation of an authentication library, but a change in the authorisation mechanism being used at the component level, for instance.

There are dangers to following this approach too far, however:

  1. it can lead to disparagement of specialists and their skills, even to a distrust of experts;
  2. it can lead to arrogance on the part of generalists.

We see the first in desperately concerning trends such as politicians thinking they know more than economists or climate scientists, anti-vaxxers ignoring the benefits of vaccination, and idiocy around chem-trails, flat-earth beliefs and moon landing conspiracies.  It happens in the world of work, as well, I’m sad to say.  There is a particular type of MBA recipient, for instance, who believes that the completion of the course and award of the degree confers on them some sort of superhuman ability to know what is is best for all organisations in all circumstances[6].

Specialise first

To come back to the world of security, my recommendation is that even if you know that your skills and interests are leading you to a career as a generalist, then you need to become a specialist first, in at least area.  You may not become an expert in that field, but you need to know it well.  Better still, strive for at least a level of competence in several fields – an ability to converse knowledgeably with true experts and to understand at least why they are making the choices and recommendations that they are.

And that leads us to the key point here: if you become a generalist, you need to acknowledge lack of expertise: it must become your modus operandi, your métier, your way of working.  You need to recognise that your strength is not in your knowing many things, but in knowing what you don’t know, and when it is time to call in the specialists.

I’m not a cryptographer, but I know enough about cryptography to realise when it’s time to call in an expert.  I’m not an expert on legal issues around cryptography, either, but know when to call on a lawyer.  Nor am I an expert on block storage, blockchain consensus, quantum key exchange protocols, CPU scheduling or compression algorithms.  The same will go for many areas which I may be called on to touch as part of my job.  I hope to have enough training and expertise within related fields – or the ability to gain it – to be able to ask sensible questions, but sometimes even that won’t be true, and the best (and most productive) interaction will be to say “I don’t know about this: please explain it to me, or at least tell me what the options are.”  This seems to me to be particularly important for security folks: there are so many overlapping disciplines, and getting one piece wrong means that your defence in depth strategy just got a whole lot shallower.

Being too lazy to look things up, too arrogant to listen to others or too short-sighted to realise that there are areas in which we are not expert are things of which we should be ashamed.

But there is no shame in being a generalist, and knowing when you need to consult a specialist.


1 – I’m extrapolating horribly here, but it’s true for me so I’m assuming it’s a universal truth.

2 – apparently the look on my face, and the things I do with my tongue, are a sight to behold.

3 – I’m constantly trying to convince my wife of these, and although she’s sceptical about some, we’re now agreed that I shouldn’t be allowed access to any power tools again if we want avoid further trips to the Accident and Emergency department at the hospital[4].

4 – it’s not only power tools.  I once nearly removed my foot with a wallpaper stripper.  I still have the scar nearly 25 years later.

5 – somewhat gendered, for which I apologise.

6 – disclaimer – I have an MBA, and met many talented and humble people on my course (and have met many since) who don’t suffer from this predicament.