systems – Page 4 – Alice, Eve and Bob

Moving to DevOps, what’s most important?

Technology, process or culture? (Clue: it’s not the first two)

You’ve been appointed the DevOps champion in your organisation: congratulations. So, what’s the most important issue that you need to address?

It’s the technology – tools and the toolchain – right? Everybody knows that unless you get the right tools for the job, you’re never going to make things work. You need integration with your existing stack – though whether you go with tight or loose integration will be an interesting question – a support plan (vendor, 3rd party or internal), and a bug-tracking system to go with your source code management system. And that’s just the start.

No! Don’t be ridiculous: it’s clearly the process that’s most important. If the team doesn’t agree on how stand-ups are run, who participates, the frequency and length of the meetings, and how many people are required for a quorum, then you’ll never be able institute a consistent, repeatable working pattern.

In fact, although both the technology and the process are important, there’s a third component which is equally important, but typically even harder to get right: culture. Yup, it’s that touch-feely thing that we techies tend to struggle with[1].

Culture

I was visiting a medium-sized government institution a few months ago (not in the UK, as it happens), and we arrived a little early to meet the CEO and CTO. We were ushered into the CEO’s office and waited for a while as the two of them finished participating in the daily stand-up. They apologised for being a minute or two late, but far from being offended, I was impressed. Here was an organisation where the culture of participation was clearly infused all the way up to the top.

Not that culture can be imposed from the top – nor can you rely on it percolating up from the bottom[3] – but these two C-level execs were not only modelling the behaviour they expected from the rest of their team, but also seemed, from the brief discussion we had about the process afterwards, to be truly invested in it. If you can get management to buy into the process – and to be seen to buy in – you are at least likely to have problems with other groups finding plausible excuses to keep their distance and get away with it.

So let’s say that management believes that you should give DevOps a go. Where do you start?

Developers, tick?[5]

Developers may well be your easiest target group. Developers are often keen to try new things, and to find ways to move things along faster, so they are often the group that can be expected to adopt new technologies and methodologies. DevOps has arguably been mainly driven by the development community. But you shouldn’t assume that all developers will be keen to embrace this change. For some, the way things have always been done – your Rick Parfitts of dev, if you will[7] – is fine. Finding ways to help them work efficiently in the new world is part of your job, not just theirs. If you have superstar developers who aren’t happy with change, you risk alienating them and losing them if you try to force them into your brave new world. What’s worse, if they dig their heels in, you risk the adoption of your DevSecOps vision being compromised when they explain to their managers that things aren’t going to change if it makes their lives more difficult and reduces their productivity.

Maybe you’re not going to be able to move all the systems and people to DevOps immediately. Maybe you’re going to need to choose which apps start with, and who will be your first DevOps champions. Maybe it’s time to move slowly.

Not maybe: definitely

No – I lied. You’re definitely going to need to move slowly. Trying to change everything at once is a recipe for disaster.

This goes for all elements of the change – which people to choose, which technologies to choose, which applications to choose, which user base to choose, which use cases to choose – bar one. For all of those elements, if you try to move everything in one go, you will fail. You’ll fail for a number of reasons. You’ll fail for reasons I can’t imagine, and, more importantly, for reasons you can’t imagine, but some of the reasons will include:

people – most people – don’t like change;
technologies don’t like change (you can’t just switch and expect everything to work still);
applications don’t like change (things worked before, or at least failed in known ways: you want to change everything in one go? Well, they’ll all fail in new and exciting[9] ways;
users don’t like change;
use cases don’t like change.

The one exception

You noticed that, above, I wrote “bar one”, when discussing which elements you shouldn’t choose to change all in one go? Well done.

What’s that exception? It’s the initial team. When you choose your initial application to change, and you’re thinking about choosing the team to make that change, select the members carefully, and select a complete set. This is important. If you choose just developers, just test folks, or just security folks, or just ops folks, or just management, then you won’t actually have proved anything at all. If you leave out one functional group from your list, you won’t actually have proved anything at all. Well, you might have proved to a small section of your community that it kind of works, but you’ll have missed out on a trick. And that trick is that if you choose keen people from across your functional groups, it’s much harder to fail.

Say that your first attempt goes brilliantly. How are you going to convince other people to replicate your success and adopt DevOps? Well, the company newsletter, of course. And that will convince how many people, exactly? Yes, that number[12]. If, on the other hand, you have team members from across the functional parts or the organisation, then when you succeed, they’ll tell their colleagues, and you’ll get more buy-in next time.

If, conversely, it fails, well, if you’ve chosen your team wisely, and they’re all enthusiastic, and know that “fail often, fail fast” is good, then they’ll be ready to go again.

So you need to choose enthusiasts from across your functional groups. They can work on the technologies and the process, and once that’s working, it’s the people who will create that cultural change. You can just sit back and enjoy. Until the next crisis, of course.

1 – OK, you’re right. It should be “with which we techies tend to struggle”[2]

2 – you thought I was going to qualify that bit about techies struggling with touchy-feely stuff, didn’t you? Read it again: I put “tend to”. That’s the best you’re getting.

3 – is percolating a bottom-up process? I don’t drink coffee[4], so I wouldn’t know.

4 – do people even use percolators to make coffee anymore? Feel free to let me know in the comments. I may pretend interest if you’re lucky.

5 – for US readers (and some other countries, maybe?), please substitute “tick” for “check” here[6].

6 – for US techie readers, feel free to perform “s/tick/check/;”.

7 – this is a Status Quo[8] reference for which I’m extremely sorry.

8 – for Millennial readers, please consult your favourite online reference engine or just roll your eyes and move on.

9 – for people who say, “but I love excitement”, trying being on call at 2am on a Sunday morning at end of quarter when your Chief Financial Officer calls you up to ask why all of last month’s sales figures have been corrupted with the letters “DEADBEEF”[10].

10 – for people not in the know, this is a string often used by techies as test data because a) it’s non-numerical; b) it’s numerical (in hexadecimal); c) it’s easy to search for in debug files and d) it’s funny[11].

11 – though see [9].

12 – it’s a low number, is all I’m saying.

If it isn’t tested, it doesn’t work

Testing isn’t just coming up with tests for desired use cases.

Huh. Shouldn’t that title be “If it isn’t tested, it’s not going to work”?

No.

I’m asserting something slightly different here – in fact, two things. The first can be stated thus:

“In order for a system to ‘work’ correctly, and to defined parameters, test cases for all plausible conditions must be documented, crafted – and passed – before the system is considered to ‘work’.”

The second is a slightly more philosophical take on the question of what a “working system” is:

“An instantiated system – including software, hardware, data and wetware[1] components – may be considered to be ‘working’ if both its current state, and all known plausible future states from the working state have been anticipated, documented and appropriately tested.”

Let’s deal with these one by one, starting with the first[3].

Case 1 – a complete test suite

I may have given away the basis for my thinking by the phrasing in the subtitle above. What I think we need to be looking for, when we’re designing a system, is what we should be doing ensuring that we have a test case for every plausible condition. I considered “possible” here, but I think that may be going too far: for most systems, for instance, you don’t need to worry too much about meteor strikes. This is an extension of the Agile methodology dictum: “a feature is not ‘done’ until it has a test case, and that test case has been passed.” Each feature should be based on a use case, and a feature is considered correctly implemented when the test cases that are designed to test that feature are all correctly passed.

It’s too easy, however, to leave it there. Defining features is, well not easy, but something we know how to do. “When a user enters enters a valid username/password combination, the splash-screen should appear.” “When a file has completed writing, a tick should appear on the relevant icon.” “If a user cancels the transaction, no money should be transferred between accounts.” The last is a good one, in that it deals with an error condition. In fact, that’s the next step beyond considering test cases for features that implement functionality to support actions that are desired: considering test cases to manage conditions that arise from actions that are undesired.

The problem is that many people, when designing systems, only consider one particular type of undesired action: accidental, non-malicious action. This is the reason that you need to get security folks[4] in when you’re designing your system, and the related test cases. In order to ensure that you’re reaching all plausible conditions, you need to consider intentional, malicious actions. A system which has not considered these and test for these cannot, in my opinion, be said truly to be “working”.

Case 2 – the bigger systems picture

I write fairly frequently[5] about the importance of systems and systems thinking, and one of the interesting things about a system, from my point of view, is that it’s arguably not really a system until it’s up and running: “instantiated”, in the language I used in my definition above.

Case 2 dealt, basically, with test cases and the development cycle. That, by definition, is before you get to a fully instantiated system: one which is operating in the environment for which it was designed – you really, really hope – and is in situ. Part of it may be quiescent, and that is hopefully as designed, but it is instantiated.

A system has a current state; it has a set of defined (if not known[7]) past states; and a set of possible future states that it can reach from there. Again, I’m not going to insist that all possible states should be considered, for the same reasons I gave above, but I think that we do need to talk about all known plausible future states.

These types of conditions won’t all be security-related. Many of them may be more appropriately thought of as to do with assurance or resilience. But if you don’t get the security folks in, and early in the planning process, then you’re likely to miss some.

Here’s how it works. If I am a business owner, and I am relying on a system to perform the tasks for which it was designed, then I’m likely to be annoyed if some IT person comes to me and says “the system isn’t working”. However, if, in response to my question, “and did it fail due to something we had considered in our design and deployment of the system” is “yes”, then I’m quite lightly to move beyond annoyed to a state which, if we’re honest, the IT person could easily have considered, nay predicted, and which is closer to “incandescent” than “contented”[8].

Because if we’d considered a particular problem – it was “known”, and “plausible” – then we should have put in place measures to deal with it. Some of those will be preventative measures, to stop the bad thing happening in the first place, and others will be mitigations, to deal with the effects of the bad thing that happened. And there may also be known, plausible states for which we may consciously decide not to prepare. If I’m a small business owner in Weston-super-mare[9], then I may be less worried about industrial espionage than if I’m a multi-national[10]. Some risks aren’t worth the bother, and that’s fine.

To be clear: the mitigations that we prepare won’t always be technical. Let’s say that we come up with a scenario where an employee takes data from the system on a USB stick and gives it to a competitor. It may be that we can’t restrict all employees from using USB sticks with the system, so we have to rely on legal recourse if that happens. If, in that case, we call in the relevant law enforcement agency, then the system is working as designed if that was our plan to deal with this scenario.

Another point is that not all future conditions can be reached from the current working state, and if they can’t, then it’s fair to decide not to deal with them. Once a TPM is initialised, for instance, taking it back to its factory state basically requires to reset it, so any system which is relying on it has also been reset.

What about the last bit of my definition? “…[A]nticipated, documented and appropriately tested.” Well, you can’t test everything fully. Consider that the following scenarios are all known and plausible for your system:

a full power-down for your entire data centre;
all of your workers are incapacitate by a ‘flu virus;
your main sysadmin is kidnapped;
an employee takes data from the system on a USB stick and gives it to a competitor.

You’re really not going to want to test all of these. But you can at least perform paper exercises to consider what steps you should take, and also document them. You might ensure that you know which law enforcement agency to call, and what the number is, for instance, instead of actually convincing an employee to leak information to a competitor and then having them arrested[11].

Conclusion

Testing isn’t just coming up with tests for desired use cases. It’s not even good enough just to prepare for accidental undesired use cases on top of that. We need to consider malicious use cases, too. And testing in development isn’t good enough either: we need to test with live systems, in situ. Because if we don’t, something, somewhere, is going to go wrong.

And you really don’t want to be the person telling your boss that, “well, we thought it might, but we never tested it.”

1 – “wetware” is usually defined as human components of a system (as here), but you might have non-human inputs (from animals or aliens), or even from fauna[2], I suppose.

2 – “woodware”?

3 – because I, for one, need a bit of a mental run-up to the second one.

4 – preferably the cynical, suspicious types.

5 – if not necessarily regularly: people often confuse the two words. A regular customer may only visit once a year, but always does it on the same day, whereas a frequent customer may visit on average once a week, but may choose a different day each week.[6]

6 – how is this relevant? It’s not.

7 – yes, I know: Schrödinger’s cat, quantum effects, blah, blah.

8 – Short version: if the IT person says “it broke, and it did it in a way we had thought of before”, then I’m going to be mighty angry.

9 – I grew up nearby. Windy, muddy, donkeys.

10 – which might, plausibly, also be based in Weston-super-mare, though I’m not aware of any.

11 – this is, I think, probably at least bordering on the unethical, and might get you in some hot water with your legal department, and possibly some other interested parties[12].

12 – your competitor might be pleased, though, so there is that.

Security patching and vaccinations: a surprising link

Learning from medicine, but recognising differences.

I’ve written a couple of times before about patching, and in one article (“The Curious Incident of the Patch in the Night-Time“), I said that I’d return to the question of how patches and vaccinations are similar. Given the recent flurry of patching news since Meltdown and Spectre, I thought that now would be a good time to do that.

Now, one difference that I should point out up front is that nobody believes that applying security patches to your systems will give them autism[1]. Let’s counter that with the first obvious similarity, though: patching your systems makes them resistant to attacks based on particular vulnerabilities. Equally, a particular patch may provide resistance to multiple types of attack of the same family, as do some vaccinations. Also similarly, as new attacks emerge – or bacteria or viruses change and evolve – new patches are likely to be required to deal with the problem.

We shouldn’t overplay the similarities, of course. Just because some types of malware are referred to as “viruses” doesn’t mean that their method of attack, or the mechanisms by which computer systems defend against them, are even vaguely alike[2]. Computer systems don’t have complex immune systems which adapt and learn how to deal with malware[3]. On the other hand, there are also lots of different types of vulnerability for which patches are efficacious which are very different to bacterial or virus attacks: a buffer overflow attack or SQL injection, for instance. So, it’s clearly possible to over-egg this pudding[4]. But there is another similarity that I do think is worth drawing, though it’s not perfect.

There are some systems which, for whatever reason, it is actually quite risky to patch. This is because of the business risk associated with patching them, and might be down to a number of factors, including:

projected downtime as the patch is applied and system rebooted is unacceptable;
side effects of the patch (e.g. performance impact) are too severe;
risk of the system not rebooting after patch application is too high;
other components of the system (e.g. hardware or other software) may be incompatible with the patch.

In these cases, a decision may be made that the business risk of patching the system outweighs the business risk of leaving it unpatched. Alternatively, it may be that you are running some systems which are old and outdated, and for which there is no patch available.

Here’s where there’s another surprising similarity with vaccinations. There are, in any human population, individuals for whom the dangers of receiving a vaccination may outweigh the benefits. The reasons for this are different from the computer case, and are generally down to weakened immune systems and/or poor health. However, it turns out that as the percentage of a human population[6] that is vaccinated rises, the threat to the unvaccinated individuals reduces, as there are fewer infection vectors from whom those individuals can receive the infection.

We need to be careful with how closely we draw the analogy here, because we’re on shaky ground if we go too far, but there are types of system vulnerability – particularly malware – for which this is true for computer systems. If you patch all the systems that you can, then the number of possible “jump-off” points for malware will reduce, meaning that the unpatched systems are less likely to be affected. To a lesser degree, it’s probably true that as unsophisticated attackers notice that a particular attack vector is diminishing, they’ll ignore it and move to something else. Over-stretching this thread, however, is particularly dangerous: a standard approach for any motivated attacker is to attempt attack vectors which are “old”, but to which unpatched systems are likely to be vulnerable.

Another difference is that in the computing world, attacks never die off. Though there are stockpiles of viruses and bacteria which are extinct in the general population which are maintained for various reasons[7], some will die out over time. In the world of IT, pretty much every vulnerability ever discovered will have been documented somewhere, whether there still exists an “infected” system or not, and so is still available for re-use or re-purposing.

What is the moral of this article? Well, it’s this: even if you are unable to patch all of your systems, it’s still worth patching as many of them as you can. It’s also worth considering whether there are some low-risk systems that you can patch immediately, and which require less business analysis before deciding whether they can be patched in a second or third round of patching. It’s probably worth keeping a list of these somewhere. Even better, you can maintain lists of high-, medium- and low-risk systems – both in terms of business risk and infection vulnerability – and use this to inform your patching, both automatic and manual. But, dear reader: do patch.

1 – if you believe that – or, in fact, if you believe that vaccinations give children autism – then you’re reading the wrong blog. I seriously suggest that you go elsewhere (and read some proper science on the subject).

2 – pace the attempts of Hollywood CGI departments to make us believe that they’re exactly the same.

3 – though this is obviously an interesting research area.

4 – “overextend this analogy”. The pudding metaphor is a good one though, right?[5]

5 – and I like puddings, as my wife (and my waistline) will testify.

6 – or, come to think of it, animal (I’m unclear on flora).

7 – generally, one hopes, philanthropic.

Top 5 resolutions for security folks – 2018

Yesterday, I wrote some jokey resolutions for 2018 – today, as it’s a Tuesday, my regular day for posts, I decided to come up with some real ones.

1 – Embrace the open

I’m proud to have been using Linux[1] and other open source software for around twenty years now. Since joining Red Hat in 2016, and particularly since I started writing for Opensource.com, I’ve become more aware of other areas of open-ness out there, from open data to open organisations. There are still people out there who are convinced that open source is less secure than proprietary software. You’ll be unsurprised to discover that I disagree. I encourage everyone to explore how embracing the open can benefit them and their organisations.

2 – Talk about risk

I’m convinced that we talk too much about security for security’s sake, and not about risk, which is what most “normal people” think about. There’s education needed here as well: of us, and of others. If we don’t understand the organisations we’re part of, and how they work, we’re not going to be able to discuss risk sensibly. In the other direction, we need to be able to talk about security a bit, in order to explain how it will mitigate risk, so we need to learn how to do this in a way that informs our colleagues, rather than alienating them.

3 – Think about systems

I don’t believe that we[2] talk enough about systems. We spend a lot of our time thinking about functionality and features, or how “our bit” works, but not enough about how all the bits fit together. I don’t often link out to external sites or documents, but I’m going to make an exception for NIST special publication 800-160 “Systems Security Engineering: Considerations for a Multidisciplinary Approach in the Engineering of Trustworthy Secure Systems”, and I particularly encourage you to read Appendix E “Roles, responsibilities and skills: the characteristics and expectations of a systems security engineer”. I reckon this is an excellent description of the core skills and expertise required for anyone looking to make a career in IT security.

4 – Examine the point of conferences

I go to a fair number of conferences, both as an attendee and as a speaker – and also do my share of submission grading. I’ve written before about how annoyed I get (and I think others get) by product pitches at conferences. There are many reasons to attend the conferences, but I think it’s important for organisers, speakers and attendees to consider what’s most important to them. For myself, I’m going to try to ensure that what I speak about is what I think other people will be interested in, and not just what I’m focussed on. I’d also highlight the importance of the “hallway track”: having conversations with other attendees which aren’t necessarily directly related to the specific papers or talks. We should try to consider what conferences we need to attend, and which ones to allow to fall by the wayside.

5 – Read outside the IT security discipline

We all need downtime. One way to get that is to read – on an e-reader, online, on your phone, magazines, newspapers or good old-fashioned books. Rather than just pick up something directly related to work, choose something which is at least a bit off the beaten track. Whether it’s an article on a topic to do with your organisation’s business, a non-security part of IT[3], something on current affairs, or a book on a completely unrelated topic[4], taking the time to gain a different perspective on the world is always[5] worth it.

What have I missed?

I had lots of candidates for this list, and I’m sure that I’ve missed something out that you think should be in there. That’s what comments are for, so please share your thoughts.

1 GNU Linux.

2 the mythical IT community

3 – I know, it’s not going to be as sexy as security, but go with it. At least once.

4 – I’m currently going through a big espionage fiction phase. Which is neither here nor there, but hey.

5 – well, maybe almost always.

Who’s saying “hello”? – agency, intent and AI

Who is saying “hello world?”: you, or the computer?

I don’t yet have one of those Google or Amazon talking speaker thingies in my house or office. A large part of this is that I’m just not happy about the security side: I know that the respective companies swear that they’re only “listening” when you say the device’s trigger word, but even if that’s the case, I like to pretend[1] that I have at least some semblance of privacy in my life. Another reason, however, is that I’m not sure that I like what happens to people when they pretend that there’s a person listening to them, but it’s really just a machine.

It’s not just Alexa and the OK, Google persona, however. When I connect to an automated phone-answering service, I worry when I hear “I’ll direct your call” from a non-human. Who is “I”? “We’ll direct your call” is better – “we” could be the organisation with whom I’m interacting. But “I”? “I” is the pronoun that people use. When I hear “I”, I’m expecting sentience: if it’s a machine I’m expecting AI – preferably fully Turing-compliant.

There’s a more important point here, though. I’m entirely aware that there’s no sentience behind that “I”[2], but there’s an important issue about agency that we should unpack.

What, then, is “agency”? I’m talking about the ability of an entity to act on its or another’s behalf, and I touched on this this in a previous post, “Wow: autonomous agents!“. When somebody writes some code, what they’re doing is giving ability to the system that will run that code to do something – that’s the first part. But the agency doesn’t really occur, I’d say, until that code is run/instantiated/executed. At this point, I would argue, the software instance has agency.

But whose agency, exactly? For whom is this software acting?

Here are some answers. I honestly don’t think that any of them is right.

the person who owns the hardware (you own the Alexa hardware, right? You paid Amazon for it… Or what about running applications on the cloud?).
the person who started the software (you turned on the Alexa hardware, which started the software… And don’t forget software which is automatically executed in response to triggers or on a time schedule.)
the person who gave the software the instructions (what do you mean, “gave it the instructions”? Wrote its config file? Spoke to it? Set up initial settings? Typed in commands? And even if you gave it instructions, do you think that your OK Google hardware is implementing your wishes, or Google’s? For whom is it actually acting? And what side effects (like recording your search history and deciding what to suggest in your feed) are you happy to believe are “yours”?)
the person who installed the software (your phone comes with all sorts of software installed, but surely you are the one who imbues it with agency? If not, whom are you blaming: Google (for the Android apps) or Samsung (which actually put them on the phone)?)
the person who wrote the software (I think we’ve already dealt with this, but even then, is it a single person, or an organisation? What about open source software, which is typically written, compiled and documented by many different people? Ascribing “ownership” or “authorship” is a distinctly tricky (and intentionally tricky) issue when you discuss open source)

Another way to think of this problem is to ask: when you write and execute a program, who is saying “hello world?”: you, or the computer?

There are some really interesting questions that come out of this. Here are a couple that come to mind, which seem to me to be closely connected.

In the film Wargames[3], is the automatic dialling that Matthew Broderick’s character’s computer carries out an act with agency? Or is it when it connects to another machine? Or when it records the details of that machine? I don’t think anyone would argue that the computer is acting with agency once David Lightman actually gets it to complete a connection and interact with it, but what about before?
Google used to run automated programs against messages received as part of the Gmail service looking for information and phrases which it could use to serve ads. They were absolutely adamant that they, Google, weren’t doing the reading: it was just a computer program. I’m not sure how clear or safe a distinction that is.

Why does this all matter? Well, one of the more pressing reasons is because of self-driving cars. Whose fault is it when one goes wrong and injures or kills someone? What about autonomous defence systems?

And here’s the question that really interests – and vexes – me: is this different when the program which is executing can learn. I don’t even mean strong AI: just that it can change what it does based on the behaviour it “sees”, “hears” or otherwise senses. It feels to me that there’s a substantive difference between:

a) actions carried out at the explicit (asynchronous) request of a human operator, or according to sets of rules coded into a program

AND

b) actions carried out in response to rules that have been formed by the operation of the program itself. There is what I’d called synchronous intent within the program.

You can argue that b) has pretty much always been around, in basic forms, but it seems to me to be different when programs are being created with the expectation that humans will not necessarily be able to decode the rules, and where the intent of the human designers is to allow rulesets to be created in this way.

There is some discussion about at the moment as to how and/or whether rulesets generated by open source projects should be shared. I think the general feeling is that there’s no requirement for them to be – in the same way that material I write using an open source text editor shouldn’t automatically be considered open source – but open data is valuable, and finding ways to share it is a good idea, IMHO.

In Wargames, that is the key difference between the system as originally planned, and what it ends up doing: Joshua has synchronous intent.

I really don’t think this is all bad: we need these systems, and they’re going to improve our lives significantly. But I do feel that it’s important that you and I start thinking hard about what is acting for whom, and how.

Now, if you wouldn’t mind opening the Pod bay doors, HAL…[5]

1. and yes, I know it’s a pretense.

2. yet…

3. go on – re-watch it: you know you want to[4].

4. and if you’ve never watched it, then stop reading this article and go and watch it NOW.

5. I think you know the problem just as well as I do, Dave.

Explained: five misused security words

Untangling responsibility, authority, authorisation, authentication and identification.

I took them out of the title, because otherwise it was going to be huge, with lots of polysyllabic words. You might, therefore, expect a complicated post – but that’s not my intention*. What I’d like to do it try to explain these five important concepts in security, as they’re often confused or bound up with one another. They are, however, separate concepts, and it’s important to be able to disentangle what each means, and how they might be applied in a system. Today’s words are:

responsibility
authority
authorisation
authentication
identification.

Let’s start with responsibility.

Responsibility

Confused with: function; authority.

If you’re responsible for something, it means that you need to do it, or if something goes wrong. You can be responsible for a product launching on time, or for the smooth functioning of a team. If we’re going to ensure we’re really clear about it, I’d suggest using it only for people. It’s not usually a formal description of a role in a system, though it’s sometimes used as short-hand for describing what a role does. This short-hand can be confusing. “The storage module is responsible for ensuring that writes complete transactionally” or “the crypto here is responsible for encrypting this set of bytes” is just a description of the function of the component, and doesn’t truly denote responsibility.

Also, just because you’re responsible for something doesn’t mean that you can make it happen. One of the most frequent confusions, then, is with authority. If you can’t ensure that something happens, but it’s your responsibility to make it happen, you have responsibility without authority***.

Authority

Confused with: responsibility, authorisation.

If you have authority over something, then you can make it happen****. This is another word which is best restricted to use about people. As noted above, it is possible to have authority but no responsibility*****.

Once we start talking about systems, phrases like “this component has the authority to kill these processes” really means “has sufficient privilege within the system”, and should best be avoided. What we may need to check, however, is whether a component should be given authorisation to hold a particular level of privilege, or to perform certain tasks.

Authorisation

Confused with: authority; authentication.

If a component has authorisation to perform a certain task or set of tasks, then it has been granted power within the system to do those things. It can be useful to think of roles and personae in this case. If you are modelling a system on personae, then you will wish to grant a particular role authorisation to perform tasks that, in real life, the person modelled by that role has the authority to do. Authorisation is an instantiation or realisation of that authority. A component is granted the authorisation appropriate to the person it represents. Not all authorisations can be so easily mapped, however, and may be more granular. You may have a file manager which has authorisation to change a read-only permission to read-write: something you might struggle to map to a specific role or persona.

If authorisation is the granting of power or capability to a component representing a person, the question that precedes it is “how do I know that I should grant that power or capability to this person or component?”. That process is authentication – authorisation should be the result of a successful authentication.

Authentication

Confused with: authorisation; identification.

If I’ve checked that you’re allowed to perform and action, then I’ve authenticated you: this process is authentication. A system, then, before granting authorisation to a person or component, must check that they should be allowed the power or capability that comes with that authorisation – that are appropriate to that role. Successful authentication leads to authorisation. Unsuccessful authentication leads to blocking of authorisation******.

With the exception of anonymous roles, the core of an authentication process is checking that the person or component is who he, she or it says they are, or claims to be (although anonymous roles can be appropriate for some capabilities within some systems). This checking of who or what a person or component is authentication, whereas the identification is the claim and the mapping of an identity to a role.

Identification

Confused with: authentication.

I can identify that a particular person exists without being sure that the specific person in front of me is that person. They may identify themselves to me – this is identification – and the checking that they are who they profess to be is the authentication step. In systems, we need to map a known identity to the appropriate capabilities, and the presentation of a component with identity allows us to apply the appropriate checks to instantiate that mapping.

Bringing it all together

Just because you know whom I am doesn’t mean that you’re going to let me do something. I can identify my children over the telephone*******, but that doesn’t mean that I’m going to authorise them to use my credit card********. Let’s say, however, that I might give my wife my online video account password over the phone, but not my children. How might the steps in this play out?

First of all, I have responsibility to ensure that my account isn’t abused. I also have authority to use it, as granted by the Terms and Conditions of the providing company (I’ve decided not to mention a particular service here, mainly in case I misrepresent their Ts&Cs).

“Hi, darling, it’s me, your darling wife*********. I need the video account password.” Identification – she has told me who she claims to be, and I know that such a person exists.

“Is it really you, and not one of the kids? You’ve got a cold, and sound a bit odd.” This is my trying to do authentication.

“Don’t be an idiot, of course it’s me. Give it to me or I’ll pour your best whisky down the drain.” It’s her. Definitely her.

“OK, darling, here’s the password: it’s il0v3myw1fe.” By giving her the password, I’ve performed authorisation.

It’s important to understand these different concepts, as they’re often conflated or confused, but if you can’t separate them, it’s difficult not only to design systems to function correctly, but also to log and audit the different processes as they occur.

*we’ll have to see how well I manage, however. I know that I’m prone to long-windedness**

**ask my wife. Or don’t.

***and a significant problem.

****in a perfect world. Sometimes people don’t do what they ought to.

*****this is much, much nicer than responsibility without authority.

******and logging. In both cases. Lots of logging. And possibly flashing lights, security guards and sirens on failure, if you’re into that sort of thing.

*******most of the time: sometimes they sound like my wife. This is confusing.

********neither should you assume that I’m going to let my wife use it, either.*********

*********not to suggest that she can’t use a credit card: it’s just that we have separate ones, mainly for logging purposes.

**********we don’t usually talk like this on the phone.

Why microservices are a security issue

Should you go about decomposing all of your legacy applications into microservices? Probably not. But given all of the benefits you can accrue, you might consider starting with your security functions.

I struggled with writing the title for this post, and I worry that it comes over as clickbait. If you’ve come to read this because it looked like clickbait, then sorry*. I hope you’ll stay anyway: there are lots of fascinating** posts and many*** footnotes. What I didn’t mean to suggest is that microservices cause security problems – though like any component, of course, they can – but that microservices are appropriate objects of interest to those involved with security. I’d go further than that: I think they are an excellent architectural construct for those concerned with security.

And why is that? Well, for those of us with a systems security bent, the world is an interesting place at the moment. We’re seeing a growth in distributed systems as bandwidth is cheap and latency low. Add to this the ease of deploying to the cloud, and more architects are beginning to realise that they can break up applications not just into multiple layers but also into multiple components within the layer. Load-balancers, of course, help with this when the various components in a layer are performing the same job, but the ability to expose different services as small components has led to a growth in the design, implementation and deployment of microservices.

So, what exactly is a microservice? I quite like the definition provided by Wikipedia, though it’s interesting that security isn’t mentioned there****. One of the points that I like about microservices is that, when well-designed, they conform to the first two points of Peter H. Salus’ description of the Unix philosophy:

Write programs that do one thing and do it well.
Write programs to work together.
Write programs to handle text streams, because that is a universal interface.

The last of the three is slightly less relevant, because the Unix philosophy is generally used to refer to standalone applications, which often have a command instantiation. It does, however, encapsulate one of the basic requirements of microservices: that they must have well-defined interfaces.

By “well-defined”, I don’t just mean a description of any externally-accessible APIs’ methods, but also of the normal operation of the microservice: inputs and outputs – and, if there are any, side-effects. As I’ve described in a previous post, Thinking like a (systems) architect, data and entity descriptions are crucial if you’re going to be able to design a system. Here, in our description of microservices, we get to see why these are so important, because for me the key defining feature of a microservices architecture is decomposability. And if you’re going to decompose***** your architecture, you need to be very, very clear which “bits” (components) are going to do what.

And here’s where security starts to come in. A clear description of what a particular component should be doing allows you to:

check your design;
ensure that your implementation meets the description;
come up with reusable unit tests to check functionality;
track mistakes in implementation and correct them;
test for unexpected outcomes;
monitor for misbehaviour;
audit actual behaviour for future scrutiny.

Now, are all these things possible in a larger architecture? Yes, they are. But they becoming increasingly difficult where entities are chained together – or combined in more complex configurations. Ensuring correct implementation and behaviour is much, much easier when you’ve got smaller pieces to work together. And deriving complex systems behaviours – and misbehaviours – is much more difficult if you can’t be sure that the individual components are doing what they ought to be.

It doesn’t stop here, however. As I’ve mentioned on many previous occasions in this blog, writing good security code is difficult*******. Proving that it does what it should do is even more so. There is every reason, therefore, to restrict code which has particular security requirements – password checking, encryption, cryptographic key management, authorisation, to offer a few examples – to small, well-defined blocks. You can then do all the things that I’ve mentioned above to try to make sure that it’s done correctly.

And yet there’s more. We all know that not everybody is great at writing security-related code. By decomposing your architecture such that all security-sensitive code is restricted to well-defined components, you get the chance to put your best security people on that, and restricting the danger of J. Random Coder******** putting something in which bypasses or downgrades a key security control.

It can also act as an opportunity for learning: it’s always good to be able to point to a design/implementation/test/monitoring tuple and say: “that’s how it should be done. Hear, read, mark, learn and inwardly digest*********.”

*well, a little bit – it’s always nice to have readers.

**I know they are: I wrote them.

***probably less fascinating.

****at the time of writing this article. It’s entirely possible that I – or one of you – may edit the article to change that.

*****this sounds like a gardening term, which is interesting. Not that I really like gardening, but still******.

******amusingly, I first wrote “…if you’re going to decompose your architect…”, which sounds like the strap-line for an IT-themed murder film.

*******regular readers may remember a reference to the excellent film “The Thick of It”.

********other generic personae exist: please take your pick.

*********not a cryptographic digest: I don’t think that’s what the original writers had in mind.

Isolationism

… what’s the fun in having an Internet if you can’t, well, “net” on it?

Sometimes – and I hope this doesn’t come as too much of a surprise to my readers – sometimes, there are bad people, and they do bad things with computers. These bad things are often about stopping the good things that computers are supposed to be doing* from happening properly. This is generally considered not to be what you want to happen**.

For this reason, when we architect and design systems, we often try to enforce isolation between components. I’ve had a couple of very interesting discussions over the past week about how to isolate various processes from each other, using different types of isolation, so I thought it might be interesting to go through some of the different types of isolation that we see out there. For the record, I’m not an expert on all different types of system, so I’m going to talk some history****, and then I’m going to concentrate on Linux*****, because that’s what I know best.

In the beginning

In the beginning, computers didn’t talk to one another. It was relatively difficult, therefore, for the bad people to do their bad things unless they physically had access to the computers themselves, and even if they did the bad things, the repercussions weren’t very widespread because there was no easy way for them to spread to other computers. This was good.

Much of the conversation below will focus on how individual computers act as hosts for a variety of different processes, so I’m going to refer to individual computers as “hosts” for the purposes of this post. Isolation at this level – host isolation – is still arguably the strongest type available to us. We typically talk about “air-gapping”, where there is literally an air gap – no physical network connection – between one host and another, but we also mean no wireless connection either. You might think that this is irrelevant in the modern networking world, but there are classes of usage where it is still very useful, the most obvious being for Certificate Authorities, where the root certificate is so rarely accessed – and so sensitive – that there is good reason not to connect the host on which it is stored to be connected to any other computer, and to use other means, such as smart-cards, a printer, or good old pen and paper to transfer information from it.

And then…

And then came networks. These allow hosts to talk to each other. In fact, by dint of the Internet, pretty much any host can talk to any other host, given a gateway or two. So along came network isolation to try to stop tha. Network isolation is basically trying to re-apply host isolation, after people messed it up by allowing hosts to talk to each other******.

Later, some smart alec came up with the idea of allowing multiple processes to be on the same host at the same time. The OS and kernel were trusted to keep these separate, but sometimes that wasn’t enough, so then virtualisation came along, to try to convince these different processes that they weren’t actually executing alongside anything else, but had their own environment to do their old thing. Sadly, the bad processes realised this wasn’t always true and found ways to get around this, so hardware virtualisation came along, where the actual chips running the hosts were recruited to try to convince the bad processes that they were all alone in the world. This should work, only a) people don’t always program the chips – or the software running on them – properly, and b) people decided that despite wanting to let these processes run as if they were on separate hosts, they also wanted them to be able to talk to processes which really were on other hosts. This meant that networking isolation needed to be applied not just at the host level, but at the virtual host level, as well******.

A step backwards?

Now, in a move which may seem retrograde, it occurred to some people that although hardware virtualisation seemed like a great plan, it was also somewhat of a pain to administer, and introduced inefficiencies that they didn’t like: e.g. using up lots of RAM and lots of compute cycles. These were often the same people who were of the opinion that processes ought to be able to talk to each other – what’s the fun in having an Internet if you can’t, well, “net” on it? Now we, as security folks, realise how foolish this sounds – allowing processes to talk to each other just encourages the bad people, right? – but they won the day, and containers came along. Containers allow lots of processes to be run on a host in a lightweight way, and rely on kernel controls – mainly namespaces – to ensure isolation********. In fact, there’s more you can do: you can use techniques like system call trapping to intercept the things that processes are attempting and stop them if they look like the sort of things they shouldn’t be attempting*********.

And, of course, you can write frameworks at the application layer to try to control what the different components of an application system can do – that’s basically the highest layer, and you’re just layering applications on applications at this point.

Systems thinking

So here’s where I get to the chance to mention one of my favourite topics: systems. As I’ve said before, by “system” here I don’t mean an individual computer (hence my definition of host, above), but a set of components that work together. The thing about isolation is that it works best when applied to a system.

Let me explain. A system, at least as I’d define it for the purposes of this post, is a set of components that work together but don’t have knowledge of external pieces. Most important, they don’t have knowledge of different layers below them. Systems may impose isolation on applications at higher layers, because they provide abstractions which allow higher systems to be able to ignore them, but by virtue of that, systems aren’t – or shouldn’t be – aware of the layers below them.

A simple description of the layers – and it doesn’t always hold, partly because networks are tricky things, and partly because there are various ways to assemble the stack – may look like this.

Application (top layer)
Container
System trapping
Kernel
Hardware virtualisation
Networking
Host (bottom layer)

As I intimated above, this is a (gross) simplification, but the point holds that the basic rule is that you can enforce isolation upwards in the layers of the stack, but you can’t enforce it downwards. Lower layer isolation is therefore generally stronger than higher layer isolation. This shouldn’t come as a huge surprise to anyone who’s used to considering network stacks – the principle is the same – but it’s helpful to lay out and explain the principles from time to time, and the implications for when you’re designing and architecting.

Because if you are considering trust models and are defining trust domains, you need to be very, very careful about defining whether – and how – these domains spread across the layer boundaries. If you miss a boundary out when considering trust domains, you’ve almost certainly messed up, and need to start again. Trust domains are important in this sort of conversation because the boundaries between trust domains are typically where you want to be able to enforce and police isolation.

The conversations I’ve had recently basically ran into problems because what people really wanted to do was apply lower layer isolation from layers above which had no knowledge of the bottom layers, and no way to reach into the control plane for those layers. We had to remodel, and I think that we came up with some sensible approaches. It was as I was discussing these approaches that it occurred to me that it would have been a whole lot easier to discuss them if we’d started out with a discussion of layers: hence this blog post. I hope it’s useful.

*although they may well not be, because, as I’m pretty sure I’ve mentioned before on this blog, the people trying to make the computers do the good things quite often get it wrong.

**unless you’re one of the bad people. But I’m pretty sure they don’t read this blog, so we’re OK***.

***if you are a bad person, and you read this blog, would you please mind pretending, just for now, that you’re a good person? Thank you. It’ll help us all sleep much better in our beds.

****which I’m absolutely going to present in an order that suits me, and generally neglect to check properly. Tough.

*****s/Linux/GNU Linux/g; Natch.

******for some reason, this seemed like a good idea at the time.

*******for those of you who are paying attention, we’ve got to techniques like VXLAN and SR-IOV.

********kernel purists will try to convince you that there’s no mention of containers in the Linux kernel, and that they “don’t really exist” as a concept. Try downloading the kernel source and doing a search for “container” if you want some ammunition to counter such arguments.

*********this is how SELinux works, for instance.

Thinking like a (systems) architect

“…I know it when I see it, and the motion picture involved in this case is not that.” – Mr Justice Stewart.

My very first post on this blog, some six months ago*, was entitled “Systems security – why it matters“, and it turns out that posts where I talk** about architecture are popular, so I thought I’d go a bit further into what I think being an architect is about. First of all, I’d like to reference two books which helped me come to some sort of understanding about the art of being an architect. I read them a long time ago****, but I still dip into them from time to time. I’m going to link to the publisher’s***** website, rather than to any particular bookseller:

97 Things Every Software Architect Should Know, by Richard Monson-Haefel
Beautiful Architecture: Leading Thinkers Reveal the Hidden Beauty in Software Design, by Diomidis Spinellis , Georgios Gousios

What’s interesting about them is that they both have multiple points of view expressed in them: some contradictory – even within each book. And this rather reflects the fact that I believe that being a systems architect is an art, or a discipline. Different practitioners will have different views about it. You can talk about Computer Science being a hard science, and there are parts of it which are, but much of software engineering (lower case intentional) goes beyond that. The same, I think, is even more true for systems architecture: you may be able to grok what it is once you know it, but it’s very difficult to point to something – even a set of principles – and say, “that is systems architecture”. Sometimes, the easiest way to define something is by defining what it’s not: search for “I know it when I see it, and the motion picture involved in this case is not that.”******

Let me, however, try to give some examples of the sort of things you should expect to see when someone (or a group of people) is doing good systems architecture:

pictures: if you can’t show the different components of a system in a picture, I don’t believe that you can fully describe what each does, or how they interact. If you can’t separate them out, you don’t have a properly described system, so you have no architecture. I know that I’m heavily visually oriented, but for me this feels like a sine qua non********.
a data description: if you don’t know what data is in your system, you don’t know what it does.
an entity description: components, users, printers, whatever: you need to know what’s doing what so that you can describe what the what is that’s being done to it, and what for*********.
an awareness of time: this may sound like a weird one, but all systems (of any use) process data through time. If you don’t think about what will change, you won’t understand what will do the changing, and you won’t be able to consider what might go wrong if things get changed in ways you don’t expect, or by components that shouldn’t be doing the changing in the first place.
some thinking on failure modes: I’ve said it before, and I’ll say it again: “things will go wrong.” You can’t be expected to imagine all the things that might go wrong, but you have a responsibility to consider what might happen to different components and data – and therefore to the operation of the system of the whole – if********** they fall over.

There are, of course, some very useful tools and methodologies (the use of UML views is a great example) which can help you with all of these. But you don’t need to be an expert in all of them – or even any one of them – to be a good systems architect.

One last thing I’d add, though, and I’m going to call it the “bus and amnesiac dictum”***********:

In six months’ time, you’ll have forgotten the details or been hit by a bus: document it. All of it.

You know it makes sense.

*this note is for nothing other than to catch those people who go straight to this section, hoping for a summary of the main article. You know who you are, and you’re busted.

**well, write, I guess, but it feels like I’m chatting at people, so that’s how I think of it***

***yes, I’m going out of my way to make the notes even less info-filled than usual. Deal with it.

****seven years is a long time, right?

*****almost “of course”, though I believe they’re getting out of the paper book publishing biz, sadly.

******this is a famous comment from a case called from the U.S. called “Jacobellis v. Ohio” which I’m absolutely not going to quote in full here, because although it might generate quite a lot of traffic, it’s not the sort of traffic I want on this blog*******.

*******I did some searching found the word: it’s apophasis. I love that word. Discovered it during some study once, forgot it. Glad to have re-found it.

********I know: Greek and Latin in one post. Sehr gut, ja?

********.*I realise that this is a complete mess of a sentence. But it does have a charm to it, yes? And you know what it means, you really do.

**********when. I meant “when”.

***********more Latin.

The simple things, sometimes…

I (re-)learned an important lesson this week: if you’re an attacker, start at the front door.

This week I’ve had an interesting conversation with an organisation with which I’m involved*. My involvement is as a volunteer, and has nothing to do with my day job – in other words, I have nothing to do with the organisation’s security. However, I got an email from them telling me that in order to perform a particular action, I should now fill in an online form, which would then record the information that they needed.

So this week’s blog entry will be about entering information on an online form. One of the simplest tasks that you might want to design – and secure – for any website. I wish I could say that it’s going to be a happy tale.

I had look at this form, and then I looked at the URL they’d given me. It wasn’t a fully qualified URL, in that it had no protocol component, so I copied and pasted it into a browser to find out what would happen. I had a hope that it might automatically redirect to an https-served page. It didn’t. It was an http-served page.

Well, not necessarily so bad, except that … it wanted some personal information. Ah.

So, I cheated: I changed the http:// … to an https:// and tried again**. And got an error. The certificate was invalid. So even if they changed the URL, it wasn’t going to help.

So what did I do? I got in touch with my contact at the organisation, advising them that there was a possibility that they might be in breach of their obligations under Data Protection legislation.

I got a phone call a little later. Not from a technical person – though there was a techie in the background. They said that they’d spoken with the IT and security departments, and that there wasn’t a problem. I disagreed, and tried to explain.

The first argument was whether there was any confidential information being entered. They said that there was no linkage between the information being entered and the confidential information held in a separate system (I’m assuming database). So I stepped back, and asked about the first piece of information requested on the form: my name. I tried a question: “Could the fact that I’m a member of this organisation be considered confidential in any situation?”

“Yes, it could.”

So, that’s one issue out of the way.

But it turns out that the information is stored encrypted on the organisation’s systems. “Great,” I said, “but while it’s in transit, while it’s being transmitted to those systems, then somebody could read it.”

And this is where communication stopped. I tried to explain that unless the information from the form is transmitted over https, then people could read it. I tried to explain that if I sent it over my phone, then people at my mobile provider could read it. I tried a simple example: I tried to explain that if I transmitted it from a laptop in a Starbucks, then people who run the Starbucks systems – or even possibly other Starbucks customers – could see it. But I couldn’t get through.

In the end, I gave up. It turns out that I can avoid using the form if I want to. And the organisation is of the firm opinion that it’s not at risk: that all the data that is collected is safe. It was quite clear that I wasn’t going to have an opportunity to argue this with their IT or security people: although I did try to explain that this is an area in which I have some expertise, they’re not going to let any Tom, Dick or Harry*** bother their IT people****.

There’s no real end to this story, other than to say that sometimes it’s the small stuff we need to worry about. The issues that, as security professionals, we feel are cut and dried, are sometimes the places where there’s still lots of work to be done. I wish it weren’t the case, because frankly, I’d like to spend my time educating people on the really tricky things, and explaining complex concepts around cryptographic protocols, trust domains and identity, but I (re-)learned an important lesson this week: if you’re an attacker, start at the front door. It’s probably not even closed: let alone locked.

*I’m not going to identify the organisation: it wouldn’t be fair or appropriate. Suffice to say that they should know about this sort of issue.

**I know: all the skillz!

***Or “J. Random User”. Insert your preferred non-specific identifier here.

****I have some sympathy with this point of view: you don’t want to have all of their time taken up by random “experts”. The problem is when there really _are_ problems. And the people calling them maybe do know their thing.