Explained: five misused security words

Untangling responsibility, authority, authorisation, authentication and identification.

I took them out of the title, because otherwise it was going to be huge, with lots of polysyllabic words.  You might, therefore, expect a complicated post – but that’s not my intention*.  What I’d like to do it try to explain these five important concepts in security, as they’re often confused or bound up with one another.  They are, however, separate concepts, and it’s important to be able to disentangle what each means, and how they might be applied in a system.  Today’s words are:

  • responsibility
  • authority
  • authorisation
  • authentication
  • identification.

Let’s start with responsibility.

Responsibility

Confused with: function; authority.

If you’re responsible for something, it means that you need to do it, or if something goes wrong.  You can be responsible for a product launching on time, or for the smooth functioning of a team.  If we’re going to ensure we’re really clear about it, I’d suggest using it only for people.  It’s not usually a formal description of a role in a system, though it’s sometimes used as short-hand for describing what a role does.  This short-hand can be confusing.  “The storage module is responsible for ensuring that writes complete transactionally” or “the crypto here is responsible for encrypting this set of bytes” is just a description of the function of the component, and doesn’t truly denote responsibility.

Also, just because you’re responsible for something doesn’t mean that you can make it happen.  One of the most frequent confusions, then, is with authority.  If you can’t ensure that something happens, but it’s your responsibility to make it happen, you have responsibility without authority***.

Authority

Confused with: responsibility, authorisation.

If you have authority over something, then you can make it happen****.  This is another word which is best restricted to use about people.  As noted above, it is possible to have authority but no responsibility*****.

Once we start talking about systems, phrases like “this component has the authority to kill these processes” really means “has sufficient privilege within the system”, and should best be avoided. What we may need to check, however, is whether a component should be given authorisation to hold a particular level of privilege, or to perform certain tasks.

Authorisation

Confused with: authority; authentication.

If a component has authorisation to perform a certain task or set of tasks, then it has been granted power within the system to do those things.  It can be useful to think of roles and personae in this case.  If you are modelling a system on personae, then you will wish to grant a particular role authorisation to perform tasks that, in real life, the person modelled by that role has the authority to do.  Authorisation is an instantiation or realisation of that authority.  A component is granted the authorisation appropriate to the person it represents.  Not all authorisations can be so easily mapped, however, and may be more granular.  You may have a file manager which has authorisation to change a read-only permission to read-write: something you might struggle to map to a specific role or persona.

If authorisation is the granting of power or capability to a component representing a person, the question that precedes it is “how do I know that I should grant that power or capability to this person or component?”.  That process is authentication – authorisation should be the result of a successful authentication.

Authentication

Confused with: authorisation; identification.

If I’ve checked that you’re allowed to perform and action, then I’ve authenticated you: this process is authentication.  A system, then, before granting authorisation to a person or component, must check that they should be allowed the power or capability that comes with that authorisation – that are appropriate to that role.  Successful authentication leads to authorisation.  Unsuccessful authentication leads to blocking of authorisation******.

With the exception of anonymous roles, the core of an authentication process is checking that the person or component is who he, she or it says they are, or claims to be (although anonymous roles can be appropriate for some capabilities within some systems).  This checking of who or what a person or component is authentication, whereas the identification is the claim and the mapping of an identity to a role.

Identification

Confused with: authentication.

I can identify that a particular person exists without being sure that the specific person in front of me is that person.  They may identify themselves to me – this is identification – and the checking that they are who they profess to be is the authentication step.  In systems, we need to map a known identity to the appropriate capabilities, and the presentation of a component with identity allows us to apply the appropriate checks to instantiate that mapping.

Bringing it all together

Just because you know whom I am doesn’t mean that you’re going to let me do something.  I can identify my children over the telephone*******, but that doesn’t mean that I’m going to authorise them to use my credit card********.  Let’s say, however, that I might give my wife my online video account password over the phone, but not my children.  How might the steps in this play out?

First of all, I have responsibility to ensure that my account isn’t abused.  I also have authority to use it, as granted by the Terms and Conditions of the providing company (I’ve decided not to mention a particular service here, mainly in case I misrepresent their Ts&Cs).

“Hi, darling, it’s me, your darling wife*********. I need the video account password.” Identification – she has told me who she claims to be, and I know that such a person exists.

“Is it really you, and not one of the kids?  You’ve got a cold, and sound a bit odd.”  This is my trying to do authentication.

“Don’t be an idiot, of course it’s me.  Give it to me or I’ll pour your best whisky down the drain.”  It’s her.  Definitely her.

“OK, darling, here’s the password: it’s il0v3myw1fe.”  By giving her the password, I’ve  performed authorisation.

It’s important to understand these different concepts, as they’re often conflated or confused, but if you can’t separate them, it’s difficult not only to design systems to function correctly, but also to log and audit the different processes as they occur.


*we’ll have to see how well I manage, however.  I know that I’m prone to long-windedness**

**ask my wife.  Or don’t.

***and a significant problem.

****in a perfect world.  Sometimes people don’t do what they ought to.

*****this is much, much nicer than responsibility without authority.

******and logging.  In both cases.  Lots of logging.  And possibly flashing lights, security guards and sirens on failure, if you’re into that sort of thing.

*******most of the time: sometimes they sound like my wife.  This is confusing.

********neither should you assume that I’m going to let my wife use it, either.*********

*********not to suggest that she can’t use a credit card: it’s just that we have separate ones, mainly for logging purposes.

**********we don’t usually talk like this on the phone.

The Curious Incident of the Patch in the Night-Time

Gregory: “The patch did nothing in the night-time.”
Holmes: “That was the curious incident.”

To misquote Sir Arthur Conan-Doyle:

Gregory (cyber-security auditor) “Is there any other point to which you would wish to draw my attention?”
Holmes: “To the curious incident of the patch in the night-time.”
Gregory: “The patch did nothing in the night-time.”
Holmes: “That was the curious incident.”

I considered a variety of (munged) literary titles to head up this blog, and settled on the one above or “We Need to Talk about Patching”.  Either way round, there’s something rotten in the state of patching*.

Let me start with what I hope is a fairly uncontroversial statement: “we all know that patches are important for security and stability, and that we should really take them as soon as they’re available and patch all of our systems”.

I don’t know about you, but I suspect you’re the same as me: I run ‘sudo dnf –refresh upgrade’** on my home machines and work laptop at least once every day that I turn them on.  I nearly wrote that when an update comes out to patch my phone, I take it pretty much immediately, but actually, I’ve been burned before with dodgy patches, and I’ll often have a check of the patch number to see if anyone has spotted any problems with it before downloading it. This feels like basic due diligence, particularly as I don’t have a “staging phone” which I could use to test pre-production and see if my “production phone” is likely to be impacted***.

But the overwhelming evidence from the industry is that people really don’t apply patches – including security patches – even though they understand that they ought to.  I plan to post another blog entry at some point about similarities – and differences – between patching and vaccinations, but let’s take as read, for now, the assumption that organisations know they should patch, and look at the reasons they don’t, and what we might do to improve that.

Why people don’t patch

Here are the legitimate reasons that I can think of for organisations not patching****.

  1. they don’t know about patches
    • not all patches are advertised well enough
    • organisations don’t check for patches
  2. they don’t know about their systems
    • incomplete knowledge of their IT estate
  3. legacy hardware
    • patches not compatible with legacy hardware
  4. legacy software
    • patches not  compatible with legacy software
  5. known impact with up-to-date hardware & software
  6. possible impact with up-to-date hardware & software

Some of these are down to the organisations, or their operating environment, clearly: 1b, 2, 3 and 4.  The others, however, are down to us as an industry.  What it comes down to is a balance of risk: the IT operations department doesn’t dare to update software with patches because they know that if the systems that they maintain go down, they’re in real trouble.  Sometimes they know there will be a problem (typically because they test patches in a staging environment of some type), and sometimes because they just don’t dare.  This may be because they are in the middle of their own software update process, and the combination of Operating System, middleware or integrated software updates with their ongoing changes just can’t be trusted.

What we can do

Here are some thoughts about what we as an industry can do to try to address this problem – or set of problems.

Staging

Staging – what is a staging environment for?  It’s for testing changes before they go into production, of course.  But what changes?  Changes to your software, or your suppliers’ software?  The answer has to be “both”, I think.  You may need separate estates so that you can look at changes of these two sets of software separately before seeing what combining them does, but in the end, it is the combination of the two that matters.  You may consider using the same estate at different times to test the different options, but that’s not an option for all organisations.

DevOps

DevOps shouldn’t just be about allowing agile development practices to become part of the software lifecycle: it should also be about allowing agile operational practices become a part of the software lifecycle.  DevOps can really help with patching strategy if you think of it this way.  Remember, in DevOps, everybody has responsibility.  So your DevOps pipeline the perfect way to test how changes in your software are affected by changes in the underlying estate.  And because you’re updating regularly, and have unit tests to check all the key functionality*****, any changes can be spotted and addressed quickly.

Dependencies

Patches sometimes have dependencies.  We should be clear when a patch requires other changes, resulting a large patchset, and when a large patchset just happens to be released because multiple patches are available.  Some dependencies may be outside the control of the vendor.  This is easier to test when your patch has dependencies on an underlying Operating System, for instance, but more difficult if the dependency is on the opposite direction.  If you’re the one providing the underlying update and the customer is using software that you don’t explicitly test, then it’s incumbent on you, I’d argue, to use some of the other techniques that I’ve outlined to help your customers understand likely impact.

Visibility of likely impact

One obvious option available to those providing patches is a good description of areas of impact.  You’d hope that everyone did this already, of course, but a brief line something like “this update is for the storage subsystem, and should affect only those systems using EXT3”, for instance, is a great help in deciding the likely impact of a patch.  You can’t always get it right – there may always be unexpected consequences, and vendors can’t test for all configurations.  But they should at least test all supported configurations…

Risk statements

This is tricky, and maybe political, but is it time that we started giving those customers who need it a little more detail about the likely impact of the changes within a patch?  It’s difficult to quantify, of course: a one-character change may affect 95% of the flows through a module, whereas what may seem like a simple functional addition to a customer may actually require thousands of lines of code.  But as vendors, we should have an idea of the impact of a change, and we ought to be considering how we expose that to customers.

Combinations

Beyond that, however, I think there are opportunities for customers to understand what the impact of not having accepted a previous patch is.  Maybe the risk of accepting patch A is low, but the risk of not accepting patch A and patch B is much higher.  Maybe it’s safer to accept patch A and patch C, but wait for a successor to patch B.  I’m not sure quite how to quantify this, or how it might work, but I think there’s grounds for research******.

Conclusion

Businesses have every right not to patch.  There are business reasons to balance the risk of patching against not patching.  But the balance is currently often tipped too far in direction of not patching.  Much too far.  And if we’re going to improve the state of IT security, we, the industry, need to do something about it.  By helping organisations with better information, by encouraging them to adopt better practices, by training them in how to assess risk, and by adopting better practices ourselves.

 


*see what I did there?

**your commands my vary.

***this almost sounds like a very good excuse for a second phone, though I’m not sure that my wife would agree.

****I’d certainly be interested to hear of others: please let me know via comments.

*****you do have these two things, right?  Because if you don’t, you’re really not doing DevOps.  Sorry.

******as soon as I wrote this, I realised that somebody’s bound to have done research on this issue.  Please let me know if you have: or know somebody who has.

 

Embracing fallibility

History repeats itself because no one was listening the first time. (Anonymous)

We’re all fallible.  You’re fallible, he’s fallible, she’s fallible, I’m fallible*.  We all get things wrong from time to time, and the generally accepted “modern” management approach is that it’s OK to fail – “fail early, fail often” – as long as you learn from your mistakes.  In fact, there’s a growing view that if you’d don’t fail, you can’t learn – or that your learning will be slower, and restricted.

The problem with some fields – and IT security is one of them – is that failing can be a very bad thing, with lots of very unexpected consequences.  This is particularly true for operational security, but the same can be the case for application, infrastructure or feature security.  In fact, one of the few expected consequences is that call to visit your boss once things are over, so that you can find out how many days*** you still have left with your organisation.  But if we are to be able to make mistakes**** and learn from them, we need to find ways to allow failure to happen without catastrophic consequences to our organisations (and our careers).

The first thing to be aware of is that we can learn from other people’s mistakes.  There’s a famous aphorism, supposedly first said by George Santayana and often translated as “Those who cannot learn from history are doomed to repeat it.”  I quite like the alternative:  “History repeats itself because no one was listening the first time.”  So, let’s listen, and let’s consider how to learn from other people’s mistakes (and our own).  The classic way of thinking about this is by following “best practices”, but I have a couple of problems with this phrase.  The first is that very rarely can you be certain that the context in which you’re operating is exactly the same as that of those who framed these practices.  The other – possibly more important – is that “best” suggests the summit of possibilities: you can’t do better than best.  But we all know that many practices can indeed be improved on.  For that reason, I rather like the alternative, much-used at Intel Corporation, which is “BKMs”: Best Known Methods.  This suggests that there may well be better approaches waiting to be discovered.  It also talks about methods, which suggests to me more conscious activities than practices, which may become unconscious or uncritical followings of others.

What other opportunities are open to us to fail?  Well, to return to a theme which is dear to my heart, we can – and must – discuss with those within our organisations who run the business what levels of risk are appropriate, and explain that we know that mistakes can occur, so how can we mitigate against them and work around them?  And there’s the word “mitigate” – another approach is to consider managed degradation as one way to protect our organisations***** from the full impact of failure.

Another is to embrace methodologies which have failure as a key part of their philosophy.  The most obvious is Agile Programming, which can be extended to other disciplines, and, when combined with DevOps, allows not only for fast failure but fast correction of failures.  I plan to discuss DevOps – and DevSecOps, the practice of rolling security into DevOps – in more detail in a future post.

One last approach that springs to mind, and which should always be part of our arsenal, is defence in depth.  We should be assured that if one element of a system fails, that’s not the end of the whole kit and caboodle******.  That only works if we’ve thought about single points of failure, of course.

The approaches above are all well and good, but I’m not entirely convinced that any one of them – or a combination of them – gives us a complete enough picture that we can fully embrace “fail fast, fail often”.  There are other pieces, too, including testing, monitoring, and organisational cultural change – an important and often overlooked element – that need to be considered, but it feels to me that we have some way to go, still.  I’d be very interested to hear your thoughts and comments.

 


*my family is very clear on this point**.

**I’m trying to keep it from my manager.

***or if you’re very unlucky, minutes.

****amusingly, I first typed this word as “misteaks”.  You’ve got to love those Freudian slips.

*****and hence ourselves.

******no excuse – I just love the phrase.

 

 

Service degradation: actually a good thing

…here’s the interesting distinction between the classic IT security mindset and that of “the business”: the business generally want things to keep running.

Well, not all the time, obviously*.  But bear with me: we spend most of our time ensuring that all of our systems are up and secure and working as expected, because that’s what we hope for, but there’s a real argument for not only finding out what happens when they don’t, and not just planning for when they don’t, but also planning for how they shouldn’t.  Let’s start by examining some techniques for how we might do that.

Part 1 – planning

There’s a story** that the oil company Shell, in the 1970’s, did some scenario planning that examined what were considered, at the time, very unlikely events, and which allowed it to react when OPEC’s strategy surprised most of the rest of the industry a few years later.  Sensitivity modelling is another technique that organisations use at the financial level to understand what impact various changes – in order fulfilment, currency exchange or interest rates, for instance – make to the various parts of their business.  Yet another is war gaming, which the military use to try to understand what will happen when failures occur: putting real people and their associated systems into situations and watching them react.  And Netflix are famous for taking this a step further in the context of the IT world and having a virtual Chaos Monkey (a set of processes and scripts) which they use to bring down parts of their systems in real time to allow them to understand how resilient they the wider system is.

So that gives us four approaches that are applicable, with various options for automation:

  1. scenario planning – trying to understand what impact large scale events might have on your systems;
  2. sensitivity planning – modelling the impact on your systems of specific changes to the operating environment;
  3. wargaming – putting your people and systems through simulated events to see what happens;
  4. real outages – testing your people and systems with actual events and failures.

Actually going out of your way to sabotage your own systems might seem like insane behaviour, but it’s actually a work of genius.  If you don’t plan for failure, what are you going to do when it happens?

So let’s say that you’ve adopted all of these practices****: what are you going to do with the information?  Well, there are some obvious things you can do, such as:

  • removing discovered weaknesses;
  • improving resilience;
  • getting rid of single points of failure;
  • ensuring that you have adequately trained staff;
  • making sure that your backups are protected, but available to authorised entities.

I won’t try to compile an exhaustive list, because there are loads books and articles and training courses about this sort of thing, but there’s another, maybe less obvious, course of action which I believe we must take, and that’s plan for managed degradation.

Part 2 – managed degradation

What do I mean by that?  Well, it’s simple.  We***** are trained and indoctrinated to take the view that if something fails, it must always “fail to safe” or “fail to secure”.  If something stops working right, it should stop working at all.

There’s value in this approach, of course there is, and we’re paid****** to ensure everything is secure, right?  Wrong.  We’re actually paid to help keep the business running, and here’s the interesting distinction between the classic IT security mindset and that of “the business”: the business generally want things to keep running.  Crazy, right?  “The business” want to keep making money and servicing customers even if things aren’t perfectly secure!  Don’t they know the risks?

And the answer to that question is “no”.  They don’t know the risks.  And that’s our real job: we need to explain the risks and the mitigations, and allow a balancing act to take place.  In fact, we’re always making those trade-offs and managing that balance – after all, the only truly secure computer is one with no network connection, no keyboard, no mouse and no power connection*******.  But most of the time, we don’t need to explain the decisions we make around risk: we just take them, following best industry practice, regulatory requirements and the rest.  Nor are the trade-offs usually so stark, because when failure strikes – whether through an attack, accident or misfortune – it’s often a pretty simple choice between maintaining a particular security posture and keeping the lights on.  So we need to think about and plan for some degradation, and realise that on occasion, we may need to adopt a different security posture to the perfect (or at least preferred) one in which we normally operate.

How would we do that?  Well, the approach I’m advocating is best described as “managed degradation”.  We allow our systems – including, where necessary our security systems – to degrade to a managed (and preferably planned) state, where we know that they’re not operating at peak efficiency, but where they are operating.  Key, however, is that we know the conditions under which they’re working, so we understand their operational parameters, and can explain and manage the risks associated with this new posture.  That posture may change, in response to ongoing events, and the systems and our responses to those events, so we need to plan ahead (using the techniques I discussed above) so that we can be flexible enough to provide real resiliency.

We need to find modes of operation which don’t expose the crown jewels******** of the business, but do allow key business operations to take place.  And those key business operations may not be the ones we expect – maybe it’s more important to be able to create new orders than to collect payments for them, for instance, at least in the short term.  So we need to discuss the options with the business, and respond to their needs.  This planning is not just security resiliency planning: it’s business resiliency planning.  We won’t be able to consider all the possible failures – though the techniques I outlined above will help us to identify many of them – but the more we plan for, the better we will be at reacting to the surprises.  And, possibly best of all, we’ll be talking to the business, informing them, learning from them, and even, maybe just a bit, helping them understand that the job we do does have some value after all.


*I’m assuming that we’re the Good Guys/Gals**.

**Maybe less story than MBA*** case study.

***There’s no shame in it.

****Well done, by the way.

*****The mythical security community again – see past posts.

******Hopefully…

*******Preferably at the bottom of a well, encased in concrete, with all storage already removed and destroyed.

********Probably not the actual Crown Jewels, unless you work at the Tower of London.