I’ve visited a family member this week (see why in last week’s article), in a way which is allowed under the UK Covid-19 lockdown rules as an “exceptional circumstance”. When governments and civil authorities are managing what their citizens are allowed to do, many jurisdictions (including the UK, where I live), follow the general principle of “Everything which is not forbidden is allowed“. This becomes complicated when you’re putting in (hopefully short-term) restrictions on civil liberties such as disallowing general movement to visit family and friends, but in the general case, it makes a lot of sense. It allows for the principle of “management by exception”: rather than taking the approach that you check that every journey is allowed, you look out for disallowed journeys (taking an unnecessary trip to a castle in the north of England, for instance) and (hopefully) punish those who have undertaken them.
What astonishes me about the world of IT – and security in particular – is how often we take the opposite approach. We record every single log result and transfer it across the network just in case there’s a problem. We have humans in the chain for every software build, checking that the correct versions of containers have been used. When what we should be doing is involving humans – the expensive parts of the chain – only when they’re needed, and only sending lots of results across the network – the expensive part of that chain – when the system which is generating the logs is under attack, or has registered a fault.
That’s not to say that we shouldn’t be recording information, but that we should be intelligent about how we use it: which means that we should be automating. Automation allows us to manage – that is, apply the expensive operations in a chain – only when it is relevant. Having a list of allowed container images, and then asking the developer why she has chosen a non-standard option, is so, so much cheaper for the organisation, not to mention more interesting for the container expert, than monitoring every single build. Allowing the system generating logs to increase the amount of information sent when it realises its under attack, or to send it a command to up what it sends when a possible problem is noticed remotely – is more efficient than the alternative.
The other thing I’m not saying is that we should just ignore information that’s generated in normal cases, where operation is “nominal“. The growing opportunities to apply AI/ML techniques to this to allow us to realise what is outside normal operation, and become more sensitive to when we need to apply those expensive components in a system where appropriate, makes a lot of sense. Sometimes, statistical sampling is required, where we can’t expect all of the data to be provided to those systems (in the remote logging case, for instance), or designs of distributed systems with remote agents need to be designed.
What I want, as a human, is interesting opportunities to apply my expertise, where I can make a difference, rather than routine problems (if you have routine problems, you have broader, more concerning issues) which don’t test me, and which don’t make a broader difference to how the systems and processes I’m involved with run. That won’t happen unless I can be part of an organisation where management by exception is the norm.
One final thing that I should be clear about is that I’m also not talking about an approach where “everything which isn’t explicitly allowed is disallowed” – that doesn’t sound like a great approach for security (I may not be a huge fan of the term zero-trust, but I’m not that opposed to it). It’s the results of the decisions that we care about, on the whole, and where we can manage it, we just have to automate, given the amount of information that’s becoming available. Even worse than not managing by exception is doing nothing with the data at all!
It doesn’t happen often, but let’s realise that, on this occasion, we have something to learn from our governments, and manage by exception.