To misquote Sir Arthur Conan-Doyle:
- Gregory (cyber-security auditor) “Is there any other point to which you would wish to draw my attention?”
- Holmes: “To the curious incident of the patch in the night-time.”
- Gregory: “The patch did nothing in the night-time.”
- Holmes: “That was the curious incident.”
I considered a variety of (munged) literary titles to head up this blog, and settled on the one above or “We Need to Talk about Patching”. Either way round, there’s something rotten in the state of patching*.
Let me start with what I hope is a fairly uncontroversial statement: “we all know that patches are important for security and stability, and that we should really take them as soon as they’re available and patch all of our systems”.
I don’t know about you, but I suspect you’re the same as me: I run ‘sudo dnf –refresh upgrade’** on my home machines and work laptop at least once every day that I turn them on. I nearly wrote that when an update comes out to patch my phone, I take it pretty much immediately, but actually, I’ve been burned before with dodgy patches, and I’ll often have a check of the patch number to see if anyone has spotted any problems with it before downloading it. This feels like basic due diligence, particularly as I don’t have a “staging phone” which I could use to test pre-production and see if my “production phone” is likely to be impacted***.
But the overwhelming evidence from the industry is that people really don’t apply patches – including security patches – even though they understand that they ought to. I plan to post another blog entry at some point about similarities – and differences – between patching and vaccinations, but let’s take as read, for now, the assumption that organisations know they should patch, and look at the reasons they don’t, and what we might do to improve that.
Why people don’t patch
Here are the legitimate reasons that I can think of for organisations not patching****.
- they don’t know about patches
- not all patches are advertised well enough
- organisations don’t check for patches
- they don’t know about their systems
- incomplete knowledge of their IT estate
- legacy hardware
- patches not compatible with legacy hardware
- legacy software
- patches not compatible with legacy software
- known impact with up-to-date hardware & software
- possible impact with up-to-date hardware & software
Some of these are down to the organisations, or their operating environment, clearly: 1b, 2, 3 and 4. The others, however, are down to us as an industry. What it comes down to is a balance of risk: the IT operations department doesn’t dare to update software with patches because they know that if the systems that they maintain go down, they’re in real trouble. Sometimes they know there will be a problem (typically because they test patches in a staging environment of some type), and sometimes because they just don’t dare. This may be because they are in the middle of their own software update process, and the combination of Operating System, middleware or integrated software updates with their ongoing changes just can’t be trusted.
What we can do
Here are some thoughts about what we as an industry can do to try to address this problem – or set of problems.
Staging – what is a staging environment for? It’s for testing changes before they go into production, of course. But what changes? Changes to your software, or your suppliers’ software? The answer has to be “both”, I think. You may need separate estates so that you can look at changes of these two sets of software separately before seeing what combining them does, but in the end, it is the combination of the two that matters. You may consider using the same estate at different times to test the different options, but that’s not an option for all organisations.
DevOps shouldn’t just be about allowing agile development practices to become part of the software lifecycle: it should also be about allowing agile operational practices become a part of the software lifecycle. DevOps can really help with patching strategy if you think of it this way. Remember, in DevOps, everybody has responsibility. So your DevOps pipeline the perfect way to test how changes in your software are affected by changes in the underlying estate. And because you’re updating regularly, and have unit tests to check all the key functionality*****, any changes can be spotted and addressed quickly.
Patches sometimes have dependencies. We should be clear when a patch requires other changes, resulting a large patchset, and when a large patchset just happens to be released because multiple patches are available. Some dependencies may be outside the control of the vendor. This is easier to test when your patch has dependencies on an underlying Operating System, for instance, but more difficult if the dependency is on the opposite direction. If you’re the one providing the underlying update and the customer is using software that you don’t explicitly test, then it’s incumbent on you, I’d argue, to use some of the other techniques that I’ve outlined to help your customers understand likely impact.
Visibility of likely impact
One obvious option available to those providing patches is a good description of areas of impact. You’d hope that everyone did this already, of course, but a brief line something like “this update is for the storage subsystem, and should affect only those systems using EXT3”, for instance, is a great help in deciding the likely impact of a patch. You can’t always get it right – there may always be unexpected consequences, and vendors can’t test for all configurations. But they should at least test all supported configurations…
This is tricky, and maybe political, but is it time that we started giving those customers who need it a little more detail about the likely impact of the changes within a patch? It’s difficult to quantify, of course: a one-character change may affect 95% of the flows through a module, whereas what may seem like a simple functional addition to a customer may actually require thousands of lines of code. But as vendors, we should have an idea of the impact of a change, and we ought to be considering how we expose that to customers.
Beyond that, however, I think there are opportunities for customers to understand what the impact of not having accepted a previous patch is. Maybe the risk of accepting patch A is low, but the risk of not accepting patch A and patch B is much higher. Maybe it’s safer to accept patch A and patch C, but wait for a successor to patch B. I’m not sure quite how to quantify this, or how it might work, but I think there’s grounds for research******.
Businesses have every right not to patch. There are business reasons to balance the risk of patching against not patching. But the balance is currently often tipped too far in direction of not patching. Much too far. And if we’re going to improve the state of IT security, we, the industry, need to do something about it. By helping organisations with better information, by encouraging them to adopt better practices, by training them in how to assess risk, and by adopting better practices ourselves.
*see what I did there?
**your commands my vary.
***this almost sounds like a very good excuse for a second phone, though I’m not sure that my wife would agree.
****I’d certainly be interested to hear of others: please let me know via comments.
*****you do have these two things, right? Because if you don’t, you’re really not doing DevOps. Sorry.
******as soon as I wrote this, I realised that somebody’s bound to have done research on this issue. Please let me know if you have: or know somebody who has.
3 thoughts on “The Curious Incident of the Patch in the Night-Time”
Great overview on the patch situation Mike.
About deploying software, it reminds me of this excellent (and long) tangentially-related article on the subject: