I’ve always been on the development and architecture side of the house, rather than on the operations side. In the old days, this distinction was a useful and acceptable one, and wasn’t too difficult to maintain. From time to time, I’d get involved with discussions with people who were actually running the software that I had written, but on the whole, they were a fairly remote bunch.
This changed as I got into more senior architectural roles, and particularly as I moved through some pre-sales roles which involved more conversations with users. These conversations started to throw up an uncomfortable truth: not only were people running the software that I helped to design and write, but they didn’t just set it up the way we did in our clean test install rig, run it with well-behaved, well-structured data input by well-meaning, generally accurate users in a clean deployment environment, and then turn it off when they’re done with it.
This should all seem very obvious, and I had, of course, be on the receiving end of requests from support people who exposed that there were odd things that users did to my software, but that’s usually all it felt like: odd things.
The problem is that odd is normal. There is no perfect deployment, no clean installation, no well-structured data, and certainly very few generally accurate users. Every deployment is messy, and nobody just turns off the software when they’re done with it. If it’s become useful, it will be upgraded, patched, left to run with no maintenance, ignored or a combination of all of those. And at some point, it’s likely to become “legacy” software, and somebody’s going to need to work out how to transition to a new version or a completely different system. This all has major implications for security.
I was involved in an effort a few years ago to describe the functionality, lifecycle for a proposed new project. I was on the security team, which, for all the usual reasons didn’t always interact very closely with some of the other groups. When the group working on error and failure modes came up with their state machine model and presented it at a meeting, we all looked on with interest. And then with horror. All the modes were “natural” failures: not one reflected what might happen if somebody intentionally caused a failure. “Ah,” they responded, when called on it by the first of the security to be able to form a coherent sentence, “those aren’t errors, those are attacks.” “But,” one of us blurted out, “don’t you need to recover from them?” “Well, yes,” they conceded, “but you can’t plan for that. It’ll need to be on a case-by-case basis.”
This is thinking that we need to stamp out. We need to design our systems so that, wherever possible, we consider not only what attacks might be brought to bear on them, but also how users – real users – can recover from them.
One way of doing this is to consider security as part of your resilience planning, and bake it into your thinking about lifecycle. Failure happens for lots of reasons, and some of those will be because of bad people doing bad things. It’s likely, however, that as you analyse the sorts of conditions that these attacks can lead to, a number of them will be similar to “natural” errors. Maybe you could lose network connectivity to your database because of a loose cable, or maybe because somebody is performing a denial of service attack on it. In both these cases, you may well start off with similar mitigations, though the steps to fix it are likely to be very different. But considering all of these side by side means that you can help the people who are actually going to be operating those systems plan and be ready to manage their deployments.
So the lesson from today is the same as it so often is: make sure that your security folks are involved from the beginning of a project, in all parts of it. And an extra one: if you’re a security person, try to think not just about the attackers, but also about all those poor people who will be operating your software. They’ll thank you for it.
1 – not literally, thankfully.
2 – though there was that memorable trip to Singapore with food poisoning… I’ll stop there.
3 – a fact of which I actually was aware.
4 – some due entirely to our own navel-gazing, I’m pretty sure.
5 – exactly what we singularly failed to do in the project I’ve just described.
6 – though probably not in person. Or with an actual gift. But at least they’ll complain less, and that’s got to be worth something.