I spend a lot of my time on this blog talking about systems, because unless you understand how your systems work as a set of components, you’re never going to be able to protect and manage them. Today, however, I want to talk about security of data – the data in the systems. Any type of even vaguely useful system will hold, manipulate or use data in some way or another[1], and as I’m interested[2] in security, I think it’s useful to talk about data and data security. I’ve touched on this question in previous articles, but one recent one, What’s a State Actor, and should I care? had a number of people asking me for more detail on some of the points I raised, and as one of them was the classic “C.I.A.” model around data security, I thought I’d start there.
The first point I should make is that the “CIA triad” is sometimes over-used. You really can’t reduce all of information security to confidentiality, integrity and availability – there are a number of other important principles to consider. These three arguably don’t even cover all the issues you’d want to consider around data security – what, for instance, about data correctness and consistency, for example? – but I’ve always found them to be a useful starting point, so as long as we don’t kid ourselves into believing that they’re all we need, they are useful to hold in mind. They are, to use a helpful phrase, “necessary but not sufficient”.
We should also bear in mind that for any particular system, you’re likely to have various types and sets of data, and these types and sets may have different requirements. For instance, a database may store not only key data about, for instance, museum exhibits, but will also store data about who can update the key data, and even metadata about that – this might[3] include information about a set of role-based access controls (RBAC), and the security requirements for this will be different to the security requirements for thee key data. So, when we’re considering the data security requirements of a system, don’t assume that they will be uniform across all data sets.
Confidentiality
Confidentiality is quite an easy one to explain. If you don’t want everybody to be able to see a set of data, then you wish it to be confidential with regards to at least some entities – whether they be people or systems entities, internal or external. There are a number of ways to implement confidentiality, the most obvious being encryption of data, but there are other approaches, of which the easiest is just denying access to data through physical, geographical or other authorisation controls.
When you might care that data is confidentiality-protected: health records, legal documents, credit card details, financial information, firewall rules, system administrator rights, passwords.
When you might not care that data is confidentiality-protected: sports records, examination results, open source code, museum exhibit information, published company financial results.
Integrity
Integrity, when used as a term in this context, is slightly different to its standard usage. The property we’re looking for is not the same integrity that we expect[4] from our politicians, but is that data has not been changed from what it should be. Data is often useless unless it can be changed – we want to update information about our museum exhibits from time to time, for instance – but who can change it, and when, are the sort of things we want to control. Equally important may be the type of changes that can be made to it: if I have a careful classification scheme for my Tudor music manuscripts, I don’t want somebody putting in binary data which means nothing to me or our visitors.
I struggled to think of any examples when you wouldn’t want to protect the integrity of your data from at least some entities, as if data can be changed willy-nilly[5], it seems be worthless. It did occur to me, however, that as long as you have integrity-protected records of what has been changed, you’re probably OK. That’s the model for some open source projects or collaborative writing endeavours, for example.
[Discursion – Open source projects don’t generally allow you to scribble directly onto the main “approved” store – whose integrity is actually very important. That’s why software projects – proprietary or open source – have for decades used source control systems or versioning systems. One of the success criteria for scaling an open source project is a consensus on integrity management.]
Availability
Availability is the easiest of the triad to ignore when you’re designing a system. When you create data, it’s generally useless unless the entities that need it can get to it when they need it. That doesn’t mean that all systems need to have 100% up-time, or even that particular data sets need to be available for 100% of the up-time of the system, but when you’re designing a system, you need to decide what’s going to be appropriate, and how to manage with degradation. Why degradation? Because one of the easiest ways to affect the availability of data is to slow down access to it – as described in another recent post What’s your availability? DoS attacks and more. If I’m using a mobile app to view information about museum exhibits in real-time, and it takes five minutes for me to access the description, then things aren’t any good. On the other hand, if there’s some degradation of the service, and I can only access the first paragraph of the description, with an apology for the inconvenience and a link to other information, that might be acceptable. From a different point of view, if I notice that somebody is attacking my museum system, but I can’t get into it with administrative access to lock it down or remove their access, that’s definitely bad.
As with integrity protection, it’s difficult to think of examples when availability protection isn’t important, but availability isn’t necessarily a binary condition: it may vary from time to time.
Conclusion
Although they’re not perfect descriptions of all the properties you need to consider when designing in data security, confidentiality, integrity and availability should at least cause you to start thinking about what your data is for, how it should be accessed, how it should be changed, and by whom.
1 – I just know that somebody’s going to come up with a counter-example.
2 – And therefore assume that you, the reader, are interested.
3 – as a nested example, which is quite nice, as we’re talking about metadata.
4 – And far too rarely get, it seems.
5 – Not a rude phrase, even if it sounds like it should be. Look it up if you don’t believe me.