Imagine that you’re about to play a boardgame which involves using dice. I don’t know: Monopoly, Yahtzee, Cluedo, Dungeons & Dragons*. In most cases, at least where you’re interested in playing a fair game, you want to be pretty sure that there’s a random distribution of the dice roll results. In other words, for a 6-sided dice, you’d hope that, for each roll, there’s an equal chance that any of the numbers 1 through 6 will appear. This seems like a fairly simple thing to want to define, and, like many things which seem to be simple when you first look at them, mathematicians have managed to conjure an entire field of study around it, making it vastly complicated in the process****.
Let’s move to computers. As opposed to boardgames, you generally want computers to do the same thing every time you ask them to do it, assuming that give them the same inputs: you want their behaviour to be deterministic when presented with the same initial conditions. Random behaviour is generally not a good thing for computers. There are, of course, exceptions to this rule, and the first is when you want to use computers to play games, as things get very boring very quickly if there’s no variation in gameplay.
There’s another big exception: cryptography. In fact, it’s not all of cryptography: you definitely want a single plaintext to be encrypted to a single ciphertext under the same key in almost all cases. But there is one area where randomness is important: and that’s in the creation of the cryptographic key(s) you’re going to be using to perform those operations. It turns out that you need to have quite a lot of randomness available to create a key which is truly unique – and keys really need to be truly unique – and that if you don’t have enough randomness, then not only will you possible generate the same key (or set of them) repeatedly, but other people may do so as well, allowing them to guess what keys you’re using, and thereby be able do things like read your messages or pretend to be you.
Given that these are exactly the sorts of things that cryptography tries to stop, it is clearly very important that you do have lots of randomness.
Luckily, mathematicians and physicists have come to our rescue. Their word for randomness is “entropy”. In fact, what mathematicians and physicists mean when they talk about entropy is – as far as my understanding goes – to be a much deeper and complex issue than just randomness. But if we can find a good source of entropy, and convert it into something that computers can use, then we should have enough randomness to do all things that we want to do with cryptographic key generation*****. The problem in the last sentence is the “if” and the “should”.
First, we need to find a good source of entropy, and prove that it is good. The good thing about this is that there are, in fact, lots of natural sources of entropy. Airflow is often random enough around computers that temperature variances can be measured that will provide good enough entropy. Human interactions with peripherals such as mouse movements or keyboard strokes can provide more entropy. In the past, variances between network packets receive times were used, but there’s been some concern that these are actually less random than previously thought, and may be measurable by outside parties******. There are algorithms that allow us to measure quite how random entropy sources are – though they can’t make predictions about future randomness, of course.
Let’s assume, though, that we have a good source of entropy. Or let’s not: let’s assume that we’ve got several pretty good sources of entropy, and that we believe that when we combine them, they’ll be good enough as a group.
And this is what computers – and Operating Systems such – generally do. They gather data from various entropy sources, and then convert it to a stream of bits – your computer’s favourite language of 1s and 0s – that can then be used to provide random numbers. The problem arises when they don’t do it well enough.
This can occur for a variety of reasons, the main two being bad sampling and bad combination. Even if your sources of entropy are good, if you don’t sample them in an appropriate manner, then what you actually get won’t reflect the “goodness” of that entropy source: that’s a sampling problem. This is bad enough, but the combination algorithms are supposed to smooth out this sort of issue, assuming it’s not too bad and you have enough sources of entropy. However, when you have an algorithm which isn’t actually doing that, or isn’t combining even well-sampled, good sources, then you have a real issue. And algorithms, we know, are not always correctly implemented – and there have even been allegations that some government security services have managed to introduce weakened algorithms – with weaknesses that only they know about, and can exploit – into systems around the world. There have been some very high profile examples of poor implementation in both the proprietary and open source worlds, which have led to real problems in actual deployments. At least, when you have an open source implementation, you have the chance to fix it.
That problem is compounded when – as is often the case – these algorithms are embedded in hardware such as a chip on a motherboard. In this case, it’s very difficult to fix, as you generally can’t just replace all the affected chips, and may also be difficult to trace. Whether you are operating in hardware or software, however, the impact of a bad algorithm which isn’t spotted – at least by the Good Guys and Gals[tm] – for quite a while is that you may have many millions of weak keys out there, which are doing a very bad job of protecting identities or private data. Even if you manage to replace these keys, what about all of the historical encryptions which, if recorded, can now be read? What if I could forge the identity of the person who signed a transaction buying a house several years ago, to make it look like I now owned it, for instance?
Entropy, then, can be difficult to manage, and when we have a problem, the impact of that problem can be much larger than we might immediately imagine.
*I’m sure that there are trademarks associated with these games**
**I’m also aware that Dungeons & Dragons*** isn’t really a boardgame
***I used to be a Dungeon Master!
****for an example, try reading just the first paragraph of the entry for stochastic process on Wikipedia.
*****and gaming.
******another good source of entropy is gained by measuring radioactive decay, but you generally don’t want to be insisting that computers – or there human operators – require a radioactive source near enough to them to be useful.