What’s a hash function?

It should be computationally implausible to work backwards from the output hash to the input.

Note – many thanks to a couple of colleagues who provided excellent suggestions for improvements to this text, which has been updated to reflect them.

Sometimes I like to write articles about basics in security, and this is one of those times. I’m currently a little over a third of the way through writing a book on trust in computing and the the cloud, and ended up creating a section about cryptographic hashes. I thought that it might be a useful (fairly non-technical) article for readers of this blog, so I’ve edited it a little bit and present it here for your delectation, dears readers.

There is a tool in the security practitioner’s repertoire that it is helpful for everyone to understand: cryptographic hash functions. A cryptographic hash function, such as SHA-256 or MD5 (now superseded for cryptographic uses as it’s considered “broken”) takes as input a set of binary data (typically as bytes) and gives as output which is hopefully unique for each set of possible inputs. The length of the output – “the hash” – for any particularly hash function is typically the same for any pattern of inputs (for SHA-256, it is 32 bytes, or 256 bits – the clue’s in the name). It should be computationally implausible (cryptographers hate the word “impossible”) to work backwards from the output hash to the input: this is why they are sometimes referred to as “one-way hash functions”. The phrase “hopefully unique” when describing the output is extremely important: if two inputs are discovered that yield the same output, the hash is said to have “collisions”. The reason that MD5 has become deprecated is that it is now trivially possible to find collisions with commercially-available hardware and software systems. Another important property is that even a tiny change in the message (e.g. changing a single bit) should generate a large change to the output (the “avalanche effect”).

What are hash functions used for, and why is the property of being lacking in collisions so important? The simplest answer to the first question is that hash functions are typically used to ensure that when someone hands you a piece of binary data (and all data in the world of computing can be described in binary format, whether it is text, an executable, a video, an image or complete database of data), it is what you expect. Comparing binary data directly is slow and arduous computationally, but hash functions are designed to be very quick. Given two files of several Megabytes or Gigabytes of data, you can produce hashes of them ahead of time, and defer the comparisons to when you need them[1].

Indeed, given the fact that it is easy to produce hashes of data, there is often no need to have both sets of data. Let us say that you want to run a file, but before you do, you want to check that it really is the file you think you have, and that no malicious actor has tampered with it. You can hash that file very quickly and easily, and as long as you have a copy of what the hash should look like, then you can be fairly certain that you have the file you wanted. This is where the “lack of collisions” (or at least “difficulty in computing collisions”) property of hash functions is important. If the malicious actor can craft a replacement file which shares the same hash as the real file, then the process is essentially useless.

In fact, there are more technical names for the various properties, and what I’ve described above mashes three of the important ones together. More accurately, they are:

  1. pre-image resistance – this says that if you have a hash, it should be difficult to find the message from which it was created, even if you know the hash function used;
  2. second pre-image resistance – this says that if you have a message, it should be difficult to find another message which, when hashed, generates the same hash;
  3. collision resistance – this says that it should be difficult to find any two messages which generate the same hash.

Collision resistance and second pre-image resistance sound like the same property, at first glances, but are subtly (but importantly) different. Pre-image resistance says that if you already have a message, finding another with a matching hash, whereas collision resistance should make it hard for you to find any two messages which will generate the same hash, and is a much harder property to fulfil in a hash function.

Let us go back to our scenario of a malicious actor trying to exchange a file (with a hash which we can check) with another one. Now, to use cryptographic hashes “in the wild” – out there in the real world beyond the perfectly secure, bug-free implementations populated by unicorns and overflowing with fat-free doughnuts – there are some important and difficult provisos that need to be met. More paranoid readers may already have spotted some of them, in particular:

  1. you need to have assurances that the copy of the hash you have has also not been subject to tampering;
  2. you need to have assurances that the entity performing the hash performs and reports it correctly;
  3. you need to have assurances that the entity comparing the two hashes reports the result of that comparison correctly.

Ensuring that you can meet such assurances it not necessarily an easy task, and is one of the reasons that Trusted Platform Modules (TPMs) are part of many computing systems: they act as a hardware root of trust with capabilities to provide such assurances. TPMs are a useful an important tool for real-world systems, and I plan to write an article on them (similar to What’s an HSM?) in the future.


1 – It’s also generally easier to sign hashes of data, rather than large sets of data themselves – this happens to be important as one of the most common uses of hashes is for cryptographic (“digital”) signatures.

Not quantum-safe, not tamper-proof, not secure

Let’s make security “marketing-proof”. Or … maybe not.

If there’s one difference that you can use to spot someone who takes security seriously, it’s this: they don’t make absolute statements about security. I’m going to be a bit contentious here, and I’m sorry if it upsets some people who do take security seriously, but I’m of the very strong opinion that we should never, ever say that something is “completely secure”, “hack-proof” or even just “secured”. I wrote a few weeks ago about lazy journalism, but it pains me even more to see or hear people who really should know better using such absolutes. There is no “secure”, and I’d love to think that one day I can stop having to say this, but it comes up again and again.

We, as a community, need to be careful about the words and phrases that we use, because it’s difficult enough to educate the rest of the world about what we do without allowing non-practitioners to believe that we (or they) can take a system or component and make it so safe that it cannot be compromised or go wrong. There are two particular bug-bears that are getting to me at the moment – and that’s before I even start on the one which rules them all, “zero-trust”, which makes my skin crawl and my hackles rise whenever I hear it used[1] – and they are (as you may have already guessed from the title of this article):

  • quantum-proof
  • tamper-proof

I’ll start with the latter, because it’s more clear cut (and easier to explain). Some systems – typically hardware systems – are deployed in environments where bad people might mess with them. This, in the trade, is called “tampering”, and it has a slightly different usage from the normal meaning, in that it tends to imply that the damage done to a system or component was done with the intention that the damage didn’t necessarily stop its normal operation, but did alter it in such a way that the attacker could gain some advantage (often, but not always, snooping on activities being performed). This may have been the intention, but it may be that the damage did actually stop or at least effect normal operation, whether or not the attacker gained the advantage they were attempting. The problem with saying that any system is tamper-proof is that it clearly isn’t, particularly if you accept the second part of the definition, but even, possibly if you don’t. And it’s pretty much impossible to be sure, for the same reason that the adage that “any fool can create a cryptographic protocol that he/she can’t break” is true: you can’t assess the skills and abilities of all future attackers of your system. The best you can do is make it tamper-evident: put such controls in place that it should be clear if someone tries to tamper with the system[3].

“Quantum-safe” is another such phrase. It refers to cryptographic protocols or primitives which are designed to be resistant to attacks by quantum computers. The phrase “quantum-proof” is also used, and the problem with both of these terms is that, since nobody has yet completed a quantum computer of sufficient complexity even to be try, we can’t be sure. Even once they do, we probably won’t be sure, as people will probably come up with new and improved ways of using them to attack the protocols and primitives we’ve been describing. And what’s annoying is that the key to what we should be saying is actually in the description I gave: they are meant to be resistant to such attacks. “Quantum-resistant” is a much more descriptive and accurate phrase[5], so why not use it?

The simple answer to that question, and to the question of why people use phrases like “tamper-proof” and “secure” is that it makes better marketing copy. Ill-informed customers are more likely to buy something which is “safe” or which is “proof” against something, rather than evidencing it, or being resistant to it. Well, our part of our jobs as security professionals is to try to educate those customers, and make them less ill-informed[6]. Let’s make security “marketing-proof”. Or … maybe not.


1 – so much so that I’m actually writing a book at it[2].

2 – not just the concept of “zero-trust”, but about trust in general.

3 – sometimes, the tamper-evidence is actually intentionally destroying the capabilities a system so that you can be pretty sure that the attacker wasn’t able to make it do things it wasn’t supposed to[4].

4 – which is pretty cool, though it does mean that you can’t make it do the things it was supposed to either, of course.

5 – well, I’m assuming that most of such mechanisms are resistant, of course…

6 – I fully accept that “better-informed” would be better choice of phrase here.

Enarx goes multi-platform

Now with added SGX!

Yesterday, Nathaniel McCallum and I presented a session “Confidential Computing and Enarx” at Open Source Summit Europe. As well as some new information on the architectural components for an Enarx deployment, we had a new demo. What’s exciting about this demo was that it shows off attestation and encryption on Intel’s SGX. Our initial work focussed on AMD’s SEV, so this is our first working multi-platform work flow. We’re very excited, and particularly as this week a number of the team will be attending the first face to face meetings of the Confidential Computing Consortium, at which we’ll be submitting Enarx as a project for contribution to the Consortium.

The demo had been the work of several people, but I’d like to call out Lily Sturmann in particular, who got things working late at night her time, with little time to spare.

What’s particularly important about this news is that SGX has a very different approach to providing a TEE compared with the other technology on which Enarx was previously concentrating, SEV. Whereas SEV provides a VM-based model for a TEE, SGX works at the process level. Each approach has different advantages and offers different challenges, and the very different models that they espouse mean that developers wishing to target TEEs have some tricky decisions to make about which to choose: the run-time models are so different that developing for both isn’t really an option. Add to that the significant differences in attestation models, and there’s no easy way to address more than one silicon platform at a time.

Which is where Enarx comes in. Enarx will provide platform independence both for attestation and run-time, on process-based TEEs (like SGX) and VM-based TEEs (like SEV). Our work on SEV and SGX is far from done, but also we plan to support more silicon platforms as they become available. On the attestation side (which we demoed yesterday), we’ll provide software to abstract away the different approaches. On the run-time side, we’ll provide a W3C standardised WebAssembly environment to allow you to choose at deployment time what host you want to execute your application on, rather than having to choose at development time where you’ll be running your code.

This article has sounded a little like a marketing pitch, for which I apologise. As one of the founders of the project, alongside Nathaniel, I’m passionate about Enarx, and would love you, the reader, to become passionate about it, too. Please visit enarx.io for more information – we’d love to tell you more about our passion.

HSMって何?

セキュリティ強化には重要なHSM。ただ、どのプロジェクトにも当てはまるわけじゃありません。

今週も3文字略語です。(訳注:毎週分まだ訳せてません、頑張ります)

HSM(Hardware Security Module)のお話です。

HSMって何だ?何に使うんだっけ?どうして検討する必要があるの?

その話をする前に、「鍵」特に、暗号鍵について考えてみましょう。

 

最近のほとんどの暗号は、実装されているアルゴリズムは特定の簡単なもの(ブロック暗号)で公開されていますし、一般的にも受け入れられています。

アルゴリズムを知っているかとかどのように動いているかは問題ではないんです。というのは問題になるのは鍵の安全性だからです。

 

例として、AESアルゴリズムでデータを暗号化したいとします。これで特定のタイプの(対称)暗号化ができます。(この例では1つのAESタイプだけ使うこととします。実際はいくつも微妙な違いがあってここでは省きますが、ポイントは変わりません)

 

このアルゴリズムには二つのデータを与えられます:

 

  1. 暗号化したい、平文のデータ
  2. 暗号化するための鍵

 

結果としてでたデータは一つです。

 

  1. 暗号化されたデータ

 

この暗号化されたデータを復号するには、AESアルゴリズムに鍵を入れ込見ます。すると元の平文データが出力されます。

この仕組みは非常によく出来ています。鍵が盗み出さなければ、です。

 

ここでHSMが出てきます。鍵はとても大切です。以下の場合、とても攻撃を受けやすいのです:

 

鍵の作成時:もし、暗号鍵を作成した時にヒントとなるビットを埋め込めたら、そのデータは悪意を持って複合される可能性が高くなります。

 

鍵の使用時:データを暗号化したり複合化している間、鍵はメモリ上にあります。つまり、そのメモリを覗き見ることができれば、データを盗み見ることができます。(下記の「サイドチャネルアタック」参照)

 

鍵の保存時:鍵の保存時にしっかりと保護していない限り、鍵が盗まれる可能性があります。

 

鍵の転送時:鍵を使用する場所と違うところに保存している場合、そこに転送する時に盗みとられる可能性があります。

 

HSMは上記の全ての場合に役立ちます。

これが必要となる理由としては、鍵の作成、使用、保存、転送時に、システムの安全性が不確実な場合があるからです。

 

 もし鍵がメールの暗号化に使われるとして、もしそこに侵入されてしまったらとてもみっともない事態に陥ります。もし、これがあなたが持っている全てのクレジットカードのチップに関するものだったら、もっと大変なことになります。

 

もし、そのコンピュートシステムで十分な権限を持っていれば、その権限者はメモリを見て、鍵を得ることもできます。TEE(Trusted Esxecution Environment

)環境でなければ、の話ですけれどね。

 

もっとタチの悪いことに、メモリを見ることができなくても、暗号鍵(もしくは、暗号化データ、平文データ)に関する情報を引き出して、攻撃を仕掛けることができます。このタイプの攻撃は通常「サイドチャネルアタック」と呼ばれます。

 

これは車のエンジンのシリンダーやバルブと同じようなもので、ボンネットを通してエンジンに耳を澄ますのと同じようなことです。エンジン構造はそのつもりではなかったとしても、エンジン部品からエンジンについての情報を盗み見ることができる、ということです。

HSMはそのような攻撃を防ぐように作られているのです。

 

ではHSMの定義をお話ししましょう。

 

HSMとはハードウェアの一つで、ネットワークやPCIのようなものを介して、システムに付随された暗号化作業を行うことができる保護ストレージを持っています。そしてサイドアタック、物理的にこじ開けようとしたり、コンポーネントに物理ケーブルを差し込んで電気信号を読み取ろうとする、などの色々な攻撃から保護する物理防御機能を持っています。

 

数々のHSMは、色々なタイプの攻撃を耐えられることを証明するため

FIPS140などの標準化の認可を取得しようと検査を受けています。

 

以下にHSMの主な使用方法を挙げます。

 

鍵の作成

鍵の作成は上で述べたように、とても大切な作業です。ただサイドアタックが非常に効果的に行われる部分でもあります。HSMは(比較的)安全な鍵の生成をし、鍵に求められる適度なランダム性があります。

 

鍵の保管

HSMは何者かが侵入しようとした場合に保管されている鍵を破棄するようにできているので、鍵の保管には適しています。

 

暗号化処理

 

鍵をHSMという安全な場所から別のシステムに転送して危険に晒すより、暗号化前の平文をHSMに置いてしまってはどうでしょう(できれば転送する場合には転送用の鍵を使ってです)。そしてHSMにすでにある鍵で暗号化させ、暗号化したデータを送り返せば?(ここでも転送中は転送用の鍵を使います)こうすることで転送中と使用中の攻撃の機会を減らします。これがHSMの鍵の使い方です。

 

通常のコンピューティング処理

 

全てのHSMがこの使い方をサポートするわけではなく(他のほとんどの方法はサポートされますが)、鍵とアルゴリズムたくさん使って機密作業をするのであれば、アプリケーションをHSMで動くように書くことができます。

これは例えばAIやMLのような、前に書いたような古いやり方とは違って、非常に機密性のある場合です。

 

簡単に保証できるものではありませんが、実行環境は往往にして非常に制限があります。「正しい」ことをするのは難しく、間違いを犯すのは簡単です。すると思っていたよりも大変安全性の低いことになります。

 

結論 HSMを使うべき?

 

HSMはPKI(Public Key Infrastructure)プロジェクトなどにはルートオブトラスト(信頼性の基点)としてとてもいいものです。

 

使うのは難しいでしょうが、PKCS#11インターフェース(Public Key Cryptography Standard )を提供しているはずなので、共通化した作業は簡易化されています。機密鍵や暗号化の要件がある場合、HSMをシステムで使うのは賢明な選択ですが、どうやって静的化して使うのはアーキテクチャと設計の段階で必要で、構築の十分前段階でする必要があります。

 

日々のプロビジョニングからプロビジョニングの解除の時まで、HSMの作業は非常に注意して行う必要があることを十分に考慮してください。HSMの使用はとても意味があることですがとても高価で拡張性は多くの場合あまりありません。

 

HSMはとても機密性の高いデータとその作業を行うというユースケースには特に最適ですが、軍用や政府、ファイナンスに使われることが多いのです。

HSMは全てのプロジェクトに合うものではないのですが、機密システムの設計と運用の武装化に大切なものなのです。

 

元の記事:https://aliceevebob.com/2019/06/11/whats-an-hsm/

2019年6月11日 Mike Bursell

 

タグ:セキュリティ

What’s an HSM?

HSMs are not right for every project, but form an important part of our armoury.

HSMって何?

Another week, another TLA[1].  This time round, it’s Hardware Security Module: an HSM.  What, then, is an HSM, what is it used for, and why should I care?  Before we go there, let’s think a bit about keys: specifically, cryptographic keys.

The way that most cryptography works these days is that the algorithms to implement a particular primitive[3] are public, and it’s generally accepted that it doesn’t matter whether you know what the algorithm is, or how it works, as it’s the security of the keys that matters.  To give an example: I plan to encrypt a piece of data under the AES algorithm[4], which allows for a particular type of (symmetric) encryption.  There are two pieces of data which are fed into the algorithm:

  1. the data you want to encrypt (the cleartext);
  2. a key that you’ve chosen to encrypt it.

Out comes one piece of data:

  1. the encrypted text (the ciphertext).

In order to decrypt the ciphertext, you feed that and the key into the AES algorithm, and the original cleartext comes out.  Everything’s great – until somebody gets hold of the key.

This is where HSMs come in.  Keys are vital, and they are vulnerable:

  • at creation time – if I can trick you into creating a key some of whose bits I can guess, I increase my chances of being able to decrypt your ciphertext;
  • during use – while you’re doing the encryption or decryption of your data, your key will be in memory, which means that if I can snoop into that memory, I can get it (see also below for information on “side channel attacks”;
  • while stored – unless you protect your key while it’s “at rest”, and waiting to be used, I may have opportunities to get it.
  • while being transferred – if you store your keys somewhere different to the place in which you’re using it, I may have an opportunity to intercept it as it moves to the place it will be used.

HSMs can help in one way or another with all of these pieces, but why do we need them?  The key reason is that there are times when you can’t be certain that the system(s) you are using for creating, using, storing and transferring keys are as secure as you’d like.  If the keys we’re talking about are for encrypting a few emails between you and your spouse, well, you might find it embarrassing if they were compromised, but if these keys are ones from which, say, you derive all of the credit cards chip keys for an entire bank, then you have a rather larger problem.  When it comes down to it, somebody with sufficient privilege on a standard computing system can look at any part of memory – unless there’s a TEE[5] (Oh, how I love my TEE (or do I?)) – and if they can look at the memory, they can see the key.

Worse than this, there are occasions when even if you can’t see into memory, you might be able to derive enough information about a key – or the ciphertext or cleartext – to be able to mount an attack on it.  Attacks of this type are generally called “side channel attacks”, and you can think of them as a little akin to being able to work out the number of cylinders and valves a car[6] engine has by listening to it through the bonnet[7].  The engine leaks information about itself, even though it’s not designed with that in mind.  HSMs are (generally) good at preventing both types of attacks: it’s what they’re designed to do.

Here, then, is a definition:

An HSM is piece of hardware with protected storage which can perform cryptographic operations attached to a system – via a network connection or other connection such as PCI – and which has physical protection from various attacks, from side attacks to somebody physically levering open the case and attaching wires to important components so that they can read the electrical signals.

Many HSMs undergo testing to get certification against certain standards such as “FIPS 140” to show their ability to withstand various types of attack.

Here are the main uses for HSMs.

Key creation

Creation of keys is, as alluded to above, a very important operation, and one where side attacks have proved very effective in the past.  HSMs can provide safe(r) key generation, and ensure appropriate levels of randomness (entropy) for the required strength of key.

Key storage

HSMs are typically designed so that if somebody tries to break into them, they will delete any keys which are stored within them, so they’re a good place to store your keys.

Cryptographic operations

Rather than putting your keys at risk by transferring them to another system, and away from the safety of the HSM, why not move the cleartext to the HSM (encrypted under a transport key, preferably), get the HSM to do the encryption with the keys that it already holds, and then send the ciphertext back (encrypted under a transport key[8])?  This reduces opportunities for attacks during transport and during use, and is a key use for HSMs.

General computing operations

Not all HSMs support this use (almost all will support the others), but if you have sensitive operations with lots of keys and algorithms – which, in the case of AI/ML, for instance, may be sensitive (unlike the cryptographic primitives we were talking about before), then it is possible to write applications specifically to run on an HSM.  This is not a simple undertaking, however, as the execution environment provided is likely to be constrained.  It is difficult to do “right”, and easy to make mistakes which may leave you with a significantly less secure environment than you had thought.

Conclusion – should I use HSMs?

HSMs are excellent as roots of trust for PKI [9] projects and similar.  Using them can be difficult, but most these days should provide a PKCS#11 interface which simplifies the most common operations.  If you have sensitive key or cryptographic requirements, designing HSM use into your system can be a sensible step, but knowing how best to use them must be part of the architecture and design stages, well before implementation.  You should also take into account that operation of HSMs must be managed very carefully, from provisioning through everyday use to de-provisioning.  Use of an HSM in the cloud may make sense, but they are expensive and do not scale particularly well.

HSMs, then, are suited to very particular use cases of highly sensitive data and operations – it is no surprise that their deployment is most common within military, government and financial settings. HSMs are not right for every project, by any means, but form an important part of our armoury for the design and operation of sensitive systems.


1 – Three Letter Acronym[2]

2 – keep up, or we’ll be here for some time.

3 – cryptographic building block.

4 – let’s pretend there’s only one type of AES for the purposes of this example.  In fact, there are a number of nuances around this example which I’m going to gloss over, but which shouldn’t be important for the point I’m making.

5 – Trusted Execution Environment.

6 – automobile, for our North American friends.

7 – hood.  Really, do we have to do this every time?

8 – why do you need to encrypt something that’s already encrypted?  Because you shouldn’t use the same key for two different operations.

9 – Public Key Infrastructure.

Cryptographers arise!

Cryptography is a strange field, in that it’s both concerned with keeping secrets, but also has a long history of being kept secret, as well.  There are famous names from the early days, from Caesar (Julius, that is) to Vigenère, to more recent names like Diffie, Hellman[1], Rivest, Shamir and Adleman.  The trend even more recently has been away from naming cryptographic protocols after their creators, and more to snappy names like Blowfish or less snappy descriptions such as “ECC”.  Although I’m not generally a fan of glorifying individual talent over collective work, this feels like a bit of a pity in some ways.

In fact, over the past 80 years or so, more effort has been probably put into keeping the work of teams in cryptanalysis – the study of breaking cryptography – secret, though there are some famous names from the past like Al-Kindi, Phelippes (or “Phillips), Rejewski, Turing, Tiltman, Knox and Briggs[2].

Cryptography is difficult.  Actually, let me rephrase that: cryptography is easy to do badly, and difficult to do well.  “Anybody can design a cipher that they can’t break”, goes an old dictum, with the second half of the sentence, “and somebody else can easily break”, being generally left unsaid.  Creation of cryptographic primitives requires significant of knowledge of mathematics – some branches of which are well within the grasp of an average high-school student, and some of which are considerably more arcane.  Putting those primitives together in ways that allow you to create interesting protocols for use in the real world doesn’t necessarily require that you understand the full depth of the mathematics of the primitives that you’re using[3], but does require a good grounding in how they should be used, and how they should not be used.  Even then, a wise protocol designer, like a wise cryptographer[4], always gets colleagues and others to review his or her work.  This is one of the reasons that it’s so important that cryptography should be in the public domain, and preferably fully open source.

Why am I writing about this?  Well, partly because I think that, on the whole, the work of cryptographers is undervalued.  The work they do is not only very tricky, but also vital.  We need cryptographers and cryptanalysts to be working in the public realm, designing new algorithms and breaking old (and, I suppose) new ones.  We should be recognising and celebrating their work.  Mathematics is not standing still, and, as I wrote recently, quantum computing is threatening to chip away at our privacy and secrecy.  The other reasons that I’m writing about this is because I think we should be proud of our history and heritage, inspired to work on important problems, and to inspire those around us to work on them, too.

Oh, and if you’re interested in the t-shirt, drop me a line or put something in the comments.


1 – I’m good at spelling, really I am, but I need to check the number of ells and ens in his name every single time.

2 – I know that is heavily Bletchley-centric: it’s an area of history in which I’m particularly interested.  Bletchley was also an important training ground for some very important women in security – something of which we have maybe lost sight.

3 – good thing, too, as I’m not a mathematician, but I have designed the odd protocol here and there.

4 – that is, any cryptographer who recognises the truth of the dictum I quote above.

Entropy

… algorithms, we know, are not always correctly implemented …

Imagine that you’re about to play a boardgame which involves using dice.  I don’t know: Monopoly, Yahtzee, Cluedo, Dungeons & Dragons*.  In most cases, at least where you’re interested in playing a fair game, you want to be pretty sure that there’s a random distribution of the dice roll results.  In other words, for a 6-sided dice, you’d hope that, for each roll, there’s an equal chance that any of the numbers 1 through 6 will appear.  This seems like a fairly simple thing to want to define, and, like many things which seem to be simple when you first look at them, mathematicians have managed to conjure an entire field of study around it, making it vastly complicated in the process****.

Let’s move to computers.  As opposed to boardgames, you generally want computers to do the same thing every time you ask them to do it, assuming that give them the same inputs: you want their behaviour to be deterministic when presented with the same initial conditions.  Random behaviour is generally not a good thing for computers.  There are, of course, exceptions to this rule, and the first is when you want to use computers to play games, as things get very boring very quickly if there’s no variation in gameplay.

There’s another big exception: cryptography.  In fact, it’s not all of cryptography: you definitely want a single plaintext to be encrypted to a single ciphertext under the same key in almost all cases.  But there is one area where randomness is important: and that’s in the creation of the cryptographic key(s) you’re going to be using to perform those operations.  It turns out that you need to have quite a lot of randomness available to create a key which is truly unique – and keys really need to be truly unique – and that if you don’t have enough randomness, then not only will you possible generate the same key (or set of them) repeatedly, but other people may do so as well, allowing them to guess what keys you’re using, and thereby be able do things like read your messages or pretend to be you.

Given that these are exactly the sorts of things that cryptography tries to stop, it is clearly very important that you do have lots of randomness.

Luckily, mathematicians and physicists have come to our rescue.  Their word for randomness is “entropy”.  In fact, what mathematicians and physicists mean when they talk about entropy is – as far as my understanding goes – to be a much deeper and complex issue than just randomness.  But if we can find a good source of entropy, and convert it into something that computers can use, then we should have enough randomness to do all things that we want to do with cryptographic key generation*****.  The problem in the last sentence is the “if” and the “should”.

First, we need to find a good source of entropy, and prove that it is good.  The good thing about this is that there are, in fact, lots of natural sources of entropy.  Airflow is often random enough around computers that temperature variances can be measured that will provide good enough entropy.  Human interactions with peripherals such as mouse movements or keyboard strokes can provide more entropy.  In the past, variances between network packets receive times were used, but there’s been some concern that these are actually less random than previously thought, and may be measurable by outside parties******.  There are algorithms that allow us to measure quite how random entropy sources are – though they can’t make predictions about future randomness, of course.

Let’s assume, though, that we have a good source of entropy.  Or let’s not: let’s assume that we’ve got several pretty good sources of entropy, and that we believe that when we combine them, they’ll be good enough as a group.

And this is what computers – and Operating Systems such –  generally do.  They gather data from various entropy sources, and then convert it to a stream of bits – your computer’s favourite language of 1s and 0s – that can then be used to provide random numbers. The problem arises when they don’t do it well enough.

This can occur for a variety of reasons, the main two being bad sampling and bad combination.  Even if your sources of entropy are good, if you don’t sample them in an appropriate manner, then what you actually get won’t reflect the “goodness” of that entropy source: that’s a sampling problem.  This is bad enough, but the combination algorithms are supposed to smooth out this sort of issue, assuming it’s not too bad and you have enough sources of entropy.  However, when you have an algorithm which isn’t actually doing that, or isn’t combining even well-sampled, good sources, then you have a real issue.  And algorithms, we know, are not always correctly implemented – and there have even been allegations that some government security services have managed to introduce weakened algorithms – with weaknesses that only they know about, and can exploit – into systems around the world.  There have been some very high profile examples of poor implementation in both the proprietary and open source worlds, which have led to real problems in actual deployments.  At least, when you have an open source implementation, you have the chance to fix it.

That problem is compounded when – as is often the case – these algorithms are embedded in hardware such as a chip on a motherboard.   In this case, it’s very difficult to fix, as you generally can’t just replace all the affected chips, and may also be difficult to trace.  Whether you are operating in hardware or software, however, the impact of a bad algorithm which isn’t spotted – at least by the Good Guys and Gals[tm] – for quite a while is that you may have many millions of weak keys out there, which are doing a very bad job of protecting identities or private data.   Even if you manage to replace these keys, what about all of the historical encryptions which, if recorded, can now be read?  What if I could forge the identity of the person who signed a transaction buying a house several years ago, to make it look like I now owned it, for instance?

Entropy, then, can be difficult to manage, and when we have a problem, the impact of that problem can be much larger than we might immediately imagine.


*I’m sure that there are trademarks associated with these games**

**I’m also aware that Dungeons & Dragons*** isn’t really a boardgame

***I used to be a Dungeon Master!

****for an example, try reading just the first paragraph of the entry for  stochastic process on Wikipedia.

*****and gaming.

******another good source of entropy is gained by measuring radioactive decay, but you generally don’t want to be insisting that computers – or there human operators – require a radioactive source near enough to them to be useful.

Stop reading, start patching

… in order to correct this problem, BOTH the client AND the router must be patched

This is an emergency post: normal* service will resume next week**.

So, over the past 48 hours or so, news of the KRACK vulnerability for Wifi has started spreading.  This vulnerability makes is pretty trivially easy to snoop on information sent between a device (mobile phone, laptop, etc.) and a wifi router, in some cases allowing changes to that information.  This is not a bug in code, but a mis-design in the crypto algorithm that’s used by the vast majority of Wifi connections: WPA2.

Some key facts:

  • WPA2 personal and WPA2 enterprise are vulnerable
  • the vulnerability is in the design of the code, not the implementation
    • however, Linux and Android 6.0+ implementations (which use wpa_supplicant 2.4 or higher, are even more easily attacked)
  • in order to correct this problem, BOTH the client AND the router must be patched.
    • this means that it’s not good enough just to update your laptop, but also the router in your house, business, etc.
  • Android phones typically take a long time to get patches (if at all)
    • unless you have evidence to the contrary, assume that your phone is vulnerable
  • many hotels, businesses, etc., rarely update or patch their routers
    • assume that any wifi connection that you use from now on is vulnerable unless you know that it’s been patched
  • you can continue to rely on VPNs
  • you can continue to rely on website encryption***
    • but remember that you may be betraying lots of metadata, including, of course, the address of the website that you’re visiting, to any snoopers
  • the security of IoT devices is going to continue to be a problem
    • unless their firmware can easily be patched, it’s difficult to believe that they will be safe

For my money, it’s worth investing in that VPN solution you were always promising yourself, and starting to accept the latency hit that it may cost you.

For more information in an easily readable form, I suggest heading over to The Register, which is pretty much always a good place to start.

For information in the initial publisher of this attack, visit https://www.krackattacks.com/


*well, as normal as it ever was

**hopefully

***as much as you ever could before…

That Backdoor Fallacy revisited – delving a bit deeper

…if it breaks just once that becomes always,..

A few weeks ago, I wrote a post called The Backdoor Fallacy: explaining it slowly for governments.  I wish that it hadn’t been so popular.  Not that I don’t like the page views – I do – but because it seems that it was very timely, and this issue isn’t going away.  The German government is making the same sort of noises that the British government* was making when I wrote that post**.  In other words, they’re talking about forcing backdoors in encryption.  There was also an amusing/worrying story from slashdot which alleges that “US intelligence agencies” attempted to bribe the developers of Telegram to weaken the encryption in their app.

Given some of the recent press on this, and some conversations I’ve had with colleagues, I thought it was worth delving a little deeper***.  There seem to be three sets of use cases that it’s worth addressing, and I’m going to call them TSPs, CSPs and Other.  I’d also like to make it clear here that I’m talking about “above the board” access to encrypted messages: access that has been condoned by the relevant local legal system.  Not, in other words, the case of the “spooks”.  What they get up to is for another blog post entirely****.  So, let’s look at our three cases.

TSPs – telecommunications service providers

In order to get permission to run a telecommunications service(wired or wireless) in most (all?) jurisdictions, you need to get approval from the local regulator: a licence.  This licence is likely to include lots of requirements: a typical one is that you, the telco (telecoms company) must provide access at all times to emergency numbers (999, 911, 112, etc.).  And another is likely to be that, when local law enforcement come knocking with a legal warrant, you must give them access to data and call information so that they can basically do wire-taps.  There are well-established ways to do this, and fairly standard legal frameworks within which it happens: basically, if a call or data stream is happening on a telco’s network, they must provide access to it to legal authorities.  I don’t see an enormous change to this provision in what we’re talking about.

CSPs – cloud service providers

Things get a little more tricky where cloud service providers are concerned.  Now, I’m being rather broad with my definition, and I’m going to lump your Amazons, Googles, Rackspaces and such in with folks like Facebook, Microsoft and other providers who could be said to be providing “OTT” (Over-The-Top – in that they provide services over the top of infrastructure that they don’t own) services.  Here things are a little greyer*****.  As many of these companies (some of who are telcos, how also have a business operating cloud services, just to muddy the waters further) are running messaging, email services and the like, governments are very keen to apply similar rules to them as those regulating the telcos. The CSPs aren’t keen, and the legal issues around jurisdiction, geography and what the services are complicate matter.  And companies have a duty to their shareholders, many of whom are of the opinion that keeping data private from government view is to be encouraged.  I’m not sure how this is going to pan out, to be honest, but I watch it with interest.  It’s a legal battle that these folks need to fight, and I think it’s generally more about cryptographic key management – who controls the keys to decrypt customer information – than about backdoors in protocols or applications.

Other

And so we come to other.  This bucket includes everything else.  And sadly, our friends the governments want their hands on all of that everything else.    Here’s a little list of some of that everything else.  Just a subset.  See if you can see anything on the list that you don’t think there should be unfettered access to (and remember my previous post about how once access is granted, it’s basically game over, as I don’t believe that backdoors end up staying secret only to “approved” parties…):

  • the messages you send via apps on your phone, or tablet, or laptop or PC;
  • what you buy on Amazon;
  • your banking records – whether on your phone or at the bank;
  • your emails via your company VPN;
  • the stored texts on your phone when you enquired about the woman’s shelter
  • your emails to your doctor;
  • your health records – whether stored at your insurers, your hospital or your doctor’s surgery;
  • your browser records about emergency contraception services;
  • access to your video doorbell;
  • access to your home wifi network;
  • your neighbour’s child’s chat message to the ChildLine (a charity for abused children in the UK – similar exist elsewhere)
  • the woman’s shelter’s records;
  • the rape crisis charity’s records;
  • your mortgage details.

This is a short list.  I’ve chosen emotive issues, of course I have, but they’re all legal.  They don’t even include issues like extra-marital affairs or access to legal pornography or organising dissent against oppressive regimes, all of which might well edge into any list that many people might copmile.  But remember – if a backdoor is put into encryption, or applications, then these sorts of information will start leaking.  And they will leak to people you don’t want to have them.

Our lives revolve around the Internet and the services that run on top of it.  We have expectations of privacy.  Governments have an expectation that they can breach that privacy when occasion demands.  And I don’t dispute that such an expectation is valid.  The problem that this is not the way to do it, because of that phrase “when occasion demands”.  If the occasion breaks just once, then that becomes always, and not just to “friendly” governments.  To unfriendly governments, to criminals, to abusive partners and abusive adults and bad, bad people.  This is not a fight for us to lose.


*I’m giving the UK the benefit of the doubt here: as I write, it’s unclear whether we really have a government, and if we do, for how long it’ll last, but let’s just with it for now.

**to be fair, we did have a government then.

***and not just because I like the word “delving”.  Del-ving.  Lovely.

****one which I probably won’t be writing if I know what’s good for me.

*****I’m a Brit, so I use British spelling: get over it.

Disbelieving the many eyes hypothesis

There is a view that because Open Source Software is subject to review by many eyes, all the bugs will be ironed out of it. This is a myth.

大勢がレビューしていても信用しないという仮説

Writing code is hard.  Writing secure code is harder: much harder.  And before you get there, you need to think about design and architecture.  When you’re writing code to implement security functionality, it’s often based on architectures and designs which have been pored over and examined in detail.  They may even reflect standards which have gone through worldwide review processes and are generally considered perfect and unbreakable*.

However good those designs and architectures are, though, there’s something about putting things into actual software that’s, well, special.  With the exception of software proven to be mathematically correct**, being able to write software which accurately implements the functionality you’re trying to realise is somewhere between a science and an art.  This is no surprise to anyone who’s actually written any software, tried to debug software or divine software’s correctness by stepping through it.  It’s not the key point of this post either, however.

Nobody*** actually believes that the software that comes out of this process is going to be perfect, but everybody agrees that software should be made as close to perfect and bug-free as possible.  It is for this reason that code review is a core principle of software development.  And luckily – in my view, at least – much of the code that we use these days in our day-to-day lives is Open Source, which means that anybody can look at it, and it’s available for tens or hundreds of thousands of eyes to review.

And herein lies the problem.  There is a view that because Open Source Software is subject to review by many eyes, all the bugs will be ironed out of it.  This is a myth.  A dangerous myth.  The problems with this view are at least twofold.  The first is the “if you build it, they will come” fallacy.  I remember when there was a list of all the websites in the world, and if you added your website to that list, people would visit it****.  In the same way, the number of Open Source projects was (maybe) once so small that there was a good chance that people might look at and review your code.  Those days are past – long past.  Second, for many areas of security functionality – crypto primitives implementation is a good example – the number of suitably qualified eyes is low.

Don’t think that I am in any way suggesting that the problem is any lesser in proprietary code: quite the opposite.  Not only are the designs and architectures in proprietary software often hidden from review, but you have fewer eyes available to look at the code, and the dangers of hierarchical pressure and groupthink are dramatically increased.  “Proprietary code is more secure” is less myth, more fake news.  I completely understand why companies like to keep their security software secret – and I’m afraid that the “it’s to protect our intellectual property” line is too often a platitude they tell themselves, when really, it’s just unsafe to release it.  So for me, it’s Open Source all the way when we’re looking at security software.

So, what can we do?  Well, companies and other organisations that care about security functionality can – and have, I believe a responsibility to – expend resources on checking and reviewing the code that implements that functionality.  That is part of what Red Hat, the organisation for whom I work, is committed to doing.  Alongside that, we, the Open Source community, can – and are – finding ways to support critical projects and improve the amount of review that goes into that code*****.  And we should encourage academic organisations to train students in the black art of security software writing and review, not to mention highlighting the importance of Open Source Software.

We can do better – and we are doing better.  Because what we need to realise is that the reason the “many eyes hypothesis” is a myth is not that many eyes won’t improve code – they will – but that we don’t have enough expert eyes looking.  Yet.


* Yeah, really: “perfect and unbreakable”.  Let’s just pretend that’s true for the purposes of this discussion.

** …and which still relies on the design and architecture actually to do what you want – or think you want – of course, so good luck.

*** nobody who’s actually written more than about 5 lines of code (or more than 6 characters of Perl)

**** I added one.  They came.  It was like some sort of magic.

***** see, for instance, the Linux Foundation‘s Core Infrastructure Initiative