Thunderspy – should I care?

Thunderspy is a nasty attack, but easily prevented.

There’s a new attack out there which is getting quite a lot of attention this week. It’s called Thunderspy, and it uses the Thunderbolt port which is on many modern laptops and other computers to suck data from your machine. I thought that it might be a good issue to cover this week, as although it’s a nasty attack, there are easy ways to defend yourself, some of which I’ve already covered in previous articles, as they’re generally good security practice to follow.

What is Thunderspy?

Thunderspy is an attack on your computer which allows an attacker with moderate resources to get at your data under certain circumstances. The attacker needs:

  • physical access to your machine – not for long (maybe five minutes), but they do need it. This type of attack is sometimes called an “evil maid” attack, as it can be carried out by hotel staff with access to your room;
  • the ability to take your computer apart (a bit) – all we’re talking here is a screwdriver;
  • a little bit of hardware – around $400 worth, according to one source;
  • access to some freely available software;
  • access to another computer at the same time.

There’s one more thing that the attacker needs, and that’s for you to leave your computer on, or in suspend mode. I’ve discussed different power modes before (in 3 laptop power mode options), and mentioned, as well, that leaving your machine in suspend mode is generally a bad idea (in 7 security tips for travelling with your laptop). It turns out I was right.

What’s the bad news?

Well, there’s quite a lot of bad news:

  • lots of machines have Thunderbolt ports (you can find pictures of both the port and connectors on Wikipedia’s Thunderbolt page, in case you’re not sure whether your machine is affected);
  • machines are vulnerable even if you have full disk encryption;
  • Windows machines are vulnerable;
  • Linux machines are vulnerable;
  • Macintosh machines are vulnerable;
  • most machines with a Thunderbolt port from 2011 onwards are vulnerable;
  • although protection is available on some newer machines (from around 2019)
    • the extent of its efficacy is unclear;
    • lots of manufacturers don’t implement it;
  • some protections that you can turn on break USB and other functionality;
  • one variant of the attack breaks Thunderbolt security permanently, meaning that the attacker won’t need to take your computer apart at all for subsequent attacks: they just need physical access to the port whilst your machine it turned on (or in suspend mode).

The worst thing to note is that full disk encryption does not help you if your computer is turned on or in suspend mode.

Note – I’ve been unable to find out whether any Chromebooks have Thunderbolt support. Please check your model’s specifications or datasheet to be certain.

What’s the good news?

The good news is short and sweet: if you turn your computer completely off, or ensure that it’s in Hibernate mode, then it’s not vulnerable. Thunderspy is a nasty attack, but it’s easily prevented.

What should I do?

  1. Turn your computer off when you leave it unattended, even for short amounts of time.

That was easy, wasn’t it? This is best practice anyway, and it turns out that hibernate mode is also OK. What the attacker is looking for is a powered-up, logged-on computer with Thunderbolt. If you can stop them finding a computer that meets those criteria, then you’re fine. Putting your computer into hibernate mode is also OK.

3 open/closed Covid-19 contact tracing questions

All projects are not created equal.

One of the cheering things about the pandemic crisis in which we find ourselves is the vast up-swell of volunteering that we are seeing across the world. We are seeing this equally across the IT sector, and one of the areas where work is being done is in apps to help track Covid-19. Specifically, there is an interest in Covid-19 contact tracing, or tracking, apps for our mobile[0] phones. These aren’t apps which keep an eye on whether you’ve observed lock-down procedures, but which attempt to work out who has been in contact with whom, and work out from that, once we know that one person is infected with Covid-19, what the likely spread of the virus will be.

There are lots of contact tracing initiatives out there, from Pep-Pt from the European Union to Singapore’s TraceTogether, from the University of Washington’s PACT to MIT’s PACT[1]. Google and Apple are – unprecedentedly – working on an app together. There are lots of ways of comparing these apps and projects, but in today’s article, I want to suggest three measures which can help you consider them from the point of view of “openness”. As regular readers of this blog will know, I’m a big fan of open source – not just for software, but for data, management and the rest – and I believe that there’s also a strong correlation here with civil or human rights. There are lots of ways to compare these apps, but these three measures are not too technical, and can help us get a grip on the likelihood that some of the apps (and associated projects) may impinge on privacy and other issues about which we care. I don’t want the data generated from apps that I download onto my phone to be used now or in the future to curtail my, or other people’s civil or human rights, for blackmail or even for unapproved commercial gain.

1. Open source

Our first question must be: “is the app open source?” If the answer is “no”, then we have no way to know what is being captured, and therefore how it is being used. If the app is closed source, it could be collecting any data from pretty much any measuring device on our phones, including photo, video, audio, Bluetooth, wifi, temperature, GPS or accelerometer. We can try restricting access to these measurements, but such controls have not always been effective, understanding the impact of turning them off is rarely simple, and people frankly rarely bother to check them anyway. Equally bad is the fact that with closed source, you can’t have any idea of how good the security is, nor any chance to criticise and improve it. This is something about which I’ve written many times, including in my articles Disbelieving the many eyes hypothesis and Trust & choosing open source. Luckily, it seems that the majority of contact tracing apps are open source, but please be careful, and reject any which are not.

2 Centralised or distributed

In order to make sense of all the data that these apps collect, there needs to be a centralised[2] store where it can be processed, right? It’s common sense.

Actually, no. Although managing and processing data in one place can be much easier, there are ways to store data in a distributed manner, and allow the sorts of processing needed for contact tracing to take place. It may be more complex, but it also makes it much, much more difficult for governments, corporations or malicious actors to misuse this information. And we should be clear that this will be what happens if the data is made available. Maybe the best governments and the best corporations will be well-behaved by their standards, but a) those are not necessarily the standards that I or others will endorse and b) what about malicious actors and governments and corporations which are not “the best”?

3 Location or proximity tracking

This might seem like another obvious choice: if you want to be finding out who was in contact with whom, then the way to do it is see who was where, and when. GPS tracking – and associated technologies like wifi access point location tracking – combined with easily available time data, would give the ability to work out who was in a particular place at the same time as other people. This is true, but it also provides enormous opportunities for misuse, particularly when the data is held centrally (see above). An alternative is to use sensors like Bluetooth or NFC[3], to allow phones to collect information about other phones (or devices) with which they have been in contact and when. This is more easily anonymised – or pseudonymised – allowing information to be passed to the owners of those phones, but at the same time more difficult to misuse by governments, corporations and malicious actors.

There are other issues to consider, one of which is that these sensors were not designed for this type of use, and we may be sacrificing accuracy if we choose this option. On the other hand, many interactions between people occur indoors, where GPS is much less effective anyway, and these types of technologies may help.

You could argue that this measurement is not about “openness” in itself, but it is a key indicator to whether the information collected can be used in ways which are far from open.

Conclusion

There are many other questions we can ask about Covid-19 contact tracing apps, some of which are related to openness, and some of which are not. These include:

  • Coverage
    • not all demographics have – or use – phones as much as the rest of the population, including the poor, the elderly, and certain religious groups. How effective will such projects be if they have reduced access to these groups?
    • older devices may have less accurate sensors, or not have some of the capabilities required by the apps. What is more, there may be a correlation between use of these older devices with some of the demographics noted above.
    • some people rarely update the apps on their phones, so even if they load an initial version of an app, newer versions, with functionality or security improvements, are likely to be unequally distributed across the set of devices.
  • Removal – how easy will it be to remove the application fully, what are the consequences of not doing so, and how likely are people to do so anyway[4]?
  • Will use of these apps by mandatory or voluntary? If the former, there are serious concerns about civil or human rights, not to mention the problems noted above about coverage.

All of these questions are important, but not directly related to the question of the “openness” of the apps and projects. However, we have, right now, some great opportunities to work with and influence some really important projects for public health and well-being, and I believe that it is important that we consider the questions I’ve raised about openness before endorsing, installing or using any of the apps that are being created.


0 – or “cell”, if you’re in North America.

1 – yes, they chose the same acronym. Yes, it is confusing.

2 – or, I supposed, “centralized”, depending on your geography.

3 – “Near Field Communication” – the same capability used when you do contactless payment with your phone or credit/debit card.

4 – how many apps do you still have on your phone that you’ve not even opened for 3 months? Yup, me too.

Post-Covid, post-open?

We are inventive, we are used to turning technologies to good.

The world of lockdown to which we’re becoming habituated at the moment has produced some amazing upsides. The number of people volunteering, the resurgence of local community initiatives, the selfless dedication of key workers across the world and the recognition of their sacrifice by the general public are among the most visible. As many regular readers of this blog are likely to be aware, there has also been an outpouring of interest and engagement in software- and hardware-related projects to help, from infection-tracking apps to 3D-printing of PPE[0]. Companies have made training and educational materials available for free, and there are attempts around the world to engage and contribute to the public commonwealth.

Sadly, not all of the news is good. There has been a rise in phishing attacks, and the lack of appropriate or sufficient security in commonly-used apps such as Zoom has become frightenly evident[1]. There’s an article to write here about the balance between security, usability and cost, but I’m going to save that for another day.

Somewhere in the middle, between the obvious positives and obvious negatives, there are some developments which most of us probably accept at necessary, but which aren’t things that we’d normally welcome. Beyond the obvious restrictions on movement and public gatherings, there are a number of actions which governments, in particular, a retaking which have generally negative impacts on human rights and civil liberties, as outlined in this piece by The Guardian. The article lists numerous examples of governments imposing, or considering the imposition of, measures which would normally be quickly attacked by human rights groups, and resisted by most citizens. Despite the headline, which suggests that the article will deal with how difficult these measures will be to remove after the end of the crisis, there is actually little discussion, beyond a note that “[w]hether that surveillance is eventually rolled back will depend on public oversight.”

I think that we need to go beyond just “oversight” and start planning now for public action. In the communities in which I live and work, there is a general expectation that the world – software, management, government, data – is becoming more, not less open. We are in grave danger of losing that openness even once the need for these government measures diminish. Governments – who will see the wider intelligence-gathering and control opportunities of these changes – will espouse the view that “we need these measures in place in order to be able to react quickly if the same thing happens again”, and, if we’re not careful, public sentiment, bruised and bloodied by the pandemic, will quietly acquiesce, and we will see improvements in human and civil rights rolled back decades, and damaged further by the availability of cheap, mobile, networked technology.

If we believe that openness is a public good, then we need to think how to counter the arguments which we will hear from governments, and be ready to be vocal – not just with counter-arguments, but with counter-proposals. This pandemic is unlike either of the World Wars of the 20th Century, when a clear ending was marked, and there was the opportunity (sadly denied to many citizens of the former USSR) to regain civil liberties and roll back the restrictions of the war years. Nor is it even like the aftermath of the 9/11, that event which has impacted the intelligence and security landscape of the past two decades, where there is (was?) at least a set of (posited) human foes to target. In the case of the Covid-19 pandemic, the “enemy” is amorphous and will be around for decades to come. The measures to combat it – and its successors – will only be slowly reduced, and some will not be.

We need to fight against those measures which are unnecessary, and we need to find alternatives – transparent, public alternatives – to measures which may have some positive effects, but whose overall impact on society and human rights is clearly negative. In a era where big data is becoming pervasive, and the tools to mine it tractable, we need to provide international mechanisms to share and use that data in ways which do not benefit any single government, bloc, or section of society. We are inventive, we are used to turning technologies to good. This is the time we need to do it, and do it quickly. We can make a difference by being open, but we need to start now.


0 – Personal Protection Equipment.

1 – although note that the company is reported to be making improvements to at least one area of concern to some – routing of traffic through China.

No security without an architecture

Your diagrams don’t need to be perfect. But they do need to be there.

I attended a virtual demo this week. It didn’t work, but none of us was stressed by that: it was an internal demo, and these things happen. Luckily, the members of the team presenting the demo had lots of information about what it would have shown us, and a particularly good architectural diagram to discuss. We’ve all been in the place where the demo doesn’t work, and all felt for the colleague who was presenting the slidedeck, and on whose screen a message popped up a few slides in, saying “Demo NO GO!” from one of her team members.

After apologies, she asked if we wanted to bail completely, or to discuss the information they had to hand. We opted for the latter – after all, most demos which aren’t foregrounding user experience components don’t show much beyond terminal windows that most of us could fake up in half an hour or so anyway. She answered a couple of questions, and then I piped up with one about security.

This article could have been about the failures in security in a project which was showing an early demo: another example of security being left till late (often too late) in the process, at which point it’s difficult and expensive to integrate. However, it’s not. It clear that thought had been given to specific aspects of security, both on the network (in transit) and in storage (at rest), and though there was probably room for improvement (and when isn’t there?), a team member messaged me more documentation during the call which allowed me to understand of the choices the team had made.

What this article is about is the fact that we were able to have a discussion at all. The slidedeck included an architecture diagram showing all of the main components, with arrows showing the direction of data flows. It was clear, colour-coded to show the provenance of the different components, which were sourced from external projects, which from internal, and which were new to this demo. The people on the call – all technical – were able to see at a glance what was going on, and the team lead, who was providing the description, had a clear explanation for the various flows. Her team members chipped in to answer specific questions or to provide more detail on particular points. This is how technical discussions should work, and there was one thing in particular which pleased me (beyond the fact that the project had thought about security at all!): that there was an architectural diagram to discuss.

There are not enough security experts in the world to go around, which means that not every project will have the opportunity to get every stage of their design pored over by a member of the security community. But when it’s time to share, a diagram is invaluable. I hate to think about the number of times I’ve been asked to look at project in order to give my thoughts about security aspects, only to find that all that’s available is a mix of code and component documentation, with no explanation of how it all fits together and, worse, no architecture diagram.

When you’re building a project, you and your team are often so into the nuts and bolts that you know how it all fits together, and can hold it in your head, or describe the key points to a colleague. The problem comes when someone needs to ask questions of a different type, or review the architecture and design from a different slant. A picture – an architectural diagram – is a great way to educate external parties (or new members of the project) in what’s going on at a technical level. It also has a number of extra benefits:

  • it forces you to think about whether everything can be described in this way;
  • it forces you to consider levels of abstraction, and what should be shown at what levels;
  • it can reveal assumptions about dependencies that weren’t previously clear;
  • it is helpful to show data flows between the various components
  • it allows for simpler conversations with people whose first language is not that of your main documentation.

To be clear, this isn’t just a security problem – the same can go for other non-functional requirements such as high-availability, data consistency, performance or resilience – but I’m a security guy, and this is how I experience the issue. I’m also aware that I have a very visual mind, and this is how I like to get my head around something new, but even for those who aren’t visually inclined, a diagram at least offers the opportunity to orient yourself and work out where you need to dive deeper into code or execution. I also believe that it’s next to impossible for anybody to consider all the security implications (or any of the higher-order emergent characteristics and qualities) of a system of any significant complexity without architectural diagrams. And that includes the people who designed the system, because no system exists on its own (or there’s no point to it), so you can’t hold all of those pieces in your head of any length of time.

I’ve written before about the book Building Evolutionary Architectures, which does a great job in helping projects think about managing requirements which can morph or change their priority, and which, unsurprisingly, makes much use of architectural diagrams. Enarx, a project with which I’m closely involved, has always had lots of diagrams, and I’m aware that there’s an overhead involved here, both in updating diagrams as designs change and in considering which abstractions to provide for different consumers of our documentation, but I truly believe that it’s worth it. Whenever we introduce new people to the project or give a demo, we ensure that we include at least one diagram – often more – and when we get questions at the end of a presentation, they are almost always preceded with a phrase such as, “could you please go back to the diagram on slide x?”.

I nearly published this article without adding another point: this is part of being “open”. I’m a strong open source advocate, but source code isn’t enough to make a successful project, or even, I would add, to be a truly open source project: your documentation should not just be available to everybody, but accessible to everyone. If you want to get people involved, then providing a way in is vital. But beyond that, I think we have a responsibility (and opportunity!) towards diversity within open source. Providing diagrams helps address four types of diversity (at least!):

  • people whose first language is not the same as that of your main documentation (noted above);
  • people who have problems reading lots of text (e.g. those with dyslexia);
  • people who think more visually than textually (like me!);
  • people who want to understand your project from different points of view (e.g. security, management, legal).

If you’ve ever visited a project on github (for instance), with the intention of understanding how it fits into a larger system, you’ll recognise the sigh of relief you experience when you find a diagram or two on (or easily reached from) the initial landing page.

And so I urge you to create diagrams, both for your benefit, and also for anyone who’s going to be looking at your project in the future. They will appreciate it (and so should you). Your diagrams don’t need to be perfect. But they do need to be there.

Not quantum-safe, not tamper-proof, not secure

Let’s make security “marketing-proof”. Or … maybe not.

If there’s one difference that you can use to spot someone who takes security seriously, it’s this: they don’t make absolute statements about security. I’m going to be a bit contentious here, and I’m sorry if it upsets some people who do take security seriously, but I’m of the very strong opinion that we should never, ever say that something is “completely secure”, “hack-proof” or even just “secured”. I wrote a few weeks ago about lazy journalism, but it pains me even more to see or hear people who really should know better using such absolutes. There is no “secure”, and I’d love to think that one day I can stop having to say this, but it comes up again and again.

We, as a community, need to be careful about the words and phrases that we use, because it’s difficult enough to educate the rest of the world about what we do without allowing non-practitioners to believe that we (or they) can take a system or component and make it so safe that it cannot be compromised or go wrong. There are two particular bug-bears that are getting to me at the moment – and that’s before I even start on the one which rules them all, “zero-trust”, which makes my skin crawl and my hackles rise whenever I hear it used[1] – and they are (as you may have already guessed from the title of this article):

  • quantum-proof
  • tamper-proof

I’ll start with the latter, because it’s more clear cut (and easier to explain). Some systems – typically hardware systems – are deployed in environments where bad people might mess with them. This, in the trade, is called “tampering”, and it has a slightly different usage from the normal meaning, in that it tends to imply that the damage done to a system or component was done with the intention that the damage didn’t necessarily stop its normal operation, but did alter it in such a way that the attacker could gain some advantage (often, but not always, snooping on activities being performed). This may have been the intention, but it may be that the damage did actually stop or at least effect normal operation, whether or not the attacker gained the advantage they were attempting. The problem with saying that any system is tamper-proof is that it clearly isn’t, particularly if you accept the second part of the definition, but even, possibly if you don’t. And it’s pretty much impossible to be sure, for the same reason that the adage that “any fool can create a cryptographic protocol that he/she can’t break” is true: you can’t assess the skills and abilities of all future attackers of your system. The best you can do is make it tamper-evident: put such controls in place that it should be clear if someone tries to tamper with the system[3].

“Quantum-safe” is another such phrase. It refers to cryptographic protocols or primitives which are designed to be resistant to attacks by quantum computers. The phrase “quantum-proof” is also used, and the problem with both of these terms is that, since nobody has yet completed a quantum computer of sufficient complexity even to be try, we can’t be sure. Even once they do, we probably won’t be sure, as people will probably come up with new and improved ways of using them to attack the protocols and primitives we’ve been describing. And what’s annoying is that the key to what we should be saying is actually in the description I gave: they are meant to be resistant to such attacks. “Quantum-resistant” is a much more descriptive and accurate phrase[5], so why not use it?

The simple answer to that question, and to the question of why people use phrases like “tamper-proof” and “secure” is that it makes better marketing copy. Ill-informed customers are more likely to buy something which is “safe” or which is “proof” against something, rather than evidencing it, or being resistant to it. Well, our part of our jobs as security professionals is to try to educate those customers, and make them less ill-informed[6]. Let’s make security “marketing-proof”. Or … maybe not.


1 – so much so that I’m actually writing a book at it[2].

2 – not just the concept of “zero-trust”, but about trust in general.

3 – sometimes, the tamper-evidence is actually intentionally destroying the capabilities a system so that you can be pretty sure that the attacker wasn’t able to make it do things it wasn’t supposed to[4].

4 – which is pretty cool, though it does mean that you can’t make it do the things it was supposed to either, of course.

5 – well, I’m assuming that most of such mechanisms are resistant, of course…

6 – I fully accept that “better-informed” would be better choice of phrase here.

Isolationism – not a 4 letter word (in the cloud)

Things are looking up if you’re interested in protecting your workloads.

In the world of international relations, economics and fiscal policy, isolationism doesn’t have a great reputation. I could go on, I suppose, if I did some research, but this is a security blog[1], and international relations, fascinating area of study though it is, isn’t my area of expertise: what I’d like to do is borrow the word and apply it to a different field: computing, and specifically cloud computing.

In computing, isolation is a set of techniques to protect a process, application or component from another (or a set of the former from a set of the latter). This is pretty much always a good thing – you don’t want another process interfering with the correct workings of your one, whether that’s by design (it’s malicious) or in error (because it’s badly designed or implemented). Isolationism, therefore, however unpopular it may be on the world stage, is a policy that you generally want to adopt for your applications, wherever they’re running.

This is particularly important in the “cloud”. Cloud computing is where you run your applications or processes on shared infrastructure. If you own that infrastructure, then you might call that a “private cloud”, and infrastructure owned by other people a “public cloud”, but when people say “cloud” on its own, they generally mean public clouds, such as those operated by Amazon, Microsoft, IBM, Alibaba or others.

There’s a useful adage around cloud computing: “Remember that the cloud is just somebody else’s computer”. In other words, it’s still just hardware and software running somewhere, it’s just not being run by you. Another important thing to remember about cloud computing is that when you run your applications – let’s call them “workloads” from here on in – on somebody else’s cloud (computer), they’re unlikely to be running on their own. They’re likely to be running on the same physical hardware as workloads from other users (or “tenants”) of that provider’s services. These two realisations – that your workload is on somebody else’s computer, and that it’s sharing that computer with workloads from other people – is where isolation comes into the picture.

Workload from workload isolation

Let’s start with the sharing problem. You want to ensure that your workloads run as you expect them to do, which means that you don’t want other workloads impacting on how yours run. You want them to be protected from interference, and that’s where isolation comes in. A workload running in a Linux container or a Virtual Machine (VM) is isolated from other workloads by hardware and/or software controls, which try to ensure (generally very successfully!) that your workload receives the amount of computing time it should have, that it can send and receive network packets, write to storage and the rest without interruption from another workload. Equally important, the confidentiality and integrity of its resources should be protected, so that another workload can’t look into its memory and/or change it.

The means to do this are well known and fairly mature, and the building blocks of containers and VMs, for instance, are augmented by software like KVM or Xen (both open source hypervisors) or like SELinux (an open source capabilities management framework). The cloud service providers are definitely keen to ensure that you get a fair allocation of resources and that they are protected from the workloads of other tenants, so providing workload from workload isolation is in their best interests.

Host from workload isolation

Next is isolating the host from the workload. Cloud service providers absolutely do not want workloads “breaking out” of their isolation and doing bad things – again, whether by accident or design. If one of a cloud service provider’s host machines is compromised by a workload, not only can that workload possibly impact other workloads on that host, but also the host itself, other hosts and the more general infrastructure that allows the cloud service provider to run workloads for their tenants and, in the final analysis, make money.

Luckily, again, there are well-known and mature ways to provide host from workload isolation using many of the same tools noted above. As with workload from workload isolation, cloud service providers absolutely do not want their own infrastructure compromised, so they are, of course, going to make sure that this is well implemented.

Workload from host isolation

Workload from host isolation is more tricky. A lot more tricky. This is protecting your workload from the cloud service provider, who controls the computer – the host – on which your workload is running. The way that workloads run – execute – is such that such isolation is almost impossible with standard techniques (containers, VMs, etc.) on their own, so providing ways to ensure and prove that the cloud service provider – or their sysadmins, or any compromised hosts on their network – cannot interfere with your workload is difficult.

You might expect me to say that providing this sort of isolation is something that cloud service providers don’t care about, as they feel that their tenants should trust them to run their workloads and just get on with it. Until sometime last year, that might have been my view, but it turns out to be wrong. Cloud service providers care about protecting your workloads from the host because it allows them to make more money. Currently, there are lots of workloads which are considered too sensitive to be run on public clouds – think financial, health, government, legal, … – often due to industry regulation. If cloud service providers could provide sufficient isolation of workloads from the host to convince tenants – and industry regulators – that such workloads can be safely run in the public cloud, then they get more business. And they can probably charge more for these protections as well! That doesn’t mean that isolating your workloads from their hosts is easy, though.

There is good news, however, for both cloud service providers and their teants, which is that there’s a new set of hardware techniques called TEEs – Trusted Execution Environments – which can provide exactly this sort of protection[2]. This is rapidly maturing technology, and TEEs are not easy to use – in that it can not only be difficult to run your workload in a TEE, but also to ensure that it’s running in a TEE – but when done right, they do provide the sorts of isolation from the host that a workload wants in order to maintain its integrity and confidentiality[3].

There are a number of projects looking to make using TEEs easier – I’d point to Enarx in particular – and even an industry consortium to promote open TEE adoption, the Confidential Computing Consortium. Things are looking up if you’re interested in protecting your workloads, and the cloud service providers are on board, too.


1 – sorry if you came here expecting something different, but do stick around and have a read: hopefully there’s something of interest.

2 – the best known are Intel’s SGX and AMD’s SEV.

3 – availability – ensuring that it runs fairly – is more difficult, but as this is a property that is also generally in the cloud service provider’s best interest, and something that can can control, it’s not generally too much of a concern[4].

4 – yes, there are definitely times when it is, but that’s a story for another article.

Coming to you in Japanese

We are now multi-lingual.

I have an exciting announcement, which is that starting this week, some of the articles on this blog will also be in Japanese.  My very talented Red Hat colleague Yuki Kubota showed an interest in translating some which she thought might be of interest to Japanese readers, and I jumped at the chance.  I’m very thrilled and humbled.

We’re still ironing out the process, but hopefully (if you already read Japanese), you’ll be able to read the following articles.

We’ll try to add the tag “Japanese” to each of these, as well.

So, a huge thank you to Yuki: we’d love comments – in English or Japanese!

Timely risk or risky times?

Being aware of “the long game”.

On Friday, 29th November 2019, Jack Merritt and Saskia Jones were killed in a terrorist attack.  A number of members of the public (some with with improvised weapons) and of the emergency services acted with great heroism.  I wanted to mention the mention the names of the victims and to praise those involved in stopping him before mentioning the name of the attacker: Usman Khan.  The victims, the attacker were taking part in an offender rehabilitation conference to help offenders released from prison to reintegrate into society: Khan had been convicted to 16 years in prison for terrorist offences.

There’s an important formula that everyone involved in risk – and given that IT security is all about mitigating risk, that’s anyone involved in security – should know. It’s usually expressed thus:

Risk = likelihood x impact

Sometimes likelihood is sometimes expressed as “probability”, impact as “consequence” or “loss”, and I’ve seen some other variants as well, but the version above is generally sufficient for most purposes.

Using the formula

How should you use the formula? Well, it’s most useful for comparing risks and deciding how to mitigate them. Humans are terrible at calculating risk, and any tools that help them[1] is good.  In order to use this formula correctly, you want to compare risks over the same time period.  You could say that almost any eventuality may come to pass over the lifetime of the universe, but comparing the risk of losing broadband access to the risk of your lead developer quitting for another company between the Big Bang and the eventual heat death of the universe is probably not going to give you much actionable information.

Let’s look at the two variables that we need to have in order to calculate risk.  We’ll start with the impact, because I want to devote most of this article to the other part: likelihood.

Impact is what the damage will be if the risk happens.  In a business context, you want to look at the risk of your order system being brought down for a week by malicious attackers.  You might calculate that you would lose £15,000 in orders.  On top of that, there might be a loss of reputation which you might calculate at £30,000.  Fixing the problem might add £10,000.  Add these together, and the impact is £55,000.

What’s the likelihood?  Well, remember that we need to consider a particular time period.  What you choose will depend on what you’re interested in, but a classic use is for budgeting, and so the length of time considered is often a year.  “What is the likelihood of my order system being brought down for a week by malicious attackers over the next twelve months?” is the question you want to ask.  If you decide that it’s 0.005 (or 0.5%), then your risk is calculated thus:

Risk = 0.005 x 55,000

Risk = 275

The units don’t really matter, because what you want to do is compare risks.  If the risk of your order system being brought down through hardware failure is higher (say 500), then you should probably balance the amount of resources you assign to mitigate these risks accordingly.

Time, reputation, trust and risk

What I’m interested in is a set of rather more complicated risks, however: those associated with human behaviour.  I’m very interested in trust, and one of the interesting things about trust is how we decide to trust people.  One way is by their reputation: if someone keeps behaving well over a long period, then we tend to trust them more – or if badly, then to trust them less[2].  If we trust someone more, our calculation of risk is likely to be strongly based on that trust, as our view of the likelihood of a behaviour at odds with the reputation that person holds will be informed by that.

This makes sense: in the absence of perfect information about humans, their motivations and intentions, our view of risk must be based on something, and reputation is actually a fairly good measure for that.  We might say that the likelihood of a customer defaulting on payment terms reduces year by year as we start to think of them as a “trusted customer”.  As the likelihood reduces, we may decide to increase the amount we lend to them – and thereby the impact of defaulting – to keep the risk about the same, year on year.

The risk here is what is sometimes called “playing the long game”.  Humans sometimes manipulate their reputation, or build up a reputation, in order to perform an action once they have gained trust.  Online sellers my make lots of “good” sales in order to get a 5 star rating over time, only to wait and then make a set of “bad” sales, where they don’t ship goods at all, and then just pocket the money.  Or, they may make many small sales in order to build up a good reputation, and then use that reputation to make one big sale which they have no intention of fulfilling.  Online selling sites are wise to some of these tricks, and have algorithms to try to protect buyers (in fact, the same behaviour can be used by sellers in some cases), but these are not perfect.

I’d like to come back to the London Bridge attack.  In this case, it seems likely that the attacker bided his time over many years, behaving well, and raising his “reputation” among those who knew him – the prison staff, parole board, rehabilitation conference organisers, etc. – so that he had the opportunity to perform one major action at odds with that reputation.  The heroism of those around him stopped him being as successful as he may have hoped, but still at the cost of two innocent lives and several serious injuries.

There is no easy way to deal with such issues.  We need reputation, and we need to allow people to show that they have changed and can be integrated into society, but when we make risk calculations based on reputation in any sphere, we should take care to consider whether actors are playing a long game, and what the possible ramifications would be if they were to act at odds with that reputation.

I noted above that humans are bad at calculating risk, and to follow our example of the non-defaulting customer, one mistake might be to increase the credit we give to that customer beyond the balance of the increase of reputation: actually accepting higher risk than we would have done previously, because we consider them trustworthy.  If we do this, we’ve ceased to use the risk formula, and have started to act irrationally.  Don’t do that.

 


1 – OK, then: “us”.

2 – I’m writing this in the lead up to a UK General Election, and it occurs to me that we actually don’t apply this to most of our politicians.

コンフィデンシャルコンピューティング ー新しいHTTPSとは?

デフォルトで付いてくるセキュリティなんてありません。

この記事は
https://aliceevebob.com/2019/12/03/confidential-computing-the-new-https/ を翻訳したものです。
ここ数年、「http://…&#8221」のようなウェブサイトはなくなってきました。これはやっと業界がウェブサイトにセキュリティが「ある」ことに気付いたからです。と同時にサーバーとクライアントどちらともHTTPS通信の設定をすることが容易になったからです。

同じような動きがクラウド、エッジ、IoT、ブロックチェーン、AI/MLなどのコンピューティングにも現れることでしょう。

ストレージ内に保存するデータやネットワークで転送されるデータはは暗号化すべきである、とは認識されていました。けれどプロセスしている間使用されているデータを暗号化するのは難しく、高価でした。

Trusted Execution Environment (TEE)などのハードウェアを使って、使用中のデータやアルゴリズムを保護します。コンフィデンシャルコンピューティングは、ホストシステムや攻撃されやすい環境のデータを保護するのです。

TEE とEnarx Project(Nathaniel McCallumと共同創立しているプロジェクトです、参考: Enarx for everyone (a quest) and Enarx goes multi-platform )に付いては何度かブログに投稿しています。
EnarxはTEEを使っていて、Enarkでプラットフォームや使用言語に依存せず、機密性が必要なアプリケーションやマイクロサービスなどのコンポーネントを安全に信頼できないホストにデプロイすることができます。

Enarxはもちろん完全にオープンソースで(Apache2.0のライセンス使用)です。
ワークロードを信頼できないホストで稼働させるのはコンフィデンシャルコンピューティングが保証するところです。これからは下記のような場合の機密性があるデータにコンフィデンシャルコンピューティングが普通に使われるようになるでしょう。:

ストレージ:ストレージインフラを完全に信用できないので、保存したデータは暗号化したい
ネットワーク:ネットワークインフラを完全に信用できないので、転送中のデータを暗号化したい
コンピューティング:コンピューティングインフラを信用できないので、使用中のデータを暗号化したい

信頼信用に関してはもっと言いたいことはあるのですが「完全に」という言葉が大切です。(これは推敲の最中に書き足しました。)
パケットを送ったりやブロックを保存したりするかどうか、上記のどのケースでもCPUやファームウェアなど、インフラをある程度信頼しなくてはいけません。というのも、それらを信頼できなければコンピューティングなんてできません。
(準同型暗号という技術があり提供されつつありますが、まだ限定的で技術も未完成です)

CPU周りで見つかる脆弱性があると、CPUを完全に信頼するかどうか、また乗っているホストの物理攻撃に完全に安全がどうか、というのは何度も出てくる疑問です。
どちらの疑問にも、「いいえ」と答えられますね。しかし拡張性とデプロイの費用の問題から現状ではベストな技術でしょう。

二番目の疑問については、誰も(もしくは他の技術)完全に安全だと偽装できないということです。私たちがすべきなのはthreat model を考慮し、この場合ではTEEが特定の要件に対して十分なセキュリティを提供できるかどうか決定する、ということです。

一つ目の疑問に関してはEnarxの当てはまるモデルは、特定のCPUセットを信頼するかどうかデプロイメントの際に全て決め打ちする、ということでしょう。
例えばQというベンダのR世代のチップに脆弱性が見つかったとしましょう。「ワークロードをQから出ているR世代のCPUにはデプロイさせず、Q社のSタイプ、Tタイプ、Uタイプのチップと、P社、M社、N社のCPUにはデプロイOKとする」と宣言できれば簡単ですね。

コンフィデンシャルコンピューティングが注目されていますが、そこに適応させるには3つの変化のステージがあると考えています。

1 ハードウェアの稼働性:
TEEがサポートされているハードウェアが手に入るようになったのはここ半年から一年の間です。IntelのSGXやAMDのSEVなど市場で鍵となる製品が出てきだことからもわかります。
これからもTEEが使えるハードウェアの製品が出てくると予想されます。

2 業界の受け入れ状態:
アプリケーションのデプロイメントとしてクラウドが急激に受け入れられているのに合わせて法規制や整備は扱うデータを保護するよう、組織や団体に対して要求を増やしてきています。
組織や団体は、信頼性のないホストでの機密性の高いアプリケーション(もしくは機密データを扱うアプリ)の稼働方法にざわざわしてきています。正確には、彼らが完全に信用できないホスト上で、のアプリに関してですね。

これは別に驚くことではないのです。もしマーケットが投資に値するものではなければ、チップベンダーはこの技術に投資しないでしょう。
Linux FoundationのConfidential Computing Consortium (CCC)の体制は、どれくらい業界がコンフィデンシャルコンピューティングの共通使用モデルを見つけようとしているか、オープンソースプロジェクトにこのような技術採用を勧めているか、の別のよい例ですね。

その一つがRed Hatが始めたEnarxはCCCのプロジェクトです。

3 オープンソース:
ブロックチェーンのように、コンフィデンシャルコンピューティングはオープンソースを使うことがとても簡単な技術の一つです。

機密性の高いアプリケーションを動かす場合、動いているもの自体を信用しなくてはいけません。CPUやファームウェアのようなものではなく、TEEの中でワークロードの実際の実行を手伝うフレームワークのことです。

良い言い回しがあります。
「私はホストマシーンとソフトウェアスタックが信用できないからTEEを使うんだ」

しかしTEEのソフトウェア環境に可視性がなければ、ただソフトウェアを別の不可視性の高い環境に移しただけです。
TEEのオープンソースによって、あなたやコミュニティ5トはプロプライエタリのベンダー仕様ソフトウェアにはできないチェックと監査ができるようになるのです。

このようにCCCはオープンな開発モデルをであるLinux Foundationに属しているのであり、TEEに関するソフトウェアプロジェクトにCCCに参加するよう、またオープンソースにするように推進しているのです。

このハードウェアの可動性、業界の受け入れとオープンソースの三つがここ15から20年の技術の変革を促進するものだと考えます。
ブロックチェーン、AI、クラウドコンピューティング、ウェブスケールコンピューティング、ビッグデータ、インターネット販売は全てこの三つが合わさって、今までになかった変革を業界にもたらしたのです。

デフォルトのセキュリティはここ何十年か必要だと訴えられているものですが、まだ達成されていません。正直なところ、それが本当に実現するかはわかりません。

しかし新しい技術が実現することで、業界で、特定のユースケースにセキュリティが浸透することがもっと実用的になり、そこに期待も集まるでしょう。

コンフィデンシャルコンピューティングは次の新しい変革を迎えようとしています。
そして読者の皆さんがその革命に参加する日が来るでしょう。オープンソースなのですから。
元の記事:https://aliceevebob.com/2019/12/03/confidential-computing-the-new-https/
2019年12月3日 Mike Bursell

 

Confidential computing – the new HTTPS?

Security by default hasn’t arrived yet.

Over the past few years, it’s become difficult to find a website which is just “http://…”.  This is because the industry has finally realised that security on the web is “a thing”, and also because it has become easy for both servers and clients to set up and use HTTPS connections.  A similar shift may be on its way in computing across cloud, edge, IoT, blockchain, AI/ML and beyond.  We’ve know for a long time that we should encrypt data at rest (in storage) and in transit (on the network), but encrypting it in use (while processing) has been difficult and expensive.  Confidential computing – providing this type of protection for data and algorithms in use, using hardware capabilities such as Trusted Execution Environments (TEEs) – protects data on hosted system or vulnerable environments.

I’ve written several times about TEEs and, of course, the Enarx project of which I’m a co-founder with Nathaniel McCallum (see Enarx for everyone (a quest) and Enarx goes multi-platform for examples).  Enarx uses TEEs, and provides a platform- and language-independent deployment platform to allow you safely to deploy sensitive applications or components (such as micro-services) onto hosts that you don’t trust.  Enarx is, of course, completely open source (we’re using the Apache 2.0 licence, for those with an interest).  Being able to run workloads on hosts that you don’t trust is the promise of confidential computing, which extends normal practice for sensitive data at rest and in transit to data in use:

  • storage: you encrypt your data at rest because you don’t fully trust the underlying storage infrastructure;
  • networking: you encrypt your data in transit because you don’t fully trust the underlying network infrastructure;
  • compute: you encrypt your data in use because you don’t fully trust the underlying compute infrastructure.

I’ve got a lot to say about trust, and the word “fully” in the statements above is important (I actually added it on re-reading what I’d written).  In each case, you have to trust the underlying infrastructure to some degree, whether it’s to deliver your packets or store your blocks, for instance.  In the case of the compute infrastructure, you’re going to have to trust the CPU and associate firmware, just because you can’t really do computing without trusting them (there are techniques such as homomorphic encryption which are beginning to offer some opportunities here, but they’re limited, and the technology still immature).

Questions sometimes come up about whether you should fully trust CPUs, given some of the security problems that have been found with them and also whether they are fully secure against physical attacks on the host in which they reside.

The answer to both questions is “no”, but this is the best technology we currently have available at scale and at a price point to make it generally deployable.  To address the second question, nobody is pretending that this (or any other technology) is fully secure: what we need to do is consider our threat model and decide whether TEEs (in this case) provide sufficient security for our specific requirements.  In terms of the first question, the model that Enarx adopts is to allow decisions to be made at deployment time as to whether you trust a particular set of CPU.  So, for example, of vendor Q’s generation R chips are found to contain a vulnerability, it will be easy to say “refuse to deploy my workloads to R-type CPUs from Q, but continue to deploy to S-type, T-type and U-type chips from Q and any CPUs from vendors P, M and N.”