Defining the Edge

How might we differentiate Edge computing from Cloud computing?

This is an edited excerpt from my forthcoming book on Trust in Computing and the Cloud for Wiley.

There’s been a lot of talk about the Edge, and almost as many definitions as there are articles out there. As usual on this blog, my main interest is around trust and security, so this brief look at the Edge concentrates on those aspects, and particularly on how we might differentiate Edge computing from Cloud computing.

The first difference we might identify is that Edge computing addresses use cases where consolidating compute resource in a centralised location (the typical Cloud computing case) is not necessarily appropriate, and pushes some or all of the computing power out to the edges of the network, where it can computing resources can process data which is generated at the fringes, rather than having to transfer all the data over what may be low-bandwidth networks for processing. There is no generally accepted single industry definition of Edge computing, but examples might include:

  • placing video processing systems in or near a sports stadium for pre-processing to reduce the amount of raw footage that needs to be transmitted to a centralised data centre or studio
  • providing analysis and safety control systems on an ocean-based oil rig to reduce reliance and contention on an unreliable and potentially low-bandwidth network connection
  • creating an Internet of Things (IoT) gateway to process and analyse data environmental sensor units (IoT devices)
  • mobile edge computing, or multi-access edge computing (both abbreviated to MEC), where telecommunications services such as location and augmented reality (AR) applications are run on cellular base stations, rather than in the telecommunication provider’s centralised network location.

Unlike Cloud computing, where the hosting model is generally that computing resources are consumed by tenants – customers of a public cloud, for instance- in the Edge case, the consumer of computing resources is often the owner of the systems providing them (though this is not always the case). Another difference is the size of the host providing the computing resources, which may range from very large to very small (in the case of an IoT gateway, for example). One important factor about most modern Edge computing environments is that they employ the same virtualisation and orchestration techniques as the cloud, allowing more flexibility in deployment and lifecycle management over bare-metal deployments.

A table comparing the various properties typically associated with Cloud and Edge computing shows us a number of differences.


Public cloud computingPrivate cloud computingEdge computing
LocationCentralisedCentralisedDistributed
Hosting modelTenantsOwnerOwner or tenant(s)
Application typeGeneralisedGeneralisedMay be specialised
Host system sizeLargeLargeLarge to very small
Network bandwidthHighHighMedium to low
Network availabilityHighHighHigh to low
Host physical securityHighHighLow
Differences between Edge computing and public/private cloud computing

In the table, I’ve described two different types of cloud computing: public and private. The latter is sometimes characterised as on premises or on-prem computing, but the point here is that rather than deploying applications to dedicated hosts, workloads are deployed using the same virtualisation and orchestration techniques employed in the public cloud, the key difference being that the hosts and software are owned and managed by the owner of the applications. Sometimes these services are actually managed by an external party, but in this case there is a close commercial (and concomitant trust) relationship to this managed services provider, and, equally important, single tenancy is assured (assuming that security is maintained), as only applications from the owner of the service are hosted[1]. Many organisations will mix and match workloads to different cloud deployments, employing both public and private clouds (a deployment model known as hybrid cloud) and/or different public clouds (a deployment model known as multi-cloud). All these models – public computing, private computing and Edge computing – share an approach in common: in most cases, workloads are not deployed to bare-metal servers, but to virtualisation platforms.

Deployment model differences

What is special about each of the models and their offerings if we are looking at trust and security?

One characteristic that the three approaches share is scale: they all assume that hosts will have with multiple workloads per host – though the number of hosts and the actual size of the host systems is likely to be highest in the public cloud case, and lowest in the Edge case. It is this high workload density that makes public cloud computing in particular economically viable, and one of the reasons that it makes sense for organisations to deploy at least some of their workloads to public clouds, as Cloud Service Providers can employ economies of scale which allow them to schedule workloads onto their servers from multiple tenants, balancing load and bringing sets of servers in and out of commission (a computation- and time-costly exercise) infrequently. Owners and operators of private clouds, in contrast, need to ensure that they have sufficient resources available for possible maximum load at all times, and do not have the opportunities to balance loads from other tenants unless they open up their on premises deployment to other organisations, transforming themselves into Cloud Service Providers and putting them into direct competition with existing CSPs.

It is this push for high workload density which is one of the reasons for the need for strong workload-from-workload (type 1) isolation, as in order to be able to maintain high density, cloud owners need to be able to mix workloads from multiple tenants on the same host. Tenants are mutual untrusting; they are in fact likely to be completely unaware of each other, and, if the host is doing its job well, unaware of the presence of other workloads on the same host as them. More important than this property, however, is a strong assurance that their workloads will not be negatively impacted by workloads from other tenants. Although negative impact can occur in other contexts to computation – such as storage contention or network congestion – the focus is mainly on the isolation that hosts can provide.

The likelihood of malicious workloads increases with the number of tenants, but reduces significantly when the tenant is the same as the host owner – the case for private cloud deployments and some Edge deployments. Thus, the need for host-from-workload (type 2) isolation is higher for the public cloud – though the possibility of poorly written or compromised workloads means that it should not be neglected for the other types of deployment.

One final difference between the models is that for both public and private cloud deployments the physical vulnerability of hosts is generally considered to be low[2], whereas the opportunities for unauthorised physical access to Edge computing hosts are considered to be much higher. You can read a little more about the importance of hardware as part of the Trusted Compute Base in my article Turtles – and chains of trust, and it is a fundamental principle of computer security that if an attacker has physical access to a system, then the system must be considered compromised, as it is, in almost all cases, possible to compromise the confidentiality, integrity and availability of workloads executing on it.

All of the above are good reasons to apply Confidential Computing techniques not only to cloud computing, but to Edge computing as well: that’s a topic for another article.


1 – this is something of a simplification, but is a useful generalisation.

2 – Though this assumes that people with authorised access to physical machines are not malicious, a proposition which cannot be guaranteed, but for which monitoring can at least be put in place.

Masks, vaccinations, social distancing – and cybersecurity

The mappingn the realms of cybersecurity and epidemiology is not perfect, but metaphors can be useful.

Waaaaay back in 2018 (which seems a couple of decades ago now), I wrote an article called Security patching and vaccinations: a surprising link. In those days, Spectre and Meltdown were still at the front of our minds, and the impact of something like Covid-19 was, if not unimaginable, then far from what most of us considered in our day-to-day lives. In the article, I argued that patching of software and vaccination (of humans or, I guess, animals[1]) had some interesting parallels (the clue’s in the title, if I’m honest). As we explore the impact of new variants of the Covid-19 virus, the note that “a particular patch may provide resistance to multiple types of attack of the same family, as do some vaccinations” seems particularly relevant. I also pointed out that although there are typically some individuals in a human population for whom vaccination is too risky, a broad effort to vaccinate the rest of the population has positive impact for them; in a similar way to how patching most systems in a deployment can restrict the number of “jumping off points” or attack.

I thought it might be interesting to explore other similarities between disease management in the human sphere with how we do things in cybersecurity, not because they are exact matches, but because they can be useful metaphors to explain to colleagues, family and friends what we do.

Vaccinations

We’ve looked a vaccinations a bit above: the key point here is that once a vulnerability is discovered, software vendors can release patches which, if applied correctly, protect the system from those attacks. This is not dissimilar in effect to the protection provided by vaccinations in human settings, though the mechanism is very different. Computer systems don’t really have an equivalent to anti-bodies or immune systems, but patches may – like vaccines – provide protection for more than one specific attack (think virus strain) if others exploit the same type of weakness.

Infection testing

As we have discovered since the rise of Covid-19, testing of the population is a vital measure to understand what other mechanisms need to be put in place to control infection. The same goes for cybersecurity. Testing the “health” of a set of systems, monitoring their behaviour and understanding which may be compromised, by which attacks, leveraging which vulnerabilities, is a key part of any cybersecurity strategy, and easily overlooked when everything seems to be OK.

Masks

I think of masks as acting a little like firewalls, or mechanisms like SELinux which act to prevent malicious programs from accessing parts of the system which you want to protect. Like masks, these mechanisms reduce the attack surface available to bad actors by stopping up certain “holes” in the system, making it more difficult to get in. In the case of firewalls, it’s network ports, and in the case of SELinux, it’s activities like preventing unauthorised system calls (syscalls). We know that masks are not wholly effective in preventing transmission of Covid-19 – they don’t give 100% protection from oral transmission, and if someone sneezes into your eye, for instance, that could lead to infection – but we also know that if two people are meeting, and they both wear masks, the chance of transmission from an infected to an uninfected person is reduced. Most cybersecurity controls aim mainly to protect the systems on which they reside, but a well thought-out deployment may also aim to put in controls to prevent attackers from jumping from one system to another.

Social distancing

This last point leads us to our final metaphor: social distancing. Here, we put in place controls to try to make it difficult for an attacker (or virus) to jump from one system to another (or human to another). While the rise of zero trust architectures has led to something of a down-playing of some of these techniques within cybersecurity, mechanisms such as DMZs, policies such as no USB drives and, at the extreme end, air-gapping of systems (where there is no direct network connection between them) all aim to create physical or logical barriers to attacks or transmission.

Conclusion

The mapping between controls in the realms of cybersecurity and epidemiology is not perfect, but metaphors can be useful in explaining the mechanisms we use and also in considering differences (is there an equivalent of “virus load” in computer systems, for instance?). If there are lessons we can learn from the world of disease managemet, then we should be keen to do so.


1 – it turns out that you can actually vaccinate plants, too: neat.

Dependencies and supply chains

A dependency on a component is one which that an application or component needs to work

Supply chain security is really, really hot right now. It’s something which folks in the “real world” of manufactured things have worried about for years – you might be surprised (and worried) how much effort aircraft operators need to pay to “counterfeit parts”, and I wish I hadn’t just searched online the words “counterfeit pharmaceutical” – but the world of software had a rude wake-up call recently with the Solarwinds hack (or crack, if you prefer). This isn’t the place to go over that: you’ll be able to find many, many articles if you search on that. In fact, many companies have been worrying about this for a while, but the change is that it’s an issue which is now out in the open, giving more leverage to those who want more transparency around what software they consume in their products or services.

When we in computing (particularly software) think about supply chains, we generally talk about “dependencies”, and I thought it might be useful to write a short article explaining the main dependency types.

What is a dependency?

A dependency on a component is one which that an application or component needs to work, and they are generally considered to come in two types:

  • build-time dependencies
  • run-time dependencies.

Let’s talk about those two types in turn.

Build-time dependencies

these are components which are required in order to build (typically compile and package) your application or library. For example, if I’m writing a program in Rust, I have a dependency on the compiler if I want to create an application. I’m actually likely to have many more run-time dependencies, however. How those dependencies are made visible to me will depend on the programming language and the environment that I’m building in.

Some languages, for instance, may have filesystem support built in, but others will require you to “import” one or more libraries in order to read and write files. Importing a library basically tells your build-time environment to look somewhere (local disk, online repository, etc.) for a library, and then bring it into the application, allowing its capabilities to be used. In some cases, you will be taking pre-built libraries, and in others, your build environment may insist on building them itself. Languages like Rust have clever environments which can look for new versions of a library, download it and compile it without your having to worry about it yourself (though you can if you want!).

To get back to your file system example, even if the language does come with built-in filesystem support, you may decide to import a different library – maybe you need some fancy distributed, sharded file system, for instance – from a different supplier. Other capabilities may not be provided by the language, or may be higher-level capabilities: JSON serialisation or HTTPS support, for instance. Whether that library is available in open source may have a large impact on your decision as to whether or not to use it.

Build-time dependencies, then, require you to have the pieces you need – either pre-built or in source code form – at the time that you’re building your application or library.

Run-time dependencies

Run-time dependencies, as the name implies, only come into play when you actually want to run your application. We can think of there being two types of run-time dependency:

  1. service dependency – this may not be the official term, but think of an application which needs to write some data to a window on a monitor screen: in most cases, there’s already a display manager and a window manager running on the machine, so all the application needs to do is contact it, and communicate the right data over an API. Sometimes, the underlying operating system may need to start these managers first, but it’s not the application itself which is having to do that. These are local services, but remote services – accessing a database, for instance – operate in the same sort of way. As long as the application can contact and communicate with the database, it doesn’t need to do much itself. There’s still a dependency, and things can definitely go wrong, but it’s a weak coupling to an external application or service.
  2. dynamic linking – this is where an application needs access to a library at run-time, but rather than having added it at build-time (“static linking”), it relies on the underlying operating system to provide a link to the library when it starts executing. This means that the application doesn’t need to be as large (it’s not “carrying” the functionality with it when it’s installed), but it does require that the version that the operating system provides is compatible with what’s expected, and does the right thing.

Conclusion

I’ve resisted the temptation to go into the impacts of these different types of dependency in terms of their impact on security. That’s a much longer article – or set of articles – but it’s worth considering where in the supply chain we consider these dependencies to live, and who controls them. Do I expect application developers to check every single language dependency, or just imported libraries? To what extent should application developers design in protections from malicious (or just poorly-written) dynamically-linked libraries? Where does the responsibility lie for service dependencies – particularly remote ones?

These are all complex questions: the supply chain is not a simple topic (partly because there is not just one supply chain, but many of them), and organisations need to think hard about how they work.

3 types of application security

There are security applications and there are applications which have security.

I’m indebted to my friend and colleague, Matt Smith, for putting me on the road to this article: he came up with a couple of the underlying principles that led me to what you see below. Thanks, Matt: we’ll have a beer (and maybe some very expensive wine) – one day.

There are security applications and there are applications which have security: specifically, security features or functionality. I’m not saying that there’s not a cross-over, and you’d hope that security applications have security features, but they’re used for different things. I’m going to use open source examples here (though many of them may have commercial implementations), so let’s say: OpenVPN is a security product, whose reason for existence is to provide a security solution, where as Kubernetes is not a security application, though it does include security features. That gives us two types of application security, but I want to add another distinction. It may sound a little arbitrary, at least from the point of the person designing or implementing the application, but I think it’s really important for those who are consuming – that is, buying, deploying or otherwise using an application. Here are the three:

  1. security application – an application whose main purpose is to solve a security-related problem;
  2. compliance-centric security – security within an application which aims to meet certain defined security requirements;
  3. risk-centric security – security within an application which aims to allow management or mitigation of particular risk.

Types 2 and 3 are subsets of the non-security application (though security applications may well include them!), and may not seem that different, until you look at them from an application user’s point of view. Those differences are what I want to concentrate on in this article.

Compliance-centric

You need a webserver – what do you buy (or deploy)? Well, let’s assume that you work in a regulated industry, or even just that you’ve spent a decent amount of time working on your risk profile, and you decide that you need a webserver which supports TLS 1.3 encryption. This is basically a “tick box” (or “check box” for North American readers): when you consider different products, any which do not meet this requirement are not acceptable options for your needs. They must be compliant – not necessarily to a regulatory regime, but to your specific requirements. There may also be more complex requirements such as FIPS compliance, which can be tested and certified by a third party – this is a good example of a compliance feature which has moved from a regulatory requirement in certain industries and sectors to a well-regarded standard which is accepted in others.

I think of compliance-centric security features as “no” features. If you don’t have them, the response to a sales call is “no”.

Risk-centric

You’re still trying to decide which webserver to buy, and you’ve come down to a few options, all of which meet your compliance requirements: which to choose? Assuming that security is going to be the deciding factor, what security features or functionality do they provide which differentiate them? Well, security is generally about managing risk (I’ve written a lot about this before, see Don’t talk security: talk risk, for example), so you look at features which allow you to manage risks that are relevant to you: this is the list of security-related capabilities which aren’t just compliance. Maybe one product provides HSM integration for cryptographic keys, another One Time Password (OTP) integration, another integrity-protected logging. Each of these allows you to address different types of risk:

  • HSM integration – protect against compromise of private keys
  • OTP integration – protect against compromise of user passwords, re-use of user passwords across sites
  • integrity-protected logging – protect against compromise of logs.

The importance of these features to you will depend on your view of these different risks, and the possible mitigations that you have in place, but they are ways that you can differentiate between the various options. Also, popular risk-centric security features are likely to morph into compliance-centric features as they are commoditised and more products support them: TLS is a good example of this, as is password shadowing in Operating Systems. In a similar way, regulatory regimes (in, for instance, the telecommunications, government, healthcare or banking sectors) help establish minimum risk profiles for applications where customers or consumers of services are rarely in a position to choose their provider based on security capabilities (typically because they’re invisible to them, or the consumers do not have the expertise, or both).

I think of risk-centric security features as “help me” features: if you have them, the response to a sales call is “how will they help me?”.

Why is this important to me?

If you are already a buyer of services, and care about security – you’re a CISO, or part of a procurement department, for instance – you probably consider these differences already. You buy security products to address security problems or meet specific risk profiles (well, assuming they work as advertised…), but you have other applications/products for which you have compliance-related security checks. Other security features are part of how you decide which product to buy.

If you are a developer, architect, product manager or service owner, though, think: what I am providing here? Am I providing a security application, or an application with security features? If the latter, how do I balance where I put my effort? In providing and advertisingcompliance-centric or risk-centric features? In order to get acceptance in multiple markets, I am going to need to address all of their compliance requirements, but that may not leave me enough resources to providing differentiating features against other products (which may be specific to that industry). On the other hand, if I focus too much on differentiation, you may miss important compliance features, and not get in the door at all. If you want to getting to the senior decision makers in a company or organisation and to be seen as a supplier of a key product – one which is not commoditised, but is differentiated from your competitors and really helps organisations manage risk – then you need to thinking about moving to risk-centric security features. But you really, really need to know what compliance-centric features are expected, as otherwise you’re just not going to get in the door.

The importance of hardware End of Life

Security considerations are important when considering End of Life.

Linus Torvald’s announcement this week that Itanium support is “orphaned” in the Linux kernel means that we shouldn’t expect further support for it in the future, and possibly that support will be dropped in the future. In 2019, floppy disk support was dropped from the Linux kernel. In this article, I want to make the case that security considerations are important when considering End of Life for hardware platforms and components.

Dropping support for hardware which customers aren’t using is understandable if you’re a proprietary company and can decide what platforms and components to concentrate on, but why do so in open source software? Open source enthusiasts are likely to be running old hardware for years – sometimes decades after anybody is still producing it. There’s a vibrant community, in fact, of enthusiasts who enjoying resurrecting old hardware and getting it running (and I mean really old: EDSAC (1947) old), some of whom enjoy getting Linux running on it, and some of whom enjoy running it on Linux – by which I mean emulating the old hardware by running it on Linux hardware. It’s a fascinating set of communities, and if it’s your sort of thing, I encourage you to have a look.

But what about dropping open source software support (which tends to centre around Linux kernel support) for hardware which isn’t ancient, but is no longer manufactured and/or has a small or dwindling user base? One reason you might give would be that the size of the kernel for “normal” users (users of more recent hardware) is impacted by support for old hardware. This would be true if you had to compile the kernel with all options in it, but Linux distributions like Fedora, Ubuntu, Debian and RHEL already pare down the number of supported systems to something which they deem sensible, and it’s not that difficult to compile a kernel which cuts that down even further – my main home system is an AMD box (with AMD graphics card) running a kernel which I’ve compiled without most Intel-specific drivers, for instance.

There are other reasons, though, for dropping support for old hardware, and considering that it has met its End of Life. Here are three of the most important.

Resources

My first point isn’t specifically security related, but is an important consideration: while there are many volunteers (and paid folks!) working on the Linux kernel, we (the community) don’t have an unlimited number of skilled engineers. Many older hardware components and architectures are maintained by teams of dedicated people, and the option exists for communities who rely on older hardware to fund resources to ensure that they keep running, are patched against security holes, etc.. Once there ceases to be sufficient funding to keep these types of resources available, however, hardware is likely to become “orphaned”, as in the case of Itanium.

There is also a secondary impact, in that however modularised the kernel is, there is likely to be some requirement for resources and time to coordinate testing, patching, documentation and other tasks associated with kernel modules, which needs to be performed by people who aren’t associated with that particular hardware. The community is generally very generous with its time and understanding around such issues, but once the resources and time required to keep such components “current” reaches a certain level in relation to the amount of use being made of the hardware, it may not make sense to continue.

Security risk to named hardware

People expect the software they run to maintain certain levels of security, and the Linux kernel is no exception. Over the past 5-10 years or so, there’s been a surge in work to improve security for all hardware and platforms which Linux supports. A good example of a feature which is applicable across multiple platforms is Address Space Layout Randomisation (ASLR), for instance. The problem here is not only that there may be some such changes which are not applicable to older hardware platforms – meaning that Linux is less secure when running on older hardware – but also that, even when it is possible, the resources required to port the changes, or just to test that they work, may be unavailable. This relates to the point about resources above: even when there’s a core team dedicated to the hardware, they may not include security experts able to port and verify security features.

The problem goes beyond this, however, in that it is not just new security features which are an issue. Over the past week, issues were discovered in the popular sudo tool which ships with most Linux systems, and libgcrypt, a cryptographic library used by some Linux components. The sudo problem was years old, and the libgcrypt so new that few distributions had taken the updated version, and neither of them is directly related to the Linux kernel, but we know that bugs – security bugs – exist in the Linux kernel for many years before being discovered and patched. The ability to create and test these patches across the range of supported hardware depends, yet again, not just on availability of the hardware to test it on, or enthusiastic volunteers with general expertise in the platform, but on security experts willing, able and with the time to do the work.

Security risks to other hardware – and beyond

There is a final – and possibly surprising – point, which is that there may sometimes be occasions when continuing support for old hardware has a negative impact on security for other hardware, and that is even if resources are available to test and implement changes. In order to be able to make improvements to certain features and functionality to the kernel, sometimes there is a need for significant architectural changes. The best-known example (though not necessarily directly security-related) is the Big Kernel Lock, or BLK, an architectural feature of the Linux kernel until 2.6.39 in 2011, which had been introduced to aid concurrency management, but ended up having significant negative impacts on performance.

In some cases, older hardware may be unable to accept such changes, or, even worse, maintaining support for older hardware may impose such constraints on architectural changes – or require such baroque and complex work-arounds – that it is in the best interests of the broader security of the kernel to drop support. Luckily, the Linux kernel’s modular design means that such cases should be few and far between, but they do need to be taken into consideration.

Conclusion

Some of the arguments I’ve made above apply not only to hardware, but to software as well: people often keep wanting to run software well past its expected support life. The difference with software is that it is often possible to emulate the hardware or software environment on which it is expected to run, often via virtual machines (VMs). Maintaining these environments is a challenge in itself, but may actually offer a via alternative to trying to keep old hardware running.

End of Life is an important consideration for hardware and software, and, much as we may enjoy nursing old hardware along, it doesn’t makes sense to delay the inevitable – End of Life – beyond a certain point. When that point is will depend on many things, but security considerations should be included.

Trust in Computing and the Cloud

I wrote a book.

I usually write and post articles first thing in the morning, before starting work, but today is different. For a start, I’m officially on holiday (so definitely not planning to write any code for Enarx, oh no), and second, I decided that today would be the day that I should finish my book, if I could.

Towards the end of 2019, I signed a contract with Wiley to write a book (which, to be honest, I’d already started) on trust. There’s lots of literature out there on human trust, organisational trust and how humans trust each other, but despite a growing interest in concepts such as zero trust, precious little on how computer systems establish and manage trust relationships to each other. I decided it was time to write a book on this, and also on how trust works (or maybe doesn’t) in the Cloud. I gave myself a target of 125,000 words, simply by looking at a couple of books at the same sort of level and then doing some simple arithmetic around words per page and number of pages. I found out later that I’d got it a bit wrong, and this will be quite a long book – but it turns out that the book that needed writing (or that I needed to write, which isn’t quite the same thing) was almost exactly this long, as when I finished around 1430 GMT today, I found that I was at 124,939 words. I have been tracking it, but not writing to the target, given that my editor told me that I had some latitude, but I’m quite amused by how close it was.

Anyway, I emailed my editor on completion, who replied (despite being on holiday), and given that I’m 5 months or so ahead of schedule, he seems happy (I think they prefer early to late).

I don’t have many more words in me today, so I’m going to wrap up here, but do encourage you to read articles on this blog labelled with “trust”, several of which are edited excerpts from the book. I promise I’ll keep you informed as I get information about publication dates, etc.

Keep safe and have a Merry Christmas and Happy New Year, or whatever you celebrate.

Why physical compromise is game over

Systems have physical embodiments and are actually things we can touch.

This is an edited excerpt from my forthcoming book on Trust in Computing and the Cloud for Wiley.

We spend a lot of our time – certainly I do – worrying about how malicious attackers might break into systems that I’m running, architecting, designing, implementint or testing. And most of that time is spent thinking about logical – that is software-based – attacks. But these systems don’t exist solely in the logical realm: they have physical embodiments, and are actually things we can touch. And we need to worry about that more than we typically do – or, more accurately, more of us need to worry about that.

If a system is compromised in software, it’s possible that you may be able to get it back and clean, but there is a general principle in IT security that if an attacker has physical access to a system, then that system should be considered compromised. In trust terms (something I care about a lot, given my professional interests), “if an attacker has physical access to a system, then any assurances around expected actions should be considered reduced or void, thereby requiring a re-evaluation of any related trust relationships”. Tampering (see my previous article on this, Not quantum-safe, not tamper-proof, not secure) is the typical concern when considering physical access, but exactly what an attacker will be able to achieve given physical access will depend on a number of factors, not least the skill of the attacker, resources available to them and the amount of time they have physical access. Scenarios range from an unskilled person attaching a USB drive to a system in short duration Evil Maid[1] attacks and long-term access by national intelligence services. But it’s not just running (or provisioned, but not currently running) systems that we need to be worried about: we should extend our scope to those which have yet to be provisioned, or even necessarily assembled, and to those which have been decommissioned.

Many discussions in the IT security realm around supply chain, concentrate mainly on software, though there are some very high profile concerns that some governments (and organisations with nationally sensitive functions) have around hardware sourced from countries with whom they do not share entirely friendly diplomatic, military or commercial relations. Even this scope is too narrow: there are many opportunities for other types of attackers to attack systems at various points in their life-cycles. Dumpster diving, where attackers look for old computers and hardware which has been thrown out by organisations but not sufficiently cleansed of data, is an old and well-established technique. At the other end of the scale, an attacker who was able to get a job at a debit or credit card personalisation company and was then able to gain information about the cryptographic keys inserted in the bank card magnetic strips or, better yet, chips, might be able to commit fraud which was both extensive and very difficult to track down. None of these attacks require damage to systems, but they do require physical access to systems or the manufacturing systems and processes which are part of the systems’ supply chain.

An exhaustive list and description of physical attacks on systems is beyond the scope of this article (readers are recommended to refer to Ross Anderson’s excellent Security Engineering: A Guide to Building Dependable Distributed Systems for more information on this and many other topics relevant to this blog), but some examples across the range of threats may serve to give an idea of sort of issues that may be of concern.

AttackLevel of sophisticationTime requiredDefences
USB drive to retrieve dataLowSecondsDisable USB ports/use software controls
USB drive to add malware to operating systemLowSecondsDisable USB ports/use software controls
USB drive to change boot loaderMediumMinutesChange BIOS settings
Attacks on Thunderbolt ports[2]MediumMinutesFirmware updates; turn off machine when unattended
Probes on buses and RAMHighHoursPhysical protection of machine
Cold boot attack[3]HighMinutesPhysical protection of machine/TPM integration
Chip scraping attacksHighDaysPhysical protection of machine
Electron microscope probesHighDaysPhysical protection of machine

The extent to which systems are vulnerable to these attacks varies enormously, and it is particularly notable that systems which are deployed at the Edge are particularly vulnerable to some of them, compared to systems in an on-premises data centre or run by a Cloud Service Provider in one of theirs. This is typically either because it is difficult to apply sufficient physical protections to such systems, or because attackers may be able to achieve long-term physical access with little likelihood that their attacks will be discovered, or, if they are, with little danger of attribution to the attackers.

Another interesting point about the majority of the attacks noted above is that they do not involve physical damage to the system, and are therefore unlikely to show tampering unless specific measures are in place to betray them. Providing as much physical protection as possible against some of the more sophisticated and long-term attacks, alongside visual checks for tampering, is the best defence for techniques which can lead to major, low-level compromise of the Trusted Computing Base.


1 – An Evil Maid attack assumes that an attacker has fairly brief access to a hotel room where computer equipment such as a laptop is stored: whilst there, they have unfettered access, but are expected to leave the system looking and behaving in the same way it was before they arrived. This places some bounds on the sorts of attacks available to them, but such attacks are notoriously different to defend.

2 – I wrote an article on one of these: Thunderspy – should I care?

2- A cold boot attack allows an attacker with access to RAM to access data recently held in memory.

Uninterrupted power for the (CI)A

I’d really prefer not to have to restart every time we have a minor power cut.

Regular readers of my blog will know that I love the CIA triad – confidentiality, integrity and availability – as applied to security. Much of the time, I tend to talk about the first two, and spend little time on the third, but today I thought I’d address the balance a little bit by talking about UPSes – Uninterruptible Power Supplies. For anyone who was hoping for a trolling piece on giving extra powers to US Government agencies, it’s time to move along to another blog. And anyone who thinks I’d stoop to a deliberately misleading article title just to draw people into reading the blog … well, you’re here now, so you might as well read on (and you’re welcome to visit others of my articles such as Do I trust this package?, A cybersecurity tip from Hazzard County, and, of course, Defending our homes).

Years ago, when I was young and had more time on my hands, I ran an email server for my own interest (and accounts). This was fairly soon after we moved to ADSL, so I had an “always-on” connection to the Internet for the first time. I kept it on a pretty basic box behind a pretty basic firewall, and it served email pretty well. Except for when it went *thud*. And the reason it went *thud* was usually because of power fluctuations. We live in a village in the East Anglian (UK) countryside where the electricity supply, though usually OK, does go through periods where it just stops from time to time. Usually for under a minute, admittedly, but from the point of view of most computer systems, even a second of interruption is enough to turn them off. Usually, I could reboot the machine and, after thinking to itself for a while, it would come back up – but sometimes it wouldn’t. It was around this time, if I remember correctly, that I started getting into journalling file systems to reduce the chance of unrecoverable file system errors. Today, such file systems are custom-place: in those days, they weren’t.

Even when the box did come back up, if I was out of the house, or in the office for the day, on holiday, or travelling for business, I had a problem, which was that the machine was now down, and somebody needed to go and physically turn it on if I wanted to be able to access my email. What I needed was a way to provide uninterruptible power to the system if the electricity went off, and so I bought a UPS: an Uninterruptible Power System. A UPS is basically a box that sits between your power socket and your computer, and has a big battery in it which will keep your system going for a while in the event of a (short) power failure and the appropriate electronics to provide AC power out from the battery. Most will also have some sort of way to communicate with your system such as a USB port, and software which you can install to talk to it that your system can decide whether or not to shut itself down – when, for instance, the power has been off for long enough that the battery is about to give out. If you’re running a data centre, you’ll typically have lots of UPS boxes keeping your most important servers up while you wait for your back-up generators to kick in, but for my purposes, knowing that my email server would stay up for long enough that it would ride out short power drops, and be able to shut down gracefully if the power was out for longer, was enough: I have no interest in running my own generator.

That old UPS died probably 15 years ago, and I didn’t replace it, as I’d come to my senses and transferred my email accounts to a commercial provider, but over the weekend I bought a new one. I’m running more systems now, some of them are fairly expensive and really don’t like power fluctuations, and there are some which I’d really prefer not to have to restart every time we have a minor power cut. Here’s what I decided I wanted:

  • a product with good open source software support;
  • something which came in under £150;
  • something with enough “juice” (batter power) to tide 2-3 systems over a sub-minute power cut;
  • something with enough juice to keep one low-powered box running for a little longer than that, to allow it to coordinate shutting down the other boxes first, and then take itself down if required;
  • something with enough ports to keep a a couple of network switches up while the previous point happened (I thought ahead!);
  • Lithium Ion rather than Lead battery if possible.

I ended up buying an APC BX950UI, which meets all of my requirements apart from the last one: it turns out that only high-end UPS systems currently seem to have moved to Lithium Ion battery technology. There are two apparently well-maintained open source software suites that support APC UPS systems: apcupsd and nut, both of which are available for my Linux distribution of choice (Fedora). As it happens, they both also have support for Windows and Mac, so you can mix and match if needs be. I chose nut, which doesn’t explicitly list my model of UPS, but which supports most of the product lower priced product line with its usbhid-ups driver – I knew that I could move to apsupsd if this didn’t work out, but nut worked fine.

Set up wasn’t difficult, but required a little research (and borrowing a particular cable/lead from a kind techie friend…), and I plan to provide details of the steps I took in a separate article or articles to make things easier for people wishing do replicate something close to my set-up. However, I’m currently only using it on one system, so haven’t set up the coordinated shutdown configuration. I’ll wait till I’ve got it more set up before trying that.

What has this got to do with security? Well, I’m planning to allow VPN access to at least one of the boxes, and I don’t want it suddenly to disappear, leaving a “hole in the network”. I may well move to a central authentication mechanism for this and other home systems (if you’re interested, check out projects such as yubico-pam): and I want the box that provides that service to stay up. I’m also planning some home automation projects where access to systems from outside the network (to view cameras, for instance) will be a pain if things just go down: IoT devices may well just come back up in the event of a power failure, but the machines which coordinate them are less likely to do so.

Security, cost and usability (pick 2)

If we cannot be explicit that there is a trade-off, it’s always security that loses.

Everybody wants security: why wouldn’t you? Let’s role-play: you’re a software engineer on a project to create a security product. There comes a time in the product life-cycle when it’s nearly due, and, as usual, time is tight. So you’re in the regular project meeting and the product manager’s there, so you ask them what they want you to do: should you prioritise security? The product manager is very clear[1]: they will tell you that they want the product as secure as possible – and they’re right, because that’s what customers want. I’ve never spoken to a customer (and I’ve spoken to lots of customers over the years) who said that they’d prefer a product which wasn’t as secure as possible. But there’s a problem, which is that all customers also want their products tomorrow – in fact, most customers want their products today, if not yesterday.

Luckily, products can generally be produced more quickly if more resources are applied (though Frederick Brooks’ The Mythical Man Month tells us that simple application of more engineers is actually likely to have a negative impact), so the requirement for speed of delivery can be translated to cost. There’s another thing that customers want, however, and that is for products to be easy to use: who wants to get a new product and then, when it arrives, for it to take months to integrate or for it to be almost impossible for their employees to run it as they expect?

So, to clarify, customers want a security product to be be the following:

  1. secure – security is a strong requirement for many enterprises and organisations[3], and although we shouldn’t ever use the word secure on its own, that’s still what customers want;
  2. cheap – nobody wants to pay more than the minimum they can;
  3. usable – everybody likes simple-to-use, easy-to-integrate applications.

There’s a problem, however, which is that out of the three properties above, you can only choose two for any application or project. You say this to your product manager (who’s always right, remember[1]), and they’ll say: “don’t be ridiculous! I want all three”.

But it just doesn’t work like that: why? Here’s my take on the reasons. Security, simply stated, is designed to stop people doing things. Stated from the point of view of a user, security’s view is to reduce usability. “Doing security” is generally around applying controls to actions in a system – whether by users or non-human entities – and the simplest way to apply it is “blanket security” – defaulting to blocking or denying actions. This is sometimes known as fail to safe or fail to closed.

Let’s take an example: you have a simple internal network in your office and you wish to implement a firewall between your network and the Internet, to stop malicious actors from probing your internal machines and to compromised systems on the internal network from communicating out to the Internet. “Easy,” you think, and set up a DENY ALL rule for connections originating outside the firewall, and a DENY ALL rule for connections originating inside the firewall, with the addition of a ALLOW all outgoing port 443 connections to ensure that people can use web browsers to make HTTPS connections. You set up the firewall, and get ready to head home, knowing that your work is done. But then the problems arise:

  • it turns out that some users would like to be able to send email, which requires a different outgoing port number;
  • sending email often goes hand in hand with receiving email, so you need to allow incoming connections to your mail server;
  • one of your printers has been compromised, and is making connections over port 443 to an external botnet;
  • in order to administer the pay system, your accountant – who is not a full-time employee, and works from home, needs to access your network via a VPN, which requires the ability to accept an incoming connection.

Your “easy” just became more difficult – and it’s going to get more difficult still as more users start encountering what they will see as your attempts to make their day-to-day revenue-generating lives more difficult.

This is a very simple scenario, but it’s clear that in order to allow people actually to use a system, you need to spend a lot more time understanding how security will interact with it, and how people’s experience of the measures you put in place will be impacted. Usability and user experience (“UX”) is a complex field on its own, but when you combine it with the extra requirements around security, things become even more tricky.

You need both to manage the requirements of users to whom the security measures should be transparent (“TLS encryption should be on by default”) and those who may need much more control (“developers need to be able to select the TLS cipher suite options when connecting to a vendor’s database”), so you need to understand the different personae[4] you are targeting for your application. You also need to understand the different failures modes, and what the correct behaviour should be: if authentication fails three times in a row, should the medical professional who is trying to get a rush blood test result be locked out of the system, or should the result be provided, and a message sent to an administrator, for example? There will be more decisions to make, based on what your application does, the security policies of your customers, their risk profiles, and more. All of these investigations and decisions take time, and time equates to money. What is more, they also require expertise – both in terms of security but also usability – and that is in itself expensive.

So, you have three options:

  1. choose usability and cost – you can prioritise usability and low cost, but you won’t be able to apply security as you might like;
  2. choose security and cost – in this case, you can apply more security to the system, but you need to be aware that usability – and therefore your customer’s acceptance of the system – will suffer;
  3. choose usability and security – I wish this was the one that we chose every time: you decide that you’re willing to wait longer or pay more for a more secure product, which people can use.

I’m not going to pretend that these are easy decisions, nor that they are always clear cut. And a product manager’s job is sometimes to make difficult choices – hopefully ones which can be re-balanced in a later release, but difficult choices nevertheless. It’s really important, however, that anyone involved in security – as an engineer, as a UX expert, as a product manager, as a customer – understands the trade-off here. If we cannot be explicit that there is a trade-off, then the trade-off will be made silently, and in my experience, it’s always security that loses.


1 – and right: product managers are always right[2].

2 – I know: I used to be a product manager.

3 – and the main subject of this blog, so it shouldn’t be a surprise that I’m writing about it.

4 – or personas if you really, really must. I got an “A” in Latin O level, and I’m not letting this one go.

Do I trust this package?

The area of software supply chain management is growing in importance.

This isn’t one of those police dramas where a suspect parcel arrives at the precinct and someone realises just in time that it may be a bomb – what we’re talking about here is open source software packages (though the impact on your application may be similar if you’re not sufficiently suspicious). Open source software is everywhere these days – which is great – but how can you be sure that you should trust the software you’ve downloaded to do what you want? The area of software supply chain management – of which this discussion forms a part – is fairly newly visible in the industry, but is growing in importance. We’re going to consider a particular example.

There’s a huge conversation to be had here about what trust means (see my article “What is trust?” as a starting point, and I have a forthcoming book on Trust in Computing and the Cloud for Wiley), but let’s assume that you have a need for a library which provides some cryptographic protocol implementation. What do you need to know, and what are you choices? We’ll assume, for now, that you’ve already made what is almost certainly the right choice, and gone with an open source implementation (see many of my articles on this blog for why open source is just best for security), and that you don’t want to be building everything from source all the time: you need something stable and maintained. What should be your source of a new package?

Option 1 – use a vendor

There are many vendors out there now who provide open source software through a variety of mechanisms – typically subscription. Red Hat, my employer (see standard disclosure) is one of them. In this case, the vendor will typically stand behind the fitness for use of a particular package, provide patches, etc.. This is your easiest and best choice in many cases. There may be times, however, when you want to use a package which is not provided by a vendor, or not packaged by your vendor of choice: what do you do then? Equally, what decisions do vendors need to make about how to trust a package?

Option 2 – delve deeper

This is where things get complex. So complex, in fact, that I’m going to be examining them at some length in my book. For the purposes of this article, though, I’ll try to be brief. We’ll start with the assumption that there is a single maintainer of the package, and multiple contributors. The contributors provide code (and tests and documentation, etc.) to the project, and the maintainer provides builds – binaries/libraries – for you to consume, rather than your taking the source code and compiling it yourself (which is actually what a vendor is likely to do, though they still need to consider most of the points below). This is a library to provide cryptographic capabilities, so it’s fairly safe to assume that we care about its security. There are at least five specific areas which we need to consider in detail, all of them relying on the maintainer to a large degree (I’ve used the example of security here, though very similar considerations exist for almost any package): let’s look at the issues.

  1. build – how is the package that you are consuming created? Is the build process performed on a “clean” (that is, non-compromised) machine, with the appropriate compilers and libraries (there’s a turtles problem here!)? If the binary is created with untrusted tools, then how can we trust it at all, so what measures does the maintainer take to ensure the “cleanness” of the build environment? It would be great if the build process is documented as a repeatable build, so that those who want to check it can do so.
  2. integrity – this is related to build, in that we want to be sure that the source code inputs to the build process – the code coming, for instance, from a git repository – are what we expect. If, somehow, compromised code is injected into the build process, then we are in a very bad position. We want to know exactly which version of the source code is being used as the basis for the package we are consuming so that we can track features – and bugs. As above, having a repeatable build is a great bonus here.
  3. responsiveness – this is a measure of how responsive – or not – the maintainer is to changes. Generally, we want stable features, tied to known versions, but a quick response to bug and (in particular) security patches. If the maintainer doesn’t accept patches in a timely manner, then we need to worry about the security of our package. We should also be asking questions like, “is there a well-defined security disclosure of vulnerability management process?” (See my article Security disclosure or vulnerability management?), and, if so, “is it followed”?
  4. provenance – all code is not created equal, and one of the things of which a maintainer should be keeping track is the provenance of contributors. If a large amount of code in a part of the package which provides particularly sensitive features is suddenly submitted by an unknown contributor with a pseudonymous email address and no history of contributions of security functionality, this should raise alarm bells. On the other hand, if there is a group of contributors employed by a company with a history of open source contributions and well-reviewed code who submit a large patch, this is probably less troublesome. This is a difficult issue to manage, and there are typically no definite “OK” or “no-go” signs, but the maintainer’s awareness and management of contributors and their contributions is an important point to consider.
  5. expertise – this is the most tricky. You may have a maintainer who is excellent at managing all of the points above, but is just not an expert in certain aspects of the functionality of the contributed code. As a consumer of the package, however, I need to be sure that it is fit for purpose, and that may include, in the case of the security-related package we’re considering, being assured that the correct cryptographic primitives are used, that bounds-checking is enforced on byte streams, that proper key lengths are used or that constant time implementations are provided for particular primitives. This is very, very hard, and the job of maintainer can easily become a full-time one if they are acting as the expert for a large and/or complex project. Indeed, best practice in such cases is to have a team of trusted, experienced experts who work either as co-maintainers or as a senior advisory group for the project. Alternatively, having external people or organisations (such as industry bodies) perform audits of the project at critical junctures – when a major release is due, or when an important vulnerability is patched, for instance – allows the maintainer to share this responsibility. It’s important to note that the project does not become magically “secure” just because it’s open source (see Disbelieving the many eyes hypothesis), but that the community, when it comes together, can significantly improve the assurance that consumers of a project can have in the packages which is produces.

Once we consider these areas, we then need to work out how we measure and track each of them. Who is in a position to judge the extent to which any particular maintainer is fulfilling each of the areas? How much can we trust them? These are a set of complex issues, and one about which much more needs to be written, but I am passionate about exposing the importance of explicit trust in computing, particularly in open source. There is work going on around open source supply chain management, for instance the new (at time of writing Project Rekor – but there is lots of work still to be done.

Remember, though: when you take a package – whether library or executable – please consider what you’re consuming, what about it you can trust, and on what assurances that trust is founded.