Bug #16720

Update kernel to mitigate new MDS attacks

Added by cypherpunks 2019-05-15 03:48:17 . Updated 2019-11-01 22:45:06 .

Status:
Resolved
Priority:
High
Assignee:
Category:
Target version:
Start date:
Due date:
% Done:

100%

Feature Branch:
bugfix/16720-linux-4.19.37-nosmt+force-all-tests
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

A very severe collection of Spectre-class hardware security vulnerabilities have been discovered which allow reading arbitrary memory. Existing Spectre defenses do not mitigate them. The only mitigation is to install new microcode updates (which add new behavior to a CPU instruction) and kernel updates (which use call those instructions at each context switch). It’s also unfortunately quite necessary to disable SMT (Hyper-Threading). On updated kernels, this can be done with mds=full,nosmt on the kernel command line. Until this is done, arbitrary memory reads are possible in Tails, potentially even from the Browser.

A proof-of-concept was also shown specifically for Tails.

See https://cpu.fail/ and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more information.


Subtasks


Related issues

Blocks Tails - Feature #16209: Core work: Foundations Team Confirmed
Blocked by Tails - Bug #16708: Upgrade Linux to 4.19.37 Resolved

History

#1 Updated by cypherpunks 2019-05-15 07:01:09

Also note that, while disabling SMT can reduce performance (by up to 20% in worst case scenarios), it really is necessary to prevent trivial cross-process memory reads. Luckily this only reduces the maximum performance when all CPU cores are maxed out, not the average performance, so unless a Tails user is doing some heavy compute job that is maxing out all cores, it won’t be a major impact. Google’s Chromebook went with this option and disabled SMT.

#2 Updated by mercedes508 2019-05-15 12:29:49

  • Assignee set to anonym
  • Priority changed from High to Normal

#3 Updated by mercedes508 2019-05-15 14:46:23

#4 Updated by mercedes508 2019-05-15 14:46:51

  • Assignee deleted (anonym)

#5 Updated by anonym 2019-05-15 15:13:04

  • Status changed from New to Confirmed
  • Priority changed from Normal to High
  • Target version set to Tails_3.14

cypherpunks wrote:
> Until [mitigated], arbitrary memory reads are possible in Tails, potentially even from the Browser.

Worst case this should be fixed in Tails 3.14 in six days, but perhaps this could warrant an emergency release if we have RM availability? FWIW I am available until tomorrow ~14:00 UTC but after that I’m away until Monday.

> A proof-of-concept was also shown specifically for Tails.

Found it: https://www.youtube.com/watch?v=wQvgyChrk_g

#6 Updated by CyrilBrulebois 2019-05-15 18:10:35

The idea of releasing a new version in a hurry, while it features a brand new kernel (just uploaded to Debian unstable hours ago) makes me cringe a little.

#7 Updated by cypherpunks 2019-05-15 23:26:02

CyrilBrulebois wrote:
> The idea of releasing a new version in a hurry, while it features a brand new kernel (just uploaded to Debian unstable hours ago) makes me cringe a little.

The idea of remaining vulnerable to a significant arbitrary memory read vulnerability that is potentially also exploitable from JavaScript (which is enabled by default) is not cringy to you?

#8 Updated by anonym 2019-05-16 07:39:37

  • Status changed from Confirmed to In Progress

Applied in changeset commit:tails|581176c472e48ef1a6a2405fd417cbaf13a6df56.

#9 Updated by anonym 2019-05-16 08:54:16

CyrilBrulebois wrote:
> The idea of releasing a new version in a hurry, while it features a brand new kernel (just uploaded to Debian unstable hours ago) makes me cringe a little.

Let’s at least see how a full test suite run looks (which of course says little about hardware support regressions): https://jenkins.tails.boum.org/job/test_Tails_ISO_bugfix-16708-linux-4.19.37-force-all-tests/9/

#10 Updated by anonym 2019-05-16 09:15:36

  • % Done changed from 0 to 10
  • Feature Branch set to bugfix/16708-linux-4.19.37-nosmt+force-all-tests

Pushed a branch based on Bug #16708 that also enables full mitigations (so hyperthreading is disabled).

#11 Updated by anonym 2019-05-16 09:17:04

  • Feature Branch changed from bugfix/16708-linux-4.19.37-nosmt+force-all-tests to bugfix/16720-linux-4.19.37-nosmt+force-all-tests

#12 Updated by anonym 2019-05-16 10:14:32

cypherpunks wrote:
> potentially also exploitable from JavaScript

This is the scary part, which would warrant an emergency release. Sadly I doubt we have the resources (human time) to do it, so a fix in Tails 3.14 on Tuesday is probably the best we can do. :/

How “potential” is the attacks over JavaScript? I’ve repeatedly seen statements like “no action is recommended for Firefox users on Windows and Linux” (source; I’ve seen some wild speculation that the Spectre-mitigations from early 2018 might still be somewhat effective against the new attacks. But a more mundane explanation could be that those platforms now have OS-level fixes, so no action is needed for Firefox. However, for Chrome it is said: “If you use Google’s Chrome web browser, than Google suggests you make sure the operating system it runs on (be it Windows, Linux or macOS) is updated with the latest mitigations”. So I’m a bit confused about how affected Firefox is.

#13 Updated by cypherpunks 2019-05-17 20:21:49

anonym wrote:
> cypherpunks wrote:
> > potentially also exploitable from JavaScript
>
> This is the scary part, which would warrant an emergency release. Sadly I doubt we have the resources (human time) to do it, so a fix in Tails 3.14 on Tuesday is probably the best we can do. :/
>
> How “potential” is the attacks over JavaScript? I’ve repeatedly seen statements like “no action is recommended for Firefox users on Windows and Linux” (source; I’ve seen some wild speculation that the Spectre-mitigations from early 2018 might still be somewhat effective against the new attacks. But a more mundane explanation could be that those platforms now have OS-level fixes, so no action is needed for Firefox. However, for Chrome it is said: “If you use Google’s Chrome web browser, than Google suggests you make sure the operating system it runs on (be it Windows, Linux or macOS) is updated with the latest mitigations”. So I’m a bit confused about how affected Firefox is.

The attacks over JavaScript are thought to be difficult to do, but possible. I imagine they are a lot more difficult if JIT is disabled. Note that ZombieLoad is not the only one of the MDS attacks that may be exploitable through JavaScript.

Operating system mitigations are sufficient to prevent system memory from being read via JavaScript. It’s still possible for other Spectre-type attacks to read browser memory in the same address space as the attacker, and it’s going to be a very long time before we are able to prevent that completely, but OS protections are enough to protect system memory from either a compromised browser process or malicious JavaScript.

I wouldn’t bet my life on it, but it seems likely in my experience that it would be easier to find a browser 0day in the media player, JS engine, etc. than it would be to adapt the MDS attacks to work in pure JavaScript. I personally think that, while an emergency release is probably not a bad idea, the risks of MDS via JavaScript are low enough that that should not be the deciding factor.

#14 Updated by cypherpunks 2019-05-17 20:45:57

anonym wrote:
> (which of course says little about hardware support regressions)

The mitigations (mds=full,nosmt) are designed to have no impact on hardware compatibility. Furthermore, in the newest CPUs with hardware mitigations (bit 5, MDS_NO, set in the IA32_ARCH_CAPABILITIES MSR) and other unaffected hardware will automatically disable the software mitigations, so there is no need to worry about turning this off in the future.

Please make sure the microcode (intel-ucode) is also updated! The software fix is nearly useless if the microcode is not also updated. A lot of old Intel CPUs are getting the fix, but not every single one. This means that some older systems will be perpetually vulnerable to high-bandwidth arbitrary memory reads. It would also be nice if the bootloader or OS would warn the user if that is the case (which should be easy enough by checking the microcode revision).

The microcode update is vital because it adds a side-effect to an otherwise-obsolete instruction, VERW, causing it to flush the affected buffers. The software update simply causes the kernel to execute that instruction at every context switch (see https://www.kernel.org/doc/html/latest/x86/mds.html#mitigation-strategy). If the microcode is not updated, then the instruction does nothing. It’s only if the microcode is up to date that the instruction flushes the buffers.

Another thing to know is that MFBDS (CVE-2018-12130) can be used to leak physical page mappings from the MMU, and that cannot be fixed using microcode and software updates. The result is that the mapping to physical pages can be leaked, giving the same result as reading the now-privileged /proc/<pid>/pagemap. This makes rowhammer significantly easier to exploit. There’s nothing that can be done about this short of using a CPU with MDS_NO.

#15 Updated by cypherpunks 2019-05-17 20:47:35

anonym wrote:
> So I’m a bit confused about how affected Firefox is.
There is no difference at all between the two browsers in terms of exploitability.

#16 Updated by intrigeri 2019-05-18 08:17:32

  • Assignee set to intrigeri

I’ll take this ticket (and Bug #16708 which is implied) for now and will coordinate with segfault once he shows up.

Linux 4.19.37-3 and the intel-microcode update were both fast-tracked into Debian testing; DSAs were released for Stretch already (linux, intel-microcode, which tells quite a bit about the urgency of the matter and the risk/benefit assessment made by the relevant package maintainers, who I fully trust on this one.

We have some CI test results for 4.19.37 already, thanks to the original Bug #16708 branch.

I’ve read the referenced documents (BTW I’m impressed by the quality of the corresponding kernel doc) and based on this plus what I wrote above, I agree we should update the kernel, intel-microcode, and set mds=full,nosmt.

#17 Updated by intrigeri 2019-05-18 08:36:02

FWIW, disabling smt has no obvious performance impact on the test suite runs on Jenkins, but I don’t know if nosmt is honored when the host hasn’t got the microcode update.

#18 Updated by intrigeri 2019-05-18 08:36:54

Automated test suite results on Jenkins are not worse than expected so from this PoV it’s a go.

#19 Updated by intrigeri 2019-05-18 08:39:40

anonym wrote:
> cypherpunks wrote:
> > A proof-of-concept was also shown specifically for Tails.
>
> Found it: https://www.youtube.com/watch?v=wQvgyChrk_g

My understanding is that this PoC is not about processes spying each other inside Tails: it’s about the QEMU process that Tails is running in being spied by another process that runs on the host.

#20 Updated by intrigeri 2019-05-18 08:43:31

Meh, I find no branch that actually has the full mitigation (with nosmt) enabled: both bugfix/16720-linux-4.19.37-nosmt+force-all-tests and bugfix/16708-linux-4.19.37+force-all-tests currently point to commit:581176c472e48ef1a6a2405fd417cbaf13a6df56, which has no such thing. I see no bugfix/16708-linux-4.19.37-nosmt+force-all-tests branch. So I guess that @anonym got confused and did not push the code he wanted to.

#21 Updated by intrigeri 2019-05-18 09:11:15

  • blocked by Bug #16708: Upgrade Linux to 4.19.37 added

#22 Updated by segfault 2019-05-18 13:29:09

Reviewed up to fbc4e94f9e7a993c3447d07dd3eee501b144a937, LGTM

#23 Updated by intrigeri 2019-05-18 16:40:27

  • Status changed from In Progress to Fix committed
  • % Done changed from 10 to 100

Applied in changeset commit:tails|4c54166bd5a468c2e9e521aad61ade635322c9f1.

#24 Updated by intrigeri 2019-05-18 16:41:54

  • Assignee deleted (intrigeri)
  • QA Check set to Pass

I’ve seen all tests pass locally, except the OpenPGP applet and Electrum ones, as expected.

#25 Updated by cypherpunks 2019-05-19 06:51:29

intrigeri wrote:
> anonym wrote:
> > cypherpunks wrote:
> > > A proof-of-concept was also shown specifically for Tails.
> >
> > Found it: https://www.youtube.com/watch?v=wQvgyChrk_g
>
> My understanding is that this PoC is not about processes spying each other inside Tails: it’s about the QEMU process that Tails is running in being spied by another process that runs on the host.

The PoC uses QEMU to demonstrate both cross-process and VM spying at the same time. It would also be possible for one process to spy on another within a single operating system. It’s not like a VM has to be present for the attacks to work. The attack allows arbitrary memory reads from any process to:

1. Guest VMs

2. All processes

3. Kernel memory

4. SGX enclaves

5. MMU pagemap

And more.

#26 Updated by cypherpunks 2019-05-19 07:01:40

intrigeri wrote:
> FWIW, disabling smt has no obvious performance impact on the test suite runs on Jenkins, but I don’t know if nosmt is honored when the host hasn’t got the microcode update.

I’m guessing your VM is not using SMT. It’s very rare for a hypervisor to expose the logical vs physical topology to the guest because most schedulers don’t take that information into account. If your host hardware has 4 physical cores with SMT (so 8 logical cores), then a hypervisor will likely tell the guest that it is running on hardware with 8 physical cores. Hell, your hypervisor might not even pin the threads so it would be entirely up to the host thread scheduler to deal with that.

The nosmt is honored if the kernel considers the hardware to be vulnerable and is disabled automatically if it is not found to be vulnerable. I haven’t looked into exactly how it checks if it is vulnerable or not, but you should be able to verify whether or not SMT is disabled by checking some file in I think /proc? First check if your VM even uses SMT…

#27 Updated by segfault 2019-05-19 13:50:51

cypherpunks wrote:
> intrigeri wrote:
> > FWIW, disabling smt has no obvious performance impact on the test suite runs on Jenkins, but I don’t know if nosmt is honored when the host hasn’t got the microcode update.
>
> I’m guessing your VM is not using SMT. It’s very rare for a hypervisor to expose the logical vs physical topology to the guest because most schedulers don’t take that information into account. If your host hardware has 4 physical cores with SMT (so 8 logical cores), then a hypervisor will likely tell the guest that it is running on hardware with 8 physical cores. Hell, your hypervisor might not even pin the threads so it would be entirely up to the host thread scheduler to deal with that.
>
> The nosmt is honored if the kernel considers the hardware to be vulnerable and is disabled automatically if it is not found to be vulnerable. I haven’t looked into exactly how it checks if it is vulnerable or not, but you should be able to verify whether or not SMT is disabled by checking some file in I think /proc? First check if your VM even uses SMT…

The SMT state can be checked via

cat /sys/devices/system/cpu/vulnerabilities/mds

The output includes one of the following SMT states:

SMT vulnerable
SMT mitigated   
SMT disabled    
SMT Host state unknown

At least on my system, my VMs have SMT Host state unknown. I don’t know if there are also cases where the VM knows about the SMT host state.

#28 Updated by CyrilBrulebois 2019-05-23 21:21:28

  • Status changed from Fix committed to Resolved

#29 Updated by intrigeri 2019-11-01 11:03:13

We’ve got a report by a user with a Skylake CPU who says that with Tails 3.15, this breaks the boot. Replacing mds=full,nosmt with mds=full fixed the problem for them.

I’ll ask our help desk if they have heard more such reports, and then we should decide whether we document the problem and workaround (exact observable behavior is unclear at the moment), or reconsider the nosmt part.

#30 Updated by cypherpunks 2019-11-01 22:45:06

intrigeri wrote:
> We’ve got a report by a user with a Skylake CPU who says that with Tails 3.15, this breaks the boot. Replacing mds=full,nosmt with mds=full fixed the problem for them.
>
> I’ll ask our help desk if they have heard more such reports, and then we should decide whether we document the problem and workaround (exact observable behavior is unclear at the moment), or reconsider the nosmt part.

That sounds like a kernel bug that should be fixed. Disabling SMT should never, ever break boot. Of course, it would be possible to remove the nosmt part and then disable SMT after boot by toggling the correct /sys files (basically disabling the cores that are virtual cores, which won’t break anything).