Bug #17154: Improve entropy gathering

Bug #17154

Improve entropy gathering

Added by segfault 2019-10-15 05:48:07 . Updated 2020-01-28 11:32:10 .

Status:

Confirmed

Priority:

Normal

Assignee:

Category:

Target version:

Start date:

Due date:

% Done:

Feature Branch:

Type of work:

Code

Blueprint:

Starter:

Affected tool:

Deliverable for:

Description

The Linux kernel uses multiple sources of randomness to initialize its cryptographically secure pseudo-random number generator (CSPRNG). This includes various sources with dubious quality wrt. randomness: the kernel command-line, serial numbers, MAC addresses, timing information…

This is totally fine, because most of these sources are not credited as good/reliable entropy, which means that the values are mixed into the entropy pool, but they do not increase the entropy counter. (By default, the kernel currently only credits inter-interrupt timings and inter-keyboard timings).

When the entropy counter reaches a certain threshold (currently 512 bits, but it’s currently being discussed on the kernel mailing list to reduce that to 256 bits) is the entropy pool marked as initialized.

Until the entropy pool is marked as initialized, reads from /dev/random and calls to the getrandom syscall are blocking, and reads from /dev/urandom return not-cryptographically secure random numbers.

If the entropy pool is seeded with predictable inputs, all of /dev/random, /dev/urandom, and getrandom return not-cryptographically secure random numbers.

Both Debian and Tails currently add additional sources which do increase the entropy counter. I would like to re-evaluate the use of those sources.

Subtasks

Related issues

Related to Tails - Feature #7102: Evaluate how safe haveged is in a virtualized environment	Confirmed	2014-04-17
Related to Tails - ~~Feature #5650~~: rngd	Resolved
Related to Tails - ~~Bug #17124~~: Install Linux 5.3 from sid	Resolved
Related to Tails - ~~Feature #17443~~: Upgrade Linux to 5.4.8+	Resolved

History

#1 Updated by segfault 2019-10-15 06:20:11

Description updated

#2 Updated by segfault 2019-10-15 07:32:01

In Tails we currently add two services which fill the entropy pool: haveged and rngd.

haveged implements the HAVEGE algorithm to gather randomness from CPU timings. It runs as a service in the userspace and fills the entropy pool immediately when it is started and keeps filling it if the kernel’s entropy count falls low by reads from /dev/random¹.

¹ It doesn’t really make sense that reading /dev/random reduces the entropy count. Once the CSPRNG is initialized with a good random seed, it can produce a lot of cryptographically secure random numbers. That is why kernel devs now deeply regret this behavior of /dev/random:
https://lore.kernel.org/linux-ext4/20190916170028.GA15263@mit.edu/

There are multiple issues with haveged:

The fact that it tries to use timing information from CPU instructions while running in userspace, thereby being subjected to the kernel’s scheduler, which could impact the randomness of the timings [1]
The CPU instruction it uses (RDTSC) returns predictable results in some virtualized environments [2]
No one seems to know whether haveged actually provides any good randomness. AFAIK, it was never thoroughly analyzed by experts. The haveged tests which are supposed evaluate the produced randomness also pass if haveged is fed with a constant input instead of the CPU timings [3].

[1] https://twitter.com/mjg59/status/1181426468519383041
[2] https://tls.mbed.org/tech-updates/security-advisories/polarssl-security-advisory-2011-02
[3] http://jakob.engbloms.se/archives/1374

rngd uses the output from a hardware random number generator (hwrng), if any, to fill the entropy pool. There also issues with rngd:

First, it’s pretty much obsolete. The by far most common hwrngs are the ones builtin in modern x86 processors. Those can be accessed via the RDRAND instruction. Since 4.19, the Linux kernel already supports seeding the entropy pool via that instruction by either compiling it with CONFIG_RANDOM_TRUST_CPU=y or starting it with the random.trust_cpu=on command-line option [4]. Since Stretch, Debian does compile the kernel with CONFIG_RANDOM_TRUST_CPU=y [5], so currently, the kernel already credits entropy from RDRAND in Tails. Granted, it’s still possible that Tails is run on a system with a different hwrng, which is supported by rngd but not by the kernel.
We probably don’t want to use RDRAND to seed the CSPRNG. It can’t be independently audited, which means that you have to trust Intel that it (or a three-letter agency) did not install a backdoor [6][7]. That means that, from a security point of view, the best would be to remove rngd and add random.trust_cpu=off to the kernel command-line, to prevent the output from RDRAND to be credited to the entropy pool. Note that the kernel still mixes in output from RDRAND into the entropy pool in that case, it only doesn’t credit it anymore, so we don’t weaken our entropy with random.trust_cpu=off.

[4] https://outflux.net/blog/archives/2018/10/22/security-things-in-linux-v4-19/
[5] https://lists.debian.org/debian-devel/2019/02/msg00170.html
[6] https://lkml.org/lkml/2018/7/17/1279
[7] https://gist.github.com/mimoo/5957603f5aa5f0cded33e55f930644cb

#3 Updated by segfault 2019-10-15 07:53:12

(I started drafting a discussion of the UX impact of removing both rngd and haveged but won’t finish that now)

#4 Updated by intrigeri 2019-10-15 12:26:11

related to Feature #7102: Evaluate how safe haveged is in a virtualized environment added

#5 Updated by intrigeri 2019-10-15 12:26:16

related to ~~Feature #5650~~: rngd added

#6 Updated by segfault 2019-10-15 14:19:38

Description updated

#7 Updated by segfault 2019-10-15 14:51:01

In Linux 5.4, the kernel will try to gather entropy itself via CPU timing noise (jitter), similar to what haveged is doing in the userspace [1]. The quality of the randomness produced by that is still debated (although the concerns are more about simpler CPU architectures, not x86) [2], and to me it seems like a pretty rushed decision, made by Linus himself. Anyway, if that patch gets released, it won’t be our decision to make anymore whether to use jitter entropy or not.

[1] https://github.com/torvalds/linux/commit/50ee7529ec4500c88f8664560770a7a1b65db72b
[2] https://lore.kernel.org/lkml/20190930033706.GD4994@mit.edu/

I expect that on systems supported by Tails (64-bit x86), the jitter entropy generator will work quite well, so that even if we remove haveged and rngd, applications won’t have to wait for a long time for the RNG to be initialized. We should test that once we can upgrade to Linux 5.4.

#8 Updated by cypherpunks 2019-10-27 23:41:26

Oh dear, that patch for Linux looks horrible. I mean, more horrible than usual. There’s already jitterentropy which can be used to inject randomness and is based on a detailed study about certain nondeterministic aspects of CPU behavior (although even it, like HAVEGE, is not great). Why would they not use that? Why would they try to create their own naive jitter entropy collector?

Anyway, I’ve mentioned this in past tickets, but I dislike the use of haveged the way it is used now. A better solution would be to have it in a cron job to periodically write to /dev/urandom, so that it doesn’t issue the IOCTL that increases the entropy estimate with potentially dubious entropy (this only matters during early boot or after a state compromise, of course). I recall that the main reason why that idea was shot down was that GnuPG (foolishly) uses /dev/random instead of /dev/urandom and eats many kilobytes of data, which makes generating keys take a very long time. The cryptographic necessity of this can be trivially disproven with a simple look at the complexity of GNFS. I think libotr in Pidgin does that too?

Oh, and it’s all moot anyway with kernel.random.read_wakeup_threshold=64 by default, which breaks catastrophic reseeding after state compromise. Perhaps I should open a new ticket here to change that default to 128? I digress.

#9 Updated by cypherpunks 2019-10-27 23:49:21

segfault wrote:
> The Linux kernel uses multiple sources of randomness to initialize its cryptographically secure pseudo-random number generator (CSPRNG). This includes various sources with dubious quality wrt. randomness: the kernel command-line, serial numbers, MAC addresses, timing information…

This is incorrect. The kernel uses add_device_randomness() for data on the kernel command line, serial numbers, MAC addresses, etc. This does not credit entropy. In fact, its purpose is not even to be unpredictable, merely to ensure that a worst-case scenario where there is no natural entropy will not result in a hundred embedded devices choosing the same UUIDs. The entropy pool is initialized only after sufficient interrupts occur (see source code for details). The predictable device randomness does not credit entropy bits at all.

As for timing information being of dubious quality, that’s untrue as well. It is a major part of the BCP 106 recommendation for entropy collection. Timing information is taken for interrupts with add_interrupt_randomness(), and for other unpredictable events with add_timer_randomness() which itself is called in e.g. add_input_randomness(). These are completely unpredictable as long as input to the system is unpredictable, as with keystrokes and non-deterministic (due to air turbulence) behavior wrt hard drive actuator movements. The BSI paper on the Linux RNG gives more rationale.

tl;dr Those dubious sources you list aren’t a problem because they aren’t used to initialize the RNG state, and timing information is not a bad source of entropy as used by the Linux kernel.

#10 Updated by segfault 2019-10-28 21:53:34

cypherpunks wrote:
> segfault wrote:
> > The Linux kernel uses multiple sources of randomness to initialize its cryptographically secure pseudo-random number generator (CSPRNG). This includes various sources with dubious quality wrt. randomness: the kernel command-line, serial numbers, MAC addresses, timing information…
>
> This is incorrect. The kernel uses add_device_randomness() for data on the kernel command line, serial numbers, MAC addresses, etc. This does not credit entropy. In fact, its purpose is not even to be unpredictable, merely to ensure that a worst-case scenario where there is no natural entropy will not result in a hundred embedded devices choosing the same UUIDs.

Did you read the sentence after the one you quoted? I said there that these sources don’t get credited, and explain later that the entropy pool is only marked as initialized after enough entropy was credited.

> The entropy pool is initialized only after sufficient interrupts occur (see source code for details). The predictable device randomness does not credit entropy bits at all.

That’s exactly what I wrote in the description.

> As for timing information being of dubious quality, that’s untrue as well. It is a major part of the BCP 106 recommendation for entropy collection. Timing information is taken for interrupts with add_interrupt_randomness(), and for other unpredictable events with add_timer_randomness() which itself is called in e.g. add_input_randomness(). These are completely unpredictable as long as input to the system is unpredictable, as with keystrokes and non-deterministic (due to air turbulence) behavior wrt hard drive actuator movements. The BSI paper on the Linux RNG gives more rationale.

The concern is that it’s not clear whether there was enough (or even any) unpredictable input at the time when the timing information is used. And I don’t consider myself enough of an expert in this area to raise this concern, I’m just citing people I trust to have more expertise and whose arguments I find convincing.

#11 Updated by intrigeri 2019-11-11 07:18:10

related to ~~Bug #17124~~: Install Linux 5.3 from sid added

#12 Updated by segfault 2020-01-17 19:33:06

segfault wrote:
> In Linux 5.4, the kernel will try to gather entropy itself via CPU timing noise (jitter), similar to what haveged is doing in the userspace [1]. The quality of the randomness produced by that is still debated (although the concerns are more about simpler CPU architectures, not x86) [2], and to me it seems like a pretty rushed decision, made by Linus himself. Anyway, if that patch gets released, it won’t be our decision to make anymore whether to use jitter entropy or not.
>
> [1] https://github.com/torvalds/linux/commit/50ee7529ec4500c88f8664560770a7a1b65db72b
> [2] https://lore.kernel.org/lkml/20190930033706.GD4994@mit.edu/
>
> I expect that on systems supported by Tails (64-bit x86), the jitter entropy generator will work quite well, so that even if we remove haveged and rngd, applications won’t have to wait for a long time for the RNG to be initialized. We should test that once we can upgrade to Linux 5.4.

The switch to the 5.4 kernel is approaching, so we should discuss whether we want to remove haveged and rngd. I think the points I raised on Bug #17154#note-2 are still valid, and I’m in favor of removing rngd and setting random.trust_cpu=off. But I see potential problems with removing haveged:

Currently, haveged generates jitter entropy during boot. The 5.4 kernel will have a built-in jitter entropy generator (which is potentially better than haveged, because it’s not subjected to the kernel’s scheduler, see Bug #17154#note-2). But AFAICS, it is only used if getrandom is called, not if /dev/random or /dev/urandom are read. Given that, removing haveged could cause:

Applications reading from /dev/urandom without first checking if enough entropy is available will get lower quality random numbers.
Applications which wait until enough entropy is available and then read /dev/urandom could block for a long time.

If all applications in Tails would use getrandom, things would be fine, but that’s probably not the case (and users might install additional software which doesn’t use getrandom).

#13 Updated by intrigeri 2020-01-27 08:35:45

related to ~~Feature #17443~~: Upgrade Linux to 5.4.8+ added

#14 Updated by intrigeri 2020-01-28 09:38:27

FWIW, just in case there’s anything in there that you did not have in mind yet, various sources of info I’m monitoring yielded this recently:

Removing the Linux /dev/random blocking pool (LWN article)
https://twitter.com/mjg59/status/1181423056268349441
https://twitter.com/LucaFilipozzi/status/1181426253636755457
CONFIG_RANDOM_TRUST_BOOTLOADER and EFI_RNG options

#15 Updated by segfault 2020-01-28 11:32:10

intrigeri wrote:
> FWIW, just in case there’s anything in there that you did not have in mind yet, various sources of info I’m monitoring yielded this recently:

Thanks!

> * Removing the Linux /dev/random blocking pool (LWN article)

I didn’t know about this upcoming patch yet. It would be great if reads from /dev/random would work the same as calls to getentropy. I will redo the analysis regarding the removal of haveged once/if this is patch is merged and released (unfortunately, it did not make it into 5.5).

> * https://twitter.com/mjg59/status/1181423056268349441
> * https://twitter.com/LucaFilipozzi/status/1181426253636755457

I already cited that twitter thread above :)

> * CONFIG_RANDOM_TRUST_BOOTLOADER and EFI_RNG options

Same as with CONFIG_RANDOM_TRUST_CPU, the seed passed by the EFI will be mixed into the entropy pool in any case, but it won’t be credited if CONFIG_RANDOM_TRUST_BOOTLOADER is not enabled. So I think we should not activate that option, or even disable it if it’s enabled by default in Debian.

I can’t find anything about EFI_RNG kernel options.