Feature #7102: Evaluate how safe haveged is in a virtualized environment

Feature #7102

Evaluate how safe haveged is in a virtualized environment

Added by intrigeri 2014-04-17 11:08:07 . Updated 2019-10-03 04:35:51 .

Status:

Confirmed

Priority:

Normal

Assignee:

dkg

Category:

Target version:

Hole in the Roof

Start date:

2014-04-17

Due date:

% Done:

Feature Branch:

Type of work:

Security Audit

Blueprint:

Starter:

Affected tool:

Deliverable for:

Description

haveged relies on the RDTSC instruction, that apparently is useless in “some” virtualized environments:

We should research this further. A good question would be: would we be better off if we did not ship haveged at all, and instead relied only on the standard Linux entropy gathering method (that also likely has flaws when used in a VM)?

Subtasks

Related issues

Related to Tails - ~~Feature #5650~~: rngd	Resolved
Related to Tails - Feature #6116: Audit random seed	Confirmed
Related to Tails - ~~Feature #10779~~: Start haveged earlier in the boot process	Resolved	2015-12-20
Related to Tails - ~~Feature #11898~~: Have a readable blueprint about randomness in Tails	Resolved	2016-11-04
Related to Tails - Bug #17154: Improve entropy gathering	Confirmed

History

#1 Updated by intrigeri 2014-04-17 11:08:22

related to ~~Feature #5650~~: rngd added

#2 Updated by intrigeri 2014-04-17 11:32:44

Assignee set to geb

#3 Updated by intrigeri 2014-06-21 13:32:23

related to Feature #6116: Audit random seed added

#4 Updated by BitingBird 2015-01-08 03:59:20

geb, do you still plan to audit that ? If yes, that’s great - if not, please remove yourself from assignee :)

#5 Updated by intrigeri 2015-02-10 15:03:25

Description updated

#6 Updated by intrigeri 2015-02-10 15:34:59

David Goulet wrote on ~~Feature #5650#note-16~~:

> There is a fallback usually to rdtsc. In haveged case, the generic fallback is:
>
> clock_gettime(CLOCK_MONOTONIC, &ts);
>
> The monotonic clock is used. It can NOT go back in time but might subject to incremental adjustement by any NTP correction. Still much better than using “date +%s”.

#7 Updated by intrigeri 2015-02-10 20:15:37

Assignee changed from geb to intrigeri
Target version set to Tails_1.4

I’ll try to take care of it for 1.4.

#8 Updated by intrigeri 2015-04-22 01:02:43

Target version changed from Tails_1.4 to Hole in the Roof

#9 Updated by intrigeri 2015-07-12 03:02:43

Assignee deleted (~~intrigeri~~)

It’s unlikely that I’ll have+take time to take care of this. Anyone else, feel free to take it.

#10 Updated by intrigeri 2015-12-20 03:33:28

related to ~~Feature #10779~~: Start haveged earlier in the boot process added

#11 Updated by intrigeri 2016-02-19 00:44:57

Type of work changed from Audit to Security Audit

#12 Updated by intrigeri 2016-02-21 14:25:03

Description updated

#13 Updated by cypherpunks 2016-03-02 02:58:46

One solution would be to have haveged XOR its contents with the pool without issuing the ioctls to raise the pool’s entropy estimate. This can be done by writing constantly to the pool through the randomness device, such as by running the following in the background at start up:

haveged -n 0 -f - 2>/dev/null | pv -q -L 1024 >/dev/urandom

This will write to /dev/urandom at a rate of 1024 bytes per second (I think it actually writes it in bursts of 4096 or 512 bytes but I really don’t remember and I don’t have strace on hand right now). The overhead is extremely small even on very low-power netbooks, and should provide far superior randomness to using regular haveged, while additionally not increasing the entropy estimate by itself.

An alternative would be to write to /dev/urandom in chunks every 10 minutes or so, by adding something like this to a cron job:

haveged -n 1M -f /dev/urandom 2>/dev/null

That won’t provide the same recovery speed against PRNG state compromises, but it will skip the admittedly low overhead of pv.

I would personally recommend using the former. I’ve used something akin to that for many years. The only downsides I can see for either of these options is with Pidgin’s OTR key generation, which by default reads from /dev/urandom, and generating GPG keys, which do the same.

#14 Updated by cypherpunks 2017-01-15 03:34:34

intrigeri wrote:
> David Goulet wrote on ~~Feature #5650#note-16~~:
>
> > There is a fallback usually to rdtsc. In haveged case, the generic fallback is:
> >
> > clock_gettime(CLOCK_MONOTONIC, &ts);
> >
> > The monotonic clock is used. It can NOT go back in time but might subject to incremental adjustement by any NTP correction. Still much better than using “date +%s”.

That is only a fallback if TSC support is reported missing by the CPU. In some hypervisors, it doesn’t report it missing, instead it traps it and returns NULL in eax and edx. This means haveged falsely thinks it has a working TSC.

I don’t think Tails will ever be in a situation where the clock_gettime() fallback is used, because no hypervisor should report a CPU with a missing TSC, and Tails does not support any CPU architectures that genuinely lack a TSC (modern x86 and ARM support it). The only problems that can occur, as far as I know, are Tails being provided with a bogus TSC.

I don’t know if this is a problem for the kernel’s entropy generator, since it calculates the deltas between three RDTSC invocations before incrementing the entropy estimate in add_timer_entropy(). Someone should test this in bochs.

#15 Updated by emmapeel 2018-04-11 11:05:08

Description updated

Picking some brains for this ticket, I gathered:

Some random hacker said something like: I like the using the kernel approach after reading the ticket.

Schleuder people is thinking on dumping haveged from their recommended packages list in Debian, see: https://0xacab.org/schleuder/schleuder/issues/194 (added to description).

LEAP people evaluated the use of haveged in VM and decided in favor of keeping it: https://0xacab.org/leap/platform/issues/6664

#16 Updated by emmapeel 2018-04-11 12:24:34

Description updated

#17 Updated by dkg 2018-04-11 18:02:39

Can i zoom out a bit and ask why specifically you’re thinking about haveged? Concretely, what task do you need it for? What underlying systems use blocking /dev/random ? when during the life cycle of the vm do they use it?

one thing i can think of is GnuPG secret key generation. I think that’s a bug in GnuPG and should be worked around there. (see https://dev.gnupg.org/T3894). Are there other use cases driving the need for haveged?

#18 Updated by Anonymous 2018-07-03 17:09:43

Assignee set to intrigeri
QA Check set to Info Needed

#19 Updated by Anonymous 2018-08-18 13:57:31

related to ~~Feature #11898~~: Have a readable blueprint about randomness in Tails added

#20 Updated by sycamoreone 2018-10-02 11:01:22

dkg wrote:
> Can i zoom out a bit and ask why specifically you’re thinking about haveged? Concretely, what task do you need it for? What underlying systems use blocking /dev/random ? when during the life cycle of the vm do they use it?

One program/library that uses /dev/random is pidgin-otr/libotr. Without a system such as haveged pidgin will freeze during key generation without any notification about what is happening: https://bugs.otr.im/plugins/pidgin-otr/issues/63#note_412

#21 Updated by intrigeri 2018-11-17 12:17:43

Assignee changed from intrigeri to dkg
QA Check deleted (~~Info Needed~~)

sycamoreone kindly answered dkg’s question, so assigning back. I should add that this ticket is not about “shall we start shipping haveged?”: we’ve been shipping haveged since 2010, and then were told it might be unsafe in some kinds of VMs, hence this ticket.

Also, I’m adding kurono & segfault in the loop: they’re the main people who have worked on RNG matters in Tails recently.

#22 Updated by intrigeri 2019-10-03 04:35:51

Linux 5.4 will likely include a “jitter entropy” mechanism, similar in essence to what haveged is doing (detailed source will become public in a few days). So it might be that at some point we can simply drop haveged and close this ticket :)

Also, quoting from that article:

> Matthew Garrett pointed out that the Zircon kernel for the Fuchsia operating system initializes its CRNG using jitter entropy, which may lend some credibility to the technique.

#23 Updated by intrigeri 2019-10-15 12:26:11

related to Bug #17154: Improve entropy gathering added