Feature #5650

rngd

Added by Tails 2013-07-18 07:44:02 . Updated 2016-09-20 16:47:12 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Target version:
Start date:
Due date:
% Done:

100%

Feature Branch:
feature/5650-rngd
Type of work:
Code
Blueprint:

Starter:
0
Affected tool:
Deliverable for:

Description

In his talk at LinuxCon Europe 2012 about random number generation on Linux, H. Peter Anvin strongly advises to run rngd (from rng-tools.

rngd acts as a bridge between a Hardware TRNG (true random number generator) such as the ones in some Intel/AMD/VIA chipsets, and the kernel’s PRNG.

About haveged: "So, while I can’t really recommend it, I can’t not recommend it either." If you are going to run HAVEGE, Peter strongly recommended running it together with rngd, rather than as a replacement for it.

Roadmap

How to convince haveged and rngd to play together nicely. Can we just install both and be done with it?

According to H. Peter Anvin’s slides, haveged "can be run in parallel with rngd".

Let’s try that.

Debian package need some care, call for co-maintainer on Debian bug #542599. The package is actually a bit behind the ubuntu one, doesn’t include support for TPM hardware, which is the only one I could try. In a Tails VM, once installed the rngd daemon fail to start given there’s no hardware available.


Subtasks


Related issues

Related to Tails - Feature #7102: Evaluate how safe haveged is in a virtualized environment Confirmed 2014-04-17
Related to Tails - Feature #6116: Audit random seed Confirmed
Related to Tails - Feature #7675: Persist entropy pool seeds Duplicate 2016-11-04
Related to Tails - Feature #7687: Remove ekeyd Resolved 2014-07-29
Related to Tails - Feature #11758: Analyze early boot entropy gathering Resolved 2016-09-02
Related to Tails - Bug #17154: Improve entropy gathering Confirmed

History

#1 Updated by BitingBird 2014-04-05 18:09:03

  • Description updated
  • Starter set to No

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=692450 is the bug demanding that the maintainer packages the version 4, like Ubuntu has since nearly a year.

#2 Updated by intrigeri 2014-04-17 11:08:22

  • related to Feature #7102: Evaluate how safe haveged is in a virtualized environment added

#3 Updated by intrigeri 2014-06-21 13:32:37

#4 Updated by BitingBird 2014-07-19 17:48:03

Pinged the Debian maintainer and the Ubuntu maintainers.

#5 Updated by ioerror 2014-07-27 23:32:44

I think we should ship rng-tools and haveged with Tails as a bare minimum. We may also want to ship randomsound and leave it disabled. It would be useful on an airgapped machine but it will interfere with audio. It was invasive when I last looked at it.

#6 Updated by BitingBird 2014-07-28 00:00:48

Debian maintainer answered but I don’t understand the answer :) https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=692450

#7 Updated by intrigeri 2014-07-28 12:47:56

  • Feature Branch set to feature/5650-rngd

Tested in an ISO built from feature/5650-rngd, on a guest libvirt/qemu VM with a virtio rng device. The virtio-rng kernel module is automatically loaded, and rngd is automatically started. If I got the rng-tools (in Wheezy) initscript right, this should work just as well with various other hardware prngs.

#8 Updated by intrigeri 2014-07-28 13:14:52

Now, wrt. haveged/rngd working together: reading their manpage, it seems to me that:

  • rngd will by default feed entropy until fill-watermark is available. fill-watermark defaults to 50% of the size of the entropy pool, which is itself 4096 bits, so it defaults to 2048 bits;
  • haveged (by default on Wheezy) starts feeding entropy when the pool falls below 1024 bits.

So, in some situations, when an entropy source for rngd is available, and the pool has between 1024 and 2048 bits of available entropy, rngd could dominate haveged completely. I’m not sure what to do about it: maybe having haveged start feeding entropy when the pool falls below 2048 bits would help, but then we’re simply creating a race, and I’ve no idea which ones of the two competitors would win.

#9 Updated by intrigeri 2014-07-28 14:46:51

I’m told that:

  1. if the pool is persisted, “whether it’s haveged or rngd feeding it, it doesn’t matter”;
  2. else, “you have bigger problems… and picking in between evils is a fools’ game”.

Regarding (1), we now have Feature #7675. So far, so good.

Regarding (2), I still would like to understand better whether introducing rngd in Tails could possibly entirely override haveged if a hardware RNG is available (and even then, maybe shipping both is the right thing to do), or be entirely overriden by haveged in most cases (and then, it’s useless to ship rngd at all).

#10 Updated by intrigeri 2014-07-28 14:50:17

#11 Updated by intrigeri 2014-07-28 16:47:18

  • Assignee set to dgoulet
  • Target version changed from Hardening_M1 to Tails_1.2
  • % Done changed from 0 to 20
  • QA Check set to Ready for QA
  • Type of work changed from Test to Code

Formally speaking, that’s still a release goal for the 3.0 milestone. It seems to be ready, though, and I’d like more pairs of eyes to look at it, so I’m flagging it for 1.2, and tentatively assigning to dgoulet for a review of the design doc.

#12 Updated by intrigeri 2014-07-28 19:55:12

  • Status changed from Confirmed to In Progress

#13 Updated by sajolida 2014-07-29 17:47:36

#14 Updated by intrigeri 2014-10-06 05:35:11

  • Target version changed from Tails_1.2 to Tails_1.3

This wasn’t reviewed during the 1.2 dev cycle => tentatively flagging for 1.3. David, do you think you’ll have time to review the design doc on our feature/5650-rngd branch, or should I find someone else?

#15 Updated by dgoulet 2014-10-06 06:19:44

Yes, my appology, I simply forgot about that bug and didn’t flag it in my email :S…

Reviewing the hell out of it today, guarantee! I’ll post my comment here once I’m done. If there is blind spot for me also, I’ll make sure to let you all know so we can complement with someone else.

#16 Updated by dgoulet 2014-10-08 08:10:22

I’ll go in an hopefully “human readable order” using the document sections.

rngd

I don’t think this is accurate:

The fill-watermark defaults to 50% of the size of the entropy pool, which itself defaults to 4096 bits on
Linux 3.14, so basically rngd feeds the entropy pool unless there are already 2048 bits in it..

The pool size is by default 4096 on Linux and the watermark is set to 896 on Debian. So, rngd will fill it if the pool is below that watermark. However, this is quite untrue, rngd does not poll/select on /dev/random in any way thus the write threshold is just not used at all by rngd… For what I see in the code, it just writes as much as it can in a main loop…

Also, the following note is not true:

Note: rngd (2-unofficial-mt.14-1) does not modify any parameter in /proc/sys/kernel/random/.

In fact, looking at the rngd code —fill-watermark does write to the /proc/sys/kernel/random/write_wakeup_threshold

HAVEGE reliability

There is a fallback usually to rdtsc. In haveged case, the generic fallback is:

clock_gettime(CLOCK_MONOTONIC, &ts);

The monotonic clock is used. It can NOT go back in time but might subject to incremental adjustement by any NTP correction. Still much better than using “date +%s”.

Interaction between haveged and rngd

I took a look at the rngd code and the default behaviour is to simply stop if no hwrng is found thus leaving haveged the only one feeding the pool. If one is found, rngd will fill up the pool using an ioctl on /dev/random to add entropy. Reading the rngd code, of what I understand, is that it fills up a buffer of a fixed size (see below) from the FIPS standard (FIPS 140-1/140-2) using a random step (-s, default: 64 bytes).

#define FIPS_RNG_BUFFER_SIZE 2500

Once that buffer is full (from reading on the hwrng or TPM or drng), it feeds it to the kernel.The write_wakeup_threshold set by default on Debian to 896 bytes so this means that if rngd wins the race the kernel will wakeup rngd on that limit at first and not the one set by haveged. Either way, if an hw rng is available, it’s a good thing to feed the kernel when it needs it even with the default value being the distro.

Now, if haveged wins the race, it will fills up the full pool size (4096 bytes) so rngd is useless but both can race and here is why.

Haveged fills the kernel pool size in one single write to it but rngd has random_step wich basically means it will make an ioctl to the kernel to feed $random_step at a time thus it could be racing with haveged writing to also. I’ll try to demonstrate:

[empty pool: 0 bytes]
rngd —> write 64 bytes [pool: 64 bytes]
rngd —> writes 64 bytes [pool: 128 bytes]
haveged —> writes pool_size (default: 4096) - current (128 bytes). [pool: 4096 bytes]
[everyone stops since the pool size can’t be changed via /proc]
rngd —> blocks on the write because it still has to write (2500 - 128) bytes

So, after all this technical details, I think that having both is still relevant. Now, the question is which one has the better entropy source. I would argue that if rngd can find a hw rng, it worth using it before haveged fills up the pool. I didn’t find a way to run rngd for a single run and after that Tails could launch haveged… Could be done with some hackerish sleep() but I would strongly advise against that :). However, I think Tails could look for /dev/hwrng, if there, launch rngd and don’t use at all haveged. Else, launch haveged. The rational behind that would be that considering the behaviour of both daemon, I think it’s much better to have rngd run by itself using “hopefully” a good hw rng then having haveged filling the full pool size and racing with rngd with possibly less strong entropy.

Side note, a really good contribution to rngd would be to have a way to tell it to do a single loop on the hwrng thus filling 2500 bytes to the pool and exiting. That way, Tails could easily test if hwrng is available and mix the entropy pool from the two daemon. Futhermore, it should be able also to write only X bytes to the pool so with a feature like that Tails could do:

if /dev/hwrnd, launch rngd with one pass of pool_size / 2:
launc haveged to fill the rest

You mix with 50% of both sources which would be ideal…

Random pool seeding

Yes, using the date is a bad idea and I would like to refer to a comment I made on the tails list on that discussion. If there is a NTP correction before that, an attacker could see the time correction thus knowing the seed.

Please feel free to ask as much question as you like and ask me to double check stuff. I read both haveged and rngd code for that and the kernel as well for understanding how two daemons can interact at the same time on the entropy pool. I might have made a mistake so more eyes and question is always good!

#17 Updated by intrigeri 2014-10-31 14:07:56

  • Assignee changed from dgoulet to intrigeri
  • QA Check changed from Ready for QA to Dev Needed

#18 Updated by intrigeri 2015-02-10 15:28:53

  • Assignee changed from intrigeri to dgoulet
  • Target version changed from Tails_1.3 to Tails_1.4
  • QA Check changed from Dev Needed to Info Needed

Hi David,

sorry for the delay… and thanks a lot for looking into this! :)

Note that I’m looking at the rng-tools 2-unofficial-mt.14-1 source package, as currently found in Debian Wheezy, Jessie and sid.

dgoulet wrote:
> rngd
>
> I don’t think this is accurate:
>
> The fill-watermark defaults to 50% of the size of the entropy pool, which itself defaults to 4096 bits on
> Linux 3.14, so basically rngd feeds the entropy pool unless there are already 2048 bits in it..
>
> The pool size is by default 4096 on Linux and the watermark is set to 896 on Debian. So, rngd will fill it if the pool is below that watermark.

On Debian Wheezy, rngd(8) says the default is 50%, and indeed, rngd_linux.c sets random_pool_fill_watermark to 2048. I can’t find any occurrence of “896” in the source package. Where exactly did you find this value?

> However, this is quite untrue, rngd does not poll/select on /dev/random in any way thus the write threshold is just not used at all by rngd… For what I see in the code, it just writes as much as it can in a main loop…

Maybe we’re not looking at the same code, maybe my C is just too bad (it is), but I see a call to random_sleep in do_rng_data_sink_loop. random_sleep seems to compare the entropy available in random_fd with random_pool_fill_watermark, and my understanding is that random_fd defaults to opening /dev/random.

> Also, the following note is not true:
>
> Note: rngd (2-unofficial-mt.14-1) does not modify any parameter in /proc/sys/kernel/random/.
>
> In fact, looking at the rngd code —fill-watermark does write to the /proc/sys/kernel/random/write_wakeup_threshold

Where exactly do you see that? The only occurrence of write_wakeup_threshold I can find in the source package is in the rngd(8) manpage.

> HAVEGE reliability […]

Thanks! I’ll add this info on the ticket we have about this topic (Feature #7102).

> Interaction between haveged and rngd
>
> I took a look at the rngd code and the default behaviour is to simply stop if no hwrng is found thus leaving haveged the only one feeding the pool. If one is found, rngd will fill up the pool using an ioctl on /dev/random to add entropy. Reading the rngd code, of what I understand, is that it fills up a buffer of a fixed size (see below) from the FIPS standard (FIPS 140-1/140-2) using a random step (-s, default: 64 bytes).
>
> #define FIPS_RNG_BUFFER_SIZE 2500
>
> Once that buffer is full (from reading on the hwrng or TPM or drng), it feeds it to the kernel.

Hmm, again your C is of course better than mine, but that’s not how I understand this code:

                        r = FIPS_RNG_BUFFER_SIZE;

                        while (r > 0) {
                                if (gotsigterm) pthread_exit(NULL);

                                if ((s = arguments->random_step) > r) s = r;
                                random_add_entropy(p, s);
                                r -= s;
                                p += s;
                                random_sleep();
                        }

It seems to me that random_add_entropy (and thus the ioctl that sends entropy to /dev/random) is called at every step, without waiting for the p buffer to be full. Did I miss something? And indeed, your explanation below (“rngd has random_step wich basically means it will make an ioctl to the kernel to feed $random_step at a time”) seems to contradict what you wrote above.

Given these bits seem to be pretty important, and might invalidate the rest of the reasoning, I’ll wait for your confirmation before I work on this again.

Cheers!

#19 Updated by intrigeri 2015-04-22 01:02:23

  • Target version changed from Tails_1.4 to Hole in the Roof

#20 Updated by intrigeri 2015-12-20 03:33:40

  • related to Feature #10779: Start haveged earlier in the boot process added

#21 Updated by sycamoreone 2015-12-28 09:14:29

I looked into this ticket with dgoulet at 32c3. I hope he will correct me, if I misrepresent anything he said here. A few first points:

  • Intrigeri’s analysis above is correct. dgoulet was partially looking at the wrong code.
  • The problem with different thresholds mentioned in Feature #5650#note-8 exists, but we can get around this using the `—write=nnn` command line option to set havenged’s wakeup-threshold to 2048 bit also. Synchronization shouldn’t be a problem, as both processes use the RNDADDENTROPY ioctl to write to /dev/random.
  • Because both systems use the ioctl we can query the amount of entropy added to the pool. It would be possible to start both services early in the boot process and then block until enough entropy is available. We then wouldn’t need to var/…/random-seed files.

#22 Updated by intrigeri 2016-02-21 14:28:36

  • Assignee changed from dgoulet to intrigeri
  • QA Check deleted (Info Needed)

Thanks!

#23 Updated by intrigeri 2016-02-21 15:15:53

Note that haveged from Debian testing/sid changes the way write_wakeup_threshold is handled: https://sources.debian.net/src/haveged/1.9.1-3/debian/patches/0003-Don-t-set-a-watermark-higher-than-pool-size.patch/, and we’re going to ship it soon (to address Feature #10779) => I’ll try to look if this changes anything in our reasoning here, but please feel free to be faster than me :)

#24 Updated by intrigeri 2016-02-29 17:23:17

This branch fails tests since the rng-tools service fails to start (/etc/init.d/rng-tools: Cannot find a hardware RNG device to use). I think we should:

  • Add <rng model='virtio'> to the VM used for the test suite, so that we exercise the startup of rngd, and don’t have to add ugly workarounds to deal with the fact it would not start.
  • Patch the initscript so that it exit 0 when no hardware RNG device is found => most Tails systems in the real world are considered as fully started, even if they have no such device.

#25 Updated by intrigeri 2016-06-09 12:18:17

intrigeri wrote:
> This branch fails tests since the rng-tools service fails to start (/etc/init.d/rng-tools: Cannot find a hardware RNG device to use). I think we should: […]

Done on the topic branch a few months ago.

#26 Updated by intrigeri 2016-06-09 12:27:40

intrigeri wrote:
> Note that haveged from Debian testing/sid changes the way write_wakeup_threshold is handled: https://sources.debian.net/src/haveged/1.9.1-3/debian/patches/0003-Don-t-set-a-watermark-higher-than-pool-size.patch/, and we’re going to ship it soon (to address Feature #10779) => I’ll try to look if this changes anything in our reasoning here, but please feel free to be faster than me :)

It won’t change our reasoning: that change only matters if one passes -w N to haveged with N close to, or bigger, than the size of the entropy pool (4096 bits by default). And the Debian package passes -w 1024.

#27 Updated by intrigeri 2016-06-09 13:54:59

sycamoreone wrote:
> * The problem with different thresholds mentioned in Feature #5650#note-8 exists, but we can get around this using the `—write=nnn` command line option to set havenged’s wakeup-threshold to 2048 bit also.

OK. I’m doing this on the topic branch.

> Synchronization shouldn’t be a problem, as both processes use the RNDADDENTROPY ioctl to write to /dev/random.

I’m sorry I don’t understand this part. If rngd and haveged both start feeding the entropy pool at the same time, what does “Synchronization shouldn’t be a problem” mean? Doesn’t one of them dominate the pool?

> * Because both systems use the ioctl we can query the amount of entropy added to the pool. It would be possible to start both services early in the boot process and then block until enough entropy is available. We then wouldn’t need to var/…/random-seed files.

Cool idea, added to Feature #7675.

#28 Updated by intrigeri 2016-06-09 15:07:09

  • related to deleted (Feature #10779: Start haveged earlier in the boot process)

#29 Updated by intrigeri 2016-06-09 15:07:25

  • blocked by Feature #10779: Start haveged earlier in the boot process added

#30 Updated by intrigeri 2016-06-09 15:07:55

(Merged the branch for Feature #10779, since they’re touching similar areas and it makes my job easier.)

#31 Updated by intrigeri 2016-06-09 15:56:41

Updated the branch and design doc. It would be good to have sycamoreone & dgoulet answer the remaining question I’ve asked above, but according to the topic branch’s design doc, any one of the 3 possible results of the race condition is acceptable, so that’s not a blocker.

#32 Updated by intrigeri 2016-06-10 05:28:25

intrigeri wrote:
> (Merged the branch for Feature #10779, since they’re touching similar areas and it makes my job easier.)

Reverted that, I don’t want to block on the boot failure caused by the changes introduced for Feature #10779.

#33 Updated by intrigeri 2016-06-10 05:28:36

  • blocks deleted (Feature #10779: Start haveged earlier in the boot process)

#34 Updated by intrigeri 2016-06-10 13:42:31

  • blocks Bug #11522: Adjust haveged arguments customization for Stretch added

#35 Updated by intrigeri 2016-06-10 13:48:11

Status: running the full test suite on it.

#36 Updated by intrigeri 2016-07-18 09:51:53

  • % Done changed from 20 to 30

Seen the full test suite pass.

#37 Updated by intrigeri 2016-07-20 00:46:28

  • Assignee changed from intrigeri to anonym
  • Target version changed from Hole in the Roof to Tails_2.6
  • QA Check set to Ready for QA

#38 Updated by intrigeri 2016-08-01 07:35:49

I’d like to ease reviewing for the 2.6 RM, and to get automated tests running about the combination of all these changes ASAP in the 2.6 dev cycle. So, I’ve merged this work, along with the other major branches I’m proposing for 2.6, into the feature/from-intrigeri-for-2.6 integration branch (Jenkins builds and tests.

#39 Updated by anonym 2016-08-23 08:28:15

  • Status changed from In Progress to Fix committed
  • Assignee deleted (anonym)
  • % Done changed from 30 to 100
  • QA Check changed from Ready for QA to Pass

#40 Updated by intrigeri 2016-08-25 02:27:46

  • blocked by deleted (Bug #11522: Adjust haveged arguments customization for Stretch)

#41 Updated by sycamoreone 2016-09-03 02:17:23

  • related to Feature #11758: Analyze early boot entropy gathering added

#42 Updated by anonym 2016-09-20 16:47:12

  • Status changed from Fix committed to Resolved

#43 Updated by intrigeri 2019-10-15 12:26:17

  • related to Bug #17154: Improve entropy gathering added