Bug #15915

Drop background readahead on boot

Added by intrigeri 2018-09-05 12:34:34 . Updated 2019-01-30 11:52:00 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Hardware support
Target version:
Start date:
2018-09-05
Due date:
% Done:

100%

Feature Branch:
feature/15915-remove-readahead
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

Starting Tails 3.9 from DVD is twice slower than 3.8 (3:55 vs. 2:12 on my hardware), presumably because my “common sense” (sic) did not make me filter out enough stuff from the SquashFS sort file. I did not measure things nor look at the bootchart but it seems as if the X.Org startup was blocked for 2 minutes, waiting for the background readahead to complete, which is kinda ironic.

I think we should:

  1. keep the SquashFS sort file for now, hoping it is still useful
  2. try removing the part about BG_FILES from config/chroot_local-includes/lib/live/config/0000-readahead and measure boot time from DVD and USB, both in a problematic case (3.9) and optimal case (like the squashfs.sort used for 3.8)
  3. if removing background readahead helps, try removing the foreground readahead as well and measure things again

Files


Subtasks


Related issues

Blocked by Tails - Bug #16134: devel branch FTBFS since torbrowser-launcher 0.3.1-2 was uploaded to sid Resolved 2018-11-17

History

#1 Updated by intrigeri 2018-11-17 18:44:31

Our work on Feature #15292 will deprecate booting from DVD except for VMs, where readahead does not matter much (or at all): most Linux distros have no such mechanism when booting from the hard disk, which is effectively the case when booting a VM from an ISO stored on a hard disk. So I’ll skip the measurement part and directly remove all readahead.

#2 Updated by intrigeri 2018-11-17 18:52:09

And I’ll also remove the SquashFS sort system: it’s not particularly useful when booting from a USB stick, increases the cost of releasing Tails, and is useless for VMs.

#3 Updated by intrigeri 2018-11-17 19:02:23

  • Status changed from Confirmed to In Progress
  • Feature Branch set to feature/15915-remove-readahead

#4 Updated by intrigeri 2018-11-17 20:06:37

  • blocked by Bug #16134: devel branch FTBFS since torbrowser-launcher 0.3.1-2 was uploaded to sid added

#5 Updated by intrigeri 2018-11-17 21:05:40

Note to myself: pay attention to reproducibility (who knows, maybe the sort file is actually a key factor of building the SquashFS reproducibly?)

#6 Updated by intrigeri 2018-11-18 18:46:40

  • % Done changed from 0 to 10

Interestingly, my hunch seems to have been wrong: in VMs on my system (ISO backed with by pretty fast NVMe), removing both the readahead and the SquashFS sort file makes the boot process slower and (perhaps more importantly) there’s ~5 seconds of delay, with the hourglass cursor, between the time when the GDM desktop appears and the time when the Greeter interface is displayed, while everything pops up at about the same time without these changes. It also makes the GNOME login process slower after clicking through the Greeter.

I’m running the test suite on another machine to check if the difference is measurable there (relevant to the throughput of our CI) and I’ll test on bare metal USB: if things are also slower when booting from USB, I’ll test with only one change at a time, among the 2 I’ve made.

#7 Updated by intrigeri 2018-11-18 19:58:38

ThinkPad X200, USB:

  • baseline (readahead + SquashFS sort file): 96s to the Greeter + 25s to the GNOME desktop
  • no readahead, no SquashFS sort file: 128s to the Greeter + 42s to the GNOME desktop

=> one of these 2 things is definitely useful. I’ll find out which one.

Speculating — if it’s the readahead, then let’s keep it for now and reconsider once we ship a pre-compiled AppArmor policy (aka. binary cache): most of the benefit of the readahead is gained while the boot is blocked by CPU-bounds tasks that don’t use much I/O; at the moment, on not-super-fast machines, I suspect this is dominated by apparmor.service compiling the policy; once we remove that big CPU-bound blocker from the boot process, it may be that the readahead won’t be of much use anymore.

#8 Updated by intrigeri 2018-11-19 08:28:08

ThinkPad X200, USB:

  • baseline (readahead + SquashFS sort file): 96s to the Greeter + 25s to the GNOME desktop
  • no readahead, no SquashFS sort file: 128s to the Greeter + 42s to the GNOME desktop
  • no readahead, SquashFS sort file: 85s to the Greeter + 25s to the GNOME desktop

The results I see in VMs on my system (ISO backed with by pretty fast NVMe) and in the test suite (another machine, pretty fast NVMe as well) are mostly consistent with those: the only exception is that there’s no statistically significant improvement when removing only the readahead.

Conclusions:

  • sorting the SquashFS is very useful even when booting from bare metal USB or from an ISO backed by fast storage => let’s keep it, even though it has a cost (takes time during the release process which increases our time to remediation, decreases the chances we put out emergency releases, and costs us money)
  • the readahead is detrimental when booting from bare metal USB and useless when booting from an ISO backed by fast storage => let’s drop it, save 10s of boot time on a typical laptop, and make our code simpler :)

(Oh, and to whoever might be wondering: as indicated by the fact this does not block any FT tracking ticket, I’m doing this as a volunteer, for fun, without clocking, purely as pleasurable procrastination, even though it might results in minor improvements to our “Make it easier to switch between a Tails contextual identity and another identity outside of Tails” strategic planning goal :)

#9 Updated by intrigeri 2018-11-30 15:48:24

I’ve not checked in depth but it seems that this branch has failures on Jenkins that I’m not used to see there. I don’t know if that’s a bug in our test suite which would be usually not be triggered, or a “real” regression caused by this branch.

#10 Updated by intrigeri 2018-12-03 13:46:10

  • Assignee changed from intrigeri to anonym
  • % Done changed from 10 to 50
  • QA Check set to Ready for QA
  • Type of work changed from Test to Code

intrigeri wrote:
> I’ve not checked in depth but it seems that this branch has failures on Jenkins that I’m not used to see there. I don’t know if that’s a bug in our test suite which would be usually not be triggered, or a “real” regression caused by this branch.

The devel branch is affected as well and I’ve tracked it down to a regression in systemd v239 (Bug #16184). Recent builds from this branch don’t expose the problem more than the devel branch in a statistically meaningful way.

Tentatively assigning the review to anonym but let’s discuss this at the FT meeting later today :)

#11 Updated by intrigeri 2018-12-03 13:49:58

  • Subject changed from Consider dropping background readahead on boot to Drop background readahead on boot

#12 Updated by intrigeri 2019-01-04 15:20:56

  • Assignee deleted (anonym)

#13 Updated by lamby 2019-01-06 11:38:07

  • Assignee set to lamby

(QA)

#14 Updated by lamby 2019-01-06 13:14:39

Builds fine (see attached build log) and boots fine too (see dmesg.txt). Let me know if I should test anything in particular here. Nothing seems amiss…

#15 Updated by intrigeri 2019-01-06 15:52:53

  • Status changed from In Progress to Fix committed
  • % Done changed from 50 to 100

Applied in changeset commit:tails|01833d854b1372c214c87739211e50dfe385458e.

#16 Updated by intrigeri 2019-01-06 15:53:29

  • Assignee deleted (intrigeri)

#17 Updated by anonym 2019-01-30 11:52:00

  • Status changed from Fix committed to Resolved