Bug #10757

tails-iuk 1.27 is extremely slow at applying upgrades

Added by anonym 2015-12-15 06:53:13 . Updated 2016-01-27 13:31:54 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Target version:
Start date:
2015-12-15
Due date:
% Done:

100%

Feature Branch:
iuk:bugfix/10757-slow-install
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Upgrader
Deliverable for:

Description

While testing Tails 1.8 it was discovered that the syncing introduced by the fix for Bug #10239 makes write operations extremely slow on the live mediam, and consequently applying upgrades takes much longer than before. In the test I did it took 55 minutes to apply the upgrade (the time to download exluded!).


Files


Subtasks


History

#1 Updated by anonym 2015-12-15 06:57:52

Slightly trimmed log discussing the issue:

(14:59:53) anonym: I'm afraid that my 1.7 -> 1.8 incremental upgrade has hanged...
(15:00:19) intrigeri: anonym: tell me the *download* hanged.
(15:01:02) anonym: intrigeri: nope it worked fine. it's been stuck on "your Tails device is being upgraded..." for what feels like unusually long now
(15:02:45) anonym: no errors reported anywhere (.xsession-errors, dmesg, syslog) (yet)
(15:05:49) anonym: ok, it seems to work *but* be extremely slow
(15:06:11) anonym: so is this caused by the fsync:ing stuff?
(15:07:13) intrigeri: possibly
(15:07:17) intrigeri: which would be good news, no?
(15:07:37) intrigeri: as in: increases the chances the device still boots post-upgrade.
(15:08:14) anonym: intrigeri: sure, but I'm afraid that this takes so long that users will think something is wrong
(15:08:42) anonym: it seems the IO operations have a 400 KB/s throughput...
(15:08:58) intrigeri: anonym: aah, OK, so that's because of the mount -o sync
(15:09:02) intrigeri: anonym: got it.
(15:09:12) intrigeri: anonym: wow.
(15:09:17) intrigeri: and "oops"
(15:09:17) anonym: sometimes less
(15:09:32) intrigeri: hmm
(15:09:35) bertagaz: need a break, bbl
(15:10:42) intrigeri: anonym: the "good" news is that it'll only affect 1.7->1.8 and we have 6 weeks to fix that in time for 2.0->2.2.
(15:11:26) anonym: if we are going to provide incremental upgrades from the beta/RCs it will affect that too
(15:11:33) intrigeri: right.
(15:11:35) anonym: but I agree it's a better situation than ususal
(15:13:10) anonym: NM was shut down half an hour ago... so at least that long
(15:13:10) anonym: yay
(15:13:49) intrigeri: OK, so the release notes need a known issue about it.
(15:13:57) anonym: definitely
(15:21:34) intrigeri: essentially all write ops are done with tar, so if we can tell it to write data in bigger chunks, we're good.
(15:27:26) anonym: I'm trying to make sense of some weird things I can see in the incremental upgrade
(15:27:41) anonym: now when it's happening in slow motion...
(15:30:19) anonym: like, *after* it has extracted 1.8.squashfs into live/, it seems it's copied into tmp/tmp.XXXXXXX/system.tar_tempdir/tar/live/ as well
(15:30:45) anonym: wow, this is absurdly slow
(15:32:01) intrigeri: :(
(15:32:37) intrigeri: OK, I won't fix it today, I'll relax, look elsewhere for now, and schedule time to read your bug report and act on it *later*.
(15:37:16) intrigeri: hmm
(15:37:16) intrigeri: 
        $self->record_installed_squashfs($self->squashfs_in_archive(
            $self->locate($archive_filename)
        ));

(15:37:38) intrigeri: it might be that squashfs_in_archive is stupid enough to *extract* stuff on disk.
(15:39:26) intrigeri: which would explain what you see.
(15:42:40) intrigeri: but that's only possible if Tails::IUK::Archive::Tar's constructor or Archive::Tar::Wrapper's list_all method extracts stuff.
(15:43:11) intrigeri: or Tails::IUK::Archive::Tar::Wrapper->new
(15:45:03) intrigeri: no, not the latter.
(15:47:27) intrigeri: anonym: OK, got it. record_installed_squashfs($self->squashfs_in_archive calls Tails::IUK::Archive::Tar->new which calls read_from which calls tar_wrapper->read which indeed extracts to a tempdir.
(15:48:49) anonym: intrigeri: I just find it extra odd that the file first appeared in live/ and *then* in tmp/
(15:49:04) anonym: so the upgrade took 55 minutes *excluding* the download time
(15:49:47) anonym: hmm. I wonder if that has had anything to do with the broken upgrades
(15:50:05) anonym: that the disk space runs out despite our conservative requirements on how much free space must be available
(15:50:08) intrigeri: I doubt it.
(15:50:48) intrigeri: Well, I think our conservative requirements are that conservative exactly because of this sub-optimal handling of disk I/O.

#2 Updated by intrigeri 2015-12-15 07:04:48

  • Category deleted (Installation)
  • Affected tool changed from Installer to Upgrader

#3 Updated by anonym 2015-12-15 12:43:36

Here’s an inotifywait monitor log of the installation. Strangely, at

I was hoping to catch the extra copy of 1.8.squashfs. It happens between:

18:16:53: /lib/live/mount/medium/tmp/tmp.kZWwiNf42g/system.tar_tempdir/tar/live CREATE,ISDIR
18:30:35: /lib/live/mount/medium/tmp/tmp.kZWwiNf42g/system.tar_tempdir/tar/live/initrd.img CREATE


and is created at

/lib/live/mount/medium/tmp/tmp.kZWwiNf42g/system.tar_tempdir/tar/live/1.8.squashfs


even though it was not logged, which is strange. I ran a bit with more events and got this:

19:03:09: /lib/live/mount/medium/live/1.8.squashfs CREATE
19:03:09: /lib/live/mount/medium/live/1.8.squashfs OPEN
19:17:32: /lib/live/mount/medium/live/1.8.squashfs CLOSE_WRITE,CLOSE
19:19:10: /lib/live/mount/medium/tmp/tmp.1lMqXNhgtg/system.tar_tempdir/tar/live/1.8.squashfs CREATE
19:19:10: /lib/live/mount/medium/tmp/tmp.1lMqXNhgtg/system.tar_tempdir/tar/live/1.8.squashfs OPEN
19:23:10: /lib/live/mount/medium/live/1.8.squashfs OPEN
19:23:10: /lib/live/mount/medium/live/1.8.squashfs CLOSE_NOWRITE,CLOSE
19:23:32: /lib/live/mount/medium/tmp/tmp.1lMqXNhgtg/system.tar_tempdir/tar/live/1.8.squashfs OPEN
19:23:32: /lib/live/mount/medium/tmp/tmp.1lMqXNhgtg/system.tar_tempdir/tar/live/1.8.squashfs CLOSE_NOWRITE,CLOSE
19:31:42: /lib/live/mount/medium/tmp/tmp.1lMqXNhgtg/system.tar_tempdir/tar/live/1.8.squashfs CLOSE_WRITE,CLOSE
19:32:54: /lib/live/mount/medium/tmp/tmp.1lMqXNhgtg/system.tar_tempdir/tar/live/1.8.squashfs DELETE


So, yeah, there clearly is some irrelevant stuff going on in that temp dir.

#4 Updated by aboules 2015-12-17 10:48:06

adding on user request in #tails:
(11:32:15)Hi, how long is the upgrade from 1.7 to 1.8 supposed to run on a C2Duo? I’m at 45 min CPU time already!
(19:01:50)my upgrade took 50min on a C2Duo
(19:02:25)CPU time, actual time some longer but sure less than 1:15h
(19:08:41)[…] 1 kernel was almost constantly at 100% by the upgrader process.
(19:09:03)unfortunate example (screenshot util took away CPU) https://filetea.me/t1sr9DQS8bNRXxmv3MAOov0GA <— this file is attached to this note
(19:11:55)also USB keys light wasn’t flashing too often during upgrade (i.e. CPU was the bottleneck on my system, not IO)

HTH, aboules

#5 Updated by intrigeri 2015-12-20 23:09:00

  • Assignee changed from intrigeri to anonym
  • QA Check set to Ready for QA
  • Feature Branch set to iuk:bugfix/10757-slow-install

OK, so I went with the simplest possible fix. It does not do any clever optimization, does not attack the “more I/O than needed” problem, but here it does make applying the upgrade a whole lot faster. I tested this way:

  1. install Tails 1.8 to a USB drive
  2. apply commit 2e496b882617186f4c0511bd59218af753472e2f (from iuk.git) by hand
  3. run tails-upgrade-frontend-wrapper

Last time I tried that without step 2, it took a while after downloading (like, dozens of minutes). This time it took max very few minutes (I wasn’t monitoring it constantly).

What do you think?

#6 Updated by intrigeri 2015-12-20 23:09:16

  • Status changed from Confirmed to In Progress
  • % Done changed from 0 to 10

#7 Updated by anonym 2016-01-03 15:14:10

  • Status changed from In Progress to Fix committed
  • Assignee deleted (anonym)
  • % Done changed from 10 to 100
  • QA Check changed from Ready for QA to Pass

intrigeri wrote:
> OK, so I went with the simplest possible fix. It does not do any clever optimization, does not attack the “more I/O than needed” problem, but here it does make applying the upgrade a whole lot faster.

To not risk further mess in this area for the 2.0 release (so the next upgrade to >2.0 will work fine), this is definitely the preferred strategy. I’m not even sure we care about the extra IO; feel free to open a ticket with my investigation copy-pasted if you feel like you will ever have the time to prioritize doing this.

> I tested this way:
>
> # install Tails 1.8 to a USB drive
> # apply commit 2e496b882617186f4c0511bd59218af753472e2f (from iuk.git) by hand
> # run tails-upgrade-frontend-wrapper
>
> Last time I tried that without step 2, it took a while after downloading (like, dozens of minutes). This time it took max very few minutes (I wasn’t monitoring it constantly).

I reproduced your test successfully. Applying the iuk took about 5 seconds in my VM with an SSD backed virtual USB disk, so that’s definitely an improvement. :)

> What do you think?

“Merged” is what I think! :) I won’t bother releasing a new talis-iuk until 2.0~rc1, though.

#8 Updated by intrigeri 2016-01-03 16:04:10

redmine@labs.riseup.net wrote (03 Jan 2016 15:14:10 GMT) :
> I’m not even sure we care about the extra IO; feel free to open a ticket with my investigation copy-pasted if you feel like you will ever have the time to prioritize doing this.

Seems unlikely.

> “Merged” is what I think! :) I won’t bother releasing a new talis-iuk until 2.0~rc1, though.

Woo!

#9 Updated by emmapeel 2016-01-17 16:03:20

Many users complain about it being slow (from 1.8.1 to 1.8.2 for example) and it was very slow for me too.

#10 Updated by intrigeri 2016-01-17 16:55:46

> Many users complain about it being slow (from 1.8.1 to 1.8.2 for example) and it was very slow for me too.

Yes, as documented in the “Known issues” section of the release announce.

#11 Updated by anonym 2016-01-27 13:31:54

  • Status changed from Fix committed to Resolved