Bug #17422

Broken upgrade path from 4.2 to 4.2.1: GDM doesn't start

Added by CyrilBrulebois 2020-01-10 20:56:35 . Updated 2020-01-11 09:03:57 .

Status:
Resolved
Priority:
Urgent
Assignee:
intrigeri
Category:
Target version:
Start date:
Due date:
% Done:

100%

Feature Branch:
bugfix/17422-iuk-v2-ownership-in-squashfs-diff
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

While testing the 4.2.1 release, it appears the UDFv2 thing isn’t ready yet: starting from a 4.2 IMG, started on either baremetal (X230, Intel, 8086:0166) or in VM (QEMU, Cirrus, 1013:00b8), switching it to use the test channel, deploying the upgrade to 4.2.1 seems to work fine. But upon reboot, GDM doesn’t start, listing the card and instructing users to troubleshoot using the https://tails.boum.org/gdm page.

At this stage, I’m considering it a blocker for 4.2.1 (which would ship Tor Browser 9.0.4).


Subtasks


History

#1 Updated by CyrilBrulebois 2020-01-11 00:28:20

Warning: You can safely pretend I don’t know anything about IUKs, upgrades, stack filesystems, and the like.

That being said, looking around in the devices, I’m seeing this after an upgrade:

  • 4.2.1.squashfs
  • filesystem.squashfs

while a pristine 4.2 image only has:

  • filesystem.squashfs

Using unsquashfs as root on my host system, I’m getting similar contents in the resulting squashfs-root.{original,upgraded}/var/lib/gdm3:

drwxr-xr-x   4 112     200          4096 Jan  6 17:25 squashfs-root-filesystem.original/var/lib/gdm3/
drwxr-xr-x   3 112     200          4096 Jan  6 17:25 squashfs-root-filesystem.original/var/lib/gdm3/.local
drwxr-xr-x   3 112     200          4096 Jan  6 17:25 squashfs-root-filesystem.original/var/lib/gdm3/.local/share
drwxr-xr-x   2 112     200          4096 Feb  9  2019 squashfs-root-filesystem.original/var/lib/gdm3/.local/share/applications
drwxr-xr-x   3 112     200          4096 Jan  6 17:25 squashfs-root-filesystem.original/var/lib/gdm3/.config
drwxr-xr-x   2 112     200          4096 Feb  9  2019 squashfs-root-filesystem.original/var/lib/gdm3/.config/dconf
drwxr-xr-x   4 112     200          4096 Jan  6 17:25 squashfs-root-filesystem.upgraded/var/lib/gdm3/
drwxr-xr-x   3 112     200          4096 Jan  6 17:25 squashfs-root-filesystem.upgraded/var/lib/gdm3/.local
drwxr-xr-x   3 112     200          4096 Jan  6 17:25 squashfs-root-filesystem.upgraded/var/lib/gdm3/.local/share
drwxr-xr-x   2 112     200          4096 Feb  9  2019 squashfs-root-filesystem.upgraded/var/lib/gdm3/.local/share/applications
drwxr-xr-x   3 112     200          4096 Jan  6 17:25 squashfs-root-filesystem.upgraded/var/lib/gdm3/.config
drwxr-xr-x   2 112     200          4096 Feb  9  2019 squashfs-root-filesystem.upgraded/var/lib/gdm3/.config/dconf

with 112:200 matching Debian-gdm:Debian-gdm in the Tails environment.

But the 4.2.1 which I guess ends up being put on top of the probably-untouched “main” filesystem.squashfs has:

drwxr-xr-x   4 root     root         4096 Jan  9 17:09 squashfs-root-4.2.1/var/lib/gdm3/
drwxr-xr-x   3 root     root         4096 Jan  9 17:09 squashfs-root-4.2.1/var/lib/gdm3/.local
drwxr-xr-x   2 root     root         4096 Jan  9 17:09 squashfs-root-4.2.1/var/lib/gdm3/.local/share
drwxr-xr-x   2 root     root         4096 Jan  9 17:09 squashfs-root-4.2.1/var/lib/gdm3/.config

which I guess shadows the permissions in the original filesystem, which in turn explains why GDM cannot open the X log file under the xorg subdirectory (that it has to create) under /var/lib/gdm3/.local/share. (As seen by setting a root password, and inspecting the reasons for the failed gdm.service unit.)

→ Fix the ownership issues, save the day?

#2 Updated by CyrilBrulebois 2020-01-11 00:36:06

(Same warning applies.)

And it seems Tails_amd64_4.2_to_4.2.1.iuk ships the file I was just looking at (seen under squashfs-root/overlay/live/4.2.1.squashfs after unsquashfs-ing the IUK), which seems to indicate fixing the IUK generation might be sufficient to unblock us, without having to tell users to fiddle with their un-automatically-upgradable 4.2 release?

#3 Updated by CyrilBrulebois 2020-01-11 01:41:17

(Same warning applies.)

Toying with config/chroot_local-includes/usr/src/iuk/lib/Tails/IUK.pm a little, particularly the create_squashfs_diff sub, it seems permissions are all fine with the mounting of the old and new root filesystems ($old_squashfs_mount and $new_squashfs_mount), also with the union mount ($union_mount).

But then, I’m seeing mksquashfs called, which happens to use those options:

method _build_mksquashfs_options () { [
    qw{-no-progress -noappend},
    qw{-all-root},
    qw{-comp xz -Xbcj x86 -b 1024K -Xdict-size 1024K},
]}

Looks to me -all-root is not our friend:

       -all-root
           make all files owned by root.

Looking at the history of iuk.git (since that was introduced before the merge into the main Tails repository), this dates back to:

commit b578872ba2167da7dbc106a9d20c88e523ab73b3
Author: intrigeri <intrigeri@boum.org>
Date:   Sun Nov 24 21:44:50 2019 +0000

    Start implementing IUK format v2 (<del><a class='issue tracker-2 status-3 priority-6 priority-default closed child' href='/code/issues/6876' title='Have the incremental upgrade process use less RAM'>Feature #6876</a></del>)

which seems consistent with a seemingly IUKv2-specific problem, which didn’t exist with IUKv1.

#4 Updated by CyrilBrulebois 2020-01-11 02:09:29

To verify, I’ve implemented these steps:

  • dropped -all-root from mksquashfs options;
  • dropped the last line, with compression options, so that I would get the default and quicker gzip compression (I was being a little impatient to know whether that would work);
  • generated a new IUK from 4.2 to 4.2.1;
  • extracted its /live/4.2.1.squashfs;
  • tweaked the filesystem of the virtual machine that was already upgraded from 4.2 to 4.2.1, removing the broken /live/4.2.1.squashfs and replacing it with the brand new one extracted from the hacked IUK.

I’m seeing GDM start up just fine, Tails pretend it’s running 4.2.1 (looking at /etc/os-release), Tor Browser 9.0.4 running properly, etc.

#5 Updated by intrigeri 2020-01-11 07:34:47

  • Target version set to Tails_4.2.2

This matches the analysis I did last night (couldn’t sleep…).
Thanks a lot for testing the fix, this will save me some time!
I’m now preparing a branch with a fix based on this (we need -all-root is one place, just not everywhere).

#6 Updated by intrigeri 2020-01-11 07:45:12

  • Status changed from Confirmed to In Progress
  • Target version deleted (Tails_4.2.2)
  • Feature Branch set to bugfix/17422-iuk-v2-ownership-in-squashfs-diff

#7 Updated by intrigeri 2020-01-11 07:53:06

  • Status changed from In Progress to Needs Validation
  • Assignee changed from intrigeri to CyrilBrulebois
  • Target version set to Tails_4.2.2

I have pushed a tentative (really simple and tiny) fix for this (really stupid) bug, along with the test cases I wrote last night.

#8 Updated by CyrilBrulebois 2020-01-11 08:48:21

  • Description updated
  • Assignee changed from CyrilBrulebois to intrigeri
  • Target version deleted (Tails_4.2.2)
  • Private changed from Yes to No

Warning: You can safely pretend I don’t know anything about IUKs, upgrades, stack filesystems, and the like.

That being said, looking around in the devices, I’m seeing this after an upgrade:

  • 4.2.1.squashfs
  • filesystem.squashfs

while a pristine 4.2 image only has:

  • filesystem.squashfs

Using unsquashfs as root on my host system, I’m getting similar contents in the resulting squashfs-root.{original,upgraded}/var/lib/gdm3:

drwxr-xr-x   4 112     200          4096 Jan  6 17:25 squashfs-root-filesystem.original/var/lib/gdm3/
drwxr-xr-x   3 112     200          4096 Jan  6 17:25 squashfs-root-filesystem.original/var/lib/gdm3/.local
drwxr-xr-x   3 112     200          4096 Jan  6 17:25 squashfs-root-filesystem.original/var/lib/gdm3/.local/share
drwxr-xr-x   2 112     200          4096 Feb  9  2019 squashfs-root-filesystem.original/var/lib/gdm3/.local/share/applications
drwxr-xr-x   3 112     200          4096 Jan  6 17:25 squashfs-root-filesystem.original/var/lib/gdm3/.config
drwxr-xr-x   2 112     200          4096 Feb  9  2019 squashfs-root-filesystem.original/var/lib/gdm3/.config/dconf
drwxr-xr-x   4 112     200          4096 Jan  6 17:25 squashfs-root-filesystem.upgraded/var/lib/gdm3/
drwxr-xr-x   3 112     200          4096 Jan  6 17:25 squashfs-root-filesystem.upgraded/var/lib/gdm3/.local
drwxr-xr-x   3 112     200          4096 Jan  6 17:25 squashfs-root-filesystem.upgraded/var/lib/gdm3/.local/share
drwxr-xr-x   2 112     200          4096 Feb  9  2019 squashfs-root-filesystem.upgraded/var/lib/gdm3/.local/share/applications
drwxr-xr-x   3 112     200          4096 Jan  6 17:25 squashfs-root-filesystem.upgraded/var/lib/gdm3/.config
drwxr-xr-x   2 112     200          4096 Feb  9  2019 squashfs-root-filesystem.upgraded/var/lib/gdm3/.config/dconf

with 112:200 matching Debian-gdm:Debian-gdm in the Tails environment.

But the 4.2.1 which I guess ends up being put on top of the probably-untouched “main” filesystem.squashfs has:

drwxr-xr-x   4 root     root         4096 Jan  9 17:09 squashfs-root-4.2.1/var/lib/gdm3/
drwxr-xr-x   3 root     root         4096 Jan  9 17:09 squashfs-root-4.2.1/var/lib/gdm3/.local
drwxr-xr-x   2 root     root         4096 Jan  9 17:09 squashfs-root-4.2.1/var/lib/gdm3/.local/share
drwxr-xr-x   2 root     root         4096 Jan  9 17:09 squashfs-root-4.2.1/var/lib/gdm3/.config

which I guess shadows the permissions in the original filesystem, which in turn explains why GDM cannot open the X log file under the xorg subdirectory (that it has to create) under /var/lib/gdm3/.local/share. (As seen by setting a root password, and inspecting the reasons for the failed gdm.service unit.)

→ Fix the ownership issues, save the day?

#9 Updated by CyrilBrulebois 2020-01-11 08:54:05

  • Target version set to Tails_4.2.2

(If only Redmine wouldn’t change metadata when one is just updating description + unmarking as private…)

The proposed change looks good to me: compared to what I tried, it should indeed take care of what you mentioned as needed → keeping -all-root for one of the two calls (while generating the IUK itself, and not while generating the SquashFS diff).

Feel free to merge, and to redo the IUKv2 dance.

#10 Updated by intrigeri 2020-01-11 09:02:23

  • Status changed from Needs Validation to In Progress

Applied in changeset commit:tails|90d6dc1edb60de251a1e6f694b6925aaf01520da.

#11 Updated by intrigeri 2020-01-11 09:03:11

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset commit:tails|28c7de61354aa3ea33da0e95c10dc52c14851dc2.

#12 Updated by intrigeri 2020-01-11 09:03:57

Thank you for reviewing.

> Feel free to merge

Done.

> and to redo the IUKv2 dance.

Let’s coordinate this on XMPP.