Bug #15695

Avoid breaking automatic upgrades to Tails 3.9

Added by intrigeri 2018-06-30 11:33:08 . Updated 2018-09-05 16:14:07 .

Status:
Resolved
Priority:
Elevated
Assignee:
Category:
Build system
Target version:
Start date:
2018-06-30
Due date:
% Done:

100%

Feature Branch:
bugfix/15695-avoid-breaking-automatic-upgrades-to-tails-3-9
Type of work:
Research
Blueprint:

Starter:
Affected tool:
Upgrader
Deliverable for:

Description

As noticed thanks to Bug #15419, our devel branch changes the GID of the debian-tor user:

<code class="diff">
-vboxsf:x:112:
-monkeysphere:x:113:
-debian-tor:x:114:tor-launcher
-lpadmin:x:115:
+monkeysphere:x:112:
+debian-tor:x:113:tor-launcher
+lpadmin:x:114:
+vboxsf:x:115:
</code>

Files


Subtasks


Related issues

Related to Tails - Bug #15407: Prevent system user uid:s and gid:s from changing between releases Resolved 2018-06-28
Related to Tails - Bug #15424: Use fixed UID and GID for debian-tor Rejected 2018-03-16
Related to Tails - Bug #13426: Tor does not start on Tails 3.0.1 automatically upgraded from 3.0 Resolved 2017-07-05
Blocks Tails - Feature #15334: Core work 2018Q3: Foundations Team Resolved 2018-02-20
Blocks Tails - Bug #15419: Detect earlier in the dev process if we're breaking automatic upgrades Resolved 2018-06-28

History

#1 Updated by intrigeri 2018-06-30 11:34:29

As explained in commit:a774ba2aba7012f3f5bc630b0f3dab8ac696dbd7, “Since the virtualbox 5.2.8-dfsg-7 upload, the virtualbox-guest-utils package depends on virtualbox-guest-dkms”. I think this new dependency explains why virtualbox-guest-utils, that creates the vboxsf group, is installed later than before.

#2 Updated by intrigeri 2018-06-30 11:34:50

  • related to Bug #15419: Detect earlier in the dev process if we're breaking automatic upgrades added

#3 Updated by intrigeri 2018-06-30 11:36:58

#4 Updated by intrigeri 2018-06-30 12:46:24

  • Status changed from Confirmed to In Progress
  • % Done changed from 0 to 10

Brainstorming our options:

  1. ship a patched virtualbox-guest-utils package that removes the added dependency and hope that’s enough to fix this instance of the more general problem (while we’re at it, we can as well backport the latest version from sid, that adds compatibility with Linux 4.17; too bad, virtualbox is now in stretch-backports which could have given us the opportunity to stop shipping it via our custom APT repo); that’s relatively cheap and easy, and surely an acceptable way to postpone the problem, hoping we have more time to address its root cause next time another instance of this problem pops up.
  2. go back to the hacks from Bug #15424: change UIDs/GIDs via a chroot hook, killing processes as needed; service startup is disabled in the chroot, so in theory only processes started via maintainer scripts and left behind should have to be killed; in practice, that probably boils down to gpg-agent; so while my concern about killing processes outside of the chroot remains, limiting the kill spree to gpg-agent processes might be safe enough; in our current Vagrant VM, I see no user that has obvious reasons to have gpg-agent running, and indeed I see no such process running during a build. That’s also relatively cheap and easy, and most of the work has been done already. It’l be wasted work at some point if overlayfs is not affected by the bug, but well, at least we’re covered until Feature #8415 is done.
  3. find some way to stabilize all UIDs/GIDs (Bug #15407): looks like more work than the two first options, and that work will be wasted if overlayfs is not affected by the bug, which makes this option very unappealing to me.
  4. complete Feature #8415 (overlayfs) in time for Tails 3.9… assuming overlayfs is not affected by the bug that makes us worry about changing UIDs/GIDs (to be verified on Bug #15689): there’s not that much work left to do and we have to do it soon anyway, so clearly it would be the best place to invest our time in; but it’s not clear whether it’s realistic to do that in the next 1.5 months

I’m currently leaning towards the 2nd option (stabilize just the UIDs/GIDs we want), with the 1st one as a fallback. But depending on how our first shots at Feature #15091 and Feature #15023 go, and on how we distribute the work on these two big tasks, who knows, Feature #8415 might be doable. I’ll come back here in a few weeks and depending on our progress on these other fronts, I’ll pick the best approach among the options that will be realistic at this point.

#5 Updated by intrigeri 2018-06-30 12:48:24

  • related to Bug #15407: Prevent system user uid:s and gid:s from changing between releases added

#6 Updated by intrigeri 2018-06-30 12:48:38

  • related to Bug #15424: Use fixed UID and GID for debian-tor added

#7 Updated by intrigeri 2018-06-30 12:51:27

  • related to deleted (Bug #15419: Detect earlier in the dev process if we're breaking automatic upgrades)

#8 Updated by intrigeri 2018-06-30 12:51:33

  • blocks Bug #15419: Detect earlier in the dev process if we're breaking automatic upgrades added

#9 Updated by intrigeri 2018-06-30 12:53:38

  • related to Bug #13426: Tor does not start on Tails 3.0.1 automatically upgraded from 3.0 added

#10 Updated by intrigeri 2018-07-27 08:55:48

intrigeri wrote:
> I’m currently leaning towards the 2nd option (stabilize just the UIDs/GIDs we want), with the 1st one as a fallback. But depending on how our first shots at Feature #15091 and Feature #15023 go, and on how we distribute the work on these two big tasks, who knows, Feature #8415 might be doable.

If I magically have time to do Feature #8415 in time for the freeze, great, but let’s not count on it => I think we should go for the 2nd option, aiming to have something merged by August 10. I’ll try to find a FT member who can handle this.

#11 Updated by intrigeri 2018-07-27 09:09:13

lamby will take a look and tell me by Aug 11 whether he wants it (tentative budget: 2h).

#12 Updated by intrigeri 2018-07-27 09:12:41

  • Assignee changed from intrigeri to lamby

#13 Updated by intrigeri 2018-08-12 09:55:13

intrigeri wrote:
> lamby will take a look and tell me by Aug 11 whether he wants it (tentative budget: 2h).

Ping? The freeze is on the 15th so if you don’t want it, I’ll work on it tomorrow or Tuesday; and if you want it, great :)

#14 Updated by lamby 2018-08-12 12:09:38

  • Assignee changed from lamby to intrigeri

Thanks for the ping. I’ve spent about 1h looking at this issue and trying to understand it.

In my attempt to properly understand this issue, I dived into the code, resulting in the following entirely-untested commits: https://github.com/lamby/tails/commits/15695-avoid-breaking-automatic-upgrades-to-tails-3-9. These might be worth merging anyway as it’s essentially “better commenting”

However, I got somewhat stuck on:

> change UIDs/GIDs via a chroot hook, killing processes as needed; service startup is disabled in the chroot, so in theory only processes started via maintainer scripts and left behind should have to be killed; in practice, that probably boils down to gpg-agent

Three questions arise for me here: why would anything be running at all? Why would that be gpg-agent and not, well, something running as debian-tor? Why do we not do (or need?) the kill-process dance for the tails-persistent-setup chroot hack right now?

(Some quick personal notes for this bug: See Bug #13426 for the background on why this breaks upgrades, the root cause being probably in aufs which will eventually be replaced with overlayfs in Feature #8415.)

#15 Updated by intrigeri 2018-08-13 07:21:43

  • Assignee changed from intrigeri to lamby

> In my attempt to properly understand this issue, I dived into the code, resulting in the following entirely-untested commits: https://github.com/lamby/tails/commits/15695-avoid-breaking-automatic-upgrades-to-tails-3-9. These might be worth merging anyway as it’s essentially “better commenting”

Good idea! Two potential issues that prevent me from merging it though:

  • In Change_gid I think you meant -exec chgrp instead of -exec chown.
  • Change_gid 112 150 feels strange because we had TPS_GROUP_STEALER=$(getent group 122 | awk -F ':' '{print $1}') (note 112 vs. 122)

By the way, basing your work on top of my branch for Bug #15419 will help you validate it :) I’ve updated bugfix/15419-detect-uid-and-gid-changes so our CI builds it and I’ll share the FTBFS message with you as soon as a build has failed, which will demonstrate better what the practical problem we need to solve now is.

> However, I got somewhat stuck on:

>> change UIDs/GIDs via a chroot hook, killing processes as needed; service startup is disabled in the chroot, so in theory only processes started via maintainer scripts and left behind should have to be killed; in practice, that probably boils down to gpg-agent

> Three questions arise for me here: why would anything be running at all?
> Why would that be gpg-agent and not, well, something running as debian-tor?

On Bug #15424 we noticed that the monkeysphere user had a running process. I’ve assumed this was gpg-agent because on our ISO build CI jobs we have to kill those on exit in order to be able to tear down a mountpoint. IIRC that’s because the monkeysphere postinst performs GnuPG operations, that trigger gpg-agent startup (in the background, that’s how GnuPG 2 works). The tor package does no such thing which is why there’s no running process left behind running as the debian-tor user. Now, my hunch might be wrong (I don’t recall whether I reality-checked it) but we’ll know as soon as the workaround this ticket is now about is implemented :)

> Why do we not do (or need?) the kill-process dance for the tails-persistent-setup chroot hack right now?

Do you mean: why don’t we do it yet (in current devel branch)? Or why shouldn’t we do it for 3.9 as part of this ticket? I guess because we’ve never needed it so far :)

#16 Updated by lamby 2018-08-13 08:52:35

  • Assignee changed from lamby to intrigeri

> In Change_gid I think you meant -exec chgrp instead of -exec chown.

Fixed.

> Change_gid 112 150 feels strange because we had TPS_GROUP_STEALER=$(getent group 122 | awk -F ‘:’ ‘{print $1}’) (note 112 vs. 122)

Fixed. That’ll teach me to work in the sun…

> By the way, basing your work on top of my branch for Bug #15419 will help you validate it

Rebased.

#17 Updated by intrigeri 2018-08-13 08:58:55

  • Assignee changed from intrigeri to lamby

I’ll review the (side-)changes you’ve done already (and will likely merge them!) but let’s keep this assigned to you for tracking the main problem this ticket is about.

#18 Updated by intrigeri 2018-08-13 10:51:36

I’ve merged your changes into bugfix/15419-detect-uid-and-gid-changes, now building.

And here’s the current problem we’re trying to “solve” here:

Checking UIDs and GIDs stability
/usr/share/tails/build/passwd /etc/passwd differ: char 1310, line 25
/etc/passwd and/or /etc/group differs from expected:
--- /usr/share/tails/build/passwd   2018-06-09 15:22:28.000000000 +0000
+++ /etc/passwd 2018-08-13 09:48:10.437902256 +0000
@@ -22,8 +22,8 @@
 _apt:x:104:65534::/nonexistent:/bin/false
 messagebus:x:103:105::/var/run/dbus:/bin/false
 memlockd:x:105:110:memlockd system account,,,:/usr/lib/memlockd:/bin/false
-monkeysphere:x:106:113:monkeysphere authentication user,,,:/var/lib/monkeysphere:/bin/bash
-debian-tor:x:107:114::/var/lib/tor:/bin/false
+monkeysphere:x:106:112:monkeysphere authentication user,,,:/var/lib/monkeysphere:/bin/bash
+debian-tor:x:107:113::/var/lib/tor:/bin/false
 speech-dispatcher:x:108:29:Speech Dispatcher,,,:/var/run/speech-dispatcher:/bin/false
 colord:x:109:117:colord colour management daemon,,,:/var/lib/colord:/bin/false
 saned:x:110:118::/var/lib/saned:/bin/false

Content of '/etc/passwd':
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
irc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
systemd-timesync:x:100:102:systemd Time Synchronization,,,:/run/systemd:/bin/false
systemd-network:x:101:103:systemd Network Management,,,:/run/systemd/netif:/bin/false
systemd-resolve:x:102:104:systemd Resolver,,,:/run/systemd/resolve:/bin/false
_apt:x:104:65534::/nonexistent:/bin/false
messagebus:x:103:105::/var/run/dbus:/bin/false
memlockd:x:105:110:memlockd system account,,,:/usr/lib/memlockd:/bin/false
monkeysphere:x:106:112:monkeysphere authentication user,,,:/var/lib/monkeysphere:/bin/bash
debian-tor:x:107:113::/var/lib/tor:/bin/false
speech-dispatcher:x:108:29:Speech Dispatcher,,,:/var/run/speech-dispatcher:/bin/false
colord:x:109:117:colord colour management daemon,,,:/var/lib/colord:/bin/false
saned:x:110:118::/var/lib/saned:/bin/false
pulse:x:111:119:PulseAudio daemon,,,:/var/run/pulse:/bin/false
hplip:x:112:7:HPLIP system user,,,:/var/run/hplip:/bin/false
Debian-gdm:x:113:121:Gnome Display Manager:/var/lib/gdm3:/bin/false
tails-persistence-setup:x:115:122::/home/tails-persistence-setup:/bin/false
clearnet:x:114:123::/home/clearnet:/bin/false
htp:x:116:124::/home/htp:/bin/false
tails-iuk-get-target-file:x:117:125::/home/tails-iuk-get-target-file:/bin/false
tails-upgrade-frontend:x:118:126::/home/tails-upgrade-frontend:/bin/false
tor-launcher:x:119:127::/home/tor-launcher:/bin/false
tails-install-iuk:x:120:128::/home/tails-install-iuk:/bin/false

--- /usr/share/tails/build/group    2018-06-09 15:22:28.000000000 +0000
+++ /etc/group  2018-08-13 09:48:10.505901231 +0000
@@ -48,10 +48,10 @@
 ssh:x:109:
 memlockd:x:110:
 ssl-cert:x:111:
-vboxsf:x:112:
-monkeysphere:x:113:
-debian-tor:x:114:tor-launcher
-lpadmin:x:115:
+monkeysphere:x:112:
+debian-tor:x:113:tor-launcher
+lpadmin:x:114:
+vboxsf:x:115:
 scanner:x:116:saned
 colord:x:117:
 saned:x:118:

Content of '/etc/group':
root:x:0:
daemon:x:1:
bin:x:2:
sys:x:3:
adm:x:4:
tty:x:5:
disk:x:6:
lp:x:7:
mail:x:8:
news:x:9:
uucp:x:10:
man:x:12:
proxy:x:13:
kmem:x:15:
dialout:x:20:
fax:x:21:
voice:x:22:
cdrom:x:24:
floppy:x:25:
tape:x:26:
sudo:x:27:
audio:x:29:pulse
dip:x:30:
www-data:x:33:
backup:x:34:
operator:x:37:
list:x:38:
irc:x:39:
src:x:40:
gnats:x:41:
shadow:x:42:
utmp:x:43:
video:x:44:
sasl:x:45:
plugdev:x:46:
staff:x:50:
games:x:60:
users:x:100:
nogroup:x:65534:
systemd-journal:x:101:
systemd-timesync:x:102:
systemd-network:x:103:
systemd-resolve:x:104:
input:x:106:
crontab:x:107:
netdev:x:108:
messagebus:x:105:
ssh:x:109:
memlockd:x:110:
ssl-cert:x:111:
monkeysphere:x:112:
debian-tor:x:113:tor-launcher
lpadmin:x:114:
vboxsf:x:115:
scanner:x:116:saned
colord:x:117:
saned:x:118:
pulse:x:119:
pulse-access:x:120:
Debian-gdm:x:121:
tails-persistence-setup:x:122:
clearnet:x:123:
htp:x:124:
tails-iuk-get-target-file:x:125:tails-install-iuk
tails-upgrade-frontend:x:126:
tor-launcher:x:127:
tails-install-iuk:x:128:

If these changes are innocuous, update these files in config/chroot_local-includes/usr/share/tails/build/.
See <del><a class='issue tracker-1 status-3 priority-4 priority-default closed child' href='/code/issues/15407' title='Prevent system user uid:s and gid:s from changing between releases'>Bug #15407</a></del> and <del><a class='issue tracker-1 status-3 priority-6 priority-default closed child' href='/code/issues/13426' title='Tor does not start on Tails 3.0.1 automatically upgraded from 3.0'>Bug #13426</a></del> for more context.
E: config/chroot_local-hooks/99-zzz_check_uids_and_gids failed (exit non-zero). You should check for errors.

#19 Updated by lamby 2018-08-13 14:08:50

  • Assignee changed from lamby to intrigeri

> let’s keep this assigned to you for tracking the main problem this ticket is about.

Thanks, good idea — as mentioned, I only really wrote and sent over these patches so I ensured I was understanding the issue, kinda like how making your own notes in a lecture is not the same as just reading someone else’s! :) Do appreciate they are somewhat distracting however.

> Here’s the current problem we’re trying to “solve” here:

Thanks for sending that over.

So, my reading of that diff of that we need to move the uid of vboxdrv from 112 → 151 and its gid from 112 → 151. That your quick assessment too? :)

#20 Updated by intrigeri 2018-08-13 15:13:59

  • Assignee changed from intrigeri to lamby

> So, my reading of that diff of that we need to move the uid of vboxdrv from 112 → 151 and its gid from 112 → 151. That your quick assessment too? :)

I don’t see vboxdrv involved anywhere in the error message I’ve pasted.

My quick assessment is that UIDs are all right (unchanged in the diff), only GIDs are problematic. Wrt. GIDs, 4 groups are involved and they’re fighting for a stable set of 4 GIDs (112-115). What we need to do is to restore the state from Tails 3.8 (captured in /usr/share/tails/build/group) i.e.:

  • vboxsf: GID 115 → 112
  • monkeysphere: GID 112 → 113
  • debian-tor: GID 113 → 114
  • lpadmin: GID 114 → 115

The “fun” part is that I think we can’t do this in place with atomic groupmod operations. So likely we need to temporarily give one of these groups another, free GID that’s not in the 112-115 range, and then change the GID for the 3 other groups in the right order, and finally give its correct GID to the group that was temporarily moved out of the way.

Needless to say, all this is ugly and I don’t expect it to get any better in the future… until we fix the root cause of the problem or devise a nicer solution to set all these IDs in stone.

#21 Updated by lamby 2018-08-13 15:46:12

> I don’t see vboxdrv involved anywhere in the error message I’ve pasted.

Ah, copy-paste fail. vboxsf.. :)

> […]

How about the attached? :)

#22 Updated by intrigeri 2018-08-13 18:09:28

  • Assignee changed from intrigeri to lamby

> How about the attached? :)

I think it will work but:

  • Can you confirm you tested it and the new check for changed UIDs/GIDs does not abort the build?
  • The first part of the patch hard-codes the current (as in, on our devel branch today) GID allocated to these 4 groups. As the mere existence of this very ticket shows, it’s a risky assumption. So next time these GIDs start dancing around, IMO this patch will make it harder to fix the problem than it could, because if we apply it as-is we may be changing the GID of unrelated groups in the future. So I’d rather see us do things like Change_gid $(gid_of monkeysphere) 1120. Makes sense?

#23 Updated by lamby 2018-08-14 07:21:27

> I’d rather see us do thingws like Change_gid $(gid_of monkeysphere) 1120

Oh, good idea. This is not only a more stable solution, it is also much cleaner in the code (self-documenting!). Please see the attached 2 patches which I have also pushed to https://github.com/lamby/tails/commits/15695-avoid-breaking-automatic-upgrades-to-tails-3-9.

> Can you confirm you tested it and the new check for changed UIDs/GIDs does not abort the build?

Unfortunately I have not tested it (except in a local chroot actually, but that doesn’t count). Unfortunately, I am semi-VAC today and tomorrow (first time off since 12th July…) so I won’t be able to commit to testing this before the 15th unless it happens to rain this afternoon (!) which would mean that I would not need to be entertaining my dear mother :)

#24 Updated by intrigeri 2018-08-14 08:11:54

>> I’d rather see us do thingws like Change_gid $(gid_of monkeysphere) 1120

> Oh, good idea. This is not only a more stable solution, it is also much cleaner in the code (self-documenting!). Please see the attached 2 patches which I have also pushed to https://github.com/lamby/tails/commits/15695-avoid-breaking-automatic-upgrades-to-tails-3-9.

I like it! I’ll take it over from here. I had a quick look and the only major problem I’ve noticed is that you’re giving these groups different GIDs than in Tails 3.8, which is precisely what we want to avoid. But the machinery is in place on your branch so I should be able to easily use it to fix the problem we’re tackling here :)

> […] unless it happens to rain this afternoon (!) which would mean that I would not need to be entertaining my dear mother :)

Indeed, rain is entertaining enough in itself :)

#25 Updated by intrigeri 2018-08-14 14:56:54

  • Assignee changed from intrigeri to segfault
  • % Done changed from 10 to 50
  • Estimated time set to 0 h
  • QA Check set to Ready for QA
  • Feature Branch set to bugfix/15695-avoid-breaking-automatic-upgrades-to-tails-3-9

This branch includes the check brought for Bug #15419 so if it builds (and it did build at least until I’ve merged current devel into it), that means it does its job :) And indeed, until my last fixes that check did abort the build, which is confidence-inspiring.

#26 Updated by intrigeri 2018-08-14 15:48:51

  • Assignee changed from segfault to CyrilBrulebois

#27 Updated by lamby 2018-08-14 15:56:45

Damn, I just came back to this issue after I’ve been (somewhat failing) to build this on my local machine for the past hour due to reasons unrelated to the changes. I should build Tails more often, basically.

> you’re giving these groups different GIDs than in Tails 3.8

What’s the problem there?

#28 Updated by intrigeri 2018-08-14 16:05:04

>> you’re giving these groups different GIDs than in Tails 3.8

> What’s the problem there?

This would trigger another instance of the problem we had with the upgrade to 3.0.1 (Bug #13426) and again with the upgrade to 3.6 (which made us create Bug #15407 and Bug #15419, which in turn made me notice we would have the same problem again for 3.9, which finally made me file this very ticket): due to what looks like an aufs bug, on devices upgraded via our incremental (automatic) update mechanism, the tor daemon won’t be allowed to write to its /var/lib/tor because the GID of the group owning that directory changed between the different levels of the SquashFS stack we build with aufs.

#29 Updated by intrigeri 2018-08-14 16:54:52

  • Status changed from In Progress to Fix committed
  • Assignee deleted (CyrilBrulebois)
  • % Done changed from 50 to 100
  • QA Check changed from Ready for QA to Pass

As per Bug #15407#note-31.

#30 Updated by lamby 2018-08-16 09:14:31

  • Status changed from Fix committed to In Progress

Applied in changeset commit:1d3d7854b310ae77761fbef22af59458450801ba.

#31 Updated by lamby 2018-08-16 09:14:31

  • Status changed from In Progress to Resolved

Applied in changeset commit:3682fde35080381eb97d649966b584fc9c777998.

#32 Updated by intrigeri 2018-08-16 09:55:58

  • Status changed from Resolved to Fix committed

#33 Updated by intrigeri 2018-09-05 16:14:07

  • Status changed from Fix committed to Resolved