Bug #13426

Tor does not start on Tails 3.0.1 automatically upgraded from 3.0

Added by conorsch 2017-07-05 23:13:35 . Updated 2018-06-28 20:32:12 .

Status:
Resolved
Priority:
High
Assignee:
Category:
Tor configuration
Target version:
Start date:
2017-07-05
Due date:
% Done:

100%

Feature Branch:
Type of work:
Research
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

Upgrading Tails to 3.0.1 via the Tails upgrade frontend causes the “tor” service to break on subsequent boots. Likely this is related to the tor upgrade to version 0.3.0.9-1~d90.stretch+1, described in Bug #13253.

I’ve confirmed breakage on multiple machines, both hardware and libvirt VMs, when upgrading from Tails 3.0 -> 3.0.1. I have not confirmed that the breakage exists in a fresh install of 3.0.1.

Due to the Tails update flow happening over tor, I expect the resolution here to involve manual changes. Happy to test any solutions suggested, and will work on resolving the problem locally in the meantime.


Subtasks


Related issues

Related to Tails - Bug #15407: Prevent system user uid:s and gid:s from changing between releases Resolved 2018-06-28
Related to Tails - Bug #15689: Test if overlayfs is affected by the DAC bug wrt. incremental upgrades changing UID/GID Resolved 2018-06-28
Related to Tails - Bug #15690: Stop installing all "Priority: standard" packages only to remove some of them later Resolved 2018-06-29
Related to Tails - Bug #15695: Avoid breaking automatic upgrades to Tails 3.9 Resolved 2018-06-30

History

#1 Updated by conorsch 2017-07-05 23:22:58

Relevant log lines:

$ sudo journalctl -af
[sudo] password for amnesia: 
-- Logs begin at Wed 2017-07-05 22:52:18 UTC. --
Jul 05 23:04:33 amnesia onion-grater[5333]: /usr/bin/onioncircuits (pid: 18913, user: amnesia, port: 55098, filter: onioncircuits) connected: loaded filter: onioncircuits
Jul 05 23:04:33 amnesia onion-grater[5333]: /usr/bin/onioncircuits (pid: 18913, user: amnesia, port: 55098, filter: onioncircuits) disconnected: client quit
Jul 05 23:04:33 amnesia onion-grater[5333]: ----------------------------------------
Jul 05 23:04:33 amnesia onion-grater[5333]: Exception happened during processing of request from ('127.0.0.1', 55098)
Jul 05 23:04:33 amnesia onion-grater[5333]: Traceback (most recent call last):
Jul 05 23:04:33 amnesia onion-grater[5333]:   File "/usr/lib/python3.5/socketserver.py", line 625, in process_request_thread
Jul 05 23:04:33 amnesia onion-grater[5333]:     self.finish_request(request, client_address)
Jul 05 23:04:33 amnesia onion-grater[5333]:   File "/usr/lib/python3.5/socketserver.py", line 354, in finish_request
Jul 05 23:04:33 amnesia onion-grater[5333]:     self.RequestHandlerClass(request, client_address, self)
Jul 05 23:04:33 amnesia onion-grater[5333]:   File "/usr/lib/python3.5/socketserver.py", line 681, in __init__
Jul 05 23:04:33 amnesia onion-grater[5333]:     self.handle()
Jul 05 23:04:33 amnesia onion-grater[5333]:   File "/usr/local/lib/onion-grater", line 630, in handle
Jul 05 23:04:33 amnesia onion-grater[5333]:     self.controller = self.connect_to_real_control_port()
Jul 05 23:04:33 amnesia onion-grater[5333]:   File "/usr/local/lib/onion-grater", line 566, in connect_to_real_control_port
Jul 05 23:04:33 amnesia onion-grater[5333]:     with open(global_args.control_cookie_path, "rb") as f:
Jul 05 23:04:33 amnesia onion-grater[5333]: FileNotFoundError: [Errno 2] No such file or directory: '/run/tor/control.authcookie'

Tailing journalctl while bouncing the failing tor service:

Jul 05 23:21:25 amnesia systemd[1]: tor@default.service: Failed with result 'exit-code'.
Jul 05 23:21:25 amnesia systemd[1]: tor@default.service: Service hold-off time over, scheduling restart.
Jul 05 23:21:25 amnesia systemd[1]: systemd-timesyncd.service: Cannot add dependency job, ignoring: Unit systemd-timesyncd.service is masked.
Jul 05 23:21:25 amnesia systemd[1]: Stopped Anonymizing overlay network for TCP.
Jul 05 23:21:25 amnesia systemd[1]: Starting Anonymizing overlay network for TCP...
Jul 05 23:21:25 amnesia tor[31125]: Jul 05 23:21:25.932 [notice] Tor 0.3.0.9 (git-100816d92ab5664d) running on Linux with Libevent 2.0.21-stable, OpenSSL 1.1.0f and Zlib 1.2.8.
Jul 05 23:21:25 amnesia tor[31125]: Jul 05 23:21:25.938 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://www.torproject.org/download/download#warning
Jul 05 23:21:25 amnesia tor[31125]: Jul 05 23:21:25.943 [notice] Read configuration file "/usr/share/tor/tor-service-defaults-torrc".
Jul 05 23:21:25 amnesia tor[31125]: Jul 05 23:21:25.943 [notice] Read configuration file "/etc/tor/torrc".
Jul 05 23:21:25 amnesia tor[31125]: Jul 05 23:21:25.945 [warn] The ControlListenAddress option is deprecated, and will most likely be removed in a future version of Tor. Use ControlPort instead. (If you think this is a mistake, please let us know!)
Jul 05 23:21:25 amnesia tor[31125]: Jul 05 23:21:25.946 [warn] The TransListenAddress option is deprecated, and will most likely be removed in a future version of Tor. Use TransPort instead. (If you think this is a mistake, please let us know!)
Jul 05 23:21:25 amnesia tor[31125]: Jul 05 23:21:25.946 [warn] The WarnUnsafeSocks option is deprecated, and will most likely be removed in a future version of Tor. Changing this option makes it easier for you to accidentally lose your anonymity by leaking DNS information (If you think this is a mistake, please let us know!)
Jul 05 23:21:25 amnesia tor[31125]: Jul 05 23:21:25.951 [warn] Directory /var/lib/tor cannot be read: Permission denied
Jul 05 23:21:25 amnesia tor[31125]: Jul 05 23:21:25.951 [warn] Failed to parse/validate config: Couldn't access/create private data directory "/var/lib/tor"
Jul 05 23:21:25 amnesia tor[31125]: Jul 05 23:21:25.951 [err] Reading config failed--see warnings above.
Jul 05 23:21:25 amnesia systemd[1]: tor@default.service: Control process exited, code=exited status=1
Jul 05 23:21:25 amnesia systemd[1]: Failed to start Anonymizing overlay network for TCP.
Jul 05 23:21:25 amnesia systemd[1]: tor@default.service: Unit entered failed state.
Jul 05 23:21:25 amnesia systemd[1]: tor@default.service: Failed with result 'exit-code'.
$ sudo systemctl status tor@default.service
[sudo] password for amnesia: 
● tor@default.service - Anonymizing overlay network for TCP
   Loaded: loaded (/lib/systemd/system/tor@default.service; static; vendor preset: enabled)
  Drop-In: /lib/systemd/system/tor@default.service.d
           └─fix-obfs4proxy.conf, writable-etc-tor.conf
   Active: failed (Result: exit-code) since Wed 2017-07-05 22:45:12 UTC; 15min ago
  Process: 10399 ExecStartPre=/usr/bin/tor --defaults-torrc /usr/share/tor/tor-service-defaults-torrc -f /etc/tor/torrc --RunAsDaemon 0 --verify-config (code=exited, status=1/FAILURE)
  Process: 10397 ExecStartPre=/usr/bin/install -Z -m 02755 -o debian-tor -g debian-tor -d /var/run/tor (code=exited, status=0/SUCCESS)

Jul 05 22:45:12 amnesia systemd[1]: tor@default.service: Control process exited, code=exited status=1
Jul 05 22:45:12 amnesia systemd[1]: Failed to start Anonymizing overlay network for TCP.
Jul 05 22:45:12 amnesia systemd[1]: tor@default.service: Unit entered failed state.
Jul 05 22:45:12 amnesia systemd[1]: tor@default.service: Failed with result 'exit-code'.
Jul 05 22:45:12 amnesia systemd[1]: tor@default.service: Service hold-off time over, scheduling restart.
Jul 05 22:45:12 amnesia systemd[1]: Stopped Anonymizing overlay network for TCP.
Jul 05 22:45:12 amnesia systemd[1]: tor@default.service: Start request repeated too quickly.
Jul 05 22:45:12 amnesia systemd[1]: Failed to start Anonymizing overlay network for TCP.
Jul 05 22:45:12 amnesia systemd[1]: tor@default.service: Unit entered failed state.
Jul 05 22:45:12 amnesia systemd[1]: tor@default.service: Failed with result 'exit-code'.

#4 Updated by secretservice 2017-07-06 01:41:36

[Snipped, see https://tails.boum.org/contribute/working_together/code_of_conduct/ again. — intrigeri]

#5 Updated by goupille 2017-07-06 10:44:17

as it seems to happen only with the automatic upgrade, affected users should try to upgrade manually :

https://tails.boum.org/upgrade

#6 Updated by goupille 2017-07-06 10:45:31

  • Assignee set to intrigeri
  • Type of work changed from Sysadmin to Research
  • Affected tool set to Tor Launcher

#7 Updated by goupille 2017-07-06 10:45:49

  • Status changed from New to Confirmed

#8 Updated by intrigeri 2017-07-06 13:20:33

  • Subject changed from Tails 3.0.1 upgrade breaks tor to Tor does not start on Tails 3.0.1 automatically upgraded from 3.0
  • Target version set to Tails_3.1

#9 Updated by intrigeri 2017-07-06 13:21:21

  • Affected tool deleted (Tor Launcher)

(This has nothing to do about Tor Launcher, has it?)

#10 Updated by intrigeri 2017-07-06 17:17:44

@help desk: I was hoping I could document a simple workaround, but we’re not there yet.

On 3.0 started from USB:

drwx--S--- 2 108 115 3 Jun 10 14:39 /lib/live/mount/rootfs/filesystem.squashfs/var/lib/tor/

On 3.0.1 started from DVD:

drwx--S--- 2 109 116 3 Jul  4 13:59 /lib/live/mount/rootfs/filesystem.squashfs/var/lib/tor/

So the UID/GID for the debian-tor user/group have changed (most likely due to Bug #13427).

And inside 3.0.1.squashfs that’s in the IUK, as expected:

drwx--S--- 2 109 116 3 Jul  4 13:59 tmp/var/lib/tor/

On a USB drive automatically upgraded from 3.0 to 3.0.1:

drwx--S--- 2 109 116 3 Jul  4 13:59 /var/lib/tor/
drwx--S--- 2 109 116 3 Jul  4 13:59 /lib/live/mount/rootfs/3.0.1.squashfs/var/lib/tor/
drwx--S--- 2 108 115 3 Jun 10 14:39 /lib/live/mount/rootfs/filesystem.squashfs/var/lib/tor/

# grep debian-tor /etc/passwd
debian-tor:x:109:116::/var/lib/tor:/bin/false

… so far, so good.

But:

# sudo -u debian-tor ls -l /var/lib/tor
ls: cannot open directory '/var/lib/tor': Permission denied

# sudo -u '#109' ls -l /var/lib/tor
ls: cannot open directory '/var/lib/tor': Permission denied

# sudo -u '#108' ls -l /var/lib/tor
ls: cannot open directory '/var/lib/tor': Permission denied

… while interestingly:

# sudo -u debian-tor ls -l /lib/live/mount/rootfs/3.0.1.squashfs/var/lib/tor/
total 0

… and as expected:

# sudo -u debian-tor ls -l /lib/live/mount/rootfs/filesystem.squashfs/var/lib/tor/
ls: cannot open directory '/lib/live/mount/rootfs/filesystem.squashfs/var/lib/tor/': Permission denied

FWIW, manually running chown debian-tor:debian-tor /var/lib/tor doesn’t change that.

Initially I suspected the changes we did in the SquashFS delta generation for Feature #11974, but 1. these changes should affect only hardlinked files; 2. they went in 3.0~rc1 and nobody complained about this while we provided an incremental upgrade path from 3.0~rc1 to 3.0. OTOH perhaps that IUK simply did not do the kind of things that trigger this bug.

So it very much looks like we’ve hit a bug in aufs here, where UID/GID overriden by a top-most branch are taken into account when looking at files’ metadata, but not when actually accessing these files. Wow!

#11 Updated by conorsch 2017-07-06 17:39:01

Thanks for sharing the analysis, intrigeri! Confirmed that upgrading the tor package under Tails 3.0 does not cause similar breakage. I had to manually bounce the tor service after the apt upgrade, but thereafter tor continued to work well. So the breakage does seem to have come in through the Tails 3.0.1 upgrade specifically, as your comment seems to indicate.

#12 Updated by intrigeri 2017-07-06 17:41:22

FWIW sudo -u debian-tor strace ls /var/lib/tor says: open("/var/lib/tor", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 EACCES (Permission denied).

#13 Updated by intrigeri 2017-07-06 17:44:08

And sudo -u debian-tor ls -l /lib/live/mount/overlay/var/lib/tor/ works just fine. So the debian-tor user has access to all relevant components of our aufs stack (read-write tmpfs > read-only 3.0.1.squashfs > read-only filesystem.squashfs), just not to the lowest component (filesystem.squashfs), but it doesn’t have access to the aufs “filesystem” resulting from that stack. “Interesting.”

#14 Updated by intrigeri 2017-07-06 17:47:42

At this point I’m lost, and it seems the next steps for investigating this problem are low-level, way outside of my comfort/skill-zone :(

So I’m open to all suggestions wrt. what I should try next. If anyone here knows people who are knowledgeable in low-level stuff, kernel debugging and this kind of stuff, please ask them to take a look. In 3.0.1 we ship linux-image-4.9.0-3-amd64 (4.9.30-2+deb9u2) and aufs-dkms (4.9+20161219-1), both from Stretch (+ security).

#15 Updated by intrigeri 2017-07-06 18:02:52

Disabled the incremental upgrade path from 3.0 to 3.0.1, so that nobody who’s not been affected yet will be (this time / until we fix the root cause or do Feature #8415).

#16 Updated by intrigeri 2017-07-06 18:21:44

  • Status changed from Confirmed to In Progress

Applied in changeset commit:a4d24aab982a2faefeb9fb3a848333032991be37.

#17 Updated by conorsch 2017-07-06 18:43:07

Blocking the upgrade path is a solid plan, thanks for doing that. For folks who already updated, is there a manual workaround for resolution? Seems like no, and the surest path forward for users who did upgrade to 3.0.1 is to recreate a 3.0 Tails stick and migrate.

#18 Updated by msheiny 2017-07-06 18:49:34

conorsch wrote:
> Blocking the upgrade path is a solid plan, thanks for doing that. For folks who already updated, is there a manual workaround for resolution? Seems like no, and the surest path forward for users who did upgrade to 3.0.1 is to recreate a 3.0 Tails stick and migrate.

Looks like running the tails installer from a 3.0.1 stick to upgrade a 3.0 stick works. I just did this on a stick that broke tor during the online upgrade path (3.0->3.0.1) and things look like they are working a-okay.

#19 Updated by msheiny 2017-07-06 18:57:08

msheiny wrote:
> conorsch wrote:
> > Blocking the upgrade path is a solid plan, thanks for doing that. For folks who already updated, is there a manual workaround for resolution? Seems like no, and the surest path forward for users who did upgrade to 3.0.1 is to recreate a 3.0 Tails stick and migrate.
>
> Looks like running the tails installer from a 3.0.1 stick to upgrade a 3.0 stick works. I just did this on a stick that broke tor during the online upgrade path (3.0->3.0.1) and things look like they are working a-okay.

To clarify, I was able to fix my broken 3.0.1 install (that was the product of doing an online upgrade from 3.0 —> 3.0.1) by doing the install from clone on a new 3.0.1 ISO.

  • Boot from new 3.0.1 ISO
  • Insert broken 3.0.1 USB installation, and run tails installer through cloning
  • Remove ISO, boot back to “broken” 3.0.1 — which now has tor repaired.

#20 Updated by franps 2017-07-10 18:13:17

I have upgraded tails in a manual way and the problem I had related to automatically upgrading (tor didn’t boot) has dissapeared.
Thanks

#21 Updated by intrigeri 2017-07-23 08:10:13

  • Status changed from In Progress to Resolved
  • Assignee deleted (intrigeri)
  • % Done changed from 0 to 100

Two weeks later, I really don’t see what I could do about this problem. The short-term problem was addressed by disabling this incremental upgrade and documenting how to repair a broken system. I took two actions for the longer term:

  • increase the chances that we detect such issues before releasing: commit:eca3d1001236570cc6a26fd2a961710a0e151ca2
  • add Feature #8415 to the list of candidates for our next roadmap update; the root cause of this bug is probably in aufs, and hopefully overlayfs would not have exposed it.

And I’m now closing ths ticket as it’s not actionable.

I’m still open to suggestions if anyone wants to help me debug this problem.

One option would be to report a bug somewhere, but our current reproducer is pretty hard to set up so I guess one would have to share a disk image (USB stick with 3.0 + the broken 3.0.1 upgrade) and point developers to it; I suspect they won’t like it much. So I think I’d rather spend time on Feature #8415 instead.

#22 Updated by bertagaz 2017-08-06 15:11:11

As a note somewhere, during 3.1 IUK tests, spriver noticed that the 3.0.1->3.1 upgrade path did not work. It was exposing the same uid mess preventing Tor to start correctly. So I’ve disabled this upgrade path.

Hopefully we won’t get hit again, as long as the debian-tor uid/gid isn’t changed between releases because of something like exim to be installed and deinstalled at some point of the build. That’s maybe something to check with a regression test, depending on the advancement of Feature #8415?

#23 Updated by intrigeri 2017-09-01 12:51:28

> Hopefully we won’t get hit again, as long as the debian-tor uid/gid isn’t changed between releases because of something like exim to be installed and deinstalled at some point of the build. That’s maybe something to check with a regression test, depending on the advancement of Feature #8415?

Please file a dedicated ticket if you think commit:eca3d1001236570cc6a26fd2a961710a0e151ca2 is not enough.

#24 Updated by anonym 2018-03-13 15:01:16

  • related to Bug #15407: Prevent system user uid:s and gid:s from changing between releases added

#25 Updated by intrigeri 2018-06-28 20:32:12

intrigeri wrote:
> bertagaz wrote:
>> Hopefully we won’t get hit again, as long as the debian-tor uid/gid isn’t changed between releases because of something like exim to be installed and deinstalled at some point of the build.

We were, hit again, during 3.5..3.6 upgrade.

> That’s maybe something to check with a regression test, depending on the advancement of Feature #8415?
>
> Please file a dedicated ticket if you think commit:eca3d1001236570cc6a26fd2a961710a0e151ca2 is not enough.

That was not enough: during the 3.6 release process, the result of this failing test was deemed not worth fixing and respinning an ISO. See you on Bug #15419 and friends.

#26 Updated by intrigeri 2018-06-28 20:48:45

  • related to Bug #15689: Test if overlayfs is affected by the DAC bug wrt. incremental upgrades changing UID/GID added

#27 Updated by intrigeri 2018-06-29 12:37:25

  • related to Bug #15690: Stop installing all "Priority: standard" packages only to remove some of them later added

#28 Updated by intrigeri 2018-06-30 12:53:40

  • related to Bug #15695: Avoid breaking automatic upgrades to Tails 3.9 added