Bug #9012

Network is sometimes not unblocked post-Greeter in Jessie

Added by intrigeri 2015-03-04 12:12:40 . Updated 2016-05-09 01:43:57 .

Status:
Resolved
Priority:
Elevated
Assignee:
Category:
Spoof MAC
Target version:
Start date:
2015-03-04
Due date:
% Done:

50%

Feature Branch:
bugfix/9012-unblock-network
Type of work:
Research
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

On an ISO built today from feature/jessie at commit:a48119d41f0083b22210b59c62ebace02ab9c7f0, after logging in the only network adapter I can see in ip a is lo. If I manually run /usr/local/sbin/tails-unblock-network in a root terminal in the GNOME session, then the network comes up.

I cannot reproduce this with an ISO from a week ago, built from commit:6e49989443c85c3cfb69fefac6b4534f601f2e96. The Git diff doesn’t ring a bell, so likely the culprit is one of:

  • systemd/udev upgrade from 215-11 to 215-12
  • Linux upgrade from 3.16.7-ckt4-3 to 3.16.7-ckt7-1

Files


Subtasks


Related issues

Related to Tails - Bug #16805: Time between Greeter and desktop multiplied by 2 Resolved
Blocked by Tails - Feature #10583: Release a first Jessie-based beta Resolved 2015-11-19

History

#1 Updated by intrigeri 2015-03-04 12:17:15

  • Description updated

#2 Updated by intrigeri 2015-03-04 12:19:20

  • Description updated

#3 Updated by intrigeri 2015-03-04 12:24:10

  • Subject changed from Network is not unblocked post-Greeter in Jessie to Network is sometimes not unblocked post-Greeter in Jessie
  • Assignee set to intrigeri

Well, now I can’t reproduce this failure anymore. I suspect there’s a race condition somewhere. I should look into the hacks I had to do to adapt MAC spoofing / network unblocking to Jessie and systemd: there were quite a few bugfixes in this area in Jessie’s systemd since then, and perhaps our hacks can now be simplified, and are causing this bug in their current state.

#4 Updated by anonym 2015-06-12 18:27:31

I’ve seen this now in a build from commit: 6ff01bd. I’ve attached the output of journalctl -al, as requested over IRC.

#5 Updated by intrigeri 2015-07-06 09:04:55

I think I’ve just reproduced it with the automated test suite: no NM icon appeared.

#6 Updated by intrigeri 2015-07-06 09:27:11

anonym wrote:
> I’ve seen this now in a build from commit: 6ff01bd. I’ve attached the output of journalctl -al, as requested over IRC.

That log shows that NM has successfully acquired a DHCP lease, and everything seems to be working, so I don’t understand how it is related to this bug => anonym, next time you see it, please send the same log again, and explain a bit more what makes you think it’s this bug. Thanks!

#7 Updated by intrigeri 2015-07-07 16:59:35

I’ve just seen it. No time / too lazy to debug, but running udevadm trigger was enough to unblock the network. Redmine won’t let me upload the Journal, so I’ll keep it around locally.

#8 Updated by intrigeri 2015-07-08 06:26:02

intrigeri wrote:
> I’ve just seen it. No time / too lazy to debug, but running udevadm trigger was enough to unblock the network.

/bin/udevadm trigger --type=devices --action=add is actually enough to unblock the network. Note that tails-unblock-network exits successfully when the unblocking fails.

Potentially relevant pieces of log (during the time when tails-unblock-network is running, but doesn’t actually unblock the network:

  • sh[2222]: Failed to get realtime timestamp: Cannot assign requested address — that’s in tails-restricted-network-detector.service, which shouldn’t affect the next operations
  • systemd[1]: Failed to reset devices.list on /system.slice: Invalid argument, but this seems to be neglictible

#9 Updated by intrigeri 2015-07-08 07:45:30

  • Status changed from Confirmed to Resolved
  • % Done changed from 0 to 100

Applied in changeset commit:271e43fa9355f4a7820a7580a907a332ac57d669.

#10 Updated by intrigeri 2015-07-08 07:47:20

  • Assignee deleted (intrigeri)

#11 Updated by intrigeri 2015-10-01 09:08:15

  • Status changed from Resolved to In Progress
  • % Done changed from 100 to 50

I’ve seen it again :/

#12 Updated by intrigeri 2015-11-12 01:24:02

  • Assignee set to intrigeri

#13 Updated by intrigeri 2015-11-17 08:59:39

I’ve not seen it since my Jessie sprint has started. I could not reproduce it today when booting from USB on: ThinkPad X61, ThinkPad X200, MacBook Pro 8,1 13-inch. At this point I think it’ll have to wait for the first beta release of Tails 2.0, or until we see it again in the test suite.

#14 Updated by intrigeri 2015-12-07 18:01:18

#15 Updated by anonym 2015-12-21 08:01:48

I saw it today on an image built from the same commit as 2.0-beta1.

#16 Updated by anonym 2015-12-21 10:32:04

anonym wrote:
> I saw it today on an image built from the same commit as 2.0-beta1.

See attached file for the output of journalctl -al (just noticed that -l is the default nowadays). Random things to notice:

  • at Dec 21 15:55:30 the GNOME desktop has started and I run sudo su in gnome-terminal
  • at Dec 21 15:57:38 (or slightly before) I run tails-unblock-network manually
  • it seems I got a default GNOME Shell desktop (there’s no panel at the bottom, the Activities button instead of Applications) so GNOME Classic failed some how
  • this was on a Tails 1.8 USB installation upgraded (Upgrade by cloning) to Tails/Jessie 2.0~beta1
  • I had enabled read-write persistence (migrated from Tails 1.8)

#17 Updated by anonym 2015-12-21 10:34:40

More debugging info attached.

#18 Updated by anonym 2015-12-22 02:09:45

After a lot of rebooting, roughly half with persistence enabled and half with it disabled (obviously) I have seen a few cases of the failure in the former case, but none whatsoever in the latter. Coincidence, or could enabling persistence makes the unblocking fail more often? OTOH, I’m pretty sure I did not use persistence when I had the issue six months ago (see command 4).

Just a thought: in the Greeter’s PostLogin script we start tails-unblock-network in the background => in parallel with the GNOME login process. Perhaps the udev triggering (which seem to cause devices to be removed and then re-added) is not such a great idea at this critical time for GNOME? Perhaps we should remove the “&”?

#19 Updated by intrigeri 2016-01-01 13:43:22

anonym wrote:
> * it seems I got a default GNOME Shell desktop (there’s no panel at the bottom, the Activities button instead of Applications) so GNOME Classic failed some how

I’ve seen that occasionally too (but much later post-login, like hours). I wonder if our custom value for enabled-extensions (that lacks the extensions enabled by Classic mode) is at fault.

#20 Updated by intrigeri 2016-01-10 13:55:45

I looked at journalctl-al-2.log in details.

The network adapter is: 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device.

Relevant bits of logs:

Dec 21 15:51:51 localhost.localdomain kernel: input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input2
Dec 21 15:51:51 localhost.localdomain kernel: ACPI: Power Button [PWRF]

Dec 21 15:52:22 amnesia systemd-logind[1814]: Watching system buttons on /dev/input/event1 (Power Button)

Dec 21 15:52:24 amnesia gdm-Xorg-:0[1926]: (II) config/udev: Adding input device Power Button (/dev/input/event1)
Dec 21 15:52:24 amnesia gdm-Xorg-:0[1926]: (**) Power Button: Applying InputClass "evdev keyboard catchall"

Dec 21 15:53:39 amnesia gdm-session-worker[3502]: Running /usr/local/sbin/tails-unblock-network...
Dec 21 15:53:39 amnesia gdm-session-worker[3502]: + systemctl start tails-unblock-network.service

[at this point, the blacklist file is supposed to have been deleted]

Dec 21 15:53:39 amnesia gdm-session-worker[3502]: + echo Sleeping...
Dec 21 15:53:39 amnesia gdm-session-worker[3502]: Sleeping...
Dec 21 15:53:39 amnesia gdm-session-worker[3502]: + sleep 2

Dec 21 15:53:42 amnesia gdm-session-worker[3502]: systemd-udev-trigger.service restarted.
Dec 21 15:53:42 amnesia gdm-session-worker[3502]: Restarting systemd-udev-settle.service...
Dec 21 15:53:42 amnesia gdm-session-worker[3502]: + systemctl restart systemd-udev-settle.service

Dec 21 15:53:42 amnesia gdm-Xorg-:0[1926]: (II) config/udev: removing device Power Button
Dec 21 15:53:42 amnesia systemd-logind[1814]: Watching system buttons on /dev/input/event1 (Power Button)
Dec 21 15:53:42 amnesia gdm-Xorg-:0[1926]: (II) evdev: Power Button: Close
Dec 21 15:53:42 amnesia gdm-Xorg-:0[1926]: (II) UnloadModule: "evdev"
Dec 21 15:53:42 amnesia gdm-Xorg-:0[1926]: (II) config/udev: Adding input device Power Button (/dev/input/event1)

Dec 21 15:53:42 amnesia gdm-session-worker[3502]: systemd-udev-settle.service restarted.

Dec 21 15:53:42 amnesia gdm-session-worker[3502]: + systemctl enable NetworkManager.service NetworkManager-wait-online.service

Dec 21 15:53:42 amnesia gdm-session-worker[3502]: + systemctl start NetworkManager.service NetworkManager-dispatcher.service
Dec 21 15:53:42 amnesia systemd[1]: Failed to reset devices.list on /system.slice: Invalid argument

Dec 21 15:53:42 amnesia kernel: input: spice vdagent tablet as /devices/virtual/input/input6

Dec 21 15:55:30 amnesia sudo[4819]: amnesia : TTY=pts/0 ; PWD=/home/amnesia ; USER=root ; COMMAND=/bin/su
Dec 21 15:55:30 amnesia sudo[4819]: pam_unix(sudo:session): session opened for user root by amnesia(uid=0)

[user runs tails-unblock-network manually]

[nothing in the Journal for two minutes, strange?]

[Dec 21 15:57:38 (or slightly before), user runs tails-unblock-network manually]

Dec 21 15:57:38 amnesia gdm-Xorg-:0[1926]: (II) config/udev: removing device Power Button
Dec 21 15:57:38 amnesia systemd-logind[1814]: Watching system buttons on /dev/input/event1 (Power Button)

[and here comes the network adapter, finally:]

Dec 21 15:57:38 amnesia kernel: virtio-pci 0000:00:03.0: irq 45 for MSI/MSI-X
Dec 21 15:57:38 amnesia kernel: virtio-pci 0000:00:03.0: irq 46 for MSI/MSI-X
Dec 21 15:57:38 amnesia kernel: virtio-pci 0000:00:03.0: irq 47 for MSI/MSI-X

Dec 21 15:57:38 amnesia spoof-mac[5361]: Trying to spoof MAC address of NIC eth0...

So in short, deleting the blacklist file + sync in tails-unblock-network.service, then sleeping 2 seconds and triggering udev, was not enough for udev and the kernel to pick up the network adapter.

From here, here’s what I would like to do for 2.0~rc1:

  • I’m preparing a branch that increases logging in tails-unblock-network, and makes udevd pass modprobe the --verbose option. I’d like to see it merged into devel, so that next time someone sees this bug I get more information. In particular, I’m curious to know if modprobe virtio_net is run at all, the first time we (unsuccessfully) trigger udev.
  • I think it’s not too risky to bet that there’s a race condition between the time we delete the blacklist file, and the time we trigger udev. Raising the sleep time (urgl) might help, and should not slow down the login process as long as we run tails-unblock-network in the background from PostLogin. I’ll try to do that for 2.0~rc1.

Longer-term:

  • I suspect that an aufs bug prevents udev from picking up the change in /etc/modprobe.d; if it’s the case, then switching to overlayfs (Feature #8415) will fix that. I’ve looked into the aufs documentation quite a bit, searched for information wrt. whether unlink/rm is atomic. I haven’t found anything conclusive. I’m concerned about our usage of noxino: “If you do not want to use xino, use noxino mount option. Use this option with care, since the inode number may be changed silently and unexpectedly anytime. For example, rmdir failure, recursive chmod/chown/etc to a large and deep directory or anything else. And some applications will not work correctly.”
  • We’re using a somewhat old kernel (3.16), I wonder if there might be a bug somewhere, that would be fixed by upgrading (Feature #10298).

#21 Updated by intrigeri 2016-01-10 18:29:05

  • Assignee changed from intrigeri to anonym
  • QA Check set to Ready for QA
  • Feature Branch set to bugfix/9012-unblock-network

Hi,

I’ve done what I said (above) I would do for 2.0~rc1. Please review’n’merge, but I doubt it fixes the problem for real, so don’t mark as fix committed, just reassign to me for 2.2.

#22 Updated by anonym 2016-01-11 11:43:44

  • Status changed from In Progress to Fix committed
  • % Done changed from 50 to 100

Applied in changeset commit:b0a2d99d0372f60a7e7a96675fa16713762f7fb5.

#23 Updated by anonym 2016-01-11 11:48:54

  • Assignee deleted (anonym)
  • QA Check changed from Ready for QA to Pass

Commits make sense. Let’s hope they make a difference!

#24 Updated by intrigeri 2016-01-11 13:03:24

  • Status changed from Fix committed to In Progress
  • Assignee set to intrigeri
  • Target version changed from Tails_2.0 to Tails_2.2
  • % Done changed from 100 to 50
  • QA Check deleted (Pass)

Thanks! I said earlier:

> I doubt it fixes the problem for real, so don’t mark as fix committed, just reassign to me for 2.2.

… so I just did that.

#25 Updated by intrigeri 2016-03-08 13:47:17

  • Target version changed from Tails_2.2 to Tails_2.3

#26 Updated by intrigeri 2016-04-16 15:41:00

  • Target version changed from Tails_2.3 to Tails_2.4

#27 Updated by intrigeri 2016-04-29 13:59:10

anonym and frontdesk people: did you see this happen or reported since Tails 2.0 was released?

#28 Updated by intrigeri 2016-04-29 13:59:33

  • Assignee changed from intrigeri to anonym
  • QA Check set to Info Needed

#29 Updated by emmapeel 2016-04-29 14:16:37

Not that I remember….

#30 Updated by anonym 2016-05-08 04:12:59

  • Assignee changed from anonym to intrigeri

I cannot recall seeing it. I propose we close this ticket.

#31 Updated by intrigeri 2016-05-09 01:43:57

  • Status changed from In Progress to Resolved
  • Assignee deleted (intrigeri)

OK, tentatively closing then. Frontdesk, please keep an eye on it :)

#32 Updated by intrigeri 2019-08-22 20:00:19

  • related to Bug #16805: Time between Greeter and desktop multiplied by 2 added