Bug #9012
Network is sometimes not unblocked post-Greeter in Jessie
50%
Description
On an ISO built today from feature/jessie at commit:a48119d41f0083b22210b59c62ebace02ab9c7f0, after logging in the only network adapter I can see in ip a
is lo
. If I manually run /usr/local/sbin/tails-unblock-network
in a root terminal in the GNOME session, then the network comes up.
I cannot reproduce this with an ISO from a week ago, built from commit:6e49989443c85c3cfb69fefac6b4534f601f2e96. The Git diff doesn’t ring a bell, so likely the culprit is one of:
- systemd/udev upgrade from 215-11 to 215-12
- Linux upgrade from 3.16.7-ckt4-3 to 3.16.7-ckt7-1
Files
Subtasks
Related issues
Related to Tails - |
Resolved | ||
Blocked by Tails - |
Resolved | 2015-11-19 |
History
#1 Updated by intrigeri 2015-03-04 12:17:15
- Description updated
#2 Updated by intrigeri 2015-03-04 12:19:20
- Description updated
#3 Updated by intrigeri 2015-03-04 12:24:10
- Subject changed from Network is not unblocked post-Greeter in Jessie to Network is sometimes not unblocked post-Greeter in Jessie
- Assignee set to intrigeri
Well, now I can’t reproduce this failure anymore. I suspect there’s a race condition somewhere. I should look into the hacks I had to do to adapt MAC spoofing / network unblocking to Jessie and systemd: there were quite a few bugfixes in this area in Jessie’s systemd since then, and perhaps our hacks can now be simplified, and are causing this bug in their current state.
#4 Updated by anonym 2015-06-12 18:27:31
- File journalctl-al.log added
I’ve seen this now in a build from commit: 6ff01bd. I’ve attached the output of journalctl -al
, as requested over IRC.
#5 Updated by intrigeri 2015-07-06 09:04:55
I think I’ve just reproduced it with the automated test suite: no NM icon appeared.
#6 Updated by intrigeri 2015-07-06 09:27:11
anonym wrote:
> I’ve seen this now in a build from commit: 6ff01bd. I’ve attached the output of journalctl -al
, as requested over IRC.
That log shows that NM has successfully acquired a DHCP lease, and everything seems to be working, so I don’t understand how it is related to this bug => anonym, next time you see it, please send the same log again, and explain a bit more what makes you think it’s this bug. Thanks!
#7 Updated by intrigeri 2015-07-07 16:59:35
I’ve just seen it. No time / too lazy to debug, but running udevadm trigger
was enough to unblock the network. Redmine won’t let me upload the Journal, so I’ll keep it around locally.
#8 Updated by intrigeri 2015-07-08 06:26:02
intrigeri wrote:
> I’ve just seen it. No time / too lazy to debug, but running udevadm trigger
was enough to unblock the network.
/bin/udevadm trigger --type=devices --action=add
is actually enough to unblock the network. Note that tails-unblock-network
exits successfully when the unblocking fails.
Potentially relevant pieces of log (during the time when tails-unblock-network
is running, but doesn’t actually unblock the network:
sh[2222]: Failed to get realtime timestamp: Cannot assign requested address
— that’s intails-restricted-network-detector.service
, which shouldn’t affect the next operationssystemd[1]: Failed to reset devices.list on /system.slice: Invalid argument
, but this seems to be neglictible
#9 Updated by intrigeri 2015-07-08 07:45:30
- Status changed from Confirmed to Resolved
- % Done changed from 0 to 100
Applied in changeset commit:271e43fa9355f4a7820a7580a907a332ac57d669.
#10 Updated by intrigeri 2015-07-08 07:47:20
- Assignee deleted (
intrigeri)
#11 Updated by intrigeri 2015-10-01 09:08:15
- Status changed from Resolved to In Progress
- % Done changed from 100 to 50
I’ve seen it again :/
#12 Updated by intrigeri 2015-11-12 01:24:02
- Assignee set to intrigeri
#13 Updated by intrigeri 2015-11-17 08:59:39
I’ve not seen it since my Jessie sprint has started. I could not reproduce it today when booting from USB on: ThinkPad X61, ThinkPad X200, MacBook Pro 8,1 13-inch. At this point I think it’ll have to wait for the first beta release of Tails 2.0, or until we see it again in the test suite.
#14 Updated by intrigeri 2015-12-07 18:01:18
- blocked by
Feature #10583: Release a first Jessie-based beta added
#15 Updated by anonym 2015-12-21 08:01:48
I saw it today on an image built from the same commit as 2.0-beta1
.
#16 Updated by anonym 2015-12-21 10:32:04
- File journalctl-al-2.log added
anonym wrote:
> I saw it today on an image built from the same commit as 2.0-beta1
.
See attached file for the output of journalctl -al
(just noticed that -l
is the default nowadays). Random things to notice:
- at
Dec 21 15:55:30
the GNOME desktop has started and I runsudo su
ingnome-terminal
- at
Dec 21 15:57:38
(or slightly before) I runtails-unblock-network
manually - it seems I got a default GNOME Shell desktop (there’s no panel at the bottom, the Activities button instead of Applications) so GNOME Classic failed some how
- this was on a Tails 1.8 USB installation upgraded (Upgrade by cloning) to Tails/Jessie 2.0~beta1
- I had enabled read-write persistence (migrated from Tails 1.8)
#17 Updated by anonym 2015-12-21 10:34:40
- File dconf-dump added
- File tails-debugging-info added
More debugging info attached.
#18 Updated by anonym 2015-12-22 02:09:45
After a lot of rebooting, roughly half with persistence enabled and half with it disabled (obviously) I have seen a few cases of the failure in the former case, but none whatsoever in the latter. Coincidence, or could enabling persistence makes the unblocking fail more often? OTOH, I’m pretty sure I did not use persistence when I had the issue six months ago (see command 4).
Just a thought: in the Greeter’s PostLogin script we start tails-unblock-network
in the background => in parallel with the GNOME login process. Perhaps the udev triggering (which seem to cause devices to be removed and then re-added) is not such a great idea at this critical time for GNOME? Perhaps we should remove the “&
”?
#19 Updated by intrigeri 2016-01-01 13:43:22
anonym wrote:
> * it seems I got a default GNOME Shell desktop (there’s no panel at the bottom, the Activities button instead of Applications) so GNOME Classic failed some how
I’ve seen that occasionally too (but much later post-login, like hours). I wonder if our custom value for enabled-extensions
(that lacks the extensions enabled by Classic mode) is at fault.
#20 Updated by intrigeri 2016-01-10 13:55:45
I looked at journalctl-al-2.log
in details.
The network adapter is: 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
.
Relevant bits of logs:
Dec 21 15:51:51 localhost.localdomain kernel: input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input2
Dec 21 15:51:51 localhost.localdomain kernel: ACPI: Power Button [PWRF]
Dec 21 15:52:22 amnesia systemd-logind[1814]: Watching system buttons on /dev/input/event1 (Power Button)
Dec 21 15:52:24 amnesia gdm-Xorg-:0[1926]: (II) config/udev: Adding input device Power Button (/dev/input/event1)
Dec 21 15:52:24 amnesia gdm-Xorg-:0[1926]: (**) Power Button: Applying InputClass "evdev keyboard catchall"
Dec 21 15:53:39 amnesia gdm-session-worker[3502]: Running /usr/local/sbin/tails-unblock-network...
Dec 21 15:53:39 amnesia gdm-session-worker[3502]: + systemctl start tails-unblock-network.service
[at this point, the blacklist file is supposed to have been deleted]
Dec 21 15:53:39 amnesia gdm-session-worker[3502]: + echo Sleeping...
Dec 21 15:53:39 amnesia gdm-session-worker[3502]: Sleeping...
Dec 21 15:53:39 amnesia gdm-session-worker[3502]: + sleep 2
Dec 21 15:53:42 amnesia gdm-session-worker[3502]: systemd-udev-trigger.service restarted.
Dec 21 15:53:42 amnesia gdm-session-worker[3502]: Restarting systemd-udev-settle.service...
Dec 21 15:53:42 amnesia gdm-session-worker[3502]: + systemctl restart systemd-udev-settle.service
Dec 21 15:53:42 amnesia gdm-Xorg-:0[1926]: (II) config/udev: removing device Power Button
Dec 21 15:53:42 amnesia systemd-logind[1814]: Watching system buttons on /dev/input/event1 (Power Button)
Dec 21 15:53:42 amnesia gdm-Xorg-:0[1926]: (II) evdev: Power Button: Close
Dec 21 15:53:42 amnesia gdm-Xorg-:0[1926]: (II) UnloadModule: "evdev"
Dec 21 15:53:42 amnesia gdm-Xorg-:0[1926]: (II) config/udev: Adding input device Power Button (/dev/input/event1)
Dec 21 15:53:42 amnesia gdm-session-worker[3502]: systemd-udev-settle.service restarted.
Dec 21 15:53:42 amnesia gdm-session-worker[3502]: + systemctl enable NetworkManager.service NetworkManager-wait-online.service
Dec 21 15:53:42 amnesia gdm-session-worker[3502]: + systemctl start NetworkManager.service NetworkManager-dispatcher.service
Dec 21 15:53:42 amnesia systemd[1]: Failed to reset devices.list on /system.slice: Invalid argument
Dec 21 15:53:42 amnesia kernel: input: spice vdagent tablet as /devices/virtual/input/input6
Dec 21 15:55:30 amnesia sudo[4819]: amnesia : TTY=pts/0 ; PWD=/home/amnesia ; USER=root ; COMMAND=/bin/su
Dec 21 15:55:30 amnesia sudo[4819]: pam_unix(sudo:session): session opened for user root by amnesia(uid=0)
[user runs tails-unblock-network manually]
[nothing in the Journal for two minutes, strange?]
[Dec 21 15:57:38 (or slightly before), user runs tails-unblock-network manually]
Dec 21 15:57:38 amnesia gdm-Xorg-:0[1926]: (II) config/udev: removing device Power Button
Dec 21 15:57:38 amnesia systemd-logind[1814]: Watching system buttons on /dev/input/event1 (Power Button)
[and here comes the network adapter, finally:]
Dec 21 15:57:38 amnesia kernel: virtio-pci 0000:00:03.0: irq 45 for MSI/MSI-X
Dec 21 15:57:38 amnesia kernel: virtio-pci 0000:00:03.0: irq 46 for MSI/MSI-X
Dec 21 15:57:38 amnesia kernel: virtio-pci 0000:00:03.0: irq 47 for MSI/MSI-X
Dec 21 15:57:38 amnesia spoof-mac[5361]: Trying to spoof MAC address of NIC eth0...
So in short, deleting the blacklist file + sync in tails-unblock-network.service
, then sleeping 2 seconds and triggering udev, was not enough for udev and the kernel to pick up the network adapter.
From here, here’s what I would like to do for 2.0~rc1:
- I’m preparing a branch that increases logging in
tails-unblock-network
, and makes udevd pass modprobe the--verbose
option. I’d like to see it merged into devel, so that next time someone sees this bug I get more information. In particular, I’m curious to know ifmodprobe virtio_net
is run at all, the first time we (unsuccessfully) trigger udev. - I think it’s not too risky to bet that there’s a race condition between the time we delete the blacklist file, and the time we trigger udev. Raising the sleep time (urgl) might help, and should not slow down the login process as long as we run
tails-unblock-network
in the background fromPostLogin
. I’ll try to do that for 2.0~rc1.
Longer-term:
- I suspect that an aufs bug prevents udev from picking up the change in
/etc/modprobe.d
; if it’s the case, then switching to overlayfs (Feature #8415) will fix that. I’ve looked into the aufs documentation quite a bit, searched for information wrt. whether unlink/rm is atomic. I haven’t found anything conclusive. I’m concerned about our usage ofnoxino
: “If you do not want to use xino, use noxino mount option. Use this option with care, since the inode number may be changed silently and unexpectedly anytime. For example, rmdir failure, recursive chmod/chown/etc to a large and deep directory or anything else. And some applications will not work correctly.” - We’re using a somewhat old kernel (3.16), I wonder if there might be a bug somewhere, that would be fixed by upgrading (
Feature #10298).
#21 Updated by intrigeri 2016-01-10 18:29:05
- Assignee changed from intrigeri to anonym
- QA Check set to Ready for QA
- Feature Branch set to bugfix/9012-unblock-network
Hi,
I’ve done what I said (above) I would do for 2.0~rc1. Please review’n’merge, but I doubt it fixes the problem for real, so don’t mark as fix committed, just reassign to me for 2.2.
#22 Updated by anonym 2016-01-11 11:43:44
- Status changed from In Progress to Fix committed
- % Done changed from 50 to 100
Applied in changeset commit:b0a2d99d0372f60a7e7a96675fa16713762f7fb5.
#23 Updated by anonym 2016-01-11 11:48:54
- Assignee deleted (
anonym) - QA Check changed from Ready for QA to Pass
Commits make sense. Let’s hope they make a difference!
#24 Updated by intrigeri 2016-01-11 13:03:24
- Status changed from Fix committed to In Progress
- Assignee set to intrigeri
- Target version changed from Tails_2.0 to Tails_2.2
- % Done changed from 100 to 50
- QA Check deleted (
Pass)
Thanks! I said earlier:
> I doubt it fixes the problem for real, so don’t mark as fix committed, just reassign to me for 2.2.
… so I just did that.
#25 Updated by intrigeri 2016-03-08 13:47:17
- Target version changed from Tails_2.2 to Tails_2.3
#26 Updated by intrigeri 2016-04-16 15:41:00
- Target version changed from Tails_2.3 to Tails_2.4
#27 Updated by intrigeri 2016-04-29 13:59:10
anonym and frontdesk people: did you see this happen or reported since Tails 2.0 was released?
#28 Updated by intrigeri 2016-04-29 13:59:33
- Assignee changed from intrigeri to anonym
- QA Check set to Info Needed
#29 Updated by emmapeel 2016-04-29 14:16:37
Not that I remember….
#30 Updated by anonym 2016-05-08 04:12:59
- Assignee changed from anonym to intrigeri
I cannot recall seeing it. I propose we close this ticket.
#31 Updated by intrigeri 2016-05-09 01:43:57
- Status changed from In Progress to Resolved
- Assignee deleted (
intrigeri)
OK, tentatively closing then. Frontdesk, please keep an eye on it :)
#32 Updated by intrigeri 2019-08-22 20:00:19
- related to
Bug #16805: Time between Greeter and desktop multiplied by 2 added