Bug #17115

MAC spoofing can fail and disable a network adapter when waking up from suspend

Added by sajolida 2019-10-02 14:31:12 . Updated 2019-10-11 06:48:09 .

Status:
Confirmed
Priority:
Normal
Assignee:
Category:
Spoof MAC
Target version:
Start date:
Due date:
% Done:

0%

Feature Branch:
Type of work:
Research
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

Today my Edimax USB Wi-Fi adapter was disabled by a MAC spoofing failure when waking up my computer from suspending overnight. I sent the journal to tails-foundations@boum.org.

I’ve used this adapter for days without problems.

I might have experienced this bug sometimes in the recent past (I’ve been using it as my main interface for 3 weeks) but definitely not every time I wake up from suspend.

I’m running Tails 4.0~beta2 and I don’t remember experiencing this in Tails 3.x but I also haven’t used this adapter as my main interface that much in Tails 3.x.


Subtasks


Related issues

Related to Tails - Bug #6550: No MAC spoofing fail-safe for hotplugged devices after logging in Resolved 2013-12-29
Related to Tails - Feature #11293: Check if/how we should use NetworkManager's new MAC address spoofing capabilities Confirmed 2016-03-31
Related to Tails - Bug #17128: Investigate whether internal network adapters consistently use their previously spoofed MAC address after resuming from suspend Confirmed

History

#1 Updated by sajolida 2019-10-02 22:13:17

I got the same failure again today when coming back from lunch.

#2 Updated by intrigeri 2019-10-07 07:19:57

  • related to Bug #6550: No MAC spoofing fail-safe for hotplugged devices after logging in added

#3 Updated by intrigeri 2019-10-07 07:36:36

  • related to Feature #11293: Check if/how we should use NetworkManager's new MAC address spoofing capabilities added

#4 Updated by intrigeri 2019-10-07 07:36:42

  • related to Bug #17128: Investigate whether internal network adapters consistently use their previously spoofed MAC address after resuming from suspend added

#5 Updated by intrigeri 2019-10-07 08:02:54

  • Status changed from New to Confirmed

Ouch!

> I’m running Tails 4.0~beta2 and I don’t remember experiencing this in Tails 3.x but I also haven’t used this adapter as my main interface that much in Tails 3.x.

I don’t think we did anything on our side between 3.x and 4.0~beta2 that impacts the race condition that’s causing this bug (described below). But of course, there are so many pieces involved, it could be that some upstream change made us more likely to lose the race on Buster. For example, NetworkManager could have been improved to pick up new network adapters faster. Go figure :)

FTR, here’s the relevant subset of the log. I did my best to redact all private information from it:

kernel: PM: suspend entry (deep)
kernel: Filesystems sync: 0.001 seconds
kernel: (NULL device *): firmware: direct-loading firmware iwlwifi-6000-4.ucode
kernel: (NULL device *): firmware: direct-loading firmware rtlwifi/rtl8192cufw_TMSC.bin
kernel: rtl8192cu: Board Type 0
kernel: Freezing user space processes ... (elapsed 0.012 seconds) done.
kernel: Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
kernel: wlan1: deauthenticating from [REDACTED] by local choice (Reason: 3=DEAUTH_LEAVING)
kernel: Disabling non-boot CPUs ...
kernel: Enabling non-boot CPUs ...
kernel: ACPI: Waking up from system sleep state S3
kernel: usb 1-1.6: reset high-speed USB device number 4 using ehci-pci
kernel: usb 2-1.1: reset high-speed USB device number 3 using ehci-pci
kernel: Restarting tasks ... done.
kernel: PM: suspend exit
systemd-logind[4341]: Lid opened.
NetworkManager[13362]: <info>  [1570025476.7702] device (wlan1): state change: activated -> unmanaged (reason 'removed', sys-iface-state: 'removed')
NetworkManager[13362]: <info>  [1570025476.8219] dhcp4 (wlan1): canceled DHCP transaction, DHCP client pid 13377
NetworkManager[13362]: <info>  [1570025476.8219] dhcp4 (wlan1): state changed bound -> done
systemd-sleep[19576]: System resumed.
systemd[1]: systemd-suspend.service: Succeeded.
systemd[1]: Started Suspend.
systemd[1]: Stopped target Sleep.
systemd[1]: Reached target Suspend.
systemd[1]: Stopped target Suspend.
wpa_supplicant[4348]: wlan1: CTRL-EVENT-DISCONNECTED bssid=[REDACTED] reason=3 locally_generated=1
kernel: rtl_usb: rx_max_size 15360, rx_urb_num 8, in_ep 1
kernel: rtl8192cu: Loading firmware rtlwifi/rtl8192cufw_TMSC.bin
kernel: ieee80211 phy3: Selected rate control algorithm 'rtl_rc'
systemd-udevd[19668]: Using default interface naming scheme 'v240'.
dbus-daemon[4337]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher'
op.nm-dispatcher.service' requested by ':1.73' (uid=0 pid=13362 comm="/usr/sbin/NetworkManager --no-daemon ")
NetworkManager[13362]: <info>  [1570025477.0016] wifi-nl80211: (wlan1): using nl80211 for WiFi device control
NetworkManager[13362]: <info>  [1570025477.0080] device (wlan1): driver supports Access Point (AP) mode
kernel: rtl8192cu: MAC auto ON okay!
NetworkManager[13362]: <info>  [1570025477.0170] manager: (wlan1): new 802.11 WiFi device (/org/freedesktop/NetworkManager/Devices/6)
systemd[1]: Starting Network Manager Script Dispatcher Service...
NetworkManager[13362]: <info>  [1570025477.0472] rfkill3: found WiFi radio killswitch (at
0000:00:1a.0/usb1/1-1/1-1.1/1-1.1:1.0/ieee80211/phy3/rfkill3) (driver rtl8192cu)
systemd-logind[4341]: Operation 'sleep' finished.
kernel: rtl8192cu: Tx queue select: 0x05
NetworkManager[13362]: <info>  [1570025477.0730] manager: sleep: wake requested (sleeping: yes  enabled: yes)
spoof-mac[19715]: Trying to spoof MAC address of NIC wlan1...
gnome-shell[13628]: An active wireless connection, in infrastructure mode, involves no access point?
dbus-daemon[4337]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
systemd[1]: Started Network Manager Script Dispatcher Service.
nm-dispatcher[19711]: req:1 'down' [wlan1]: new request (8 scripts)
nm-dispatcher[19711]: req:1 'down' [wlan1]: start running ordered scripts...
nm-dispatcher[19711]: req:2 'connectivity-change': new request (8 scripts)
[…]
spoof-mac[19756]: macchanger failed for NIC wlan1, returned 1 and said: [ERROR] Could not change MAC: interface up or insufficient permissions:
Device or resource busy
spoof-mac[19757]: Failed to spoof MAC address of NIC wlan1. Going into panic mode.
wpa_supplicant[4348]: nl80211: deinit ifname=wlan1 disabled_11b_rates=0
systemd[1]: Stopped target Tor has bootstrapped.
systemd[1]: Stopping Manage the flag file that indicates whether Tor has bootstrapped...
nm-dispatcher[19711]: req:2 'connectivity-change': start running ordered scripts...
kernel: iwlwifi 0000:02:00.0: Radio type=0x0-0x3-0x1
systemd[1]: tails-tor-has-bootstrapped-flag-file.service: Succeeded.
systemd[1]: Stopped Manage the flag file that indicates whether Tor has bootstrapped.
kernel: usbcore: deregistering interface driver rtl8192cu

My understanding is that when the system resumes from suspect, several things race against each other:

  • wlan1 (the USB Wi-Fi adapter that’s later blocked) reappears on the USB bus ⇒ udev triggers our MAC spoofing procedure for this device.
  • NetworkManager notices that wlan1 reappeared and starts managing it (which I expect implies it’ll start scanning for APs, and since a while it will itself randomize the MAC address for scanning)
  • NetworkManager notices that wlan1 disappeared during suspend and runs its “down” scripts; I think this one is not a problem here so I’ll ignore it for now.

Once NetworkManager has started managing the device, and even more so while it is itself trying to randomize the MAC address for scanning, our udev-triggered MAC spoofing procedure can’t possibly work: that’s why at boot time, we start NetworkManager only after we’ve spoofed MAC addresses.

I’m not surprised this happens: lots of our codebase was written before we started supporting suspend-to-RAM and was never adapted to support such situations. I would even go further: our current MAC spoofing implementation is essentially designed for systems with a fixed set of hardware; I don’t recall if we took hotplugging new devices into account in the design phase, but for example Bug #6550 shows that the initial implementation was not fully ready for this.

And indeed, I’m pretty sure that the exact same race condition happens when one plugs in a new network adapter, regardless of suspend. It could be that for some reason, our own MAC spoofing procedure is more likely to win the race (good) in that case, but that’s pure luck ⇒ I’ll make the scope of this ticket broader.

It’s interesting that the same problem does not happen for the internal network adapter. I guess the kernel does not see it as a “new” device on resume, so no udev event is triggered, so we don’t try to spoof their MAC address again. Which makes me wonder if some devices/firmware, in this case, will forget their earlier state and go back to their permanent MAC address. Uh oh ⇒ filed Bug #17128.

Brainstorming possible solutions:

  • Letting NetworkManager handle the whole MAC spoofing dance itself (Feature #11293) would almost certainly fix that: it’s not going to race against itself.
  • Is there a way to ask NetworkManager to wait until we’ve spoofed the MAC address? I doubt it: even its pre-up phase runs too late. We would essentially need a pre-manage hook.
  • We could stop NetworkManager on resume and then start it again once we’re sure that all network adapters have been put back online and their MAC address spoofed. But:
    • It’s not easy to be reliably confident that we’ve waited enough. Possibly we could record somewhere the list of network adapters present before suspend, and then wait for all of them to come back. We would still some kind of timeout though, because the user might have unplugged their USB Wi-Fi adapter and put it in their bag while the system was suspended.
    • This solution does not address the same race condition for hotplugged network adapters, for example: I close the lid, unplug my USB Wi-Fi adapter, put both in my bag, come back one hour later, open the lid, type my lock screen password, possibly spend a few minutes brewing coffee, and I finally plug back my USB Wi-Fi adapter.
    • This would make a subsystem of Tails, that is already complex, even harder to reason about, debug, and improve.

Personally, I’m getting more and more convinced that we should look into Feature #11293 again: there’s a strong trend towards OS vendors enabling MAC randomization by default in more and more cases, NetworkManager is following this trend, and now could be a good time to benefit from it and get rid of our own, in-house implementation.

anonym, any thoughts on this topic? segfault, what about you?

#6 Updated by intrigeri 2019-10-07 08:08:34

  • Subject changed from MAC spoofing false negative when waking up from suspend to MAC spoofing can fail and disable a network adapter when waking up from suspend or when it's hot-plugged
  • Type of work changed from Code to Research

#7 Updated by anonym 2019-10-07 09:05:52

intrigeri wrote:
> I’m not surprised this happens: lots of our codebase was written before we started supporting suspend-to-RAM and was never adapted to support such situations.

And it was certainly the case with this feature.

> I would even go further: our current MAC spoofing implementation is essentially designed for systems with a fixed set of hardware; I don’t recall if we took hotplugging new devices into account in the design phase, but for example Bug #6550 shows that the initial implementation was not fully ready for this.

Hotplugging was part of the design, but the initial implementation was indeed buggy since that case got much less testing and attention (see e.g. commit:d4fd3edab513d5777aec435f25ae57404c020ce9).

However, then there is the udev vs NM race made apparent in tails-unblock-network — clearly that workaround (starting NM last) has never been applied to hotplugged devices. So, indeed, this feature probably only worked properly for hardware present before logging in. :/

> Brainstorming possible solutions:
>
> * Letting NetworkManager handle the whole MAC spoofing dance itself (Feature #11293) would almost certainly fix that: it’s not going to race against itself.

This would be so nice!

> * We could stop NetworkManager on resume and then start it again once we’re sure that all network adapters have been put back online and their MAC address spoofed.

Wow, eek, this is getting pretty hairy!

> Personally, I’m getting more and more convinced that we should look into Feature #11293 again: there’s a strong trend towards OS vendors enabling MAC randomization by default in more and more cases, NetworkManager is following this trend, and now could be a good time to benefit from it and get rid of our own, in-house implementation.

100% agreed!

#8 Updated by segfault 2019-10-07 09:38:39

> Personally, I’m getting more and more convinced that we should look into Feature #11293 again: there’s a strong trend towards OS vendors enabling MAC randomization by default in more and more cases, NetworkManager is following this trend, and now could be a good time to benefit from it and get rid of our own, in-house implementation.

I also agree, this sounds like the best solution.

#9 Updated by anonym 2019-10-07 10:16:23

anonym wrote:
> However, then there is the udev vs NM race made apparent in tails-unblock-network — clearly that workaround (starting NM last) has never been applied to hotplugged devices. So, indeed, this feature probably only worked properly for hardware present before logging in. :/

This made me uneasy and contradicted my memories of how udev vs NM interacts, so I made an experiment by leveraging our “MAC address spoofing is successful” scenario (our automated test suite is damn practical for setting up manual testing scenarios :D ):

  1. I injected the “I pause” step as breakpoints before and after the “I hotplug a network device” step
  2. I started the test and waited until the first “I pause”
  3. I took control of the VM added a long sleep to tails-spoof-mac
  4. I unpaused the test and waited to the next “I pause”
  5. I checked the journal in the VM

According to the journal, NM has only been made aware of the existence of the new device (“new Ethernet device”) at this point, and has not changed its state. That happens as soon as I kill tails-soof-mac though. So my conclusion is that there is no race between udev hooks and NM here: NM waits until udev hooks finish.

So, regarding the comments and order of things in tails-unblock-network, perhaps we were hit by some bug that confused us to think this race was the problem? Or perhaps the problem was actually having NM running before we unblock the network? Any way, in conclusion it actually does look like hotplugging works exactly as it should.

So it seems sajolida’s bug is caused by something else. I wanted to run udevadm monitor during a suspend-resume cycle to see what is reported for the device but it doesn’t seem supported by libvirt (or my configuration) so I gave up, and I lack any such hardware to try for real. intrigeri or sajolida, can you please record what udevadm monitor reports during a full suspend-resume cycle when the spoof failure occurs? Our hook only triggers on the add action but perhaps something else is emitted when resuming from sleep? I suspect it could be change, and in that case this ticket is fixed by triggering on ACTION=="add|change" instead.

#10 Updated by intrigeri 2019-10-09 15:18:07

  • Subject changed from MAC spoofing can fail and disable a network adapter when waking up from suspend or when it's hot-plugged to MAC spoofing can fail and disable a network adapter when waking up from suspend

#11 Updated by intrigeri 2019-10-09 15:19:39

Hi,

anonym wrote: > According to the journal, NM has only been made aware of the existence of the new device ("new Ethernet device") at this point, and has *not* changed its state. That happens as soon as I kill tails-soof-mac@ though. So my conclusion is that there is no race between udev hooks and NM here: NM waits until udev hooks finish.

I did not fully review the test procedure (should I?) but I’m inclined to blindly trust your conclusions.

> So, regarding the comments and order of things in tails-unblock-network, perhaps we were hit by some bug that confused us to think this race was the problem? Or perhaps the problem was actually having NM running before we unblock the network?

Or, when we implemented this initially, NM did not wait for udev hooks to complete. That was years ago :)

> Any way, in conclusion it actually does look like hotplugging works exactly as it should.

Great! I’m very happy I was wrong. I’ve optimistically fixed the ticket title accordingly, hoping that you merely forgot.

> intrigeri or sajolida, can you please record what udevadm monitor@ reports during a full suspend-resume cycle when the spoof failure occurs?

I’ve dug through my hardware storage and found a USB Wi-Fi adapter I had totally forgotten about ⇒ I’ll try to give you this info.

#12 Updated by intrigeri 2019-10-09 15:49:21

Hi again @anonym,

> anonym wrote:
>> intrigeri or sajolida, can you please record what udevadm monitor reports during a full suspend-resume cycle when the spoof failure occurs?

I can’t reproduce the spoof failure with the hardware I have handy: I’ve tried 5 times (sometimes with the sleep hardware button, sometimes by closing the lid, sometimes by pressing the suspend button in GNOME Shell’s system menu); Wi-Fi came back up every time, each time with a shiny new MAC address. Which, I guess, at least confirms that external network adapters are subject to the MAC spoofing dance upon resume, while in my experience (Bug #17128#note-3) at least some internal network adapters are immune to it.

> I’ve dug through my hardware storage and found a USB Wi-Fi adapter I had totally forgotten about

FTR this USB Wi-Fi adapter is a super old Belkin “Wireless G USB Network Adapter”, “Model no. F5D7050eaE”.

> ⇒ I’ll try to give you this info.

It’s full of private info so I’ll send it to you privately… FWIW given I could not reproduce.

⇒ I think it would be useful if sajolida sent you this info (+ possibly the output of journalctl) when the spoof failure occurs.

#13 Updated by anonym 2019-10-10 08:56:20

intrigeri wrote:
> Hi,
>
> anonym wrote: > > According to the journal, NM has only been made aware of the existence of the new device ("new Ethernet device") at this point, and has *not* changed its state. That happens as soon as I kill tails-soof-mac@ though. So my conclusion is that there is no race between udev hooks and NM here: NM waits until udev hooks finish.
>
> I did not fully review the test procedure (should I?) but I’m inclined to blindly trust your conclusions.

No, I’m pretty confident in my procedure. I documented my steps “just in case”.

> > So, regarding the comments and order of things in tails-unblock-network, perhaps we were hit by some bug that confused us to think this race was the problem? Or perhaps the problem was actually having NM running before we unblock the network?
>
> Or, when we implemented this initially, NM did not wait for udev hooks to complete. That was years ago :)

I guess, but above I said that that hypothesis “contradicted my memories of how udev vs NM interacts”: I have a distinct memory of doing essentially the test I did above years ago when developing this feature.

> > intrigeri or sajolida, can you please record what udevadm monitor@ reports during a full suspend-resume cycle when the spoof failure occurs?
>
> I’ve dug through my hardware storage and found a USB Wi-Fi adapter I had totally forgotten about ⇒ I’ll try to give you this info.

So hotplugged devices are removed and then added, there’s no change, so my hunch was wrong, damn.

Comparing your log vs the excerpt from sajolida’s we can see that NM has done the same state changes (activated -> unmanaged and bound -> done) and the device should be unmanaged. But then for sajolida macchanger fails with “Could not change MAC: interface up or insufficient permissions: Device or resource busy”.

I’m trying to figure out why it is busy. In sajolida’s case we can see nm-dispatcher running scripts for “wlan0 down” between spoof-mac’s “Trying to spoof” and the failure, which is suspect, because in intri’s case that happens way earlier. I looked through the NM dispatcher scripts, and nothing in them uses the device on down, but I guess NM keeps some sort of handle on the device until all scripts are done, explaining why it’s “busy”.

So my best approximation of a potential solution would be to make spoof-mac wait until the device is done with it’s NM disatcher scripts (e.g. in first NM dispatch script we write a state file, and we remove it on the last NM dispatch script, then spoof-mac will wait until this file does not exist), or we just make spoof-mac re-try macchanger on failure for up to 20 seconds or whatever.

Thoughts?

#14 Updated by intrigeri 2019-10-11 06:48:09

> I’m trying to figure out why it is busy. In sajolida’s case we can see nm-dispatcher running scripts for “wlan0 down” between spoof-mac’s “Trying to spoof” and the failure, which is suspect, because in intri’s case that happens way earlier. I looked through the NM dispatcher scripts, and nothing in them uses the device on down, but I guess NM keeps some sort of handle on the device until all scripts are done, explaining why it’s “busy”.

Yeah, it could be, for example, that NM is asking wpasupplicant to disconnect cleanly from the Wi-Fi AP (assuming there’s such a thing as a clean disconnection); and for some reason, on my test system this happens (and completes) earlier than on sajolida’s.

So my hunch was probably totally wrong when I dropped this part of the process from the list of potential culprits :)

> So my best approximation of a potential solution would be to make spoof-mac wait until the device is done with it’s NM disatcher scripts (e.g. in first NM dispatch script we write a state file, and we remove it on the last NM dispatch script, then spoof-mac will wait until this file does not exist),

(Here I’m in hunch/feelings land, and did not bother checking if there’s any rationality attached to my current thoughts.) I’m really not enthusiastic at the idea of having to carry & maintain state in this subsystem. I can see that it would work in most cases (starting with: on the developer’s machine) but could be subtly racy in ways that are too hard to reproduce for us having much hope of debugging issues and gaining confidence in tentative fixes.

> or we just make spoof-mac re-try macchanger on failure for up to 20 seconds or whatever.

Yeah! While perhaps less “clean” conceptually, to me, retrying a few times like this seems much simpler to implement — and more importantly, to reason about — than the state file idea.
IMO it makes sense to try this cheap workaround and see if it fixes the problem. And if it does not, then I think I’d rather investigate the NM MAC randomization way before putting much more efforts into our current implementation.