Bug #16791

Buster: torstatus@tails.boum.org makes GNOME Shell crashes when unlocking the screen or restoring from suspend

Added by intrigeri 2019-06-08 18:22:21 . Updated 2019-06-18 17:38:01 .

Status:
Resolved
Priority:
Elevated
Assignee:
intrigeri
Category:
Target version:
Start date:
Due date:
% Done:

100%

Feature Branch:
https://salsa.debian.org/tails-team/tails/merge_requests/28
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

Quoting alienpup’s report on tails-testers today:

> I’ve just retested all three systems using this .img file:
> tails-amd64-feature_buster-3.17-20190606T0917Z-835f745c59+devel@2f96ace13e.img

> I found one new regressions relative to 3.14:

> All three systems fail to fully restore from suspend. More specifically, they do, but X graphics appears correctly for only for 1-2 seconds, after which the screen blanks again before a only portion of the screen returns. At this point X no longer responds. On the two Intel based laptops, the mouse is inoperative. On the AMD based desktop, the mouse remains operative, but nothing on-screen responds (to mouse or keyboard). All three systems are still running at this point, and I can CTRL+ALT to a vterm and log in (password set at boot). Running top reveals Xorg is sleeping, consuming “0” cpu and minimal RAM.


Subtasks


Related issues

Blocks Tails - Feature #16209: Core work: Foundations Team Confirmed
Blocks Tails - Bug #16755: Call for testing: feature/buster (June 2019 edition) Resolved 2019-06-18

History

#1 Updated by intrigeri 2019-06-08 18:22:32

#2 Updated by intrigeri 2019-06-11 09:27:56

sajolida says resuming from suspend works fine on a X201.

#3 Updated by intrigeri 2019-06-17 08:27:01

  • Assignee set to intrigeri

intrigeri wrote:
> sajolida says resuming from suspend works fine on a X201.

Not 100% fine. I should look into “Subject: almost black screen when waking up in Tails Buster”.

#4 Updated by intrigeri 2019-06-17 14:10:38

On resume (both in 190608-alienpup-hp-pavilion-dm4-buster-sudo-journalctl.out and 190608-alienpup-lenovo-tp-t410-buster-sudo-journalctl.out), I see that our torstatus GNOME Shell extension makes GNOME Shell segfault:

Jun 09 00:15:26 amnesia systemd[1]: Stopped target Tor has bootstrapped.
Jun 09 00:15:26 amnesia systemd[1]: Stopping Manage the flag file that indicates whether Tor has bootstrapped...
Jun 09 00:15:26 amnesia nm-dispatcher[13798]: req:2 'connectivity-change': start running ordered scripts...
Jun 09 00:15:26 amnesia kernel: iwlwifi 0000:01:00.0: Radio type=0x1-0x2-0x0
Jun 09 00:15:26 amnesia gnome-shell[10955]: Object St.Icon (0x5bf5986404c0), has been already deallocated — impossible to access it. This might be caused by the object having been destroyed from C code using something such as destroy(), dispose(), or remove() vfuncs.
Jun 09 00:15:26 amnesia org.gnome.Shell.desktop[10955]: == Stack trace for context 0x5bf596804240 ==
Jun 09 00:15:26 amnesia org.gnome.Shell.desktop[10955]: #0   5bf5991e30e0 i   /usr/share/gnome-shell/extensions/torstatus@tails.boum.org/extension.js:64 (7d2a40d74160 @ 67)
Jun 09 00:15:26 amnesia org.gnome.Shell.desktop[10955]: #1   7ffdd6d01a80 b   resource:///org/gnome/gjs/modules/_legacy.js:82 (7d2a41eb0b80 @ 71)
Jun 09 00:15:26 amnesia org.gnome.Shell.desktop[10955]: #2   5bf5991e3058 i   /usr/share/gnome-shell/extensions/torstatus@tails.boum.org/extension.js:78 (7d2a40d74280 @ 92)
Jun 09 00:15:26 amnesia org.gnome.Shell.desktop[10955]: #3   7ffdd6d02a60 b   resource:///org/gnome/gjs/modules/_legacy.js:82 (7d2a41eb0b80 @ 71)
Jun 09 00:15:26 amnesia org.gnome.Shell.desktop[10955]: #4   7ffdd6d036d0 b   self-hosted:983 (7d2a41ef01f0 @ 515)
Jun 09 00:15:26 amnesia kernel: gnome-shell[10955]: segfault at 28 ip 00007d2a5ac68bea sp 00007ffdd6d009c0 error 4 in libst-1.0.so[7d2a5ac5b000+2c000]
Jun 09 00:15:26 amnesia kernel: Code: 48 89 d8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 31 db 48 83 c4 08 48 89 d8 5b 5d c3 0f 1f 40 00 41 54 55 48 89 f5 53 48 89 fb <4c> 8b 67 28 e8 7d 3b ff ff 48 8b 13 48 85 d2 74 05 48 3b 02 74 13

Then GNOME Shell is restarted and crashes again, apparently when the flag file torstatus monitors appears and the extension should update the icon:

Jun 09 00:15:32 amnesia systemd[1]: Starting Manage the flag file that indicates whether Tor has bootstrapped...
Jun 09 00:15:32 amnesia gnome-shell[13857]: Object St.Icon (0x63e9b035a750), has been already deallocated — impossible to access it. This might be caused by the object having been destroyed from C code using something such as destroy(), dispose(), or remove() vfuncs.
Jun 09 00:15:32 amnesia kernel: gnome-shell[13857]: segfault at 28 ip 0000778770e1dbea sp 00007ffef01b2ad0 error 4 in libst-1.0.so[778770e10000+2c000]
Jun 09 00:15:32 amnesia kernel: Code: 48 89 d8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 31 db 48 83 c4 08 48 89 d8 5b 5d c3 0f 1f 40 00 41 54 55 48 89 f5 53 48 89 fb <4c> 8b 67 28 e8 7d 3b ff ff 48 8b 13 48 85 d2 74 05 48 3b 02 74 13
Jun 09 00:15:32 amnesia org.gnome.Shell.desktop[13857]: == Stack trace for context 0x63e9ae56e2c0 ==
Jun 09 00:15:32 amnesia org.gnome.Shell.desktop[13857]: #0   63e9ae923e10 i   /usr/share/gnome-shell/extensions/torstatus@tails.boum.org/extension.js:62 (77873dd74160 @ 36)
Jun 09 00:15:32 amnesia org.gnome.Shell.desktop[13857]: #1   7ffef01b3b10 I   resource:///org/gnome/gjs/modules/_legacy.js:82 (77873edb0b80 @ 71)
Jun 09 00:15:32 amnesia org.gnome.Shell.desktop[13857]: #2   63e9ae923d88 i   /usr/share/gnome-shell/extensions/torstatus@tails.boum.org/extension.js:75 (77873dd74280 @ 70)
Jun 09 00:15:32 amnesia org.gnome.Shell.desktop[13857]: #3   7ffef01b4a80 I   resource:///org/gnome/gjs/modules/_legacy.js:82 (77873edb0b80 @ 71)
Jun 09 00:15:32 amnesia org.gnome.Shell.desktop[13857]: #4   7ffef01b56f0 b   self-hosted:983 (77873edf01f0 @ 515)
Jun 09 00:15:32 amnesia systemd[1]: Started Manage the flag file that indicates whether Tor has bootstrapped.

Finally, gnome-session gives up restarting GNOME Shell, which explains what alienpup sees (apart of the focused application, the desktop is unresponsive):

Jun 09 00:15:32 amnesia gnome-session-binary[10811]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 11
Jun 09 00:15:32 amnesia gnome-session-binary[10811]: Unrecoverable failure in required component org.gnome.Shell.desktop
Jun 09 00:15:32 amnesia gnome-session-binary[10811]: WARNING: App 'org.gnome.Shell.desktop' respawning too quickly
Jun 09 00:15:32 amnesia gnome-session[10811]: Unable to init server: Could not connect: Connection refused
Jun 09 00:15:32 amnesia gnome-session-f[14115]: Cannot open display: 

I’ve asked alienpup and sajolida to test again:

  • with xorg-driver=intel (just in case the icon deallocation is hardware specific, which I strongly doubt)
  • after disabling our torstatus GNOME Shell extension

#5 Updated by intrigeri 2019-06-17 14:18:23

  • Subject changed from Buster fails to restore from suspend to Buster: torstatus@tails.boum.org makes GNOME Shell crashes when restoring from suspend

sajolida’s video looks very much like GNOME Shell crashed, too.

#6 Updated by intrigeri 2019-06-17 14:25:12

  • Assignee deleted (intrigeri)
  • Priority changed from Normal to Elevated
  • Type of work changed from Research to Code

I think that the _updateIcon method in torstatus should not assume that the this._icon object still exists. Instead, it should check if it does, and create it if needed. So most of the _init method needs to be moved to another method, so both _init and _updateIcon can use it.

I’ll handle the follow-ups with alienpup and sajolida but segfault, lamby or alant would surely be more efficient than me to fix this JavaScript => deassigning myself for now. The relevant code lives in "$":https://salsa.debian.org/tails-team/tails/blob/master/config/chroot_local-includes/usr/share/gnome-shell/extensions/torstatustails.boum.org/extension.js.

Raising priority as IMO this blocks putting out a first official beta that we’ll ask Tails contributors and enthusiasts to use in production.

#7 Updated by lamby 2019-06-17 20:29:02

@intrigeri

> I think that the _updateIcon method in torstatus should not assume that the this._icon object still exists

I worry this is papering over the problem. What we in-effect have is a double-free here, no? :) Also note that we are in a strange non-typical Javascript land here - usually if this was the web we would just do if (typeof this._icon === 'undefined') { return; } but I think what is happening is that this._icon does indeed /exist/ in terms of being a Javascript object at the time but the underling C/GNOME/blah binding it refers to does not, so doing anything with it breaks as above. I do not know how to test whether the ref exists from GNOME (or Gio?) scripts and my search fu is failing me right now…

#8 Updated by intrigeri 2019-06-18 08:08:01

intrigeri wrote:
> I’ve asked alienpup and sajolida to test again:
>
> * with xorg-driver=intel (just in case the icon deallocation is hardware specific, which I strongly doubt)
> * after disabling our torstatus GNOME Shell extension

sajolida did 3-5 suspend+resume cycles, with all for combinations of torstatus enabled/disabled and xorg-driver=intel or not, and could not reproduce the GNOME Shell crash anymore. So no new useful data point here except sajolida says GNOME Shell sometimes crashes when waking up the screen (that the screen locker puts to sleep), while the system has not been fully suspended.

#9 Updated by intrigeri 2019-06-18 09:09:58

Dear @lamby,

>> I think that the _updateIcon method in torstatus should not assume that the this._icon object still exists

> I worry this is papering over the problem.

This is fair.

> What we in-effect have is a double-free here, no? :)

More like a use-after-free, no?

> I think what is happening is that this._icon does indeed /exist/ in terms of being a Javascript object at the time but the underling C/GNOME/blah binding it refers to does not

Full ACK. There are many similar bug reports about plenty of other GNOME bits and extensions on the web. One thing I’ve noticed while browsing them is that apparently, an extension should basically assume it’s disabled/enabled when locking/unlocking the screen (and thus, when suspending/resuming); at least, that’s how other projects have fixed extremely similar issues.

To confirm this assumption, I guess we could log something when disable() and _destroy() are called.

Random guess: this._monitor_changed_signal_id is triggered on a partially destroyed instance of TorStatusIndicator. I’m wondering I see that we call a destroy() method:

function disable() {
tor_status_indicator.destroy();
}

… while in the TorStatusIndicator class there’s a _destroy() method?

In other words, I wonder if our _destroy() method, that disconnects the file monitor, is ever called.

#10 Updated by lamby 2019-06-18 09:12:48

> More like a use-after-free, no?

Ye, you got me. :)

> In other words, I wonder if our _destroy() method, that disconnects the file monitor, is ever called.

Indeed, that was my roughly brief reading of it too but (further) was unlikely to be solved abstractly by “just” thinking or analysing the code itself - I bet there are a myriad of timing issues when unsuspending as well as whether it has been called at all… :/

#11 Updated by intrigeri 2019-06-18 09:47:52

  • Status changed from Confirmed to In Progress
  • Assignee set to intrigeri

I can reproduce part of the bug (GNOME Shell crashes and restarts) in a VM like this:

nmcli connection down 'Wired connection' ; gnome-shell-extension-tool --disable-extension=torstatus@tails.boum.org

I’ve done some global.log() debugging and I think I have a fix.

#12 Updated by intrigeri 2019-06-18 10:08:14

  • Feature Branch set to bugfix/16791-torstatus-crashes-gnome-shell-on-unlock-and-resume

#13 Updated by intrigeri 2019-06-18 11:00:51

  • Status changed from In Progress to Needs Validation
  • Assignee deleted (intrigeri)

I can’t reproduce the bug anymore on this branch when locking/unlocking, nor with the simple reproducer I’ve documented above, so I think my fix is OK. I hope this was the only bug that triggered the behavior experienced by alienpup and sajolida. I’ll ask them to test a new feature/buster once this is reviewed & merged.

#14 Updated by segfault 2019-06-18 13:37:47

  • Assignee set to segfault

#15 Updated by segfault 2019-06-18 14:16:23

  • Status changed from Needs Validation to In Progress
  • Assignee changed from segfault to intrigeri
  • Feature Branch changed from bugfix/16791-torstatus-crashes-gnome-shell-on-unlock-and-resume to bugfix/16791-torstatus-crashes-gnome-shell-on-unlock-and-resume,https://salsa.debian.org/tails-team/tails/merge_requests/28

I created a MR on GitLab and did the review there: https://salsa.debian.org/tails-team/tails/merge_requests/28

#16 Updated by segfault 2019-06-18 15:36:35

  • Status changed from In Progress to Needs Validation

I pushed a commit. I tested that it works by changing the extension.js in a running feature/buster VM and reloading gnome-shell.

#17 Updated by segfault 2019-06-18 15:37:02

  • Assignee deleted (intrigeri)

#18 Updated by intrigeri 2019-06-18 16:50:37

  • Feature Branch changed from bugfix/16791-torstatus-crashes-gnome-shell-on-unlock-and-resume,https://salsa.debian.org/tails-team/tails/merge_requests/28 to https://salsa.debian.org/tails-team/tails/merge_requests/28

#19 Updated by intrigeri 2019-06-18 16:51:10

  • Subject changed from Buster: torstatus@tails.boum.org makes GNOME Shell crashes when restoring from suspend to Buster: torstatus@tails.boum.org makes GNOME Shell crashes when unlocking the screen or restoring from suspend
  • Assignee set to intrigeri

#20 Updated by intrigeri 2019-06-18 17:11:40

  • blocks Bug #16755: Call for testing: feature/buster (June 2019 edition) added

#21 Updated by intrigeri 2019-06-18 17:38:01

  • Status changed from Needs Validation to Resolved
  • % Done changed from 0 to 100

Applied in changeset commit:tails|6913b65085c2603726421009170e0c757ce5b709.