Bug #7323

Wheezy's GNOME crashes randomly after Greeter login

Added by anonym 2014-05-28 13:28:59 . Updated 2017-06-29 10:06:25 .

Status:
Rejected
Priority:
Normal
Assignee:
Category:
Target version:
Start date:
2014-06-12
Due date:
% Done:

100%

Feature Branch:
Type of work:
Research
Blueprint:

Starter:
0
Affected tool:
Deliverable for:

Description

That is, before the desktop is shown, a picture with a sad computer screen appears with the message: “Oh no! Something has gone wrong. A problem has occurred and the system can’t recover.” and there’s a “Logout” button. Sometimes you can still see the usual start-up notifications (e.g. for time syncing, available wireless network, etc.).

anonym has seen it in the following occasions:

  • once in a VM that I had artificially starved the CPU on. Persistence was not enabled (running from DVD actually). I couldn’t reproduce it in 4-5 reboots afterwards, though.
  • twice when having the experimental Windows 8 camouflage (Feature #6342) activated (in a VM)
  • another case I’ll discuss below since it’s more interesting.

In all of the above occasions (including the “interesting” case I omitted to describe here), clicking “Logout” and redoing the Greeter login results in a functional Tails desktop.

intrigeri:

  • once when running the test suite, in scenario “Booting Tails from a USB drive upgraded from DVD with persistence enabled”.

Now, the “interesting” case mentioned above: it’s on a quite powerful (4-core 4th gen intel i7) bare metal system, when running Tails from a USB drive, with a 60 GB persistent partition.

  • Persistence enabled: 7/7 trial boots results in the GNOME crash.
  • Persistence disabled: 5/5 trial boots do not result in the GNOME crash.

There’s a noticeable extra delay after the Greeter in the crash case compared to when not crashing, and possibly something in GNOME times out. Indeed, in .xsession-errors I see this only when GNOME crashes:

gnome-session[7848]: WARNING: Application 'gnome-settings-daemon.desktop' failed to register before timeout

Searching for this string will probably help us solve it:


Subtasks

Bug #7408: Try to reproduce GNOME crashing when persistence is enabled Resolved intrigeri

0


Related issues

Related to Tails - Bug #8778: "Oh no!" / Xorg crash after logging in at the Greeter on Jessie Resolved 2016-02-09
Related to Tails - Bug #11392: Tails does not start: document workarounds to reach debug information Rejected 2016-04-29
Has duplicate Tails - Bug #7748: System cannot be recovered. Resolved 2014-08-05

History

#2 Updated by intrigeri 2014-05-29 09:11:44

Just random ideas — I would try:

  • dropping the gconf-related tweaks in /usr/local/sbin/live-persist
  • bisecting the enabled persistence settings

#3 Updated by intrigeri 2014-06-10 13:13:18

See also bug report 9ca4699b0b48d7b0321243747dcb8f90, which seems very similar, but… in 1.0.1. Ooops, could it be Linux 3.14? Perhaps our automated test suite should test things a bit more when persistence is enabled.

#4 Updated by anonym 2014-06-11 05:43:40

intrigeri wrote:
> See also bug report 9ca4699b0b48d7b0321243747dcb8f90, which seems very similar, but… in 1.0.1.

Perhaps, but the bug description doesn’t make it clear when the infinite loading occurs, if it’s when the persistent volume is being opened or after it, e.g. when GNOME is starting, which I’ve confirmed is the case in for this bug. To know which we’d need the bug reporter to enable “More option” (if “More options” screen is not shown, then it’s not this bug) but sadly there’s no email address given.

> Ooops, could it be Linux 3.14? Perhaps our automated test suite should test things a bit more when persistence is enabled.

Sure, but what exactly are you proposing? Something like a scenario where we boot + enable persistence 10 or so times? Perhaps with an increasingly severe execution cap on the virtual CPU in each iteration? :)

#5 Updated by intrigeri 2014-06-11 06:06:20

> To know which we’d need the bug reporter to enable “More option” (if
> “More options” screen is not shown, then it’s not this bug) […].

Do you mean that going the Greeter’s “More options” screen is needed to reproduce this bug?

> Sure, but what exactly are you proposing?

I’m not sure. I’ve just looked, and the scenarios with persistence enabled do more than what I thought, e.g. they run the “GNOME has started” step, which should be enough to catch this bug, so my proposal is actually invalid.

#6 Updated by anonym 2014-06-12 06:12:13

intrigeri wrote:
> > To know which we’d need the bug reporter to enable “More option” (if
> > “More options” screen is not shown, then it’s not this bug) […].
>
> Do you mean that going the Greeter’s “More options” screen is needed to reproduce this bug?

No, it just makes it easier to determine when the error occurs. When enabling both persistence and “More options”, the opening of the persistent volume + mount happens before the “More option” screen is opened, and hence before Tails Greeter logs in and starts the GNOME session. Therefore issues with opening + mounting the persistent volume can be distinguished from issues after Tails Greeter logs in, like stuff related to GNOME components, which seem to be the case in this bug.

#7 Updated by anonym 2014-06-12 06:29:27

anonym wrote:
> Now, the “interesting” case mentioned above: it’s on a quite powerful (4-core 4th gen intel i7) bare metal system, when running Tails from a USB drive, with a 60 GB persistent partition.
>
> * Persistence enabled: 7/7 trial boots results in the GNOME crash.
>
> * Persistence disabled: 5/5 trial boots do not result in the GNOME crash.
>
> There’s a noticeable extra delay after the Greeter in the crash case compared to when not crashing, and possibly something in GNOME times out. Indeed, in .xsession-errors I see this only when GNOME crashes:

I cannot reproduce this any more :/. I upgraded to a more recent build from testing, and then GNOME started just fine. I then downgraded to the Tails version from the old know bad image (that I still have saved; it’s built from commit 5a7404406b1a95570d03be1dbfc0e0cda2f4aa6f, with whatever was in the APT suite on 2014-05-19) and I still cannot reproduce it. I also tried installing the bad image on another USB drive, and it’s not reproducible like that either.

Without a way to reproduce this bug, we’re essentially back at square one.

#8 Updated by intrigeri 2014-06-12 11:09:00

I’ll try to reproduce this (Bug #7408).

Since 1.1-beta1 was built, Linux
was updated to 3.14.5 in sid, so it might be that we were hit by
a kernel regression, that was fixed since then.

#9 Updated by anonym 2014-06-12 19:18:54

intrigeri wrote:
> I’ll try to reproduce this (Bug #7408).

Excellent!

> Since 1.1-beta1 was built, Linux
> was updated to 3.14.5 in sid, so it might be that we were hit by
> a kernel regression, that was fixed since then.

If so, I should still be able to reproduce it using my old, bad image, right?

I also have some interesting preliminary test results on the same hardware: when using a recent devel/testing build, and persistent is present, then there’s an extra ~1 minute delay before Xorg starts. During the wait you only have to good ol’ console login, and everything else seems fine. Nuking the persistence partition solves it. Urgh.

It could be that my hardware has some serious issues whit something related to persistence that is more or less unrelated to this bug. Since this crash was observed by both you an me, on different “hardware” (well, it was in VMs), I’m assuming there’s a more general, hardware-agnostic (except w.r.t. races) problem. An alternative assumption is that GNOME Flashback simply is very unstable, and there are multiple ways in Tails that can lead to a crash during the initialization of the GNOME session.

#10 Updated by intrigeri 2014-06-15 05:31:34

anonym wrote:
> If so, I should still be able to reproduce it using my old, bad image, right?

Right, my lead was a wrong one.

> I also have some interesting preliminary test results on the same hardware: when using a recent devel/testing build, and persistent is present, then there’s an extra ~1 minute delay before Xorg starts. During the wait you only have to good ol’ console login, and everything else seems fine. Nuking the persistence partition solves it. Urgh.

I can’t reproduce that with a build from yesterday’s testing branch. More testing results will follow.

> I’m assuming there’s a more general, hardware-agnostic (except w.r.t. races) problem. An alternative assumption is that GNOME Flashback simply is very unstable, and there are multiple ways in Tails that can lead to a crash during the initialization of the GNOME session.

I’ve never seen GNOME Flashback crash in such a way during session startup in a non-Tails environment. We’re doing tons of crazy stuff during session initialization, e.g. some of it touching dconf/gsettings, some of it touching gconf, some of it indirectly playing with dbus, and a lot of these things may very well race with other (standard) GNOME components’ initialization.

Some of the links you’ve found (and in particular https://projects.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/gnome-settings-daemon&id=352f2cfa1f3de3b874aed0e67c7f23a023385ed2) seem related to region/locales initialization, which is an area where we’re doing things in tails-configure-keyboard that may be particularly racy.

Hopefully systemd-based user session activation (likely will be available, on an opt-in basis, in Jessie) will help us clean up this mess.

#11 Updated by intrigeri 2014-06-15 05:36:49

I’ve failed to reproduce this with a build from yesterday’s testing branch, on a 4GB USB stick passed-through to a QEMU/libvirt VM:

  • no persistence partition
  • persistent volume present but not enabled
  • persistent volume enabled
  • persistent volume enabled, non-US keyboard and language
  • persistent volume enabled, non-US keyboard and language, admin password set

In the cases with a persistent volume, I had enabled all presets found in t-p-s.

If someone tries harder to reproduce this bug, I guess it might be key to select a non-US keyboard, just in case the problem is related to us setting /org/gnome/libgnomekbd/keyboard/layouts twice in a row in tails-configure-keyboard.

#12 Updated by intrigeri 2014-06-15 05:59:06

anonym, did you have a live-additional-software.conf file, and/or APT lists/packages presets enabled, when you reproduced that initially? (I’m suspecting a timeout due to the long time it may take to fully download e.g. the Wheezy package lists, starting from persistent APT stuff originating from Squeeze; or, a long time needed to download persistent additional packages and their dependencies).

#13 Updated by anonym 2014-06-16 10:06:27

intrigeri wrote:
> anonym, did you have a live-additional-software.conf file,

Currently it’s empty, but I’m pretty sure I actually had some simple package (cowsay I think :)) in it before, when I could reproduce this bug. I had to clear it and peristence.conf before I did the test for comment #7. This may be important, cause if I now (re-)add cowsay into it, it takes much longer time (almost two minutes) to get from the Greeter (“more options” enabled) to when GNOME is fully started.

Any ideas?

> and/or APT lists/packages presets enabled, when you reproduced that initially?

Yes, and an apt-get update had been run during an earlier session.

> (I’m suspecting a timeout due to the long time it may take to fully download e.g. the Wheezy package lists, starting from persistent APT stuff originating from Squeeze; or, a long time needed to download persistent additional packages and their dependencies).

I thought it was only the installation of these things that happened at login time, and that all network related stuff happens via a NM hook, i.e. when the GNOME session is fully up.

#14 Updated by anonym 2014-06-19 15:19:02

intrigeri wrote:
> I’ve failed to reproduce this with a build from yesterday’s testing branch, on a 4GB USB stick passed-through to a QEMU/libvirt VM:
>
> * no persistence partition
> * persistent volume present but not enabled
> * persistent volume enabled
> * persistent volume enabled, non-US keyboard and language
> * persistent volume enabled, non-US keyboard and language, admin password set
>
> In the cases with a persistent volume, I had enabled all presets found in t-p-s.
>
> If someone tries harder to reproduce this bug, I guess it might be key to select a non-US keyboard, just in case the problem is related to us setting /org/gnome/libgnomekbd/keyboard/layouts twice in a row in tails-configure-keyboard.

In total I’ve spent more than 6 hours trying to reproduce this bug this week (on the several different machines (including the one that always could reproduce it) and VMs), playing with similar options, and the additional software feature, and except the (different) issue reported in Bug #7323#note-13 I haven’t seen anything, in particular no crashes.

#15 Updated by intrigeri 2014-06-19 15:46:43

> In total I’ve spent more than 6 hours trying to reproduce this bug this week (on the
> several different machines (including the one that always could reproduce it) and
> VMs), playing with similar options, and the additional software feature, and except
> the (different) issue reported in Bug #7323#note-13 I haven’t seen anything, in
> particular no crashes.

Shall we just close this ticket as unreproducible, then?

#16 Updated by intrigeri 2014-06-20 09:00:32

  • Assignee set to anonym
  • QA Check set to Info Needed

Reassigning to anonym who should make the final decision.

#17 Updated by anonym 2014-06-22 10:31:44

  • Status changed from Confirmed to Rejected
  • Assignee deleted (anonym)
  • QA Check deleted (Info Needed)

Closing since we cannot reproduce this any more.

#18 Updated by sajolida 2014-08-06 14:10:14

  • Status changed from Rejected to Confirmed
  • Target version deleted (Tails_1.1)

This has been reported twice recently:

- <W8eEWMsAnIMHTB9F5x7WEaaOK7h2gNOO@tails-bugs.boum.org-schleuder>
- WhisperBack f415c46147f16b418b7e22796a4b27de

We’ll ask for more info from the reporters.

#19 Updated by sajolida 2014-08-06 14:49:10

  • related to Bug #7748: System cannot be recovered. added

#20 Updated by emmapeel 2014-08-06 15:24:24

Also reported just now in tails-bugs
Subject: Bug report: 8cf334ee3874249ee7e0ea4dd21eda93
Date: Wed, 6 Aug 2014 17:13:11 +0200

Message-Id: rcCtkTJn9yZx5WRq434O6dxa14FRDZ4m@tails-bugs.boum.org-schleuder

#21 Updated by emmapeel 2014-08-06 15:25:53

  • related to deleted (Bug #7748: System cannot be recovered.)

#22 Updated by emmapeel 2014-08-06 15:26:09

  • has duplicate Bug #7748: System cannot be recovered. added

#23 Updated by emmapeel 2014-08-08 08:45:42

User reported previously this crash happenning only in a install done in a MicroSD USB adapter, that used to work in 1.0.
Other installs, DVD and USB stick, work fine for him. Here the info:

Tails-Version: 1.1 - 20140722
Bug report: 5a3c0366e5d3ca35fc45cab1d34881d98e4dfb11
live-build: 2.0.12-2
live-boot: 3.0.1-1
live-config: 3.0.23-1
Dell Inspiron 6400 computer
I boot Tails from a micro SD attached to the computer via USB adpter. It works fine with earlier versions of Tails. However with version 1.1 i get the error message below.

#24 Updated by emmapeel 2014-08-11 10:08:43

And more apparitions of the bug!

This one sends the Oh no! Something has happenned error when clicking for more options on the Tails Greeter. Here more info from the user:

Inserting the usb stick ,working well in a notebook Acer Aspire,with Tails 1.1,the PC, a Packard Bell iMedia S2110,reboots with Win8 ignoring at all the usb stick.

Inserting the DVD with Tails 1.1, it starts with the Tails BOOT MENU appearing on the display.Then, I do not choose any boot option.

After a while appears the black screen with the image of a sad computer saying Oh!Something has gone wrong.A problem has occured and the system can-t recover.Please contact a system administrator.

and,after a little time, still appears TAILS GREETER.At this point if I click the login button appears GNOME DESKTOP,but I can do just a little without a pasword for root terminal.

Id,instead,in TAILS GREETER I click YES to have more options,the same TAILS GREETER disappears and I see,then, the image of a sad computer in the middle of display saying what written above.

= output of command /usr/sbin/dmidecode -s system-manufacturer =
Packard Bell

= output of command /usr/sbin/dmidecode -s system-product-name =
imedia S2110

Message-Id: jQAgaA8fjStvRNapDuIyrdh2eETLjNdA@tails-bugs.boum.org-schleuder

#25 Updated by intrigeri 2014-08-12 15:48:04

Possibly related: I’m seeing this kind of crashes today in the Greeter itself. The “Oh no” message is displayed before the Greeter’s controls are. Still, the test suite robot manages to log in.

#26 Updated by sajolida 2014-09-13 03:37:57

  • related to Bug #7890: Tails Jessie displays "Oh no! Something had gone wrong" instead of the Greeter added

#27 Updated by BitingBird 2014-09-13 04:36:49

  • related to deleted (Bug #7890: Tails Jessie displays "Oh no! Something had gone wrong" instead of the Greeter)

#28 Updated by BitingBird 2014-09-13 04:36:54

  • has duplicate Bug #7890: Tails Jessie displays "Oh no! Something had gone wrong" instead of the Greeter added

#29 Updated by intrigeri 2014-09-13 08:18:03

  • is duplicate of deleted (Bug #7890: Tails Jessie displays "Oh no! Something had gone wrong" instead of the Greeter)

#30 Updated by intrigeri 2015-01-22 20:06:32

  • related to Bug #8778: "Oh no!" / Xorg crash after logging in at the Greeter on Jessie added

#31 Updated by intrigeri 2015-01-23 14:28:26

Anyone who can reproduces that bug, please try the following:

  • set a root password with rootpw=XXX on the kernel command-line
  • boot Tails
  • once Tails Greeter appears, switch to a text console and login as root
  • append the --debug option to the Exec and TryExec lines in /usr/share/xsessions/gnome-fallback.desktop
  • switch back to the graphical TTY
  • log in
  • once the “oh no!” screen appears, go back to the text console and report back the content of /home/amnesia/.xsession-errors

#32 Updated by intrigeri 2015-02-12 17:38:39

Just seen it in the automated test suite (And I start Tails from USB drive “old” with network unplugged and I login with persistence password “asdf”).

#33 Updated by intrigeri 2015-02-12 19:01:29

Seen one more time (in Scenario: The persistent Tor Browser directory is usable — And I start Tails from USB drive “current” and I login with persistence password “asdf”).

#34 Updated by kytv 2015-02-20 21:34:01

I just hit it during Persistent browser bookmarks. In my case the “Oh No” screen was displayed at the greeter, and once more after login. I was not using --capture, however.

/var/log/gdm/:0.log:

Errors from xkbcomp are not fatal to the X server
The XKEYBOARD keymap compiler (xkbcomp) reports:
> Warning:          Compat map for group 2 redefined
>                   Using new definition
> Warning:          Compat map for group 3 redefined
>                   Using new definition
> Warning:          Compat map for group 4 redefined
>                   Using new definition
Errors from xkbcomp are not fatal to the X server
   Zero width or height
- 0th attempt
- OOM at 1024 768 24
Cache contents:  null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null 1019 1021     total: 2
Out of video memory: Could not allocate 3149824 bytes
   Bad bpp: 1 (1)
   Bad bpp: 1 (1)
- 0th attempt
- OOM at 1024 768 24
Cache contents:  null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null 1019 1021 null     total: 2
Out of video memory: Could not allocate 3149824 bytes
- 0th attempt
- OOM at 1024 768 24

etc.

/var/log/Xorg.0.log

[   210.043] (II) evdev: QEMU QEMU USB Tablet: Configuring as touchscreen
[   210.043] (II) evdev: QEMU QEMU USB Tablet: Adding scrollwheel support
[   210.043] (**) evdev: QEMU QEMU USB Tablet: YAxisMapping: buttons 4 and 5
[   210.043] (**) evdev: QEMU QEMU USB Tablet: EmulateWheelButton: 4, EmulateWheelInertia: 10, EmulateWheelTimeout: 200
[   210.043] (**) Option "config_info" "udev:/sys/devices/pci0000:00/0000:00:05.0/usb1/1-2/1-2:1.0/0003:0627:0001.0001/input/input2/event1"
[   210.043] (II) XINPUT: Adding extended input device "QEMU QEMU USB Tablet" (type: TOUCHSCREEN, id 7)
[   210.043] (WW) evdev: QEMU QEMU USB Tablet: touchpads, tablets and touchscreens ignore relative axes.
[   210.043] (II) evdev: QEMU QEMU USB Tablet: initialized for absolute axes.
[   210.043] (**) QEMU QEMU USB Tablet: (accel) keeping acceleration scheme 1
[   210.043] (**) QEMU QEMU USB Tablet: (accel) acceleration profile 0
[   210.043] (**) QEMU QEMU USB Tablet: (accel) acceleration factor: 2.000
[   210.043] (**) QEMU QEMU USB Tablet: (accel) acceleration threshold: 4
[   251.041]    Bad bpp: 1 (1)
[   251.041]    Bad bpp: 1 (1)
[   254.818] - 0th attempt
[   254.818] - OOM at 1026 804 24
[   254.818] Cache contents:  null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null 1019 1021 null     total: 2
[   254.818] Out of video memory: Could not allocate 3303720 bytes
[   255.176] - 0th attempt
[   255.176] - OOM at 1024 768 24
[   255.176] Cache contents:  null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null 1019 1021 null     total: 2
[   255.176] Out of video memory: Could not allocate 3149824 bytes
[   255.473] - 0th attempt
[   255.473] - OOM at 1024 768 24
[   255.473] Cache contents:  null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null 1019 1021     total: 2
[   255.473] Out of video memory: Could not allocate 3149824 bytes
[   256.260]    Bad bpp: 1 (1)
[   256.260]    Bad bpp: 1 (1)
[   256.415]    Bad bpp: 1 (1)
[   260.397]    Zero width or height
[   260.630] - 0th attempt

etc.

#35 Updated by intrigeri 2015-02-27 19:04:39

I see lots of OOM errors in these X.Org logs. The test suite uses <model type='qxl' vram='9216' heads='1'/> to configure the virtual video adapter provided to the system under testing. Shall we try to raise the vram value?

#36 Updated by kytv 2015-08-02 03:37:41

I just saw this failure again during the test suite.

intrigeri wrote:
> I see lots of OOM errors in these X.Org logs. The test suite uses <model type='qxl' vram='9216' heads='1'/> to configure the virtual video adapter provided to the system under testing. Shall we try to raise the vram value?

I’m bumping this value locally to give the video device 64MB of RAM. I’ll run with this for a while to see if it makes anything worse (or, optimistically, better).

#37 Updated by BitingBird 2016-06-26 10:57:56

  • Status changed from Confirmed to In Progress

#38 Updated by BitingBird 2016-06-29 06:47:17

  • related to Bug #11392: Tails does not start: document workarounds to reach debug information added

#39 Updated by Anonymous 2017-06-29 10:06:25

  • Status changed from In Progress to Rejected

Now, we’re already on Stretch.. And I think that we’ve not seen this happen since a long time. Tickets related to the new Greeter are tracked elsewhere. Closing.