Bug #8778

"Oh no!" / Xorg crash after logging in at the Greeter on Jessie

Added by kytv 2015-01-22 19:05:23 . Updated 2016-02-09 14:49:27 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Hardware support
Target version:
Start date:
2016-02-09
Due date:
% Done:

0%

Feature Branch:
Type of work:
Code
Blueprint:

Starter:
0
Affected tool:
Deliverable for:

Description

This is the problem referred to in Bug #8710.

So far, I have been able to reproduce this with every Jessie Tails ISO I’ve tried. This time I used tails-i386-feature_jessie-1.3-20150122T0511Z-0e61703.iso.

The steps to reproduce this are few:
1. Get a Tails Jessie ISO.
2. Boot it with kvm -m 4096 -cdrom tails-i386-feature_jessie-1.3-20150122T0511Z-0e61703.iso
3. Log in at the greeter.

The end result is the dreaded “Oh no!” GNOME 3 screen. If you switch to another VT and then go back to X you’ll be back at the Tails greeter.

Attached to this ticket is the result of journalctl -a > journal.txt.  @var/log/gdm@ only contained the file tails-greeter.errors. Its content:

 
day 022 of 2015 [18:45:28] Password variable not found.

Files


Subtasks


Related issues

Related to Tails - Bug #7323: Wheezy's GNOME crashes randomly after Greeter login Rejected 2014-06-12

History

#1 Updated by kytv 2015-01-22 19:13:50

kytv wrote:
> This is the problem referred to in Bug #8710.

I meant “that I referred to at Bug #8710#note-1”.

#2 Updated by intrigeri 2015-01-22 19:20:20

  • Subject changed from [feature/jessie] "Oh no!" / Xorg crash after logging in at the greeter to "Oh no!" / Xorg crash after logging in at the Greeter on Jessie
  • Assignee set to kytv
  • Target version set to Tails_2.0
  • QA Check set to Info Needed

I’ll need the output of journalctl -ax -o verbose to better differentiate between the various GDM, gnome-session and X.Org instances.

#3 Updated by kytv 2015-01-22 19:24:28

  • File deleted (journal.txt)

#4 Updated by kytv 2015-01-22 19:25:44

  • File <del>missing: journal.txt</del> added

Attachment updated.

#5 Updated by kytv 2015-01-22 19:32:03

I’m re-adding the file in case it was added too soon after the failures and it was missing info that may be of use in diagnosing this problem.

#6 Updated by kytv 2015-01-22 19:32:16

  • File deleted (journal.txt)

#7 Updated by kytv 2015-01-22 19:32:41

  • Assignee changed from kytv to intrigeri
  • QA Check deleted (Info Needed)

#8 Updated by intrigeri 2015-01-22 20:02:21

Here are bits of the log I find useful to understand the timing of events:

Thu 2015-01-22 19:22:30.832007 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=31b;b=a38adcd9c0164ccfaa853ad70505b6e6;m=3193e97;t=50d4297e39cc0;x=83e8427bee77a3c6]
    MESSAGE=pam_unix(gdm-launch-environment:session): session opened for user Debian-gdm by (uid=0)

Thu 2015-01-22 19:23:03.506610 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=59a;b=a38adcd9c0164ccfaa853ad70505b6e6;m=50bd314;t=50d4299d6313c;x=5e7e8b9172b96802]
    SYSLOG_IDENTIFIER=nm-dispatcher
    MESSAGE=Dispatching action 'up' for eth0
    _PID=2546

Thu 2015-01-22 19:23:07.972295 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=5a4;b=a38adcd9c0164ccfaa853ad70505b6e6;m=54ff49f;t=50d429a1a52c7;x=3cfa3c931d8799be]
    MESSAGE=/etc/gdm3/PostLogin/Default: line 162: /var/lib/gdm3/tails.password: No such file or directory

Thu 2015-01-22 19:23:08.317312 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=5a8;b=a38adcd9c0164ccfaa853ad70505b6e6;m=5553a3f;t=50d429a1f9867;x=fc333b5726d95dfe]
    PRIORITY=6
    _UID=0
    _SYSTEMD_SLICE=system.slice
    _BOOT_ID=a38adcd9c0164ccfaa853ad70505b6e6
    _MACHINE_ID=fe471aac2ec3c2730d0d14a2069ad59c
    _CAP_EFFECTIVE=3fffffffff
    _TRANSPORT=syslog
    SYSLOG_FACILITY=10
    _HOSTNAME=amnesia
    _SYSTEMD_CGROUP=/system.slice/gdm.service
    _SYSTEMD_UNIT=gdm.service
    _COMM=gdm-session-wor
    _EXE=/usr/lib/gdm3/gdm-session-worker
    _GID=1000
    _PID=1956
    _CMDLINE=gdm-session-worker [pam/gdm-autologin]
    SYSLOG_IDENTIFIER=gdm-autologin]
    MESSAGE=pam_unix(gdm-autologin:session): session opened for user amnesia by (unknown)(uid=0)

Thu 2015-01-22 19:23:08.386992 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=5b6;b=a38adcd9c0164ccfaa853ad70505b6e6;m=556896c;t=50d429a20e794;x=3bf5168f2ea1d0e2]
    MESSAGE=pam_unix(gdm-launch-environment:session): session closed for user Debian-gdm

Thu 2015-01-22 19:23:08.395664 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=5b7;b=a38adcd9c0164ccfaa853ad70505b6e6;m=5568eea;t=50d429a20ed12;x=2c04f18f671beefa]
    MESSAGE=pam_systemd(gdm-launch-environment:session): Failed to release session: Interrupted system call

Thu 2015-01-22 19:23:08.349504 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=5aa;b=a38adcd9c0164ccfaa853ad70505b6e6;m=555b719;t=50d429a201541;x=7c87077f925fd853]
    MESSAGE=pam_unix(systemd-user:session): session opened for user amnesia by (uid=0)

Thu 2015-01-22 19:23:08.508258 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=5c3;b=a38adcd9c0164ccfaa853ad70505b6e6;m=558223a;t=50d429a228062;x=bfe1ecbf1b0d6e56]
    MESSAGE=/etc/gdm3/Xsession: Beginning session setup...

Thu 2015-01-22 19:23:08.896561 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=5c4;b=a38adcd9c0164ccfaa853ad70505b6e6;m=55e13a5;t=50d429a2871cd;x=c1c859bd7bfb89d8]

Thu 2015-01-22 19:23:12.916602 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=5d0;b=a38adcd9c0164ccfaa853ad70505b6e6;m=59b6652;t=50d429a65c47a;x=43dc49f7ffaec0dd]
    _CMDLINE=/usr/bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
    MESSAGE=Successfully activated service 'org.a11y.atspi.Registry'

19:23:13: starting pulseaudio and gnome-keyring-daemon

Thu 2015-01-22 19:23:14.164419 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=5e3;b=a38adcd9c0164ccfaa853ad70505b6e6;m=5ae719f;t=50d429a78cfc7;x=40be85e06733cf1b]
    SYSLOG_IDENTIFIER=x-session-manager
    MESSAGE=WARNING: App 'pulseaudio.desktop' exited with code 1

    MESSAGE=Failure: Module initialization failed

Thu 2015-01-22 19:23:16.302979 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=5e6;b=a38adcd9c0164ccfaa853ad70505b6e6;m=5cf13ac;t=50d429a9971d4;x=d481e1d3487ae1b6]
    MESSAGE=[system] Activating via systemd: service name='org.freedesktop.UDisks2' unit='udisks2.service'

Thu 2015-01-22 19:23:16.887663 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=5ec;b=a38adcd9c0164ccfaa853ad70505b6e6;m=5d7fe47;t=50d429aa25c6f;x=98978d11be15321e]
    SYSLOG_IDENTIFIER=org.gtk.Private.AfcVolumeMonitor
    MESSAGE=Volume monitor alive

Thu 2015-01-22 19:23:16.910460 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=5ed;b=a38adcd9c0164ccfaa853ad70505b6e6;m=5d858f3;t=50d429aa2b71b;x=65294fbf64c52f8d]
    MESSAGE=[system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service'

Thu 2015-01-22 19:23:17.740102 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=5ef;b=a38adcd9c0164ccfaa853ad70505b6e6;m=5e50440;t=50d429aaf6268;x=4a5c261a35efbd6d]
    MESSAGE=[system] Successfully activated service 'org.freedesktop.hostname1'

Thu 2015-01-22 19:23:17.870697 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=5f0;b=a38adcd9c0164ccfaa853ad70505b6e6;m=5e6fe41;t=50d429ab15c69;x=b17ae10265a9cae4]
starting to kill GDM's xorg

Thu 2015-01-22 19:23:18.034150 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=5f1;b=a38adcd9c0164ccfaa853ad70505b6e6;m=5ea3050;t=50d429ab48e78;x=2190e574b6bd985a]
    SYSLOG_IDENTIFIER=cupsd
    MESSAGE=Unable to change ownership of "/var/log/cups" - Permission denied

... and then more.

Thu 2015-01-22 19:23:18.996487 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=613;b=a38adcd9c0164ccfaa853ad70505b6e6;m=5f82d51;t=50d429ac28b79;x=d19faeab5aaa01c6]
    MESSAGE=[system] Activating via systemd: service name='org.freedesktop.locale1' unit='dbus-org.freedesktop.locale1.service'

Thu 2015-01-22 19:23:19.062520 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=617;b=a38adcd9c0164ccfaa853ad70505b6e6;m=5f92f76;t=50d429ac38d9e;x=d4875f7734ab7838]
    MESSAGE=[system] Successfully activated service 'org.freedesktop.locale1'

Thu 2015-01-22 19:23:24.402946 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=61a;b=a38adcd9c0164ccfaa853ad70505b6e6;m=64aac12;t=50d429b150a3a;x=bd0b18c2a0f88a64]
    MESSAGE=Registered Authentication Agent for unix-session:1 (system bus name :1.43 [/usr/bin/gnome-shell], object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8)

Thu 2015-01-22 19:23:25.559911 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=61d;b=a38adcd9c0164ccfaa853ad70505b6e6;m=65c536c;t=50d429b26b194;x=2cf00269694cb5b9]
    MESSAGE=pam_unix(login:session): session opened for user root by LOGIN(uid=0)
    _SYSTEMD_CGROUP=/system.slice/system-getty.slice/getty@tty2.service

Thu 2015-01-22 19:23:26.810898 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=62e;b=a38adcd9c0164ccfaa853ad70505b6e6;m=66f69c4;t=50d429b39c7ec;x=7825eb86ce6c8c31]
    _CMDLINE=sudo -n -u debian-tor /usr/local/sbin/tor-has-bootstrapped
    MESSAGE=pam_unix(sudo:session): session opened for user debian-tor by (uid=0)
    _PID=2902

Thu 2015-01-22 19:23:36.866314 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=64c;b=a38adcd9c0164ccfaa853ad70505b6e6;m=708d7e2;t=50d429bd3360a;x=9c1d954af0f09984]
    MESSAGE=amnesia : TTY=unknown ; PWD=/home/amnesia ; USER=debian-tor ; COMMAND=/usr/local/sbin/tor-has-bootstrapped
    _CMDLINE=sudo -n -u debian-tor /usr/local/sbin/tor-has-bootstrapped

Thu 2015-01-22 19:23:37.317343 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=651;b=a38adcd9c0164ccfaa853ad70505b6e6;m=70fbaaf;t=50d429bda18d7;x=fcdac88b95cafec7]
    _CMDLINE=sudo -n -u debian-tor /usr/local/sbin/tor-has-bootstrapped
    MESSAGE=pam_unix(sudo:session): session opened for user debian-tor by (uid=0)

Thu 2015-01-22 19:23:43.044585 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=656;b=a38adcd9c0164ccfaa853ad70505b6e6;m=7671dc1;t=50d429c317be9;x=45ed00ab91852c9b]
    MESSAGE=Gjs-Message: JS WARNING: [/usr/share/gnome-shell/extensions/launch-new-instance@gnome-shell-extensions.gcampax.github.com/extension.js 9]: assignment to undeclared variable _activateOriginal

Thu 2015-01-22 19:23:45.301564 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=659;b=a38adcd9c0164ccfaa853ad70505b6e6;m=7898e14;t=50d429c53ec3c;x=9e75ffa87f63b789]
    MESSAGE=Gjs-Message: JS WARNING: [/usr/share/gnome-shell/extensions/shutdown-helper@tails.boum.org/extension.js 137]: assignment to undeclared variable extension

Thu 2015-01-22 19:23:45.301564 UTC [s=ac50e06708e0463b84ea82d7e2f781cb;i=659;b=a38adcd9c0164ccfaa853ad70505b6e6;m=7898e14;t=50d429c53ec3c;x=9e75ffa87f63b789]
    MESSAGE=(gnome-shell:2819): mutter-WARNING **: STACK_OP_RAISE_ABOVE: window 0x4f00c00016 not in stack

kytv:

  • had GNOME crashed already when you opened the root session on tty2?
  • may you please retry after removing the apparmor-related parameters on the kernel command-line, just to be sure?

#9 Updated by intrigeri 2015-01-22 20:02:46

  • Assignee changed from intrigeri to kytv
  • QA Check set to Info Needed
  • Affected tool deleted (Greeter)

#10 Updated by intrigeri 2015-01-22 20:06:32

  • related to Bug #7323: Wheezy's GNOME crashes randomly after Greeter login added

#11 Updated by intrigeri 2015-01-22 22:11:30

  • would be good if you could take note of the exact second when the “oh no!” message appears, to correlate it with other events (most likely, a timeout expiring).
  • may I have a log of processes open by uid 1000 (e.g. every second) and another journal from the same boot, with all options used above? best if it’s the same boot as for previous bullet point

#12 Updated by intrigeri 2015-01-22 23:13:18

I’m both sad and happy to have reproduced here:

  • with the same kvm command-line
  • with -cpu qemu32 (“QEMU Virtual CPU version 2.1.2”)
  • with -cpu qemu64 (“QEMU Virtual CPU version 2.1.2”)
  • with -machine pc-i440fx-2.0,accel=kvm,usb=off -cpu qemu32 -smp 2,sockets=2,cores=1,threads=1
  • with -machine pc-i440fx-2.0,accel=kvm,usb=off -cpu qemu64 -smp 2,sockets=2,cores=1,threads=1

This means that I can probably debug this myself. OTOH, I’d be happy to have the info I’ve asked Kill Your TV to gather :)

However, I could not reproduce this with -machine pc-i440fx-2.0,accel=kvm,usb=off -cpu SandyBridge,+invtsc,+invpcid,+erms,+bmi2,+smep,+avx2,+bmi1,+fsgsbase,+abm,+pdpe1gb,+rdrand,+f16c,+osxsave,+movbe,+pcid,+pdcm,+xtpr,+fma,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme (creds go to libvirt for generating these bits for me).

In all cases, I was running the test suite in parallel in another KVM guest, not tried without yet. On failures, in most cases I’ve briefly seen some of the desktop (e.g. desktop icons) appear with a black background before the “oh no!” screen.

#13 Updated by intrigeri 2015-01-23 11:10:52

Also, it would be useful to figure out where (in what .desktop file?) we can pass --debug to gnome-session. I guess that this flag would help us understand what exactly is going on.

#14 Updated by intrigeri 2015-01-23 12:15:12

intrigeri wrote:
> Also, it would be useful to figure out where (in what .desktop file?) we can pass --debug to gnome-session. I guess that this flag would help us understand what exactly is going on.

Appending --debug to the Exec and TryExec lines in /usr/share/xsessions/gnome-classic.desktop seems to produce the intended effect. Then, I see that the command run by gnome-shell-classic.desktop (that is, /usr/bin/gnome-shell) is exiting with code 1.

#15 Updated by intrigeri 2015-01-23 12:52:05

An error message I see just after the one about GNOME Shell dying is LLVM ERROR: Do not know how to split the result of this operator. I don’t see it with the slightly different CPU configuration where I cannot reproduce the bug. This looks like https://freedesktop.org/patch/34445/, http://llvm.org/bugs/show_bug.cgi?id=15929 and https://bugs.launchpad.net/ubuntu/+source/llvm-toolchain-3.5/+bug/1360241.

#16 Updated by intrigeri 2015-01-23 13:38:36

Works:

  • -machine pc-i440fx-2.0,accel=kvm,usb=off -cpu SandyBridge
  • -machine pc-i440fx-2.0,accel=kvm,usb=off -cpu pentium3
  • -machine pc-i440fx-2.0,accel=kvm,usb=off -cpu pentium2
  • -machine pc-i440fx-2.0,accel=kvm,usb=off -cpu pentium

Buggy:

  • -machine pc-i440fx-2.0,accel=kvm,usb=off -cpu qemu64,+invtsc,+invpcid,+erms,+bmi2,+smep,+avx2,+bmi1,+fsgsbase,+abm,+pdpe1gb,+rdrand,+f16c,+osxsave,+movbe,+pcid,+pdcm,+xtpr,+fma,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme
  • -machine pc-i440fx-2.0,accel=kvm,usb=off -cpu qemu64,+invtsc,+invpcid,+erms,+bmi2,+smep,+avx2,+bmi1,+fsgsbase,+abm,+pdpe1gb,+rdrand,+f16c,+osxsave,+movbe,+pcid,+pdcm,+xtpr,+fma,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme,+sse2,+sse

The difference between -cpu pentium2 and -cpu qemu32 is:

  • pentium2 adds:
    • mca, mtrr, pse36 (that pentium hasn’t, so probably irrelevant)
    • vme
  • qemu32 adds:
    • EXT_POPCNT
    • EXT_SSE3
    • SSE
    • SSE2

And then:

  • qemu32,+vme doesn’t work
  • qemu32,+vme,+mca,+mtrr,+pse36 doesn’t work either
  • qemu32,-sse,-sse2,-sse3 works fine

=> it looks like we’re hitting a bug in how llvmpipe behaves with some very specific combination of CPU features. On the one hand, it’s a shame that we fail with the default QEMU virtual CPU. On the other hand, it seems that no real hardware is affected, and when running in QEMU, one (starting with our own test suite) should specify a CPU that’s closer to common bare metal ones and/or to the host CPU.

#17 Updated by Tails 2015-01-23 14:12:40

  • Status changed from Confirmed to In Progress

Applied in changeset commit:2983b2ec0702642af5f2e538727e73de5abcd004.

#18 Updated by intrigeri 2015-01-23 14:23:33

  • Priority changed from Normal to Elevated
  • % Done changed from 0 to 20
  • QA Check changed from Info Needed to Ready for QA

Aforementioned commit should fix this problem in our automated test suite. And commit:d0cdbce1a289919f4056ca32a60425bbd16c0ff5 documents it as a known issue, along with a workaround.

Kill Your TV, may you please confirm that this fixes the problem for you?

#19 Updated by kytv 2015-01-23 19:46:11

  • Assignee changed from kytv to intrigeri
  • QA Check changed from Ready for QA to Dev Needed

Unfortunately I still have this problem when running the test suite in a nested VM.

I reset the Level 1 guest (the one that I run the test suite in) to use the host CPU in its configs. The libvirt generated qemu command line:

qemu-system-x86_64 -enable-kvm -name TestSuite -S -machine pc-i440fx-2.1,accel=kvm,usb=off -cpu Opteron_G4,+invtsc,+perfctr_nb,+perfctr_core,+topoext,+nodeid_msr,+lwp,+wdt,+skinit,+ibs,+osvw,+cr8legacy,+extapic,+cmp_legacy,+fxsr_opt,+mmxext,+osxsave,+monitor,+ht,+vme -m 10240 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid d7fb4a44-3cf7-4c4b-baa4-7c82172aa77b -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/TestSuite.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -drive file=/VMs/libvirt/images/testsuite.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=unsafe -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-ide0-0-1,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:5d:da:73,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on

Level 2 (TailsToaster) libvirt generated command line looks like

qemu-system-x86_64 -name TailsToaster -S -machine pc-0.15,accel=kvm,usb=off -cpu qemu64,+fma4,+xop,+3dnowprefetch,+misalignsse,+sse4a,+abm,+lahf_lm,+pdpe1gb,+hypervisor,+avx,+osxsave,+xsave,+aes,+popcnt,+x2apic,+sse4.2,+sse4.1,+ssse3,+pclmuldq -m 1280 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid f6c0e2d6-4260-43ef-b66a-9956124e2a23 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/TailsToaster.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -drive file=/home/kytv/tails/tails-i386-feature_jessie-1.3-20150120.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ac:dd:ee,bus=pci.0,addr=0x3 -chardev socket,id=charserial0,host=127.0.0.1,port=1337,server,nowait -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:1 -device qxl-vga,id=video0,ram_size=67108864,vram_size=9437184,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on

/proc/cpuinfo on the host:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 1
model name      : AMD FX(tm)-6100 Six-Core Processor
stepping        : 2
microcode       : 0x600063d
cpu MHz         : 3300.000
cache size      : 2048 KB
physical id     : 0
siblings        : 6
core id         : 0
cpu cores       : 3
apicid          : 16
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 nodeid_msr topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bogomips        : 6630.28
TLB size        : 1536 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb

I can work on getting the debug info requested earlier if it’d still be of use.

#20 Updated by intrigeri 2015-01-23 21:42:48

> Unfortunately I still have this problem when running the test suite in a nested VM.

Too bad :(

I’m curious if you can still replicate this bug with the -machine and -cpu combinations that work for me => can you please try giving e.g. a pentium, pentium2 or pentium3 CPU to the level 2 VM (hopefully that can work on a AMD host), or any combination that works for me?

> I can work on getting the debug info requested earlier if it’d still be of use.

It would be good to know (using gnome-session --debug, as explained above) what part of the GNOME session fails to load, and especially whether you see the same LLVM error as me. If that’s the case, then it would be good to try and reproduce this bug using:

  • the same virtualization setup (same level 1 guest, and same domain configuration for the level 2 guest)
  • regular Debian Jessie
  • GNOME Shell in Classic mode

… and if it works fine, then retry with the same set of GNOME Shell extensions that we enable in feature/jessie.

#21 Updated by intrigeri 2015-01-23 22:16:05

  • Assignee changed from intrigeri to kytv
  • QA Check changed from Dev Needed to Info Needed

#22 Updated by kytv 2015-01-24 00:09:23

  • QA Check changed from Info Needed to Dev Needed

Some results on the level 2 VM:

  • -cpu SandyBridge yields a crash with the same LLVM message referenced above.
  • -cpu pentium3 loads the desktop normally with no crashes

#23 Updated by kytv 2015-01-24 00:09:52

  • QA Check changed from Dev Needed to Info Needed

#24 Updated by intrigeri 2015-01-24 08:29:18

It seems that Ubuntu had this llvmpipe bug, and then reverted to build mesa against llvm-3.4 for a while, and now they have a newer mesa, and they now built against llvm 3.5, just like Debian. So perhaps only the newer version of mesa works fine with llvm 3.5. So we should retry with:

  • an ISO that has that set of packages rebuilt with llvm-3.4;
  • an ISO that has the sid version of all binary packages we install that are built from the mesa source package.

And then we’ll have enough info to tell the Debian or upstream mesa folks that something’s wrong.

#25 Updated by intrigeri 2015-01-24 08:47:31

kytv wrote:
> Some results on the level 2 VM:
>
> * -cpu SandyBridge yields a crash with the same LLVM message referenced above.
> * -cpu pentium3 loads the desktop normally with no crashes

Can you please bisect that a bit and find a 64-bit Intel -cpu that works for you? (See kvm -cpu help for the full list.) Using it for our automated test suite would be a good enough stopgap measure until we have the llvmpipe bug fixed.

#26 Updated by kytv 2015-01-25 18:42:49

intrigeri wrote:

> Can you please bisect that a bit and find a 64-bit Intel -cpu that works for you?

Certainly! (It was already on my personal TODO list to satisfy my curiosity)

#27 Updated by kytv 2015-01-25 21:00:51

The command run for each of these: kvm -cpu $CPUTYPE -m 4096 -cdrom tails-i386-feature_jessie-1.3-20150120.iso

Working means “I login at the Greeter and get to the desktop.”
Not working means “I login at the Greeter and see the dreaded ‘Oh no!’ screen.”

None of these are working in the level 2 VM:

  • SandyBridge
  • core2duo
  • coreduo
  • Broadwell
  • Haswell
  • Westmere
  • Nehalem
  • Penryn
  • Conroe
  • n270

These are working:

  • host
  • kvm64
  • Opteron_G1
  • Opteron_G2
  • Opteron_G3
  • Opteron_G4
  • Opteron_G5
  • phenom

Interestingly, host as set in changeset 2983b2ec0702642af5f2e538727e73de5abcd004 did not work for me in the test suite (as I noted above at Bug #8778#19).

#28 Updated by kytv 2015-01-26 02:24:11

  • Assignee deleted (kytv)
  • QA Check deleted (Info Needed)

FWIW, I tried SandyBridge and core2duo in a level 1 VM and they also crashed, so this doesn’t appear to be a nested VM problem.

#29 Updated by intrigeri 2015-01-26 10:45:22

  • Assignee set to kytv
  • QA Check set to Info Needed

kytv wrote:
> FWIW, I tried SandyBridge and core2duo in a level 1 VM and they also crashed, so this doesn’t appear to be a nested VM problem.

Would be good to know if these crashes are the same llvmpipe bug as I’ve seen (see Bug #8778#note-20).

#30 Updated by kytv 2015-01-26 15:37:10

intrigeri wrote:
> kytv wrote:
> > FWIW, I tried SandyBridge and core2duo in a level 1 VM and they also crashed, so this doesn’t appear to be a nested VM problem.
>
> Would be good to know if these crashes are the same llvmpipe bug as I’ve seen (see Bug #8778#note-20).

Seems to be the same. (This was with -cpu SandyBridge in a Level 1 VM on an AMD host).

Mon 2015-01-26 15:31:32.305172 UTC [s=8436e2e3d3f14492bea1592751a01c6a;i=687;b=09aad8b5a40a44bf9dfbc6dcf436d7b1;m=905294c;t=50d8fd538c514;x=4c87b186a0cdf245]
    PRIORITY=6
    _BOOT_ID=09aad8b5a40a44bf9dfbc6dcf436d7b1
    _MACHINE_ID=417a22869e1544f2ba02cf3fad947857
    _TRANSPORT=stdout
    _HOSTNAME=amnesia
    _CAP_EFFECTIVE=0
    _EXE=/usr/bin/gnome-session
    SYSLOG_IDENTIFIER=gnome-session
    _GID=1000
    _AUDIT_SESSION=2
    _AUDIT_LOGINUID=1000
    _SYSTEMD_OWNER_UID=1000
    _SYSTEMD_SLICE=user-1000.slice
    _UID=1000
    _PID=2749
    _SYSTEMD_CGROUP=/user.slice/user-1000.slice/session-2.scope
    _SYSTEMD_SESSION=2
    _SYSTEMD_UNIT=session-2.scope
    _COMM=x-session-manag
    _CMDLINE=x-session-manager
    MESSAGE=LLVM ERROR: Do not know how to split the result of this operator!

#31 Updated by intrigeri 2015-01-26 19:42:43

>> Would be good to know if these crashes are the same llvmpipe bug as I’ve seen (see Bug #8778#note-20).

> Seems to be the same.

Thanks! Next best steps are then the ones described in Bug #8778#note-24, I think.
Unless someone has a better idea? (I’d love it :)

#32 Updated by Tails 2015-02-25 13:33:27

Applied in changeset commit:8544f14485289653b8dde9a29dc496f79e768866.

#33 Updated by BitingBird 2015-02-25 20:54:45

  • QA Check deleted (Info Needed)

#34 Updated by intrigeri 2015-02-26 08:59:15

  • Assignee deleted (kytv)

#35 Updated by intrigeri 2015-03-08 16:21:17

Also see https://bugs.debian.org/770130 and merged bug reports.

#36 Updated by intrigeri 2015-03-08 17:50:40

  • Category set to Hardware support
  • Assignee set to kytv
  • % Done changed from 20 to 30
  • QA Check set to Ready for QA

I’ve rebuilt the mesa source package with the patch from https://freedesktop.org/patch/34445/, and uploaded the result to our feature-jessie package. Going to build an ISO and test locally. Kill Your TV, may you please do the same and report back whether it fixes the bug for you?

#37 Updated by intrigeri 2015-03-08 19:56:26

  • % Done changed from 30 to 40

intrigeri wrote:
> Going to build an ISO and test locally.

It fixes the bug for me with qemu-system-x86_64 -enable-kvm -machine pc-i440fx-2.0,accel=kvm,usb=off -cdrom foo.iso -m 2048 -cpu qemu32 (and I could reproduce the bug again with the same command-line, and an older ISO built from feature/jessie).

#38 Updated by kytv 2015-03-08 22:41:39

  • % Done changed from 40 to 50

intrigeri wrote:
> I’ve rebuilt the mesa source package with the patch from https://freedesktop.org/patch/34445/

GREAT find!

So far, so good with the default kvm command line that I included in my original report. I’ll try with my nested VM set-up later—maybe I’ll finally be able to do the test suite stuff for feature/jessie. ;)

#39 Updated by kytv 2015-03-09 18:49:51

Initial findings:

These did not work before. They still do not work.

  • SandyBridge
  • Broadwell
  • Haswell
  • Westmere
  • Nehalem
  • Penryn

These worked before and still work:

  • host
  • kvm64
  • Opteron_G1
  • Opteron_G2
  • Opteron_G3
  • Opteron_G4
  • Opteron_G5
  • phenom

These did not work before but they work now.

  • core2duo
  • coreduo
  • Conroe
  • n270
  • qemu32

Logs will be forthcoming.

#40 Updated by intrigeri 2015-03-09 20:51:53

> Initial findings:

Thanks! May you please send this information to the corresponding Debian bug report, by replying to the email I’ve Cc’ed you? Otherwise, I can do it, just tell me.

#41 Updated by kytv 2015-03-09 21:17:17

intrigeri wrote:
> > Initial findings:
>
> Thanks! May you please send this information to the corresponding Debian bug report, by replying to the email I’ve Cc’ed you? Otherwise, I can do it, just tell me.

I did that ~5 minutes after posting it here :)

#42 Updated by kytv 2015-03-09 22:30:42

I’m assuming that the error logged will be the same for all of the configs that fail. With SandyBridge:

Mon 2015-03-09 22:02:18.006526 UTC [s=eb33466202884b30aef3339fee8ed7a2;i=748;b=298ef4f4f5c3475ebacdd57abc1aa2fa;m=da900d1;t=510e2300787fe;x=497f9a9f60f18c15]
    PRIORITY=6
    _BOOT_ID=298ef4f4f5c3475ebacdd57abc1aa2fa
    _MACHINE_ID=c559c5e56b134ab5a5acf6c74eba069f
    _TRANSPORT=stdout
    _HOSTNAME=amnesia
    _CAP_EFFECTIVE=0
    _EXE=/usr/bin/gnome-session
    SYSLOG_IDENTIFIER=gnome-session
    _GID=1000
    _AUDIT_SESSION=2
    _AUDIT_LOGINUID=1000
    _SYSTEMD_OWNER_UID=1000
    _SYSTEMD_SLICE=user-1000.slice
    _UID=1000
    _PID=2728
    _SYSTEMD_CGROUP=/user.slice/user-1000.slice/session-2.scope
    _SYSTEMD_SESSION=2
    _SYSTEMD_UNIT=session-2.scope
    _COMM=x-session-manag
    _CMDLINE=x-session-manager
    MESSAGE=LLVM ERROR: Do not know how to split the result of this operator!

As mentioned on the Debian bug, my CPU doesn’t have support for all of the Sandy Bridge (and probably other CPU) features, but the failure should be more graceful, falling back to something that’s almost certainly going to work.

#43 Updated by intrigeri 2015-03-10 02:21:16

> As mentioned on the Debian bug, my CPU doesn’t have support for all of the Sandy
> Bridge (and probably other CPU) features, but the failure should be more graceful,
> falling back to something that’s almost certainly going to work.

I see what you mean. Arguably that’s a QEMU design problem, that may be hard to change now without breaking tons of stuff. Anyway: as far as the mesa bug is concerned, we should probably test only vcpus that can actually be fully emulated.

#44 Updated by kytv 2015-03-10 03:22:56

I meant that gnome-shell/mesa should just work. :) Failing to find a non-essential CPU flag (if that’s what’s happening) shouldn’t be able halt the displaying of the desktop. I see this as gnome-shell giving up (ungracefully) when it doesn’t have to as opposed to it being a qemu problem, especially since this same crash was seen on real, non-broken hardware.

(So many of the changes in recent GNOME versions bother me but that’s a discussion for another time—if ever).

I’m still unable to run the test suite in Jessie. To be investigated…

#45 Updated by kytv 2015-03-10 03:59:01

  • Assignee changed from kytv to intrigeri
  • % Done changed from 50 to 80
  • QA Check changed from Ready for QA to Pass

kytv wrote:

>
> I’m still unable to run the test suite in Jessie. To be investigated…

It looks like I finally can. When I first ran into this problem, first seen in the test suite, I set my lvl1 vm to “Host CPU”. That didn’t fix it but I kept that setting.

After removing it (in virt-manager, selecting Hypervisor Default) my lvl1 libvirt xml file changed thusly:


-  <cpu mode='custom' match='exact'>
-    <model fallback='allow'>Opteron_G4</model>
-    <vendor>AMD</vendor>
-    <feature policy='require' name='perfctr_core'/>
-    <feature policy='require' name='monitor'/>
-    <feature policy='require' name='skinit'/>
-    <feature policy='require' name='ibs'/>
-    <feature policy='require' name='mmxext'/>
-    <feature policy='require' name='osxsave'/>
-    <feature policy='require' name='vme'/>
-    <feature policy='require' name='topoext'/>
-    <feature policy='require' name='fxsr_opt'/>
-    <feature policy='require' name='cr8legacy'/>
-    <feature policy='require' name='ht'/>
-    <feature policy='require' name='wdt'/>
-    <feature policy='require' name='extapic'/>
-    <feature policy='require' name='osvw'/>
-    <feature policy='require' name='nodeid_msr'/>
-    <feature policy='require' name='perfctr_nb'/>
-    <feature policy='require' name='cmp_legacy'/>
-    <feature policy='require' name='lwp'/>
-    <feature policy='require' name='invtsc'/>
-  </cpu>

Now I can see the desktop in the test suite within my nested VM set-up.

After all that I can say that the patched Mesa pkgs fixed my problem. intrigeri

#46 Updated by intrigeri 2015-03-10 09:54:24

> I meant that gnome-shell/mesa should just work. :) Failing to find a non-essential CPU flag (if that’s what’s happening) shouldn’t be able halt the displaying of the desktop.

Indeed, you’re right (note that there’s little that GNOME Shell can do about it if the underlying hardware drivers fail).

#47 Updated by intrigeri 2015-03-10 09:54:42

>> I’m still unable to run the test suite in Jessie. To be investigated…

> It looks like I finally can.

Woohoo!

#48 Updated by intrigeri 2015-05-16 09:35:18

  • Status changed from In Progress to Resolved
  • Assignee deleted (intrigeri)
  • % Done changed from 80 to 100

Please open another ticket if the problem comes back.

#49 Updated by goupille 2016-02-09 15:11:48

  • related to Bug #11096: "Oh no!" / Xorg crash after logging in at the Greeter with Intel 855GM graphics added

#50 Updated by intrigeri 2016-04-29 13:38:04

  • related to deleted (Bug #11096: "Oh no!" / Xorg crash after logging in at the Greeter with Intel 855GM graphics)