Bug #16224

Black screen after the boot menu with Intel GPU (i915)

Added by goupille 2018-12-13 20:38:20 . Updated 2019-01-30 11:52:51 .

Status:
Resolved
Priority:
Elevated
Assignee:
Category:
Hardware support
Target version:
Start date:
2018-12-13
Due date:
% Done:

100%

Feature Branch:
Type of work:
End-user documentation
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

Several users reported that since upgrading to 3.11, Tails no longer boot, displaying an empty black screen after the boot menu in normal mode, and in troubleshooting mode it ends up with the following message :

Error starting GDM with your graphics card: Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02).

Adding xorg-driver=intel to the startup options does not solve the issue.


Subtasks


Related issues

Related to Tails - Bug #16145: Upgrade Linux to 4.18.20 Resolved 2018-11-22
Related to Tails - Bug #16447: Gather information about regression on some Intel GPU (Braswell, Kaby Lake) In Progress 2019-02-08
Blocks Tails - Feature #15941: Core work 2018Q4 → 2019Q2: Technical writing Resolved 2018-09-11
Blocks Tails - Feature #15507: Core work 2019Q1: Foundations Team Resolved 2018-04-08
Blocked by Tails - Bug #16073: Upgrade Linux to 4.19 Resolved 2018-10-25

History

#1 Updated by goupille 2018-12-13 20:58:08

to be more clear, that’s the GPU on the thinkpad x201

#2 Updated by goupille 2018-12-13 22:34:14

same issue with

Intel HD Graphics [8086:0046] (rev18)

(intel core i3 - M380)

#3 Updated by goupille 2018-12-13 22:44:51

two anonymous users reported the same problem (blankscreen) with the following card :

Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (primary) [8086:2a02] (rev 0c)

#4 Updated by goupille 2018-12-13 23:08:40

and it is reported in debian:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914980

#5 Updated by goupille 2018-12-13 23:48:37

  • Subject changed from Black screen after the boot menu with Intel HD GPU first generation (Westmere) to Black screen after the boot menu with Intel GPU (i915)

#6 Updated by goupille 2018-12-14 11:12:58

  • Priority changed from Normal to Elevated

I set the priority to “elevated”, given the number of users reporting this

#7 Updated by goupille 2018-12-14 12:01:37

  • Assignee changed from intrigeri to CyrilBrulebois

Adding this bug to 3.11’s known issues (with an anchor) could be a good thing for us (helpdesk)…

#8 Updated by emmapeel 2018-12-15 07:41:56

A user in XMPP reports that going through the links on the Debian report found this patch:

https://patchwork.freedesktop.org/patch/265653/

The user volunteered to test Tails ISO images with the patch on their laptops Thinkpad X200, X200s, T500, T400s, X301, and T400. Tails 3.11 currently doesn’t work correctly on any of them.

#9 Updated by CyrilBrulebois 2018-12-15 13:55:42

I’ve just pushed a commit to the master branch adding this under “Known issues”. Feel free to adjust the wording if needed before calling for translations (if that isn’t done automatically):

kibi@armor:~/work/clients/tails/tails.git$ git show -- wiki/src/news/version_3.11.mdwn
commit 7523fcc7e35c002e2ebaf4b00660c5dd293d16f4
Author: Cyril Brulebois <ckb@riseup.net>
Date:   Sat Dec 15 14:48:36 2018 +0100

    Document <del><a class='issue tracker-1 status-3 priority-5 priority-default closed child' href='/code/issues/16224' title='Black screen after the boot menu with Intel GPU (i915)'>Bug #16224</a></del> as a known issue.

    Requested-by: goupille (for frontdesk).

diff --git a/wiki/src/news/version_3.11.mdwn b/wiki/src/news/version_3.11.mdwn
index fd0aa3c806..112f3e3914 100644
--- a/wiki/src/news/version_3.11.mdwn
+++ b/wiki/src/news/version_3.11.mdwn
@@ -50,7 +50,11 @@ For more details, read our [[!tails_gitweb debian/changelog desc="changelog"]].

 # Known issues

-None specific to this release.
+- Tails may fail to start on some computers with Intel graphical
+  hardware: a regression in the i915 Linux kernel module can lead
+  to a black screen when trying to boot this Tails version
+  ([[!tails_ticket 16224]], [[!debbug 914980]]). Users may want
+  to delay upgrading until a solution has been identified.

 See the list of [[long-standing issues|support/known_issues]].

(This only shows the actual change in the MDWN file; PO files were updated as well in the same commit.)

#10 Updated by CyrilBrulebois 2018-12-15 13:57:11

I’ll check what happened in the upstream (mainline) and downstream (debian) kernels, and see whether I can build a patched kernel and then an ISO, that users could try. This might be material for an emergency release given the prominence of Intel GPUs…

#11 Updated by intrigeri 2018-12-15 18:12:01

  • related to Bug #16145: Upgrade Linux to 4.18.20 added

#12 Updated by CyrilBrulebois 2018-12-16 12:05:23

  • Assignee changed from CyrilBrulebois to anonym

No luck upstream, so I’ve tried to assess the situation on the Debian side, and came up with this suggestion: <https://bugs.debian.org/914980#50>

My patch against upstream had this commit message:

Revert "drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5"

This reverts commit 06e562e7f515292ea7721475950f23554214adde.

v4.18.20 regresses at least on gen4 as seen in these bug reports:
  https://bugs.freedesktop.org/108850
  https://bugs.freedesktop.org/108984
  https://bugs.debian.org/914980
  https://redmine.tails.boum.org/code/issues/16224

This patch landed in various drm-intel branches but hasn't found its way
to linux-4.18.y yet:
  https://patchwork.freedesktop.org/patch/265653/

Trying to apply it on top of v4.18.20 triggers several conflicts, so it
seems safer to just revert what seems to be the culprit, as confirmed by
a user reporting this revert fixes the problem for them, and by this
part of the commit message for the actual fix in drm-intel:

    commit 5179749925933575a67f9d8f16d0cc204f98a29f
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Tue Dec 4 14:15:16 2018 +0000

        drm/i915: Allocate a common scratch page
    […]
        Fixes: 06e562e7f515 ("drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5") # v4.18.20
        Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108850
    […]

Signed-off-by: Cyril Brulebois <cyril@debamax.com>

As the person responsible for releasing 3.11 with this regression, I was meaning to get a fix against the Debian package to get a test ISO built with it, so that testers could report whether the patch was doing its job. The end goal being finding a short-term solution, which would let us contemplate the feasibility of an emergency release.

Since then, I’ve learnt that 4.18.y is EOL, that the next upload to Debian is going to be 4.19-based anyway, and that the obvious way to deal with such an issue is to revert to the previous kernel, instead of building our own kernel…

Reassigning to anonym for input (as RM and as 4.18.20 merge submitter). I’m fine with doing the needed release work (if we end up doing an emergency release) once a solution has been found, but also fine with letting someone else handle it if that’s not desirable.

In the meanwhile, I’ll be dealing with the last post-release (3.11) steps, which I had postponed for personal reasons; sorry for the breakages in Jenkins etc. in the meanwhile (Bug #16226).

#13 Updated by intrigeri 2018-12-17 10:27:33

  • Category set to Hardware support
  • Assignee changed from anonym to intrigeri
  • Priority changed from Elevated to High
  • Target version set to Tails_3.12

Dear kibi,

> As the person responsible for releasing 3.11 with this regression,

I greatly appreciate the work you’ve put into this: it feels very good to see that there are people around to tackle such issues when I’m away! Thank you :)

I think I understand why you’ve felt personally responsible:

  • You were the one who pushed the big red “release” button.
  • In other projects (e.g. Debian), dealing with such post-release fallout is on the release managers’ plate, be it formally or de facto.

Now, I’d like to provide another perspective:

  • I’ve started preparing a timeline for a blameless postmortem process and tl;dr: we’ve been very unlucky and nobody did anything very wrong. Whoever pushed the big red “release” button is a mere detail in a long series of unfortunate events, coincidences, small mistakes, and missing info/communication that lead to releasing with this regression.
  • In Tails, the RM is not responsible for the actual code that’s in the release nor for such regressions. Dealing with such fallout is the FT’s job. In this specific case, well, perhaps you were the only active FT person last week (I, for one, was not) so it did not make a big difference in practice. But I think it’s worth clarifying that you did not have to handle this with your RM hat on: it’s great that you did it (and please report this work as part of your FT work!) but that’s not part of the expectations.

> […] and that the obvious way to deal with such an issue is to revert to the previous kernel

First I’ll quickly try to find a workaround we could document so that affected users can use Tails 3.11 and we don’t have to put out an emergency release. That would be ideal but I’m not holding my breathe. Help desk, if anyone already did this work, please tell me what’s known to not work.

Then if I find no workaround, I’ll investigate the possibility of downgrading the kernel: on Saturday I only took a very quick look at the CVEs fixed by the kernel upgrade brought by Tails 3.11 and at first glance it did not seem too unreasonable to downgrade; but I need to look a bit closer before concluding that this is a viable option. If it turns out it’s not, then building our own kernel might be the best option on the table (upgrading to 4.19 should be considered too though).

> I’m fine with doing the needed release work (if we end up doing an emergency release) once a solution has been found, but also fine with letting someone else handle it

Excellent :)

Cheers!

#14 Updated by intrigeri 2018-12-17 10:35:39

#15 Updated by intrigeri 2018-12-17 11:14:14

  • Type of work changed from Research to End-user documentation

> First I’ll quickly try to find a workaround we could document so that affected users can use Tails 3.11

I could reproduce this bug on a X200 (8086:2a42 rev 07, for which we don’t force the intel X.Org driver in config/chroot_local-includes/usr/share/live/config/xserver-xorg/intel.ids).

Then I’ve tested some workarounds:

  • modprobe.blacklist=i915: OK (native resolution, vesa driver)
  • nomodeset: OK (native resolution, vesa driver)
  • nofb: crash in early boot
  • modprobe.blacklist=i915 xorg-driver=intel: GDM fails to start
  • nomodeset xorg-driver=intel: GDM fails to start
  • nofb xorg-driver=intel: crash in early boot

So I’ll document the workaround that’s easiest to type: nomodeset.

Help desk, please ask affected users to add the nomodeset option in the boot menu and report back if that’s enough to fix their problem. I expect that on some hardware, Tails won’t work as well as usual but I hope it’ll at least start and fulfil basic needs; expected issues: sluggish graphics performance (in particular with high screen resolutions), smaller resolution than the native one.

Unless we get reports that this workaround is not sufficient on a broad set of hardware, that’ll be good enough and we don’t need to put out an emergency release (which is good because all our options have issues: either kernel security regressions, or non-trivial initial dev costs + increased maintenance costs, or big risk of introducing other regressions).

#16 Updated by intrigeri 2018-12-17 11:37:05

  • Status changed from Confirmed to In Progress

Applied in changeset commit:tails|4a9f556290d928fbbbec923f28d7860b26ea481f.

#17 Updated by intrigeri 2018-12-17 11:44:11

  • Assignee changed from intrigeri to sajolida
  • % Done changed from 0 to 50
  • QA Check set to Ready for QA

Hi sajolida! I’ve documented the workaround, trying to stick to the style we use in similar text elsewhere. Please review the 2 commits listed in “Associated revisions” above, that I’ve pushed straight to master given the pretty bad impact and scope of this regression. Thanks in advance!

(BTW, somewhat off-topic: https://tails.boum.org/doc/first_steps/startup_options/#boot_menu does not tell that in the Boot Loader Menu, the keyboard layout is US QWERTY. I suspect it’ll make it hard for many users to add the options we document here and there. If you agree, happy to check if we already have a ticket about that and file one if not. I guess that we could include a picture of a US QWERTY keyboard layout on that page.)

#18 Updated by goupille 2018-12-18 10:52:14

the workaround doesn’t solve the issue with the Ironlake-Arrandale GPU

Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02)

#19 Updated by mercedes508 2018-12-18 12:20:00

goupille wrote:
> the workaround doesn’t solve the issue with the Ironlake-Arrandale GPU
>
> […]

Received at least 4 bug reports today confirming this :)

#20 Updated by sajolida 2018-12-18 18:01:32

  • blocks Feature #15941: Core work 2018Q4 → 2019Q2: Technical writing added

#21 Updated by sajolida 2018-12-18 18:09:29

  • Assignee changed from sajolida to intrigeri
  • QA Check changed from Ready for QA to Info Needed

Both revisions look really fine!

#22 Updated by intrigeri 2018-12-19 08:01:33

  • Assignee changed from intrigeri to mercedes508

> goupille wrote:
>> the workaround doesn’t solve the issue with the Ironlake-Arrandale GPU
>>
>> […]

> Received at least 4 bug reports today confirming this :)

Are they really all on Ironlake-Arrandale?

#23 Updated by intrigeri 2018-12-19 08:07:13

intrigeri wrote:
> > goupille wrote:
> >> the workaround doesn’t solve the issue with the Ironlake-Arrandale GPU
> >>
> >> […]
>
> > Received at least 4 bug reports today confirming this :)
>
> Are they really all on Ironlake-Arrandale?

Apparently not: I’ve been forwarded a report that the workaround does not work on 8086:0046 (rev 02) either.

Help desk, please give us aggregated data: ideally the list of affected GPUs, and at least the subset of those where the workaround is reported not to work. Thanks!

#24 Updated by mercedes508 2018-12-19 10:08:51

intrigeri wrote:
> intrigeri wrote:
> > > goupille wrote:
> > >> the workaround doesn’t solve the issue with the Ironlake-Arrandale GPU
> > >>
> > >> […]
> >
> > > Received at least 4 bug reports today confirming this :)
> >
> > Are they really all on Ironlake-Arrandale?
>
> Apparently not: I’ve been forwarded a report that the workaround does not work on 8086:0046 (rev 02) either.
>
> Help desk, please give us aggregated data: ideally the list of affected GPUs, and at least the subset of those where the workaround is reported not to work. Thanks!

Hey, well the 4 reports from yesterday are all for 8086:0046 (rev 02) which is the one described by goupille in comment #18 as well or am I missing something?

#25 Updated by intrigeri 2018-12-19 10:14:42

> Hey, well the 4 reports from yesterday are all for 8086:0046 (rev 02) which is the one described by goupille in comment #18 as well or am I missing something?

Thanks for the clarification :)

If any other GPU is affected, please let us know.

#26 Updated by intrigeri 2018-12-19 10:41:07

For GPUs where nomodeset is not enough, try: nomodeset xorg-driver=vesa (we’re forcing the intel driver there and that may not work with nomodeset).

#27 Updated by mercedes508 2018-12-19 11:42:42

Some basic stats from the last 3 days bug reports:

  • [8086:0046] (rev 02): 7 reports and nomodeset doesn’t work.
  • Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07): 1 report & nomodeset works
  • [8086:0046] (rev 12): 4 reports and nomodeset doesn’t work
  • [8086:0046] (Rev 18): 1 report and nomodeset doesn’t work
  • [8086:2a42] (rev 07): 1 report & nomodeset works

#28 Updated by CyrilBrulebois 2018-12-19 12:48:11

In Bug #16226#note-15 we were wondering whether staying at/downgrading to 3.10.1 is documented as a workaround for this bug; if it is, we should keep the image around; otherwise we should delete the relevant files.

#29 Updated by mercedes508 2018-12-20 12:15:06

  • Assignee changed from mercedes508 to intrigeri

intrigeri wrote:
> For GPUs where nomodeset is not enough, try: nomodeset xorg-driver=vesa (we’re forcing the intel driver there and that may not work with nomodeset).

Just got the 2 first positive reports for this workaround on [8086:0046] (rev 2). Will let you know later if I get more.

#30 Updated by intrigeri 2018-12-21 07:29:47

  • Assignee changed from intrigeri to mercedes508

> intrigeri wrote:
>> For GPUs where nomodeset is not enough, try: nomodeset xorg-driver=vesa (we’re forcing the intel driver there and that may not work with nomodeset).

> Just got a first positive report for this workaround on [8086:0046] (rev 2).

Thanks, documented!

#31 Updated by mercedes508 2018-12-21 10:32:34

OK so today:

  • [8086:0046] (rev 02): nomodeset xorg-driver=vesa works (5 reports)
  • [8086:0046] (rev 12): nomodeset xorg-driver=vesa doesn’t work (1 report)

#32 Updated by mercedes508 2018-12-30 11:35:47

OK, so it basically works for everyone now, didn’t receive reports about nomodeset xorg-driver=vesa not working. Even though people complain a bit a bout the quality of the graphic.

#33 Updated by intrigeri 2018-12-30 12:31:39

  • Assignee changed from mercedes508 to intrigeri
  • QA Check changed from Info Needed to Ready for QA

OK, issue mitigated then. Great! :) Next step: test on a build from the devel branch to confirm that the problem is indeed solved there (without the workarounds).

#34 Updated by intrigeri 2018-12-30 12:31:51

#35 Updated by intrigeri 2018-12-30 12:31:56

  • blocked by deleted (Feature #15506: Core work 2018Q4: Foundations Team)

#36 Updated by intrigeri 2019-01-02 05:10:06

  • blocked by Bug #16073: Upgrade Linux to 4.19 added

#37 Updated by intrigeri 2019-01-02 09:03:33

  • Priority changed from High to Elevated

#38 Updated by anonym 2019-01-07 11:18:19

I think we need a blameless postmortem analysis for this issue. It might be as easy as this: whoever did the “bare metal” manual tests didn’t do a thorough enough job to catch this serious problem. Of course, whoever did it followed our current instructions so is not their fault, rather our manual tests are clearly insufficient.

We need a bit more rigorous hardware testing when bumping kernels (which should also be done for the merge request’s QA, not only release QA since that’s a bit late), like a list of very common hardware to test, and asking tails-testers@ for help. Modern Intel GPUs naturally belongs on that list considering that most Intel systems will use it. Which reminds me that we probably should test on AMD hardware since we developers mostly (only?) use Intel hardware so far. And so on.

#39 Updated by intrigeri 2019-01-08 10:02:44

> I think we need a blameless postmortem analysis for this issue.

Yes! Feel free to initiate it somewhere (I’d rather privately). A good way to start is to cooperatively build a timeline of facts.

#40 Updated by intrigeri 2019-01-14 19:11:27

  • Status changed from In Progress to Fix committed
  • Assignee deleted (intrigeri)
  • % Done changed from 50 to 100
  • QA Check changed from Ready for QA to Pass

I confirm this is fixed on devel (ThinkPad X200) since Bug #16073 was merged.

#41 Updated by anonym 2019-01-30 11:52:51

  • Status changed from Fix committed to Resolved

#42 Updated by mercedes508 2019-02-08 13:26:13

  • related to Bug #16447: Gather information about regression on some Intel GPU (Braswell, Kaby Lake) added