Bug #11583

UEFI boot tests fail on Jenkins

Added by intrigeri 2016-07-21 03:16:16 . Updated 2017-05-23 09:02:54 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Test suite
Target version:
Start date:
2016-07-21
Due date:
% Done:

100%

Feature Branch:
test/11583-uefi-boot-is-fragile-stretch
Type of work:
Research
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

Once Bug #10720 is workaround’ed, “Booting Tails from a USB drive in UEFI mode” always fail with a black screen.


Subtasks


Related issues

Related to Tails - Bug #12141: UEFI boot on QEMU is broken since 2.10~rc1 Resolved 2017-01-13
Blocked by Tails - Bug #11588: Sometimes fails to boot from USB on Jenkins with I/O errors Resolved 2016-07-22

History

#1 Updated by intrigeri 2016-07-21 03:18:29

  • Feature Branch set to test/11583-uefi-boot-is-fragile

Flagged as fragile.

#2 Updated by intrigeri 2016-07-21 10:31:37

  • Assignee set to intrigeri
  • Target version set to Tails_2.6

Random idea: check if AppArmor blocks access to the OVMF firmware.

#3 Updated by intrigeri 2016-07-28 05:15:10

intrigeri wrote:
> Random idea: check if AppArmor blocks access to the OVMF firmware.

It doesn’t.

#4 Updated by bertagaz 2016-07-28 06:55:31

From what I saw while testing Bug #10777, it seems like the firmware is not very reliable. It sometimes fails to boot, with different symptoms (black screen, freezes on the boot device list screen,…). I’ll report more in depth later. The OVMF doc says it can have trouble to do boot when using the KVM feature on recent qemu. Using -no-kvm option is said to help (that’s a “may”).

Stretch OVMF package is much more up to date, so I’m testing it at home with this one, and installed it on isotester6 as a job with UEFI scenario was involved. There’s also a qemu-efi package in Stretch, with the edk2 EFI bootloader. Could be another candidate if OVMF is not enough reliable.

#5 Updated by intrigeri 2016-07-28 08:06:11

Thanks for looking into this!

> Stretch OVMF package is much more up to date, so I’m testing it at home with this one, and installed it on isotester6 as a job with UEFI scenario was involved.

OK, let’s try this indeed!

However: let’s please not do any such thing without encoding it in Puppet.. especially when no ticket is tracking the clean up step: I’d like our Puppet recipes to remain an accurate description of the current state of our systems.

> There’s also a qemu-efi package in Stretch, with the edk2 EFI bootloader. Could be another candidate if OVMF is not enough reliable.

Great! Seems worth a try. Note that it’s built from the ovmf source package as well, and the package description doesn’t make it very clear how this firmware differs from the one shipped in the ovmf binary package (presumably it has less features, e.g. no Secure Boot support).

Another debugging step I want to take is to verify if we are experiencing a mere display issue, or a more serious problem with that UEFI firmware (e.g. I would drop anything that expects to see the boot menu, let it timeout and boot, and see if Tails boots as a result).

(And here again, I find it strange that this problem happens on Jenkins, while I’ve never seen it elsewhere.)

#6 Updated by intrigeri 2016-07-28 08:45:25

  • blocked by Bug #11588: Sometimes fails to boot from USB on Jenkins with I/O errors added

#7 Updated by bertagaz 2016-07-28 08:56:56

intrigeri wrote:
> > Stretch OVMF package is much more up to date, so I’m testing it at home with this one, and installed it on isotester6 as a job with UEFI scenario was involved.
>
> OK, let’s try this indeed!

Got some errors at home, did not debug yet if it was the same than with the previous OVMF version. Will report later too.

> However: let’s please not do any such thing without encoding it in Puppet.. especially when no ticket is tracking the clean up step: I’d like our Puppet recipes to remain an accurate description of the current state of our systems.

Yeah, that’s bad I know. I wanted to give it a quick try, and thought to note this by-hand change on this ticket.

> > There’s also a qemu-efi package in Stretch, with the edk2 EFI bootloader. Could be another candidate if OVMF is not enough reliable.
>
> Great! Seems worth a try. Note that it’s built from the ovmf source package as well, and the package description doesn’t make it very clear how this firmware differs from the one shipped in the ovmf binary package (presumably it has less features, e.g. no Secure Boot support).

From what I understood from the upstream sources, both firmware share the same repo. Not sure too what’s the difference, maybe just integrated feature and compile options.

> Another debugging step I want to take is to verify if we are experiencing a mere display issue, or a more serious problem with that UEFI firmware (e.g. I would drop anything that expects to see the boot menu, let it timeout and boot, and see if Tails boots as a result).

I can do that at home once my 50 runs of two scenarios are over.

> (And here again, I find it strange that this problem happens on Jenkins, while I’ve never seen it elsewhere.)

Seen it several time at home, as other weird behaviors of this bootloader. More on that soon.

#8 Updated by intrigeri 2016-07-29 07:34:13

bertagaz wrote:
> Yeah, that’s bad I know.

Then I’ve just reverted it. (I have zero faith in leaving a note on a ticket as a good reminder for such things — especially such a ticket that might not get fixed quickly, and then comments may accumulate, and that note may easily be lost deeeep in there; besides, adding a pinning entry with Puppet is cheap.)

#9 Updated by bertagaz 2016-07-29 09:21:23

So, I’ve run this two scenario while testing Bug #10777:

  Scenario: Legacy boot
    Given I have started Tails without network from a USB drive without a persistent partition \
      and stopped at Tails Greeter's login screen
    And I log in to a new session
    Then Tails is running from USB drive "__internal"

  Scenario: UEFI boot
    Given I have started Tails without network from a USB drive without a persistent partition \
      and stopped at Tails Greeter's login screen
    Then I power off the computer
    Given the computer is set to boot in UEFI mode
    When I start Tails from USB drive "__internal" with network unplugged and I login
    Then Tails is running from USB drive "__internal"
    And Tails has started in UEFI mode

For the UEFI scenario, run 50 times with the Jessie OVMF and the above scenarios, which resulted in:

  • 4 failures of type: After going out of the bootloader setup screen, UEFI never goes on booting syslinux, it keeps displaying the device probing list, and gets killed by a timeout.
  • 3 failures of type: Goes up to the kernel command line and type the additional boot options, then doesn’t seem to hit enter or anything and freezes. After 10 minutes, timeout and reboot the VM. Goes up to the syslinux screen, but then the kernel command line doesn’t seem to be opened, and Tails starts after the 3sec syslinux timeout.
  • 1 failure of type: Shows the Tianocore logo screen then switch to black screen until it reaches a timeout.
  • 1 failure of type: Seems to freeze on the Tinaocore logo screen, but finally goes on, pass the bootloader setup menu, and starts booting from the devices. Probe the usual two first, but then ends up into the UEFI shell.

I’ve also did a 50 times run of the above scenarios with OVMF from Stretch, which resulted in:

  • 2 failure of type: Show the Tianocore logo, pass the bootloader setup menu, then display a non-blinking cursor on a black screen until it reaches a timeout.
  • 1 failure of type: Goes on up to typing the options to the kernel command line, sits 5 minutes on this screen then show a kernel panic with message “Initramfs unpacking failed: XZ-compressed data is corrupt”
  • 1 failure of type: Seems to freeze on the Tinaocore logo screen, but finally goes on booting from the devices. Boot the usual two first, but then ends up into the UEFI shell.
  • 1 failure of type: freezes on the Tianocore logo screen and gets killed by the timeout.

So no big amelioration with more recent OVMF it seems, there still are some bugs. There are some patterns in this failures.

Tried once with the qemu-efi firmware, but it didn’t boot at all, show only a black screen indefinitely. Maybe I’ve misconfigured it.

Note that Plymouth is broken in UEFI mode, so a possible short-term workaround could be to wait for the “Loading, please wait…” message to be displayed, and reboot if it is not after a certain time thanks to Bug #10777.

The OVMF documentation says it’s possible to save debug logs from the firmware on the host. I have a patch that setup the necessary Qemu options to dump them and save them on failure with other artifacts. But the Debian package uses the “-b RELEASE” compile option, which deactivate them. I’ll build a Debian package with the debugging enabled to gather this stats from my runs at home. Can be interesting if we ever want to report/ask for help to upstream.

#10 Updated by intrigeri 2016-07-30 03:28:25

Note: the topic branch has test/11588-usb-on-jenkins merged in, so it’s affected by any improvement or regression documented on Bug #11588 (e.g. currently: “crashes during memory erasure on shutdown, but with Bug #10733 merged on top it seems to be fine”; once I’ve confirmed that Bug #10733 fixes that later today, the fix will flow into the topic branch for this ticket as well).

#11 Updated by intrigeri 2016-07-30 03:35:25

  • Assignee deleted (intrigeri)
  • Target version deleted (Tails_2.6)
  • Feature Branch changed from test/11583-uefi-boot-is-fragile to wip/test/11583-uefi-boot-is-fragile

bertagaz wrote:
> * 3 failures of type: Goes up to the kernel command line and type the additional boot options, then doesn’t seem to hit enter or anything and freezes. After 10 minutes, timeout and reboot the VM. Goes up to the syslinux screen, but then the kernel command line doesn’t seem to be opened, and Tails starts after the 3sec syslinux timeout.
> * 1 failure of type: Goes on up to typing the options to the kernel command line, sits 5 minutes on this screen then show a kernel panic with message “Initramfs unpacking failed: XZ-compressed data is corrupt”

These failure modes look suspisciously like the ones described on Bug #11588. I suspect that a number of the other failures you’ve seen also share the same root cause. IMO we should fix Bug #11588 before we try to make any sense of this very ticket: as long as we have fragile USB mass storage device emulation, we can’t possibly have robust UEFI boot off USB, and we have no way to know for sure if the problems we see are specific to UEFI or not. This is just a single scenario, that passes reliably enough for 2 of our usual RMs (so is not a problem at release time) => let’s not spend too much time on it now, we have much higher-impact places to work on in this test suite.

#12 Updated by bertagaz 2016-07-30 04:44:59

  • Assignee set to intrigeri
  • Target version set to Tails_2.6

intrigeri wrote:
> bertagaz wrote:
> > * 3 failures of type: Goes up to the kernel command line and type the additional boot options, then doesn’t seem to hit enter or anything and freezes. After 10 minutes, timeout and reboot the VM. Goes up to the syslinux screen, but then the kernel command line doesn’t seem to be opened, and Tails starts after the 3sec syslinux timeout.
> > * 1 failure of type: Goes on up to typing the options to the kernel command line, sits 5 minutes on this screen then show a kernel panic with message “Initramfs unpacking failed: XZ-compressed data is corrupt”
>
> These failure modes look suspisciously like the ones described on Bug #11588. I suspect that a number of the other failures you’ve seen also share the same root cause. IMO we should fix Bug #11588 before we try to make any sense of this very ticket: as long as we have fragile USB mass storage device emulation, we can’t possibly have robust UEFI boot off USB, and we have no way to know for sure if the problems we see are specific to UEFI or not. This is just a single scenario, that passes reliably enough for 2 of our usual RMs (so is not a problem at release time) => let’s not spend too much time on it now, we have much higher-impact places to work on in this test suite.

Interesting. Your reasoning make sense, let see what result you get with Bug #11588.

#13 Updated by intrigeri 2016-07-30 06:00:03

  • Assignee deleted (intrigeri)
  • Target version deleted (Tails_2.6)

#14 Updated by intrigeri 2016-10-29 13:39:13

  • Assignee set to intrigeri
  • Target version set to Tails_2.7
  • Feature Branch changed from wip/test/11583-uefi-boot-is-fragile to test/11583-uefi-boot-is-fragile

Refreshed the branch, adding to Jenkins. I’ll have a look in a week or so & see if it’s still broken (there are some slim chances that Bug #11588 has fixed it).

#15 Updated by intrigeri 2016-10-30 08:01:37

Last 3 runs fail with “Boot Failed. EFI Floppy” followed by failed attempts at doing PXE. I can’t reproduce this failure locally (sid). Running it again now that Bug #10777 is fixed (who knows).

#16 Updated by intrigeri 2016-11-01 15:23:04

intrigeri wrote:
> Last 3 runs fail with “Boot Failed. EFI Floppy” followed by failed attempts at doing PXE. I can’t reproduce this failure locally (sid). Running it again now that Bug #10777 is fixed (who knows).

Same problem even with the branch for Bug #10777 (except we now go through the UEFI firmware setup a few times, and sometimes see “Boot Failed. EFI Floppy”, and sometimes a black screen with a cursor).

#17 Updated by intrigeri 2016-11-01 15:23:28

  • Assignee deleted (intrigeri)
  • Target version deleted (Tails_2.7)

#18 Updated by anonym 2017-01-25 17:36:28

I just refreshed this branch for gathering data for Bug #12141. We’ll see.

#19 Updated by anonym 2017-01-25 17:36:42

  • related to Bug #12141: UEFI boot on QEMU is broken since 2.10~rc1 added

#20 Updated by anonym 2017-02-18 16:33:10

  • Assignee set to anonym

I just pushed a new branch called test/11583-uefi-boot-is-fragile-stretch (note the -stretch suffix) to see whether this problem remains on Stretch. I guess the problem will remain, but let’s see. I’ll take over the ticket until then.

#21 Updated by intrigeri 2017-05-18 11:07:12

  • Feature Branch changed from test/11583-uefi-boot-is-fragile to test/11583-uefi-boot-is-fragile-stretch

anonym wrote:
> I just pushed a new branch called test/11583-uefi-boot-is-fragile-stretch (note the -stretch suffix) to see whether this problem remains on Stretch. I guess the problem will remain, but let’s see. I’ll take over the ticket until then.

Updated and pushed it again. You might want to set a target version so you look at the results before the job is garbage collected :)

#22 Updated by intrigeri 2017-05-18 11:16:00

… and hopefully Bug #12511 will fix it, who knows :)

#23 Updated by intrigeri 2017-05-18 17:32:03

  • Status changed from Confirmed to In Progress
  • Target version set to Tails_3.0~rc1
  • % Done changed from 0 to 50
  • QA Check set to Ready for QA

It now passes: https://jenkins.tails.boum.org/job/test_Tails_ISO_test-11583-uefi-boot-is-fragile-stretch/1/cucumberTestReport/installing-tails-to-a-usb-drive/booting-tails-from-a-usb-drive-in-uefi-mode/ so please review and merge :)

#24 Updated by anonym 2017-05-19 19:18:59

  • Status changed from In Progress to Fix committed
  • Assignee deleted (anonym)
  • % Done changed from 50 to 100
  • QA Check changed from Ready for QA to Pass

intrigeri wrote:
> It now passes: https://jenkins.tails.boum.org/job/test_Tails_ISO_test-11583-uefi-boot-is-fragile-stretch/1/cucumberTestReport/installing-tails-to-a-usb-drive/booting-tails-from-a-usb-drive-in-uefi-mode/ so please review and merge :)

Excellent! It also passes for me locally now (IIRC it has been broken for me for the past few months). Merged!

#25 Updated by intrigeri 2017-05-23 09:02:54

  • Status changed from Fix committed to Resolved