Bug #16843

Jenkins' isotesters fail to create VMs

Added by anonym 2019-06-27 08:11:03 . Updated 2019-07-02 07:33:22 .

Status:
Resolved
Priority:
High
Assignee:
intrigeri
Category:
Test suite
Target version:
Start date:
Due date:
% Done:

0%

Feature Branch:
Type of work:
Sysadmin
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

Most tests (since we need a VM for almost all tests) are failing like this:

  Scenario: Tails will not enable disk swap                       # features/untrusted_partitions.feature:6
    Given a computer                                              # features/step_definitions/common_steps.rb:46
    And I temporarily create a 100 MiB disk named "swap"          # features/step_definitions/common_steps.rb:59
    And I create a gpt swap partition on disk "swap"              # features/step_definitions/untrusted_partitions.rb:1
    And I plug sata drive "swap"                                  # features/step_definitions/common_steps.rb:66
    When I start Tails with network unplugged and I login         # features/step_definitions/common_steps.rb:118
      Call to virDomainCreateWithFlags failed: internal error: process exited while connecting to monitor: 2019-06-26T07:52:47.558939Z qemu-system-x86_64: Property '.md-clear' not found (Guestfs::Error)
      ./features/support/helpers/vm_helper.rb:693:in `create'
      ./features/support/helpers/vm_helper.rb:693:in `/^I start the computer$/'

Subtasks


History

#1 Updated by anonym 2019-06-27 08:31:44

So something is up with the md-clear CPUID flag. Locally I both tried disabling it an making it optional (in hopes of that working, and then checking if it fixed the issue on jenkins), but it indeed seems that this flag is needed to create the VM. Maybe it would be different if we didn’t use the host-model CPU?

Any way, these problems started occurring on the 23rd (or maybe earlier?). Here are some relevant changelogs:

libvirt (3.0.0-4+deb9u4) stretch-security; urgency=medium

  * Fix CVEs related to privilege escalations on R/O connections.
    - CVE-2019-10161:
      CVE-2019-10161-api-disallow-virDomainSaveImageGetXMLDesc-.patch
    - CVE-2019-10167:
      api-disallow-virConnectGetDomainCapabilities-on-read-only.patch
  * cpu_map: Define md-clear CPUID bit.
    CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091
  * Add spec-ctrl and ibpb CPU features and ibrs CPU models.
    CVE-2017-5753, CVE-2017-5715
  * Add ssbd CPU feature.
    CVE-2018-3639

 -- Guido Günther <agx@sigxcpu.org>  Wed, 12 Jun 2019 10:13:38 +0200

qemu (1:2.8+dfsg-6+deb9u7) stretch-security; urgency=medium

  * Fix the md_clear backport, thanks to Vincent Tondellier (Closes: #929067)

 -- Moritz Mühlenhoff <jmm@debian.org>  Wed, 05 Jun 2019 23:33:57 +0200

qemu (1:2.8+dfsg-6+deb9u6) stretch-security; urgency=medium

  [ Moritz Mühlenhoff <jmm@debian.org> ]
  * slirp-correct-size-computation-concatenating-mbuf-CVE-2018-11806.patch
    (Closes: #901017, CVE-2018-11806)
  * qga-check-bytes-count-read-by-guest-file-read-CVE-2018-12617.patch
    (Closes: #902725, CVE-2018-12617)
  * usb-mtp-use-O_NOFOLLOW-and-O_CLOEXEC-CVE-2018-16872.patch
    (Closes: #916397, CVE-2018-16872)
  * rtl8139-fix-possible-out-of-bound-access-CVE-2018-17958.patch
    (Closes: #911499, CVE-2018-17958)
  * lsi53c895a-check-message-length-value-is-valid-CVE-2018-18849.patch
    (Closes: #912535, CVE-2018-18849)
  * ppc-pnv-check-size-before-data-buffer-access-CVE-2018-18954.patch
    (Closes: #914604, CVE-2018-18954)
  * 9p-write-lock-path-in-v9fs-co_open2.patch
    9p-take-write-lock-on-fid-path-updates-CVE-2018-19364.patch
    (Closes: #914599, CVE-2018-19364)
  * 9p-fix-QEMU-crash-when-renaming-files-CVE-2018-19489.patch
    (Closes: #914727, CVE-2018-19489)
  * i2c-ddc-fix-oob-read-CVE-2019-3812.patch
    (Closes: #922635, CVE-2019-3812)
  * slirp-check-data-length-while-emulating-ident-function-CVE-2019-6778.patch
    (Closes: #921525, CVE-2019-6778)
  * slirp-check-sscanf-result-when-emulating-ident-CVE-2019-9824.patch
    (Closes: CVE-2019-9824)

  [ Michael Tokarev ]
  * enable-md-clear.patch
    define new CPUID for MDS
    (Closes: #929067)
    (Closes: CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091)
  * qxl-check-release-info-object-CVE-2019-12155.patch
    fixes null-pointer deref in qxl cleanup code
    (Closes: #929353, CVE-2019-12155)

 -- Michael Tokarev <mjt@tls.msk.ru>  Wed, 29 May 2019 14:39:09 +0300

#2 Updated by anonym 2019-06-27 08:36:33

I guess Jenkins’ nested setup also could be part of the problem, so it could be that the isotesters need a compatible policy for the md-clear CPUID flag.

#3 Updated by intrigeri 2019-06-30 15:21:11

  • Assignee changed from bertagaz to intrigeri

I’ll take a look. At first glance, it looks like we need to update our custom qemu package for isotesters so it supports the md-clear flag.

#4 Updated by intrigeri 2019-06-30 16:57:00

  • Status changed from Confirmed to Resolved

Build & uploaded 1:2.8+dfsg-6+deb9u7.0tails1, upgraded isotesters to it. Seems like it did the job.

#5 Updated by intrigeri 2019-07-01 05:44:08

  • Status changed from Resolved to In Progress

Meh, now all tests that need a persistent volume fail (which is precisely what our custom package is supposed to avoid, compared to pristine qemu from Stretch, i.e. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=851694).

#6 Updated by intrigeri 2019-07-01 05:59:41

Actually, the problem does not seem to be about persistence per-se: “Scenario Booting Tails from a USB drive without a persistent partition and creating one” fails at the “Given I have started Tails without network from a USB drive without a persistent partition and stopped at Tails Greeter’s login screen” step:

execution expired (RemoteShell::Timeout)
./features/support/helpers/remote_shell.rb:34:in `readline'
./features/support/helpers/remote_shell.rb:34:in `block (2 levels) in communicate'
./features/support/helpers/remote_shell.rb:33:in `loop'
./features/support/helpers/remote_shell.rb:33:in `block in communicate'
./features/support/helpers/remote_shell.rb:30:in `communicate'
./features/support/helpers/remote_shell.rb:72:in `execute'
./features/support/helpers/remote_shell.rb:81:in `initialize'
./features/support/helpers/vm_helper.rb:433:in `new'
./features/support/helpers/vm_helper.rb:433:in `execute'
./features/support/helpers/vm_helper.rb:472:in `has_network?'
./features/step_definitions/common_steps.rb:33:in `post_snapshot_restore_hook'
./features/step_definitions/snapshots.rb:127:in `/^I\ have\ started\ Tails\ without\ network\ from\ a\ USB\ drive\ without\ a\ persistent\ partition\ and\ stopped\ at\ Tails\ Greeter's\ login\ screen$/'
features/usb_install.feature:73:in `Given I have started Tails without network from a USB drive without a persistent partition and stopped at Tails Greeter's login screen'

But if I run this scenario locally, in isolation, it passes.

#7 Updated by intrigeri 2019-07-01 06:18:42

In the QEMU logs I see “qemu-system-x86_64: usb-msd: Bad signature 53425300”, which looks suspiciously like https://bugzilla.redhat.com/show_bug.cgi?id=1436616. And indeed, the patch that was identified to cause trouble (debian/patches/xhci-dont-kick-in-xhci_submit-and-xhci_fire_ctl_transfer.patch) appeared in our QEMU source tree when I merged 1:2.8+dfsg-6+deb9u7 yesterday.

This regression is supposedly fixed upstream by “[PATCH] xhci: flush dequeue pointer to endpoint context” aka. 243afe858b95765b98d1. I’ve cherry-picked it and am building 1:2.8+dfsg-6+deb9u7.0tails2. Buster has the fix already so I won’t bother trying to get this fixed in Stretch proper, given our past experience wrt. trying to get qemu fixes via stable updates.

#8 Updated by intrigeri 2019-07-01 06:56:11

Upgraded all isotesters to 1:2.8+dfsg-6+deb9u7.0tails2, let’s see how https://jenkins.tails.boum.org/view/Tails_ISO/job/manual_test_Tails_ISO_stable/20/console goes (although that job does not reboot the isotester so I’m not sure if it’s supposed to work).

#9 Updated by intrigeri 2019-07-02 07:33:22

  • Status changed from In Progress to Resolved