Feature #11983

Check if the test suite has more failures on the reproducible ISO

Added by intrigeri 2016-11-21 15:31:13 . Updated 2017-05-23 09:10:00 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Test suite
Target version:
Start date:
2016-11-21
Due date:
% Done:

100%

Feature Branch:
Type of work:
Research
Blueprint:

Starter:
Affected tool:
Deliverable for:
289

Description


Subtasks


Related issues

Related to Tails - Feature #11971: Consider migrating some of /lib/live/config/* to systemd unit files Resolved 2016-11-20
Blocks Tails - Feature #12348: Review'n'merge the reproducible builds branch into feature/stretch Resolved 2017-03-15
Blocked by Tails - Bug #12491: Test suite fails to start Chutney, which breaks all online tests on Jenkins Resolved 2017-04-29

History

#1 Updated by anonym 2017-03-13 11:04:52

Comparing results from Jenkins:

Actually, here the failures in feature/5630-deterministic-builds are a subset for those that currently occur in feature/stretch, so it seems this ticket is not relevant any more.

I’m wondering if we mean full (incl. @fragile) test suite runs. I doubt it, but I guess we can keep this ticket around until that has been investigated.

#2 Updated by intrigeri 2017-03-13 11:45:43

  • Status changed from Confirmed to In Progress

> I’m wondering if we mean full (incl. @fragile) test suite runs. I doubt it, but I guess we can keep this ticket around until that has been investigated.

I think it’s worth making sure soon that this branch does not bring regressions to our full test suite.

#3 Updated by intrigeri 2017-03-13 12:03:11

  • Subject changed from Check why the test suite has more failures on the reproducible ISO to Check if the test suite has more failures on the reproducible ISO
  • % Done changed from 0 to 50

#4 Updated by intrigeri 2017-03-16 05:56:34

I’ve seen a few test suite failures due to lack of disk space on Jenkins. Can you please investigate if it’s due to the ISO built from this branch being a bit larger (in which case sysadmins need to adjust the infra side) or anything else?

#5 Updated by intrigeri 2017-03-16 12:47:49

  • blocks Feature #12348: Review'n'merge the reproducible builds branch into feature/stretch added

#6 Updated by intrigeri 2017-04-17 12:22:31

  • Assignee changed from anonym to intrigeri
  • Target version set to Tails_3.0

I’m running tests on a system that should not be affected by the “no space left” problem; but I’d rather wait for my Feature #12348 branch to be reviewed’n’merged before I give more RAM to Jenkins isotesters: that branch might be enough to fix this problem. Then someone (possibly me) should do a full test suite run.

#7 Updated by intrigeri 2017-04-17 16:44:01

Here are test regressions I’ve seen. I’ll file a ticket about the relevant ones after trying workarounds. Note that this test run happened on a heavily loaded system, which might explain some of these failures.

“Tails can boot from live systems stored on hard drives”: seabios says the hard drive is not a bootable device. Same for “Writing a Tails isohybrid to a USB drive and booting it”. Looking at the diff with feature/stretch + what we have in the APT overlay, the only possible cause I can think of is:

 # Options passed to isohybrid
-AMNESIA_ISOHYBRID_OPTS="-h 255 -s 63"
+AMNESIA_ISOHYBRID_OPTS="-h 255 -s 63 --id '$SOURCE_DATE_EPOCH'"

… so perhaps the ID we pass causes problems. The syslinux source code says:

uint32_t id = 0;                /* MBR: 0 <= id <= 0xFFFFFFFF(4294967296) */

… so our SOURCE_DATE_EPOCH should fit in there. The command line argument is converted with id = strtoul(optarg, &err, 0). If it’s not set on the command line, then 4 bytes are read from the ISO at offset 440, and then converted to an ID with id = lendian_int(id); only if that fails the ID is set to something random. Then the ID is written to the MBR like this:

tmp = lendian_int(id);
memcpy(mbr, &tmp, sizeof(tmp));
mbr += sizeof(tmp);                             /* offset 444 */

I’ll pass --verbose to isohybrid so I can understand a bit more what’s going on.

“Install packages with Synaptic”: reloading the packages list returns immediately, while when done with apt-get update it takes 2 minutes. The “I update APT using Synaptic” step implementation feels racy to me: it assumes there will be a /usr/lib/apt/methods/tor+http process running immediately, as soon as the “Reload” button is clicked. I’ll add a sleep there.

“The emergency shutdown applet can shutdown Tails”: systemd segfaults and the system doesn’t shut down. Wow. I’ll retry.

“A screenshot is taken when the PRINTSCREEN key is pressed”: no screenshot was taken. Looking at the video, it seems we press the key very early, possibly before GNOME Shell has finished setting up its keybindings. I’ll check if a sleep helps. Same for “GNOME notifications are shown to the user”, I think we’re a bit too fast for GNOME Shell: in another run I’ve seen “all notifications have disappeared” fail with The Dogtail script raised: SearchError: child of [desktop frame | main]: "gnome-shell" application (RuntimeError), perhaps that’s related.

Then I’ve seen a few “the remote shell seems to be down”. I’m not quite sure what’s happening, I’ll retry.

#8 Updated by intrigeri 2017-04-17 16:48:00

I’ve seen the test suite use up to 15632812 * 1024 bytes in /tmp/TailsToaster when run on this branch. On Jenkins we allocate only 14680064 such 1024 bytes blocks. I’ll see if I’ll allocate the missing 952748 * 1024 bytes so I can get test results from Jenkins.

#9 Updated by intrigeri 2017-04-17 16:58:29

intrigeri wrote:
> I’ll see if I’ll allocate the missing 952748 * 1024 bytes so I can get test results from Jenkins.

Done.

#10 Updated by intrigeri 2017-04-17 18:39:13

intrigeri wrote:
> “Install packages with Synaptic”: […]
>
> “The emergency shutdown applet can shutdown Tails”: systemd segfaults and the system doesn’t shut down. Wow. I’ll retry.
>
> “A screenshot is taken when the PRINTSCREEN key is pressed”:

I didn’t see those failures during my last test run… but I’ve seen a bunch of other weird ones. I’ll compare with runs on current feature/stretch done in the same testing environment.

#11 Updated by intrigeri 2017-04-17 18:56:32

intrigeri wrote:
> “Tails can boot from live systems stored on hard drives”: seabios says the hard drive is not a bootable device. Same for “Writing a Tails isohybrid to a USB drive and booting it”.

That was Bug #12453.

#12 Updated by intrigeri 2017-04-17 20:09:28

intrigeri wrote:
> intrigeri wrote:
> > I’ll see if I’ll allocate the missing 952748 * 1024 bytes so I can get test results from Jenkins.
>
> Done.

… but that’s not enough as I pushed only half of the change.

#13 Updated by intrigeri 2017-04-17 20:41:44

intrigeri wrote:
> “Install packages with Synaptic”: […]

Seen that one on https://jenkins.tails.boum.org/job/test_Tails_ISO_feature-stretch/319/console so it’s not caused by the Feature #5630 branch.

#14 Updated by intrigeri 2017-04-18 07:48:57

  • Assignee changed from intrigeri to anonym

I’ve run quite a few test suite runs (some of them full, some of them without the fragile tests) and while some of them were promising (i.e. no false positives I’ve never seen on feature/stretch), my gut feeling is that this branch tends to trigger more issues in the test suite that I’ve not seen much on feature/stretch, if at all (e.g. several kinds of remote shell communication problems, virt-viewer failing to start, the Greeter’s persistent volume passphrase being already filled with 4 chars when the Greeter appears in the Tor Browser persistent bookmarks scenario). Possible root causes I can think of: our fontconfig tricks (Feature #11971, possibly systemd units ordering problem or increased memory usage), /etc/machine-id being empty.

Also, the test suite takes a while longer to run on this branch than on feature/stretch (I’m tempted to blame the fontconfig thing, Feature #11971#note-19 might fix that).

So I don’t feel comfortable merging this branch for 3.0~beta4.

Next possible steps: either investigate failures further, or wait for Feature #11971#note-19 to be done, or roughly bisect feature/stretch..feature/5630-deterministic-builds to identify what breaks stuff.

#15 Updated by intrigeri 2017-04-18 07:54:56

  • blocked by Feature #11971: Consider migrating some of /lib/live/config/* to systemd unit files added

#16 Updated by intrigeri 2017-04-18 15:27:18

  • Target version changed from Tails_3.0 to Tails_3.0~rc1

#17 Updated by intrigeri 2017-04-19 14:18:18

intrigeri wrote:
> the Greeter’s persistent volume passphrase being already filled with 4 chars when the Greeter appears in the Tor Browser persistent bookmarks scenario).

I’ve just seen that one on feature/stretch, so if Feature #5630 does something about it, it’s probably just increasing the race condition occurrence rate.

#18 Updated by intrigeri 2017-04-29 10:11:19

  • blocked by Bug #12491: Test suite fails to start Chutney, which breaks all online tests on Jenkins added

#19 Updated by intrigeri 2017-05-19 07:23:02

  • blocked by Bug #12565: Test failures on Jenkins due to lack of disk space added

#20 Updated by intrigeri 2017-05-19 10:16:43

  • blocks deleted (Feature #11971: Consider migrating some of /lib/live/config/* to systemd unit files)

#21 Updated by intrigeri 2017-05-19 10:16:46

  • related to Feature #11971: Consider migrating some of /lib/live/config/* to systemd unit files added

#22 Updated by intrigeri 2017-05-19 10:17:18

Feature #11971 won’t be done in time for 3.0~rc1 so we have to handle this ticket with the current state of the Feature #5630 branch.

#23 Updated by intrigeri 2017-05-19 11:15:24

IMO, today we should look at the results for bugfix/11971-fontconfig-cache-in-iso, which is more likely to be merged than feature/5630-deterministic-builds alone for 3.0~rc1, as it’s less risky even though the resulting ISO won’t be reproducible. There will be test results on Jenkins around 5pm CEST today, but IMO we should start running locally earlier the specific features that had issues reported above.

#24 Updated by intrigeri 2017-05-19 12:46:22

Re-reading this ticket, two specific features shall be tested: features/emergency_shutdown.feature features/gnome.feature; I’ll do that. But the remote shell communication issues can happen anywhere, so looking at full test suite runs (likely on Jenkins) will be needed as well.

#25 Updated by intrigeri 2017-05-19 14:56:14

intrigeri wrote:
> Re-reading this ticket, two specific features shall be tested: features/emergency_shutdown.feature features/gnome.feature; I’ll do that.

I’ve run both once locally and they passed just fine.

> But the remote shell communication issues can happen anywhere, so looking at full test suite runs (likely on Jenkins) will be needed as well.

https://jenkins.tails.boum.org/job/test_Tails_ISO_feature-5630-deterministic-builds/69/cucumberTestReport/ and https://jenkins.tails.boum.org/job/test_Tails_ISO_bugfix-11971-fontconfig-cache-in-iso/6/cucumberTestReport/ are confidence-inspiring, even though they are not full test suite runs: only the USB tests fail (due to Bug #12565).

#26 Updated by anonym 2017-05-20 07:55:08

When I ran locally all tests for the state of this merged into 3.0~rc1, all scenarios passed on the first run except

  • one because of undefined tests (now fixed in commit:7afcda3ce5288dfe3dd16103bf8e80b87d858527)
  • 3-4 Pidgin ones (that I took over the sessions of and finished manually)
  • the persistent bookmarks one (which is broken)

That’s pretty good! :)

#27 Updated by anonym 2017-05-20 07:59:06

  • Status changed from In Progress to Fix committed
  • Assignee deleted (anonym)
  • % Done changed from 50 to 100
  • QA Check set to Pass

#28 Updated by intrigeri 2017-05-23 09:09:36

  • blocks deleted (Bug #12565: Test failures on Jenkins due to lack of disk space)

#29 Updated by intrigeri 2017-05-23 09:10:00

  • Status changed from Fix committed to Resolved