Bug #12579

reproducibly_build_Tails_ISO_* Jenkins job are broken

Added by intrigeri 2017-05-22 08:18:35 . Updated 2017-05-31 11:46:57 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Continuous Integration
Target version:
Start date:
2017-05-22
Due date:
% Done:

100%

Feature Branch:
Type of work:
Sysadmin
Blueprint:

Starter:
Affected tool:
Deliverable for:
289

Description

It would be nice to have CI again for reproducible builds, given we would like 3.0 to be reproducible (BTW I’m going to create a similar job for feature/stretch).

See e.g. https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_feature-5630-deterministic-builds/1/console: it seems that mv tails-* build-artifacts/ should be adjusted to the place where artifacts now land.


Subtasks


Related issues

Related to Tails - Bug #12599: /var/lib/libvirt/images gets filled on isobuilders Resolved 2017-05-25

History

#1 Updated by intrigeri 2017-05-24 06:20:09

Two notes:

#2 Updated by intrigeri 2017-05-24 06:20:21

  • Subject changed from reproducibly_build_Tails_ISO_feature-5630-deterministic-builds Jenkins job is broken to reproducibly_build_Tails_ISO_* Jenkins job is broken

#3 Updated by intrigeri 2017-05-24 06:20:29

  • Subject changed from reproducibly_build_Tails_ISO_* Jenkins job is broken to reproducibly_build_Tails_ISO_* Jenkins job are broken

#4 Updated by intrigeri 2017-05-24 06:20:58

  • Status changed from Confirmed to In Progress
  • % Done changed from 0 to 10

intrigeri wrote:
> it seems that mv tails-* build-artifacts/ should be adjusted to the place where artifacts now land.

At least that part has been fixed :)

#5 Updated by bertagaz 2017-05-24 10:03:56

intrigeri wrote:
> Two notes:
>
> * See commit 1b319e879c50eda576d4971f4521b164e477ac5e in puppet-tails.

I raised the diffoscope options because of this result which didn’t sound meaningful: https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_feature-5630-deterministic-builds/3/artifact/build-artifacts/tails-diffoscope.html. I didn’t see other results before though.

> * These jobs report success even when diffoscope crashed (https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_feature-stretch/12/console), which shouldn’t be the case.

Yes, I’ve seen that. I’m a bit surprised as the script is set -e. I’ll workaround that. Also it kinda trigger a memory of mine were you were complaining during the sprint about the diffoscope version in Debian. I wonder if we should try the one in experimental, that contains an item in the changelog mentioning Tails (saying it’s faster for us now).

> it seems that mv tails-* build-artifacts/ should be adjusted to the place where artifacts now land.

Yes, but I still need to do some polishing here.

#6 Updated by intrigeri 2017-05-24 11:24:55

> Yes, I’ve seen that. I’m a bit surprised as the script is set -e. I’ll workaround that. Also it kinda trigger a memory of mine were you were complaining during the sprint about the diffoscope version in Debian. I wonder if we should try the one in experimental, that contains an item in the changelog mentioning Tails (saying it’s faster for us now).

Yes, please :)

#7 Updated by intrigeri 2017-05-24 11:33:35

> I raised the diffoscope options because of this result which didn’t sound meaningful: https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_feature-5630-deterministic-builds/3/artifact/build-artifacts/tails-diffoscope.html.

Note that in most cases, a more complete binary diff of the ISO file itself provides essentially no value: the useful info will likely be about the content of the ISO and SquashFS. This output feels incomplete though, but I doubt raising the diff numbers will fix it (I might be wrong though).

#8 Updated by intrigeri 2017-05-25 09:01:12

Here’s a different and interesting failure mode: https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_feature-stretch/18/console

#9 Updated by bertagaz 2017-05-25 12:17:04

intrigeri wrote:
> Here’s a different and interesting failure mode: https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_feature-stretch/18/console

It seems to have happened sometimes already as indicated in the Jenkins builds logs. I bet the vagrant box is paused at some point, and the probable cause is lack of disk space either in /var/lib/jenkins or /var/lib/libvirt/images. The later seems to be most likely considering we host more baseboxes with the recent change in Feature #12409#note-34. I’ll have a look.

#10 Updated by bertagaz 2017-05-26 10:18:58

intrigeri wrote:
> > Yes, I’ve seen that. I’m a bit surprised as the script is set -e. I’ll workaround that. Also it kinda trigger a memory of mine were you were complaining during the sprint about the diffoscope version in Debian. I wonder if we should try the one in experimental, that contains an item in the changelog mentioning Tails (saying it’s faster for us now).
>
> Yes, please :)

I’ve installed it by hand on isobuilder2 to test it. It leads to two conclusions: we’ll need more space in the system partition, as this version pulls a lot more packages, and we’ll probably need to mount /tmp/ as a tmpfs, as this version fails to run because it lacks disk space, as shown here

#11 Updated by bertagaz 2017-05-26 10:24:06

  • related to Bug #12599: /var/lib/libvirt/images gets filled on isobuilders added

#12 Updated by intrigeri 2017-05-26 10:33:44

> we’ll need more space in the system partition, as this version pulls a lot more packages,

ACK

> and we’ll probably need to mount /tmp/ as a tmpfs, as this version fails to run because it lacks disk space, […]

Can’t we point its TMPDIR to some place that already has enough disk space, e.g. in the workspace of the current Jenkins job?

Rationale: I’d rather not invest RAM into this yet — we’re short on RAM, and these jobs don’t run that often, so most of the time the added memory would be wasted. If we ever need to optimize I/O for diffoscope, we can reconsider (as part of Bug #11680), but let’s make it work first, and think about making it faster later, if needed.

#13 Updated by bertagaz 2017-05-26 11:05:53

intrigeri wrote:
> Can’t we point its TMPDIR to some place that already has enough disk space, e.g. in the workspace of the current Jenkins job?
>
> Rationale: I’d rather not invest RAM into this yet — we’re short on RAM, and these jobs don’t run that often, so most of the time the added memory would be wasted. If we ever need to optimize I/O for diffoscope, we can reconsider (as part of Bug #11680), but let’s make it work first, and think about making it faster later, if needed.

I’ll investigate if diffoscope respects TMPDIR, but note that it does not necessary means having to buy more RAM: we already have around 14G of it assigned to isobuilders, that are not used when diffoscope runs, so it may be that we don’t have to add some. But I agree my proposal may be overkill anyway. Let’s try yours.

#14 Updated by intrigeri 2017-05-26 11:40:16

> I’ll investigate if diffoscope respects TMPDIR

IIRC it does but I might be confused :)

> but note that it does not necessary means having to buy more RAM: we already have around 14G of it assigned to isobuilders, that are not used when diffoscope runs, so it may be that we don’t have to add some.

Good news :)

> But I agree my proposal may be overkill anyway. Let’s try yours.

Well, with this info in hand: whatever, pick the one that’s easiest to implement :)

#15 Updated by bertagaz 2017-05-27 14:00:31

  • Assignee changed from bertagaz to intrigeri
  • % Done changed from 10 to 50
  • QA Check set to Ready for QA

intrigeri wrote:
> * These jobs report success even when diffoscope crashed (https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_feature-stretch/12/console), which shouldn’t be the case.

I’ve pushed fixes in puppet-tails’ master branch (referencing this ticket), that installs diffoscope from experimental on all isobuilders and make it so that the build fails if diffoscope doesn’t report success. I think that was the last issue of this ticket, the others (disk space issues) are already tracked by Bug #12574, Bug #12595 or Bug #12599, so let’s put that ticket RfQA.

#16 Updated by bertagaz 2017-05-27 14:28:15

I forgot to mention it has run there already

#17 Updated by intrigeri 2017-05-28 08:05:44

> I forgot to mention it has run there already

\o/

#18 Updated by intrigeri 2017-05-28 08:30:21

  • QA Check changed from Ready for QA to Info Needed

> I’ve pushed fixes in puppet-tails’ master branch (referencing this ticket),

Great, thanks :)

> that installs diffoscope from experimental on all isobuilders

  • I’ve pushed commit:ebb0b29 on top.
  • Why do we need to pin all packages from experimental to 100?
  • A few days ago I also did commit 26353652539a31902734a0ab19386c12e875a131 in the jenkins-jobs repo, but that’s not enough for the --html-dir to be archived. So I did 42738a9 there again. If that doesn’t work either I’ll simply do --html-dir "${ARTIFACTS_DIR}". I’ll track & handle this, not a blocker for this ticket.

> and make it so that build fails if diffoscope doesn’t report success.

Looks great.

> I think that was the last issue of this ticket, the others (disk space issues) are already tracked by Bug #12574, Bug #12595 or Bug #12599, so let’s put that ticket RfQA.

OK! Do you think we can close this ticket once the single question above is addressed (we can still reopen it if we notice issues specific to these jobs i.e. that don’t happen on build_Tails_ISO_* again)? Or mark it as blocked by the tickets that track root causes of failures, so we don’t close it until it’s fully resolved in practice?

#19 Updated by intrigeri 2017-05-28 08:42:21

  • Assignee changed from intrigeri to bertagaz
  • % Done changed from 50 to 70

#20 Updated by intrigeri 2017-05-28 10:24:03

I’ve also pushed [master ee1d87d] Reproducible ISO builds: clean old baseboxes before building (refs: <del><a class='issue tracker-1 status-3 priority-4 priority-default closed child' href='/code/issues/12579' title='reproducibly_build_Tails_ISO_* Jenkins job are broken'>Bug #12579</a></del>). to jenkins-jobs.git, let’s see if this helps. Let me know if there was a good reason not to do it, and sorry if that’s the case!

#21 Updated by bertagaz 2017-05-29 14:53:32

  • Assignee changed from bertagaz to intrigeri
  • QA Check changed from Info Needed to Ready for QA

intrigeri wrote:
> * Why do we need to pin all packages from experimental to 100?

I didn’t know there was a default pining for experimental. Pushed a commit that will remove it from isobuilders. Will remove this lines later.

> > I think that was the last issue of this ticket, the others (disk space issues) are already tracked by Bug #12574, Bug #12595 or Bug #12599, so let’s put that ticket RfQA.
>
> OK! Do you think we can close this ticket once the single question above is addressed (we can still reopen it if we notice issues specific to these jobs i.e. that don’t happen on build_Tails_ISO_* again)? Or mark it as blocked by the tickets that track root causes of failures, so we don’t close it until it’s fully resolved in practice?

I think we can close it, and re-open when/if we stumble upon a new issue specific to the reproducible builds setup in Jenkins (or open a new one).

#22 Updated by intrigeri 2017-05-30 15:26:02

The last few builds failed, probably due to my tweaks wrt. diffoscope’s --html-dir artifacts, so I’ve dropped them. Let’s see how it goes now.

#23 Updated by intrigeri 2017-05-31 07:53:56

The last failure I’ve seen was caused by Bug #12618. Trying to build again and see if I can eventually see this job fail for a good reason.

#24 Updated by intrigeri 2017-05-31 11:46:57

  • Status changed from In Progress to Resolved
  • Assignee deleted (intrigeri)
  • % Done changed from 70 to 100
  • QA Check changed from Ready for QA to Pass

OK, I’ve not seen failures specific to these jobs anymore, although I find it suspicious that 2 of the 3 occurrences of Bug #12618 happened with them. Closing for now anyway.