Bug #15107

Some topic branches are never built reproducibly during Jenkins daily runs => add option to specify which APT snapshot serials to use during build

Added by bertagaz 2017-12-26 11:10:18 . Updated 2019-01-30 11:48:15 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Continuous Integration
Target version:
Start date:
2017-12-26
Due date:
% Done:

100%

Feature Branch:
bugfix/15107-specify-APT-snapshots-serial-to-build-with
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

Our branches based on devel use “latest” snapshots for every APT archive used at build time => their “reproducibly” (sic) job will fail if one these APT snapshots is updated between the start of the original build job and the start of the “reproducibly” job. Our branches based on stable are also affected, but to a lesser degree: they use the “latest” snapshot only for the debian-security archive. Daily Jenkins runs are scheduled deterministically, based on the name of the branch. So inevitably, the automatic daily rebuild of some branches will always fail to build reproducibly.

On Bug #14924 we implemented a workaround for stable/testing/devel but long term, the only sane option is probably to teach our build system how to use snapshots specified on the command line (instead of “latest”), and to teach Jenkins to pass the correct parameters when trying to rebuild a given ISO image.


Files


Subtasks


Related issues

Related to Tails - Bug #14924: reproducibly_build_Tails_ISO_stable Jenkins job always fails Resolved 2017-11-06
Related to Tails - Bug #14944: jenkins-data-disk is running out of diskspace Resolved 2017-11-09

History

#1 Updated by bertagaz 2017-12-26 11:10:37

  • related to Bug #14924: reproducibly_build_Tails_ISO_stable Jenkins job always fails added

#2 Updated by bertagaz 2017-12-26 11:11:00

  • related to Bug #14944: jenkins-data-disk is running out of diskspace added

#3 Updated by bertagaz 2018-01-01 14:37:40

  • Status changed from Confirmed to In Progress

Applied in changeset commit:aafdf8da72df5e9ffdada2eabc920325a6e13520.

#4 Updated by intrigeri 2018-01-01 19:44:55

> Applied in changeset commit:aafdf8da72df5e9ffdada2eabc920325a6e13520.

Great to see someone picking this up! (Wanna assign it to yourself? Otherwise drop the target version: it feels weird to see a ticket for next release without any assignee.)

One comment: this seems to assume all “latest” serials are kept in sync, which is not true. You can very well have an ISO build where each “latest” pointer is resolved to different values. So I guess we need to store each such value, either serializing them all in one single variable with JSON or similar, or with one variable per repo (urgh).

#5 Updated by intrigeri 2018-01-03 18:26:36

  • Target version deleted (Tails_3.5)

We’ve triaged this during the monthly meeting “Volunteers to handle important tickets flagged for next release, but without assignee” topic. If you’re taking responsibility for this, please assign this ticket to yourself and then set whatever target version you want, but the status of a ticket flagged for next release, but without assignee, is too unclear.

#6 Updated by bertagaz 2018-01-03 23:01:54

intrigeri wrote:
> We’ve triaged this during the monthly meeting “Volunteers to handle important tickets flagged for next release, but without assignee” topic. If you’re taking responsibility for this, please assign this ticket to yourself and then set whatever target version you want, but the status of a ticket flagged for next release, but without assignee, is too unclear.

Ack. Too bad for Bug #14944. This ticket’s priority is ‘normal’ though.

#7 Updated by intrigeri 2018-01-04 05:47:43

> Too bad for Bug #14944.

I don’t understand what you mean: we certainly did not imply that you could not work on this if you fancy it or feel responsible for it.

#8 Updated by bertagaz 2018-01-04 15:48:01

intrigeri wrote:
> I don’t understand what you mean: we certainly did not imply that you could not work on this if you fancy it or feel responsible for it.

Ack, I was confused and surprised to see that change.

> One comment: this seems to assume all “latest” serials are kept in sync, which is not true. You can very well have an ISO build where each “latest” pointer is resolved to different values. So I guess we need to store each such value, either serializing them all in one single variable with JSON or similar, or with one variable per repo (urgh).

If by that you mean that in the build process when apt-snapshots-serials prepare-build is run, the latest snapshots are rotated at the same time and thus at least one of the serial returned during the resolving is different from the others, I agree. I did not think about that.

Now I think this scenario is possible but is rarely happening (and hits devel and branches based on it only). The development cost to implement a solution to that is quite high IMO, at least for me. OTOH the proposed patch is fixing the situaion for Bug #14944 for a very huge part of the builds. I have other things to do for 3.5, so I don’t think I’ll have the time to work on implementing more than what is proposed.

So I think we should merge the proposed patch, which would greatly help for Bug #14944, and see later if we need this much to spend our work time on fixing the problem you’re pointing.

Not sure how to proceed…

#9 Updated by intrigeri 2018-01-04 17:23:35

Hi!

>> One comment: this seems to assume all “latest” serials are kept in sync, which is not true. You can very well have an ISO build where each “latest” pointer is resolved to different values. So I guess we need to store each such value, either serializing them all in one single variable with JSON or similar, or with one variable per repo (urgh).

> If by that you mean that in the build process when apt-snapshots-serials prepare-build is run, the latest snapshots are rotated at the same time

No, that’s not what I meant (or I misunderstood what you think I meant). Each of the APT archives we maintain snapshots of is updated at different times: the load is spread over the entire day. So when we start an ISO build it’s entirely possible that the latest snapshot for the debian one is YYYYMMDD02 while the latest snapshot for the torproject archive is YYYYMMDD01. Whenever this happens, with the proposed patch the 2nd ISO build will fail because APT will try to fetch a snapshot of torproject indices that does not exist yet. And symetrically, if the first build used different “latest” serials for 2+ APT archives, with the proposed patch the second build will necessarily produce a different ISO, while right now there’s a good chance it would manage to reproduce the output of the first build.

Clearer?

> Now I think this scenario is possible but is rarely happening

Do you still think that with the above clarification? If yes, I’m curious why: according to git log -p config/APT_snapshots.d/, it happened 3 times in the 5 times we froze APT snapshots for a release in 2017. Granted, that’s a very small sample; if you have better data, please share.

> The development cost to implement a solution to that is quite high IMO, at least for me.

Understood. Let’s take this into account. If it may help, I think that I could get this part done painlessly (but I’d rather not handle the integration with Jenkins). Shall I give it a try (as a volunteer)?

> OTOH the proposed patch is fixing the situaion for Bug #14944 for a very huge part of the builds.

I have no idea if the proposed patch will fix more problems than it’ll introduce regressions. Our daily builds start at a deterministic time (based on a hash of the branch name) so:

  • It will fix the problem for a subset of the branches that currently fail to build reproducibly due to APT snapshots being updated during the first build. This is good.
  • It will break the second ISO build (due to trying to use non-existing snapshots) for another subset of the branches, that currently build reproducibly just fine. That’s more false positives.
  • It will successfully build a different second ISO, i.e. fail to build reproducibly for another subset of the branches, that currently build reproducibly just fine: if the first build was using different “latest” serials for 2+ APT archives, the second build will necessarily be different. That’s more false positives.

Similarly, for some part of the builds triggered by Git changes it will fix the problem, for some others it will break the second build, and for some other it will make the second build fail to reproduce.

I won’t do the maths right now. So either you do the maths yourself, or we simply try and remain ready to quickly revert if the proposed change makes things worse.
Or I implement the serialization thing I’ve proposed, that shouldn’t take me more time than arguing, or doing the maths, or pretending I’m not doing them ;)

What do you think?

> I have other things to do for 3.5, so I don’t think I’ll have the time to work on implementing more than what is proposed.

Fair enough!

#10 Updated by intrigeri 2018-01-05 10:36:12

  • % Done changed from 0 to 10

intrigeri wrote:
> > The development cost to implement a solution to that is quite high IMO, at least for me.
>
> Understood. Let’s take this into account. If it may help, I think that I could get this part done painlessly (but I’d rather not handle the integration with Jenkins). Shall I give it a try (as a volunteer)?

I pushed a draft implementation. I refrained from using Perl and instead did Ruby so that you and anonym understand what’s going on and can take ownership of this tiny piece of code if needed in the future. The prepare-build part seems to work just fine. I didn’t know what your plan was wrt. storing the serials of the snapshots used for a given build, so for now I’ve merely added a cat-json action as an example of how it would look like; I’m happy to adjust it to suit whatever interface is required by the integration technique you have in mind; who knows, perhaps calling it with tmp/APT_snapshots.d as an argument would be enough?

Hoping this helps.

#11 Updated by Anonymous 2018-01-15 09:50:12

@bertagaz: Is this ticket still relevant? If yes, can you assign it to yourself or the relevant person? Thanks!

#12 Updated by intrigeri 2018-04-08 17:39:12

  • Tracker changed from Feature to Bug
  • Subject changed from Add option to specify which APT snapshot serial to use during build to Some topic branches are never built reproducibly during Jenkins daily runs => add option to specify which APT snapshot serials to use during build

#13 Updated by intrigeri 2018-04-08 17:44:42

  • Description updated

#14 Updated by intrigeri 2018-04-08 17:55:38

  • Category changed from Build system to Continuous Integration
  • Assignee set to intrigeri
  • Parent task deleted (Feature #5630)
  • Feature Branch set to bugfix/15107-specify-APT-snapshots-serial-to-build-with
  • Deliverable for deleted (301)

#16 Updated by intrigeri 2018-12-02 13:40:57

intrigeri wrote:
> I pushed a draft implementation. I refrained from using Perl and instead did Ruby so that you and anonym understand what’s going on and can take ownership of this tiny piece of code if needed in the future. The prepare-build part seems to work just fine. I didn’t know what your plan was wrt. storing the serials of the snapshots used for a given build, so for now I’ve merely added a cat-json action as an example of how it would look like; I’m happy to adjust it to suit whatever interface is required by the integration technique you have in mind; who knows, perhaps calling it with tmp/APT_snapshots.d as an argument would be enough?

Here’s an implementation plan. It might not be the nicest approach but I’ve picked it because it only requires changes in tails.git, so it should be easy to try out on a topic branch without impacting any other branch.

During the first build (build_Tails_ISO_*), after figuring out which APT snapshots shall be used (apt-snapshots-serials prepare-build), run echo "APT_SNAPSHOT_SERIALS='$(apt-snapshots-serials-cat-json tmp/APT_snapshots.d)'" >> tails-build-env.list. Later, once the build has succeeded, the collect_build_environment Jenkins builder will run puppet-tails:files/jenkins/slaves/isobuilders/collect_build_environment that will append other variables to this file.

Then the reproducibly_build_Tails_ISO job already loads tails-build-env.list so rake and auto/build should do the right thing, i.e. pass that inherited chunk of JSON to apt-snapshots-serials prepare-build.

#17 Updated by intrigeri 2018-12-02 14:16:26

  • QA Check set to Ready for QA

(Forcing testing of build reproducibility.)

#18 Updated by intrigeri 2018-12-02 15:51:49

To test this, we need to start builds at a time that will trigger the failure condition:

  • baseline i.e. reproduce the problem: start a build of the devel branch a few minutes before 05:15, 11:15, 17:15, or 23:15 (UTC)
  • validate the fix: start a build of this branch a few minutes before 05:15, 11:15, 17:15, or 23:15 (UTC)

The debian-security snapshot update starts at 05:15, 11:15, 17:15, 23:15 UTC. It generally lasts less than an ISO build. Starting the first builds a few minutes before one of these times should be enough to almost guarantee that the 2nd builds will use a different snapshot and fail to reproduce.

#19 Updated by intrigeri 2018-12-03 07:43:23

  • Target version set to Tails_3.12
  • % Done changed from 10 to 40

intrigeri wrote:
> To test this, we need to start builds at a time that will trigger the failure condition:
>
> * baseline i.e. reproduce the problem: start a build of the devel branch a few minutes before 05:15, 11:15, 17:15, or 23:15 (UTC)

As expected, https://jenkins.tails.boum.org/view/Tails_ISO/job/reproducibly_build_Tails_ISO_devel/427/artifact/build-artifacts/diffoscope.iso.html could not reproduce its upstream build.

> * validate the fix: start a build of this branch a few minutes before 05:15, 11:15, 17:15, or 23:15 (UTC)

https://jenkins.tails.boum.org/view/Tails_ISO/job/reproducibly_build_Tails_ISO_bugfix-15107-specify-apt-snapshots-serial-to-build-with/1/ did reproduce its upstream build and the build log show that Fixing 'latest' APT snapshots serials to: '{"torproject":"2018091802","debian":"2018100901","debian-security":"2018120203"}' which matches the upstream build’s https://jenkins.tails.boum.org/job/build_Tails_ISO_bugfix-15107-specify-apt-snapshots-serial-to-build-with/1/artifact/build-artifacts/tails-build-env.list/*view*/. Same for builds 2 of these two jobs.

And now I’ve made devel the base branch for this topic branch and will do this test again before submitting for review.

#20 Updated by intrigeri 2018-12-03 18:48:43

  • Assignee changed from intrigeri to anonym
  • % Done changed from 40 to 50

intrigeri wrote:
> And now I’ve made devel the base branch for this topic branch and will do this test again before submitting for review.

https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_bugfix-15107-specify-apt-snapshots-serial-to-build-with/6/ (started after the APT snapshots update) successfully reproduced https://jenkins.tails.boum.org/job/build_Tails_ISO_bugfix-15107-specify-apt-snapshots-serial-to-build-with/6/ (that resolved the latest snapshots before the update).

Candidate for 3.12 i.e. merging into devel.

#21 Updated by intrigeri 2019-01-04 15:20:56

  • Assignee deleted (anonym)

#22 Updated by lamby 2019-01-06 17:35:51

  • Assignee set to lamby

#23 Updated by lamby 2019-01-06 21:52:06

  • Checked out feature branch
  • Built with rake build without environment set (see first buildlog)
  • Built and booted fine, etc.
  • Ran:
$ python3 -c "import json, yaml; data = yaml.load(open('tails-amd64-bugfix_15107-specify-APT-snapshots-serial-to-build-with-3.12-20190106T1750Z-ece9232b9d.build-manifest').read()); print('APT_SNAPSHOTS_SERIALS=\'{}\''.format(json.dumps({x: y['reference'] for x, y in data['origin_references'].items()}).replace(' ', '')))"

.. to get:

APT_SNAPSHOTS_SERIALS='{"debian":"2019010603","debian-security":"2019010603","torproject":"2019010603"}'
  • Then ran a build with:
$ APT_SNAPSHOTS_SERIALS='{"debian":"2019010603","debian-security":"2019010603","torproject":"2019010603"}' rake build
  • This built (see attached second buildlog)
  • I did not see anything in the build log regarding - I was expecting some message that is echoed. I did see that the serials did not change between these builds but that might be timing… Leaving as “Ready for QA

Questions:

  • Am I missing something? I think I am…!
  • I did get a tails-build-env.list file the first build … should this not have the branch name in it?

#24 Updated by intrigeri 2019-01-07 09:20:04

  • Assignee changed from intrigeri to lamby

Your test methodology looks good to me.

> * I did not see anything in the build log regarding - I was expecting some message that is echoed.

tl;dr: look at the console output (instead of the .buildlog file) and you should see that message.

Indeed, echo "Fixing 'latest' APT snapshots serials to: '${APT_SNAPSHOTS_SERIALS}'." should print something. This is the case on https://jenkins.tails.boum.org/view/Tails_ISO/job/reproducibly_build_Tails_ISO_bugfix-15107-specify-apt-snapshots-serial-to-build-with/29/consoleFull. But as you’ve noticed, that echo’ed line does not appear in the .buildlog file (https://jenkins.tails.boum.org/view/Tails_ISO/job/reproducibly_build_Tails_ISO_bugfix-15107-specify-apt-snapshots-serial-to-build-with/29/artifact/build-artifacts/2/tails-amd64-bugfix_15107-specify-APT-snapshots-serial-to-build-with-3.12-20190114T1015Z-ece9232b9d+devel@e0ec60c0e2.buildlog/*view*/). That’s because the .buildlog file only captures the output of lb build; for details, git grep BUILD_LOG -- auto.

> * I did get a tails-build-env.list file the first build … should this not have the branch name in it?

Why do you think it should contain the branch name? (Genuine, non-rhetorical question :)

Our Jenkins machinery appends info to this file, when trying to reproduce a 1st build, as a way to forward to the 2nd build the info it needs to build from the same branch/commit/etc., e.g. https://jenkins.tails.boum.org/view/Tails_ISO/job/reproducibly_build_Tails_ISO_bugfix-15107-specify-apt-snapshots-serial-to-build-with/29/artifact/build-artifacts/1/tails-build-env.list/*view*/. Implementation: https://git-tails.immerda.ch/jenkins-jobs/tree/macros/build_Tails_ISO.yaml#n27 calls https://git.tails.boum.org/puppet-tails/tree/files/jenkins/slaves/isobuilders/collect_build_environment. But we don’t do that outside of Jenkins. I think that’s because so far, we did not try to actively support the “try to reproduce locally a non-release build” usecase, so all the logic and plumbing is implemented only for Jenkins. The only exception IIRC is the APT snapshots logic, which is fully managed in tails.git; that’s why I’ve implemented my branch here in a way that makes the new feature usable both on Jenkins and outside of it. From a design and developer experience PoV, personally I’d like to move towards supporting “try to reproduce this” outside of Jenkins, with something like https://jenkins.tails.boum.org/view/Tails_ISO/job/reproducibly_build_Tails_ISO_bugfix-15107-specify-apt-snapshots-serial-to-build-with/29/artifact/build-artifacts/1/tails-build-env.list/*view*/ specifying what this is (similarly to a buildinfo file I guess). It would make it easier to work on this part of Tails locally without fiddling with our production Jenkins setup, and more generally it would lower our coupling with Jenkins (which makes it easier to use other CI systems in the future if we want). Some day, maybe :)

#25 Updated by lamby 2019-01-07 09:25:31

  • Assignee changed from lamby to intrigeri

intrigeri wrote:

> > * I did get a tails-build-env.list file the first build … should this not have the branch name in it?
>
> Why do you think it should contain the branch name?

Simply to match the other generated files. But, as you imply, the entire point is to be across builds or otherwise separate from each specific one.

> .buildlog file only captures the output of lb build; for details, git grep BUILD_LOG -- auto.

Ah, TIL!

#26 Updated by intrigeri 2019-01-07 10:14:52

FTR when invalid JSON is passed, the build fails as expected:

+ [ -n {"torproject":"2019010603","debian":"2019010603","debian-security":"2019010603",} ]
+ echo Fixing 'latest' APT snapshots serials to: '{"torproject":"2019010603","debian":"2019010603","debian-security":"2019010603",}'.
Fixing 'latest' APT snapshots serials to: '{"torproject":"2019010603","debian":"2019010603","debian-security":"2019010603",}'.
+ apt-snapshots-serials prepare-build {"torproject":"2019010603","debian":"2019010603","debian-security":"2019010603",}
/usr/lib/ruby/2.3.0/json/common.rb:156:in `parse': 822: unexpected token at '{"torproject":"2019010603","debian":"2019010603","debian-security":"2019010603",}' (JSON::ParserError)
    from /usr/lib/ruby/2.3.0/json/common.rb:156:in `parse'
    from /usr/lib/ruby/2.3.0/json/common.rb:335:in `load'
    from /tmp/tails-build.K6g1xPRu/auto/scripts/apt-snapshots-serials-load-json:16:in `<main>'

#27 Updated by intrigeri 2019-01-07 10:21:39

  • Status changed from In Progress to Fix committed
  • % Done changed from 50 to 100

Applied in changeset commit:tails|c369ace306dc3f4588313e91b73a6eefa09a80d6.

#28 Updated by intrigeri 2019-01-07 10:22:07

  • Assignee deleted (intrigeri)
  • QA Check changed from Ready for QA to Pass

#29 Updated by anonym 2019-01-30 11:48:15

  • Status changed from Fix committed to Resolved