Feature #12608: Analyze what's still not reproducible on current testing branch

Feature #12608

Analyze what's still not reproducible on current testing branch

Added by intrigeri 2017-05-27 07:18:44 . Updated 2017-08-04 10:17:51 .

Status:

Resolved

Priority:

Elevated

Assignee:

Category:

Target version:

Tails_3.1

Start date:

2017-05-27

Due date:

% Done:

100%

Feature Branch:

Type of work:

Research

Blueprint:

Starter:

Affected tool:

Deliverable for:

289

Description

https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_testing/3/artifact/build-artifacts/diffoscope.html seems to show differences that are not tracked by any ticket yet, and that will need to be addressed if we want 3.0 to build reproducibly. This analysis is needed for us to sanity check our goal of a reproducible 3.0, hence the Elevated priority.

Files

diffoscope-muri.html.xz (260284 B)	anonym, 2017-06-09 09:55:27
diffoscope-arnaud.html.xz (409260 B)	anonym, 2017-06-09 09:55:27
arnaud.iso.torrent (92542 B)	anonym, 2017-06-09 10:07:31
good.iso.torrent (92539 B)	anonym, 2017-06-09 10:10:41

Subtasks

History

#1 Updated by intrigeri 2017-05-31 09:48:54

Status changed from Confirmed to In Progress
% Done changed from 0 to 10

I did a little bit of that => ~~Bug #12619~~, ~~Bug #12620~~.

#2 Updated by anonym 2017-06-01 09:47:35

% Done changed from 10 to 30

I’m looking at: https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_testing/13/artifact/build-artifacts/diffoscope.html

It looks like we have unmerged fixes for all issues I see there except: some of the expected differences made live/filesystem.squashfs about 20 KiB different in size (nothing unexpected so far). But then I see that the value in the column after the date has changed for these files:

live/initrd.img
live/utils/linux/syslinux
utils/mb/mbr.bin
utils/win32/syslinux.exe

Assuming that value is the extent/sector offset inside the ISO filesystem, can it be explained by the different size of live/filesystem.squashfs (i.e. it “bled over” into a new extent/sector, shifting the extent/sector these files end up in)?

#3 Updated by intrigeri 2017-06-01 10:04:06

> But then I see that the value in the column after the date has changed for these files:
> […]
> Assuming that value is the extent/sector offset inside the ISO filesystem, can it be explained by the different size of live/filesystem.squashfs (i.e. it “bled over” into a new extent/sector, shifting the extent/sector these files end up in)?

This hypothesis totally makes sense to me, especially given the constant delta (10) between these values, and the fact no file under 2192 has this problem. Once all the other pending fixes are merged into testing and in turn into feature/5630-deterministic-builds, https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_feature-5630-deterministic-builds/ (that should produce a deterministic SquashFS once we’re there) should allow you to confirm it.

(And now I wonder where isoinfo displays more precise time info than a mere date; perhaps it simply doesn’t; perhaps the ISO9660 file format has no space for that. Whatever :)

#4 Updated by anonym 2017-06-01 10:18:17

intrigeri wrote:
> > But then I see that the value in the column after the date has changed for these files:
> > […]
> > Assuming that value is the extent/sector offset inside the ISO filesystem, can it be explained by the different size of live/filesystem.squashfs (i.e. it “bled over” into a new extent/sector, shifting the extent/sector these files end up in)?
>
> This hypothesis totally makes sense to me, especially given the constant delta (10) between these values, and the fact no file under 2192 has this problem.

Thanks! Now I feel confident to assume my hypothesis is true when estimating the overall status of the reproducibility effort. IOW, the status is: currently cloudy, but the forecast predicts sun and blue skies! :)

#5 Updated by anonym 2017-06-01 16:43:15

As of build #15 from commit:49dd3889caa1664c7511eeaf250852168a898b56 (where many of my recent reproducibility fixes were merged) only the fontconfig issue + the extents shift (~~Feature #12608#note-2~~) remain!

#6 Updated by anonym 2017-06-03 15:39:32

I am predicting that build #24 will be reproducible! :)

#7 Updated by intrigeri 2017-06-04 10:50:39

anonym wrote:
> I am predicting that build #24 will be reproducible! :)

Nope, due to a bug on our infra. Hopefully fixed so https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_testing/27/ should be reproducible.

Is that enough to close this ticket, or do we want to test reproducibility with different CPUs and/or more build time difference? I think we could quite easily test that on Jenkins thanks to the build options I’ve introduced.

#8 Updated by anonym 2017-06-04 15:05:26

Assignee changed from anonym to intrigeri
QA Check set to Info Needed

intrigeri wrote:
> anonym wrote:
> > I am predicting that build #24 will be reproducible! :)
>
> Nope, due to a bug on our infra. Hopefully fixed so https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_testing/27/ should be reproducible.

Yup! #27 and #28 were successful!

> Is that enough to close this ticket, or do we want to test reproducibility with different CPUs and/or more build time difference? I think we could quite easily test that on Jenkins thanks to the build options I’ve introduced.

If you want to, sure, but I’d much rather see us try reproducing the exact same images when building from the same commit on Jenkins, my old laptop, my new laptop, your laptop, sib, bertagaz system, and so on. That will require a bit of coordination, namely that no package is uploaded to any APT overlay used by the testing branch (so I’m not bothering now since I’m expecing you and Alan to upload new Greeter packages to bugfix-greeter-fixes-for-3.0). What do you think?

#9 Updated by intrigeri 2017-06-04 16:19:44

Assignee changed from intrigeri to anonym

>> Is that enough to close this ticket, or do we want to test reproducibility with different CPUs and/or more build time difference? I think we could quite easily test that on Jenkins thanks to the build options I’ve introduced.

> If you want to, sure,

I think we should do that anyway (I mean, every time we build a second time, we should introduce as many variations as we can easily do), but not necessarily as a blocker for this ticket nor for 3.0 ⇒ ticket? I’ll check with bertagaz who of us does it.

> but I’d much rather see us try reproducing the exact same images when building from the same commit on Jenkins, my old laptop, my new laptop, your laptop, sib, bertagaz system, and so on.

Excellent idea.

> That will require a bit of coordination, namely that no package is uploaded to any APT overlay used by the testing branch […].

Indeed. IIRC we disable the stretch-security APT sources on the testing branch currently, so variations in the latest snapshot for debian-security should not matter. We’ll see how it goes in practice. But I am not aware of any plan to upload new packages to existing APT overlays, let alone to those that are already enabled on the testing branch: doing so implies that one applies a change straight to testing before one has a chance to see how it fares on Jenkins. So at least that part shouldn’t matter (I hope).

Do you want to coordinate this? If my analysis above is correct, all it takes is to pick a commit (as close as possible to the time we’ll be testing, so we test something as close as possible to 3.0 final), ask people to build, and gather the SHAAA.

#10 Updated by anonym 2017-06-07 16:22:35

Assignee changed from anonym to intrigeri

Hi! I know that all of you have Tails build environments and/or have communicated interest in Tails’ build system. I’m adding you all as watchers of this ticket as an invitation to an event tomorrow (sorry for the short notice!):

Please join us on the tails-dev@conference.riseup.net XMPP channel during the afternoon (12:00 to 18:00, CEST) tomorrow, 2017-06-08, and participate in the first Tails reprodicible builds (remote-)party!

It’s the first and perhaps last of these, so you are among an exclusive bunch of people. :) In short, we’ll all build Tails from the same state, hoping that all of us produces bit-by-bit identical images despite all differences in our hardware and host OSes. So, dust off your build machines (and possibly prepare with something like git fetch && git checkout testing && git reset --hard origin/testing && rake basebox:create to save some time during the party) and make sure you don’t miss this (potentially) once-in-a-life-time opportunity! :)

intrigeri wrote:
> >> Is that enough to close this ticket, or do we want to test reproducibility with different CPUs and/or more build time difference? I think we could quite easily test that on Jenkins thanks to the build options I’ve introduced.
>
> > If you want to, sure,
>
> I think we should do that anyway (I mean, every time we build a second time, we should introduce as many variations as we can easily do), but not necessarily as a blocker for this ticket nor for 3.0 ⇒ ticket? I’ll check with bertagaz who of us does it.

Ok! Please create it, since you know better what you are talking about (I never tried these options) and thus will capture the ticket better than me! :)

> > That will require a bit of coordination, namely that no package is uploaded to any APT overlay used by the testing branch […].
>
> Indeed. IIRC we disable the stretch-security APT sources on the testing branch currently, so variations in the latest snapshot for debian-security should not matter. We’ll see how it goes in practice. But I am not aware of any plan to upload new packages to existing APT overlays, let alone to those that are already enabled on the testing branch: doing so implies that one applies a change straight to testing before one has a chance to see how it fares on Jenkins. So at least that part shouldn’t matter (I hope).

Ack.

Hm. Perhaps we should go all the way and branch off testing, update the changelog + tag a fake release + prep custom APT overlay + create the tagged APT snapshot (that we later remove manually, if we care enough)? That should be pretty damn close to how 3.0 would be like! :) And people who cannot attend the party can try at their own leisure. :) What do you think? It should only be a few minutes of real work (and if there’s problems I’ll just abort and we fallback to building testing from the same commit).

#11 Updated by intrigeri 2017-06-07 16:51:50

anonym wrote:
> intrigeri wrote:
> > >> Is that enough to close this ticket, or do we want to test reproducibility with different CPUs and/or more build time difference? I think we could quite easily test that on Jenkins thanks to the build options I’ve introduced.
> >
> > > If you want to, sure,
> >
> > I think we should do that anyway (I mean, every time we build a second time, we should introduce as many variations as we can easily do), but not necessarily as a blocker for this ticket nor for 3.0 ⇒ ticket? I’ll check with bertagaz who of us does it.
>
> Ok! Please create it, since you know better what you are talking about (I never tried these options) and thus will capture the ticket better than me! :)

Done: ~~Feature #12654~~.

#12 Updated by intrigeri 2017-06-07 16:52:25

Assignee changed from intrigeri to anonym
QA Check deleted (~~Info Needed~~)

#13 Updated by anonym 2017-06-08 21:57:00

% Done changed from 30 to 50

Post-party analysis

The first Tails reproducible builds (remote-)party was successful in participation (six persons!) and in finding more differences between builds! :) Below I’ll post some initial results and investigations, but I’ll follow up in a later comment (tomorrow) with how we should deal with the results vs our plans for 3.0, and what the next steps are.

The baseline

SHAAAAAA: 6954ac327cb6909fd0c5c9aad8515977ccf774a25e04ceb6854e6a598f964f7b

(Note that we have an insider joke: we say “SHAAAAAAA” (the number of A:s vary) as a reference to a long email exchange with a user that was very persistent in demanding that we give him the SHA-256 checksum for some release. SHAAAA!)

The party started really good with us getting the same SHAAAA no matter what we tried. The following setups produced this “expected” SHAAAAAAAA:

anonym’s new laptop (including when playing with dateoffset, cpumodle, machinetype)
anonym’s old laptop
intrigeri’s laptop
sib
some of lizard’s isobuilders
drwhax
segfault

nodens

SHAAA: 00771c5bd98fbdab4ddc84a6fcca3ae988bfd08fcf6fd5e309631c27df5ecff7

It’s worth noting that nodens built twice, getting the above SHAAAAAAAAAAAAA both times.

The difference is limited to live/initrd.img, but the diff is huge. diffoscope says “No file format specific differences found inside, yet data differs (ASCII cpio archive (SVR4 with no CRC))” and cannot give us anything more meaningful than a gigantic binary diff.

muri

SHAAAAAAAAAAAAAAA: d6e4ab08c076effbc0eb30386f41e88880334bc873b9f99cd87940785071a59c

The differences are inside live/filesystem.squashfs:

/usr/lib/x86_64-linux-gnu/gdk-pixbuf-2.0/2.10.0/loaders.cache
/usr/lib/x86_64-linux-gnu/gio/modules/giomodule.cache
/usr/lib/x86_64-linux-gnu/gtk-2.0/2.10.0/immodules.cache
/usr/lib/x86_64-linux-gnu/gtk-3.0/3.0.0/immodules.cache
/usr/share/applications/mimeinfo.cache
/var/cache/cracklib/src-dicts
/var/lib/gconf/defaults/%gconf-tree-*.xml   (88 files for various locales)

All of them seem to be ordering issues (e.g. lists of files/stuff, XML element positions).

arnaud

SHAAAAAA: acbe13f4e88b7e2d9622d75a1e834cf81a38e30992559756d9890d81e6f6c14a

It’s worth noting that arnaud did the only disk build.

Differences (not metadata, file size differs) in:

live/filesystem.squashfs
live/initrd.img
live/vmlinuz
utils/linux/syslinux
utils/mbr/mbr.bin
utils/win32/syslinux.exe

The difference in live/initrd.img could be the same as for nodens.

The squashfs has tons of metadata differences for directories (the number listed before the date by diffoscope (inode?)). It also has the same content differences as muri, but in addition also:

/lib/modules/4.9.0-3-amd64/modules.alias
/lib/modules/4.9.0-3-amd64/modules.alias.bin
/lib/modules/4.9.0-3-amd64/modules.dep
/lib/modules/4.9.0-3-amd64/modules.dep.bin

Ignoring the .bin files, diffoscope shows something interesting:

├── /lib/modules/4.9.0-3-amd64/modules.alias
│ │ @@ -19383,8 +19383,7 @@
│ │  alias vport-type-3 vport_gre
│ │  alias net-pf-40 vmw_vsock_vmci_transport
│ │  alias vmware_vsock vmw_vsock_vmci_transport
│ │  alias virtio:d00000013v* vmw_vsock_virtio_transport
│ │  alias fs-vboxsf vboxsf
│ │  alias pci:v000080EEd0000BEEFsv*sd*bc*sc*i* vboxvideo
│ │  alias pci:v000080EEd0000CAFEsv00000000sd00000000bc*sc*i* vboxguest
│ │ -alias fs-aufs aufs
[...]
── /lib/modules/4.9.0-3-amd64/modules.dep
│ │ @@ -3395,8 +3395,7 @@
│ │  kernel/lib/mpi/mpi.ko:
│ │  kernel/lib/asn1_decoder.ko:
│ │  kernel/lib/oid_registry.ko:
│ │  kernel/virt/lib/irqbypass.ko:
│ │  updates/vboxsf.ko: updates/vboxguest.ko
│ │  updates/vboxvideo.ko: updates/vboxguest.ko kernel/drivers/gpu/drm/ttm/ttm.ko kernel/drivers/gpu/drm/drm_kms_helper.ko kernel/drivers/gpu/drm/drm.ko
│ │  updates/vboxguest.ko:
│ │ -kernel/fs/aufs/aufs.ko:

I wonder: is this what you’d expect if depmod wasn’t run after the aufs module was built?

Some random thoughts

SHAAAAAA
Hypothesis: we have six distinct problems:
- live/initrd.img (nodens, arnaud)
- live/vmlinuz (arnaud)
- utils/ (arnaud)
- metadata for directories inside the squashfs (arnaud)
- *.cache, src-dicts, %gconf-tree-*.xml (muri, arnaud)
- /lib/modules/4.9.0-3-amd64/modules.* (arnaud)
We should pair-wise compare *.cache, src-dicts, %gconf-tree-*.xml from muri and arnaud’s images; perhaps they are identical between both images?
We should figure out what information about the environment we should gather from all participats to help us narrow down the causes of these differences.
SHAAAAAAAAAAAA
Is our usage of caching safe? I tried rebuilding with a completely empty “vmproxy” APT cache and I still got the good SHAAAAAAAAA. This makes me less suspicious that it’s an issue about caching. It seems to falsify my hypothesis about me, intrigeri, lizard, sib having cached files days/weeks ago that for some reason have changed since then (scary!) and thus only ended up in the builds of those that built for the first time (in a while) today. I’m still not convinced this hypothesis is completely dead. So:
- ~~I’m gonna do a rebuild after rebuilding the base box.~~
- ~~I’m gonna do a rebuild without any caching at all.~~
~~It would be interesting to compare base boxes, at least those used by muri, arnaud and nodens vs one used to build one of the “baseline” images.~~
SHAAAAAAAAAAAAAAAAAAA
Assuming we cannot fix all these issues, what does this mean about our plan to call Tails 3.0 reproducible?
Can we realistically fix all this before 3.0? Related: should we call in the heavies (i.e. lamby)?
Where can I host all these images? I’d like to host them on lizard, so those of us with poor Internet connections can work with them remotely. Preferably while still being accessible to lamby, should that be needed.
SHA?

#14 Updated by segfault 2017-06-08 22:11:52

My build just finished and I also got the expected SHA :)

$ sha256sum tails-amd64-3.0~rc2.iso 6954ac327cb6909fd0c5c9aad8515977ccf774a25e04ceb6854e6a598f964f7b  tails-amd64-3.0~rc2.iso

#15 Updated by anonym 2017-06-08 22:33:29

anonym wrote:
> I’m gonna do a rebuild after rebuilding the base box.

Done! Got the good SHAAAAAAA!

#16 Updated by arnaud 2017-06-09 08:50:17

About the basebox

Here’s some quick facts about the basebox, that I build just before building the tails iso.

$ cd .vagrant.d/boxes/tails-builder-amd64-jessie-20170529-60c74058de/0/libvirt/
$ cat metadata.json 
{
    "provider": "libvirt",
    "format": "qcow2",
    "virtual_size": 20
}
$ sha256sum box.img 
cf67cec1eaad12ceceb62bfcaec3b8bf0b55b3a844c965b31d2e2397f12f5032  box.img

#17 Updated by anonym 2017-06-09 09:51:41

arnaud wrote:
> About the basebox
>
> Here’s some quick facts about the basebox, that I build just before building the tails iso.

Thanks, arnaud, but note that the base box generation is not (yet) reproducible, so that hash is not useful. Sorry for not making this clear!

Most likely I’ll need the whole disk image. Actually, if you can upload your .img using the same service as yesterday, and pass me the link via private email, then I can have a casual look if there’s anything obviously wrong. Ok?

#18 Updated by anonym 2017-06-09 10:11:54

File diffoscope-arnaud.html.xz added
File diffoscope-muri.html.xz added
File arnaud.iso.torrent added
File good.iso.torrent added

Attached you’ll find the diffoscope reports for arnaud and muri’s ISO images against the “good” one. I skipped nodens’ report since it’s only about live/initrd.img, and the report for that looks identical (modulo the actual bytes) to what you see for arnaud.

Since arnaud’s build seems to exhibit all my hypothesis’ six problems, I guess we can focus on examining it, and only look at the others for more data or to check whether similar-looking differences actually are caused by the same source of non-determinism. You can get arnaud’s ISO image and the good one via the attached .torrent:s (they’re not seeded yet, but should be within the hour). I can also onionshare them upon request.

#19 Updated by arnaud 2017-06-09 10:26:48

@anonym OK sure I send you the link.

Just for my understanding…

As far as I understand, the base box generation starts with tools that are presents on the host machine (aka vmdebootstrap). So these tools will be different on different configs. But later on, a base image is downloaded from the Debian servers, and this image is the same for everyone, right ? Then we probably enter this environment (chroot or vm, I don’t know), and build upon this base image, download packages and configure, until we get the tails builder image. This steps should also be identical for everyone, since it happens within the isolated environment.

So, where is it that the process is not reproducible ?

#20 Updated by anonym 2017-06-09 10:50:46

arnaud wrote:
> anonym OK sure I send you the link. > > Just for my understanding... > > As far as I understand, the base box generation *starts* with tools that are presents on the host machine (akavmdebootstrap@). So these tools will be different on different configs. But later on, a base image is downloaded from the Debian servers, and this image is the same for everyone, right ? Then we probably enter this environment (chroot or vm, I don’t know), and build upon this base image, download packages and configure, until we get the tails builder image. This steps should also be identical for everyone, since it happens within the isolated environment.
>
> So, where is it that the process is not reproducible ?

For instance, in vagrant/definitions/tails-builder/postinstall.sh we do:

echo "$(date)" > /var/lib/vagrant_box_build_time

which will be unique each time. That’s just one instance that breaks bit-by-bit reproducibility, and I’m sure there are many more issues (just think of file metadata like mtime).

#21 Updated by anonym 2017-06-11 12:24:22

anonym wrote:
> * It would be interesting to compare base boxes, at least those used by muri, arnaud and nodens vs one used to build one of the “baseline” images.

I’ve compared my base box to arnaud’s. All that differs are UUIDs, timestamps and similar, so they should be effectively identical.

#22 Updated by anonym 2017-06-12 16:07:22

Target version changed from Tails_3.0 to Tails_3.1

#23 Updated by anonym 2017-08-04 10:17:51

Status changed from In Progress to Resolved
Assignee deleted (~~anonym~~)
% Done changed from 50 to 100

I fail to see any further purpose for this ticket.