Feature #12608
Analyze what's still not reproducible on current testing branch
100%
Description
https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_testing/3/artifact/build-artifacts/diffoscope.html seems to show differences that are not tracked by any ticket yet, and that will need to be addressed if we want 3.0 to build reproducibly. This analysis is needed for us to sanity check our goal of a reproducible 3.0, hence the Elevated priority.
Files
Subtasks
History
#1 Updated by intrigeri 2017-05-31 09:48:54
- Status changed from Confirmed to In Progress
- % Done changed from 0 to 10
I did a little bit of that => Bug #12619, Bug #12620.
#2 Updated by anonym 2017-06-01 09:47:35
- % Done changed from 10 to 30
I’m looking at: https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_testing/13/artifact/build-artifacts/diffoscope.html
It looks like we have unmerged fixes for all issues I see there except: some of the expected differences made live/filesystem.squashfs
about 20 KiB different in size (nothing unexpected so far). But then I see that the value in the column after the date has changed for these files:
live/initrd.img
live/utils/linux/syslinux
utils/mb/mbr.bin
utils/win32/syslinux.exe
Assuming that value is the extent/sector offset inside the ISO filesystem, can it be explained by the different size of live/filesystem.squashfs
(i.e. it “bled over” into a new extent/sector, shifting the extent/sector these files end up in)?
#3 Updated by intrigeri 2017-06-01 10:04:06
> But then I see that the value in the column after the date has changed for these files:
> […]
> Assuming that value is the extent/sector offset inside the ISO filesystem, can it be explained by the different size of live/filesystem.squashfs
(i.e. it “bled over” into a new extent/sector, shifting the extent/sector these files end up in)?
This hypothesis totally makes sense to me, especially given the constant delta (10) between these values, and the fact no file under 2192 has this problem. Once all the other pending fixes are merged into testing and in turn into feature/5630-deterministic-builds, https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_feature-5630-deterministic-builds/ (that should produce a deterministic SquashFS once we’re there) should allow you to confirm it.
(And now I wonder where isoinfo
displays more precise time info than a mere date; perhaps it simply doesn’t; perhaps the ISO9660 file format has no space for that. Whatever :)
#4 Updated by anonym 2017-06-01 10:18:17
intrigeri wrote:
> > But then I see that the value in the column after the date has changed for these files:
> > […]
> > Assuming that value is the extent/sector offset inside the ISO filesystem, can it be explained by the different size of live/filesystem.squashfs
(i.e. it “bled over” into a new extent/sector, shifting the extent/sector these files end up in)?
>
> This hypothesis totally makes sense to me, especially given the constant delta (10) between these values, and the fact no file under 2192 has this problem.
Thanks! Now I feel confident to assume my hypothesis is true when estimating the overall status of the reproducibility effort. IOW, the status is: currently cloudy, but the forecast predicts sun and blue skies! :)
#5 Updated by anonym 2017-06-01 16:43:15
As of build #15 from commit:49dd3889caa1664c7511eeaf250852168a898b56 (where many of my recent reproducibility fixes were merged) only the fontconfig issue + the extents shift (Feature #12608#note-2) remain!
#6 Updated by anonym 2017-06-03 15:39:32
I am predicting that build #24 will be reproducible! :)
#7 Updated by intrigeri 2017-06-04 10:50:39
anonym wrote:
> I am predicting that build #24 will be reproducible! :)
Nope, due to a bug on our infra. Hopefully fixed so https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_testing/27/ should be reproducible.
Is that enough to close this ticket, or do we want to test reproducibility with different CPUs and/or more build time difference? I think we could quite easily test that on Jenkins thanks to the build options I’ve introduced.
#8 Updated by anonym 2017-06-04 15:05:26
- Assignee changed from anonym to intrigeri
- QA Check set to Info Needed
intrigeri wrote:
> anonym wrote:
> > I am predicting that build #24 will be reproducible! :)
>
> Nope, due to a bug on our infra. Hopefully fixed so https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_testing/27/ should be reproducible.
Yup! #27 and #28 were successful!
> Is that enough to close this ticket, or do we want to test reproducibility with different CPUs and/or more build time difference? I think we could quite easily test that on Jenkins thanks to the build options I’ve introduced.
If you want to, sure, but I’d much rather see us try reproducing the exact same images when building from the same commit on Jenkins, my old laptop, my new laptop, your laptop, sib, bertagaz system, and so on. That will require a bit of coordination, namely that no package is uploaded to any APT overlay used by the testing
branch (so I’m not bothering now since I’m expecing you and Alan to upload new Greeter packages to bugfix-greeter-fixes-for-3.0
). What do you think?
#9 Updated by intrigeri 2017-06-04 16:19:44
- Assignee changed from intrigeri to anonym
>> Is that enough to close this ticket, or do we want to test reproducibility with different CPUs and/or more build time difference? I think we could quite easily test that on Jenkins thanks to the build options I’ve introduced.
> If you want to, sure,
I think we should do that anyway (I mean, every time we build a second time, we should introduce as many variations as we can easily do), but not necessarily as a blocker for this ticket nor for 3.0 ⇒ ticket? I’ll check with bertagaz who of us does it.
> but I’d much rather see us try reproducing the exact same images when building from the same commit on Jenkins, my old laptop, my new laptop, your laptop, sib, bertagaz system, and so on.
Excellent idea.
> That will require a bit of coordination, namely that no package is uploaded to any APT overlay used by the testing
branch […].
Indeed. IIRC we disable the stretch-security APT sources on the testing branch currently, so variations in the latest
snapshot for debian-security
should not matter. We’ll see how it goes in practice. But I am not aware of any plan to upload new packages to existing APT overlays, let alone to those that are already enabled on the testing branch: doing so implies that one applies a change straight to testing
before one has a chance to see how it fares on Jenkins. So at least that part shouldn’t matter (I hope).
Do you want to coordinate this? If my analysis above is correct, all it takes is to pick a commit (as close as possible to the time we’ll be testing, so we test something as close as possible to 3.0 final), ask people to build, and gather the SHAAA.
#10 Updated by anonym 2017-06-07 16:22:35
- Assignee changed from anonym to intrigeri
Hi! I know that all of you have Tails build environments and/or have communicated interest in Tails’ build system. I’m adding you all as watchers of this ticket as an invitation to an event tomorrow (sorry for the short notice!):
Please join us on the tails-dev@conference.riseup.net XMPP channel during the afternoon (12:00 to 18:00, CEST) tomorrow, 2017-06-08, and participate in the first Tails reprodicible builds (remote-)party!
It’s the first and perhaps last of these, so you are among an exclusive bunch of people. :) In short, we’ll all build Tails from the same state, hoping that all of us produces bit-by-bit identical images despite all differences in our hardware and host OSes. So, dust off your build machines (and possibly prepare with something like git fetch && git checkout testing && git reset --hard origin/testing && rake basebox:create
to save some time during the party) and make sure you don’t miss this (potentially) once-in-a-life-time opportunity! :)
intrigeri wrote:
> >> Is that enough to close this ticket, or do we want to test reproducibility with different CPUs and/or more build time difference? I think we could quite easily test that on Jenkins thanks to the build options I’ve introduced.
>
> > If you want to, sure,
>
> I think we should do that anyway (I mean, every time we build a second time, we should introduce as many variations as we can easily do), but not necessarily as a blocker for this ticket nor for 3.0 ⇒ ticket? I’ll check with bertagaz who of us does it.
Ok! Please create it, since you know better what you are talking about (I never tried these options) and thus will capture the ticket better than me! :)
> > That will require a bit of coordination, namely that no package is uploaded to any APT overlay used by the testing
branch […].
>
> Indeed. IIRC we disable the stretch-security APT sources on the testing branch currently, so variations in the latest
snapshot for debian-security
should not matter. We’ll see how it goes in practice. But I am not aware of any plan to upload new packages to existing APT overlays, let alone to those that are already enabled on the testing branch: doing so implies that one applies a change straight to testing
before one has a chance to see how it fares on Jenkins. So at least that part shouldn’t matter (I hope).
Ack.
Hm. Perhaps we should go all the way and branch off testing, update the changelog + tag a fake release + prep custom APT overlay + create the tagged APT snapshot (that we later remove manually, if we care enough)? That should be pretty damn close to how 3.0 would be like! :) And people who cannot attend the party can try at their own leisure. :) What do you think? It should only be a few minutes of real work (and if there’s problems I’ll just abort and we fallback to building testing
from the same commit).
#11 Updated by intrigeri 2017-06-07 16:51:50
anonym wrote:
> intrigeri wrote:
> > >> Is that enough to close this ticket, or do we want to test reproducibility with different CPUs and/or more build time difference? I think we could quite easily test that on Jenkins thanks to the build options I’ve introduced.
> >
> > > If you want to, sure,
> >
> > I think we should do that anyway (I mean, every time we build a second time, we should introduce as many variations as we can easily do), but not necessarily as a blocker for this ticket nor for 3.0 ⇒ ticket? I’ll check with bertagaz who of us does it.
>
> Ok! Please create it, since you know better what you are talking about (I never tried these options) and thus will capture the ticket better than me! :)
Done: Feature #12654.
#12 Updated by intrigeri 2017-06-07 16:52:25
- Assignee changed from intrigeri to anonym
- QA Check deleted (
Info Needed)
#13 Updated by anonym 2017-06-08 21:57:00
- % Done changed from 30 to 50
Post-party analysis
The first Tails reproducible builds (remote-)party was successful in participation (six persons!) and in finding more differences between builds! :) Below I’ll post some initial results and investigations, but I’ll follow up in a later comment (tomorrow) with how we should deal with the results vs our plans for 3.0, and what the next steps are.
The baseline
SHAAAAAA: 6954ac327cb6909fd0c5c9aad8515977ccf774a25e04ceb6854e6a598f964f7b
(Note that we have an insider joke: we say “SHAAAAAAA” (the number of A:s vary) as a reference to a long email exchange with a user that was very persistent in demanding that we give him the SHA-256 checksum for some release. SHAAAA!)
The party started really good with us getting the same SHAAAA no matter what we tried. The following setups produced this “expected” SHAAAAAAAA:
- anonym’s new laptop (including when playing with
dateoffset
,cpumodle
,machinetype
) - anonym’s old laptop
- intrigeri’s laptop
- sib
- some of lizard’s isobuilders
- drwhax
- segfault
nodens
SHAAA: 00771c5bd98fbdab4ddc84a6fcca3ae988bfd08fcf6fd5e309631c27df5ecff7
It’s worth noting that nodens built twice, getting the above SHAAAAAAAAAAAAA both times.
The difference is limited to live/initrd.img
, but the diff is huge. diffoscope
says “No file format specific differences found inside, yet data differs (ASCII cpio archive (SVR4 with no CRC))” and cannot give us anything more meaningful than a gigantic binary diff.
muri
SHAAAAAAAAAAAAAAA: d6e4ab08c076effbc0eb30386f41e88880334bc873b9f99cd87940785071a59c
The differences are inside live/filesystem.squashfs
:
/usr/lib/x86_64-linux-gnu/gdk-pixbuf-2.0/2.10.0/loaders.cache
/usr/lib/x86_64-linux-gnu/gio/modules/giomodule.cache
/usr/lib/x86_64-linux-gnu/gtk-2.0/2.10.0/immodules.cache
/usr/lib/x86_64-linux-gnu/gtk-3.0/3.0.0/immodules.cache
/usr/share/applications/mimeinfo.cache
/var/cache/cracklib/src-dicts
/var/lib/gconf/defaults/%gconf-tree-*.xml (88 files for various locales)
All of them seem to be ordering issues (e.g. lists of files/stuff, XML element positions).
arnaud
SHAAAAAA: acbe13f4e88b7e2d9622d75a1e834cf81a38e30992559756d9890d81e6f6c14a
It’s worth noting that arnaud did the only disk build.
Differences (not metadata, file size differs) in:
live/filesystem.squashfs
live/initrd.img
live/vmlinuz
utils/linux/syslinux
utils/mbr/mbr.bin
utils/win32/syslinux.exe
The difference in live/initrd.img
could be the same as for nodens.
The squashfs has tons of metadata differences for directories (the number listed before the date by diffoscope
(inode?)). It also has the same content differences as muri, but in addition also:
/lib/modules/4.9.0-3-amd64/modules.alias
/lib/modules/4.9.0-3-amd64/modules.alias.bin
/lib/modules/4.9.0-3-amd64/modules.dep
/lib/modules/4.9.0-3-amd64/modules.dep.bin
Ignoring the .bin
files, diffoscope
shows something interesting:
├── /lib/modules/4.9.0-3-amd64/modules.alias
│ │ @@ -19383,8 +19383,7 @@
│ │ alias vport-type-3 vport_gre
│ │ alias net-pf-40 vmw_vsock_vmci_transport
│ │ alias vmware_vsock vmw_vsock_vmci_transport
│ │ alias virtio:d00000013v* vmw_vsock_virtio_transport
│ │ alias fs-vboxsf vboxsf
│ │ alias pci:v000080EEd0000BEEFsv*sd*bc*sc*i* vboxvideo
│ │ alias pci:v000080EEd0000CAFEsv00000000sd00000000bc*sc*i* vboxguest
│ │ -alias fs-aufs aufs
[...]
── /lib/modules/4.9.0-3-amd64/modules.dep
│ │ @@ -3395,8 +3395,7 @@
│ │ kernel/lib/mpi/mpi.ko:
│ │ kernel/lib/asn1_decoder.ko:
│ │ kernel/lib/oid_registry.ko:
│ │ kernel/virt/lib/irqbypass.ko:
│ │ updates/vboxsf.ko: updates/vboxguest.ko
│ │ updates/vboxvideo.ko: updates/vboxguest.ko kernel/drivers/gpu/drm/ttm/ttm.ko kernel/drivers/gpu/drm/drm_kms_helper.ko kernel/drivers/gpu/drm/drm.ko
│ │ updates/vboxguest.ko:
│ │ -kernel/fs/aufs/aufs.ko:
I wonder: is this what you’d expect if depmod
wasn’t run after the aufs
module was built?
Some random thoughts
- SHAAAAAA
- Hypothesis: we have six distinct problems:
live/initrd.img
(nodens, arnaud)live/vmlinuz
(arnaud)utils/
(arnaud)- metadata for directories inside the squashfs (arnaud)
*.cache
,src-dicts
,%gconf-tree-*.xml
(muri, arnaud)/lib/modules/4.9.0-3-amd64/modules.*
(arnaud)
- We should pair-wise compare
*.cache
,src-dicts
,%gconf-tree-*.xml
from muri and arnaud’s images; perhaps they are identical between both images? - We should figure out what information about the environment we should gather from all participats to help us narrow down the causes of these differences.
- SHAAAAAAAAAAAA
Is our usage of caching safe? I tried rebuilding with a completely empty “vmproxy
” APT cache and I still got the good SHAAAAAAAAA. This makes me less suspicious that it’s an issue about caching. It seems to falsify my hypothesis about me, intrigeri, lizard, sib having cached files days/weeks ago that for some reason have changed since then (scary!) and thus only ended up in the builds of those that built for the first time (in a while) today. I’m still not convinced this hypothesis is completely dead. So:I’m gonna do a rebuild after rebuilding the base box.I’m gonna do a rebuild without any caching at all.
It would be interesting to compare base boxes, at least those used by muri, arnaud and nodens vs one used to build one of the “baseline” images.- SHAAAAAAAAAAAAAAAAAAA
- Assuming we cannot fix all these issues, what does this mean about our plan to call Tails 3.0 reproducible?
- Can we realistically fix all this before 3.0? Related: should we call in the heavies (i.e. lamby)?
- Where can I host all these images? I’d like to host them on lizard, so those of us with poor Internet connections can work with them remotely. Preferably while still being accessible to lamby, should that be needed.
- SHA?
#14 Updated by segfault 2017-06-08 22:11:52
My build just finished and I also got the expected SHA :)
$ sha256sum tails-amd64-3.0~rc2.iso 6954ac327cb6909fd0c5c9aad8515977ccf774a25e04ceb6854e6a598f964f7b tails-amd64-3.0~rc2.iso
#15 Updated by anonym 2017-06-08 22:33:29
anonym wrote:
> I’m gonna do a rebuild after rebuilding the base box.
Done! Got the good SHAAAAAAA!
#16 Updated by arnaud 2017-06-09 08:50:17
About the basebox
Here’s some quick facts about the basebox, that I build just before building the tails iso.
$ cd .vagrant.d/boxes/tails-builder-amd64-jessie-20170529-60c74058de/0/libvirt/
$ cat metadata.json
{
"provider": "libvirt",
"format": "qcow2",
"virtual_size": 20
}
$ sha256sum box.img
cf67cec1eaad12ceceb62bfcaec3b8bf0b55b3a844c965b31d2e2397f12f5032 box.img
#17 Updated by anonym 2017-06-09 09:51:41
arnaud wrote:
> About the basebox
>
> Here’s some quick facts about the basebox, that I build just before building the tails iso.
Thanks, arnaud, but note that the base box generation is not (yet) reproducible, so that hash is not useful. Sorry for not making this clear!
Most likely I’ll need the whole disk image. Actually, if you can upload your .img
using the same service as yesterday, and pass me the link via private email, then I can have a casual look if there’s anything obviously wrong. Ok?
#18 Updated by anonym 2017-06-09 10:11:54
- File diffoscope-arnaud.html.xz added
- File diffoscope-muri.html.xz added
- File arnaud.iso.torrent added
- File good.iso.torrent added
Attached you’ll find the diffoscope
reports for arnaud and muri’s ISO images against the “good” one. I skipped nodens’ report since it’s only about live/initrd.img
, and the report for that looks identical (modulo the actual bytes) to what you see for arnaud.
Since arnaud’s build seems to exhibit all my hypothesis’ six problems, I guess we can focus on examining it, and only look at the others for more data or to check whether similar-looking differences actually are caused by the same source of non-determinism. You can get arnaud’s ISO image and the good one via the attached .torrent
:s (they’re not seeded yet, but should be within the hour). I can also onionshare
them upon request.
#19 Updated by arnaud 2017-06-09 10:26:48
@anonym OK sure I send you the link.
Just for my understanding…
As far as I understand, the base box generation starts with tools that are presents on the host machine (aka vmdebootstrap
). So these tools will be different on different configs. But later on, a base image is downloaded from the Debian servers, and this image is the same for everyone, right ? Then we probably enter this environment (chroot or vm, I don’t know), and build upon this base image, download packages and configure, until we get the tails builder image. This steps should also be identical for everyone, since it happens within the isolated environment.
So, where is it that the process is not reproducible ?
#20 Updated by anonym 2017-06-09 10:50:46
arnaud wrote:
> anonym OK sure I send you the link.
>
> Just for my understanding...
>
> As far as I understand, the base box generation *starts* with tools that are presents on the host machine (aka
vmdebootstrap@). So these tools will be different on different configs. But later on, a base image is downloaded from the Debian servers, and this image is the same for everyone, right ? Then we probably enter this environment (chroot or vm, I don’t know), and build upon this base image, download packages and configure, until we get the tails builder image. This steps should also be identical for everyone, since it happens within the isolated environment.
>
> So, where is it that the process is not reproducible ?
For instance, in vagrant/definitions/tails-builder/postinstall.sh
we do:
echo "$(date)" > /var/lib/vagrant_box_build_time
which will be unique each time. That’s just one instance that breaks bit-by-bit reproducibility, and I’m sure there are many more issues (just think of file metadata like mtime).
#21 Updated by anonym 2017-06-11 12:24:22
anonym wrote:
> * It would be interesting to compare base boxes, at least those used by muri, arnaud and nodens vs one used to build one of the “baseline” images.
I’ve compared my base box to arnaud’s. All that differs are UUIDs, timestamps and similar, so they should be effectively identical.
#22 Updated by anonym 2017-06-12 16:07:22
- Target version changed from Tails_3.0 to Tails_3.1
#23 Updated by anonym 2017-08-04 10:17:51
- Status changed from In Progress to Resolved
- Assignee deleted (
anonym) - % Done changed from 50 to 100
I fail to see any further purpose for this ticket.