Bug #15990

Integrate disk image in build process

Added by Anonymous 2018-09-28 10:24:32 . Updated 2018-11-25 22:18:47 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Target version:
Start date:
2018-09-28
Due date:
% Done:

100%

Feature Branch:
feature/15292-usb-image
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Deliverable for:
316

Description


Subtasks


History

#1 Updated by Anonymous 2018-09-28 10:35:23

  • blocks Bug #15991: Code review & rubber-duck for USB Image added

#2 Updated by intrigeri 2018-10-09 09:23:18

  • blocked by deleted (Bug #15991: Code review & rubber-duck for USB Image)

#3 Updated by segfault 2018-10-22 11:56:30

  • Status changed from Confirmed to In Progress

Applied in changeset commit:bf602f9f815f43de3721f435eef18b9552c3ec97.

#4 Updated by segfault 2018-10-23 22:56:33

  • Assignee changed from segfault to intrigeri
  • QA Check set to Ready for QA
  • Feature Branch set to feature/15292-usb-image

#5 Updated by segfault 2018-10-23 22:57:10

  • % Done changed from 0 to 10

#6 Updated by segfault 2018-10-23 22:58:35

There is now a ${BUILD_BASENAME}.img among the build artifacts :)

#7 Updated by intrigeri 2018-11-02 16:25:00

  • Assignee changed from intrigeri to segfault
  • QA Check changed from Ready for QA to Dev Needed

> There is now a ${BUILD_BASENAME}.img among the build artifacts :)

Woohoo! Great job. I’m going to start a build locally as soon as I’ve reviewed the rest of the branch, can’t wait to see a .img magically land on my disk :)

Meta: through this series of reviews I’ll push minor nitpicks directly to your branch. If you don’t mind I’ll do the same for slightly more involved changes: as seen elsewhere, this can be a nicer and more efficient way to communicate what I’m suggesting than writing plain text on Redmine.

  • This breaks at the end of the build process on Jenkins: the ${BUILD_DIR} is busy and cannot be unmounted once the build is completed. Our Jenkins ISO builders build in RAM (TAILS_RAM_BUILD), perhaps this works for you because you don’t? Either way, it looks like the code somehow does not clean up correctly after itself. We have clean up code in place to deal with this in many cases (killing processes and such) but it looks like this is a new, unsupported case. Without looking at the code, I would guess some of the tear down might be asynchronous and not fully completed by the time the script exits; or the cwd was changed to the build directory. FTR this matters because we don’t reboot ISO builders in our CI between builds, so a build that leaves a big chunk of data in a tmpfs may break every following build on the same node.
  • The changes in auto/build feels a bit like “let’s ignore the rest of the script and patch our new thing on top”. This is definitely good enough for a PoC but we should integrate the new thing better so it works together with the old code, otherwise the old code becomes very confusing, e.g.:
    • With these changes, the BUILD_DEST_FILENAME variable name becomes confusing. I think it should be renamed to BUILD_ISO_FILENAME.
    • Similarly, BUILD_FILENAME_EXT does not work well anymore.
    • Actually, I think you can ditch all the unsupported clauses of case "$LB_BINARY_IMAGES" now that we know we won’t use live-build’s own img-hdd support (and both iso-hybrid and tar are a thing of the past).
    • The other artifacts (build manifest, log, etc.) should not have .iso in their name anymore.
    • Making all this nicer has quite some potential to break things on Jenkins. No big deal as long as it leaves the ISO builder in a clean state in the end. If it breaks that until Bug #15993 is done, well, give the branch the wip/ prefix and Jenkins will ignore it (but don’t hesitate showing me the problem first, ‘cause perhaps that can be easily fixed on Jenkins’ side without waiting for the CI sprint at the end of the month :)
  • I see that the create-usb-image-from-iso script landed in tails.git. I’m fine with that. I suggest you clearly mark https://gitlab.com/segfault3/tails-usb-image-from-iso as deprecated (and the description of Bug #15984 updated) to avoid confusion: if I had to work on this script in 6 months, I would not be sure where I should do that; we have the same problem for htpdate and it has caused trouble a few times already.
  • Regarding the new build-deps, I think you can drop python3, which will be pulled by python3-gi. Just to simplify and avoid having one more line to update if things change in Debian. If you disagree for some reason, just skip this one, no need to argue.

#8 Updated by segfault 2018-11-11 14:50:47

> This breaks at the end of the build process on Jenkins: the ${BUILD_DIR} is busy and cannot be unmounted once the build is completed.

I see in the Jenkins log that the unmounting is not retried as it should, but the build aborts after sudo fuser --ismountpoint --mount /tmp/tails-build.CM9cKrAH --kill. Since there is not error message, this probably means that fuser couldn’t find any process accessing the directory, in which case it returns with a non-zero exit code. I don’t think we want to abort in that case, so I pushed a commit to fix this.

#9 Updated by segfault 2018-11-11 20:33:18

intrigeri wrote:
> * The changes in auto/build feels a bit like “let’s ignore the rest of the script and patch our new thing on top”. This is definitely good enough for a PoC but we should integrate the new thing better so it works together with the old code, otherwise the old code becomes very confusing, e.g.:
> With these changes, the BUILD_DEST_FILENAME variable name becomes confusing. I think it should be renamed to BUILD_ISO_FILENAME.
> Similarly, BUILD_FILENAME_EXT does not work well anymore.
> Actually, I think you can ditch all the unsupported clauses of case "$LB_BINARY_IMAGES" now that we know we won’t use live-build’s own img-hdd support (and both iso-hybrid and tar are a thing of the past).
> The other artifacts (build manifest, log, etc.) should not have .iso in their name anymore.
> Making all this nicer has quite some potential to break things on Jenkins. No big deal as long as it leaves the ISO builder in a clean state in the end. If it breaks that until Bug #15993 is done, well, give the branch the wip/ prefix and Jenkins will ignore it (but don’t hesitate showing me the problem first, ‘cause perhaps that can be easily fixed on Jenkins’ side without waiting for the CI sprint at the end of the month :)

Done.

> * I see that the create-usb-image-from-iso script landed in tails.git. I’m fine with that. I suggest you clearly mark https://gitlab.com/segfault3/tails-usb-image-from-iso as deprecated (and the description of Bug #15984 updated) to avoid confusion: if I had to work on this script in 6 months, I would not be sure where I should do that; we have the same problem for htpdate and it has caused trouble a few times already.

I kept https://gitlab.com/segfault3/tails-usb-image-from-iso up to date with the changes to create-usb-image-from-iso and would like to continue using it to create USB images from existing ISOs. This is not an official Tails repository after all - I doubt that you will actually assume that we build Tails releases with code that is somehow pulled from gitlab.com.

> * Regarding the new build-deps, I think you can drop python3, which will be pulled by python3-gi. Just to simplify and avoid having one more line to update if things change in Debian. If you disagree for some reason, just skip this one, no need to argue.

Done.

#10 Updated by segfault 2018-11-11 20:33:59

Next step: Test whether Jenkins builds still fail.

#11 Updated by segfault 2018-11-11 22:28:49

segfault wrote:
> Next step: Test whether Jenkins builds still fail.

They fail during provisioning and I don’t understand why.

#12 Updated by intrigeri 2018-11-12 11:40:48

> I see in the Jenkins log that the unmounting is not retried as it should, but the build aborts after sudo fuser --ismountpoint --mount /tmp/tails-build.CM9cKrAH --kill. Since there is not error message, this probably means that fuser couldn’t find any process accessing the directory, in which case it returns with a non-zero exit code. I don’t think we want to abort in that case, so I pushed a commit to fix this.

Indeed, that code is racy: between the umount failure and the call for fuser, the process that was blocking the umount operation can very well have vanished. Your fix seems correct to me.

#13 Updated by intrigeri 2018-11-12 11:45:02

>> * I see that the create-usb-image-from-iso script landed in tails.git. I’m fine with that. I suggest you clearly mark https://gitlab.com/segfault3/tails-usb-image-from-iso as deprecated (and the description of Bug #15984 updated) to avoid confusion: if I had to work on this script in 6 months, I would not be sure where I should do that; we have the same problem for htpdate and it has caused trouble a few times already.

> I kept https://gitlab.com/segfault3/tails-usb-image-from-iso up to date with the changes to create-usb-image-from-iso and would like to continue using it to create USB images from existing ISOs. This is not an official Tails repository after all - I doubt that you will actually assume that we build Tails releases with code that is somehow pulled from gitlab.com.

Of course I would easily understand which version is used for building a Tails ISO :) My concern is about where I should work on this script (at some point the detailed Git history was only in your external repo so it felt like it was the canonical place). Your answer suggests I should work in tails.git and simply ignore your tails-usb-image-from-iso repo. Fine by me!

#14 Updated by intrigeri 2018-11-12 12:21:30

>> Next step: Test whether Jenkins builds still fail.
> They fail during provisioning and I don’t understand why.

Weird indeed, I don’t think I’ve ever seen this. Vagrant provisioned this new box just fine on my own system.

I’ve made the failing script more verbose and I see:

12:09:24 + mktemp -d
12:09:24 + DEBOOTSTRAP_GNUPG_HOMEDIR=/tmp/tmp.emoPyECuSj
12:09:24 + gpg --homedir /tmp/tmp.emoPyECuSj --import ../../../config/chroot_sources/tails.chroot.gpg
12:09:24 gpg: keybox '/tmp/tmp.emoPyECuSj/pubring.kbx' created
12:09:24 gpg: cannot open '/dev/tty': No such device or address

I’ve added the --no-tty option and it fixed the problem :)

I suspect the gnupg upgrade in the Stretch 9.6 point release causes it to start sending output to /dev/tty. We’ll need this fix on every branch so please review commit:ccbfbe3920c929e43cc8c0138048186419cb932c and if happy, apply it to master, then merge master into stable and in turn into devel.

#15 Updated by segfault 2018-11-12 20:44:45

> I suspect the gnupg upgrade in the Stretch 9.6 point release causes it to start sending output to /dev/tty. We’ll need this fix on every branch so please review ccbfbe3920c929e43cc8c0138048186419cb932c and if happy, apply it to master, then merge master into stable and in turn into devel.

Done.

#16 Updated by intrigeri 2018-11-13 08:39:31

… and FTR this regression got reported to Debian: https://bugs.debian.org/913614

#17 Updated by segfault 2018-11-13 19:45:14

intrigeri wrote:
> My concern is about where I should work on this script (at some point the detailed Git history was only in your external repo so it felt like it was the canonical place). Your answer suggests I should work in tails.git and simply ignore your tails-usb-image-from-iso repo.

Yes, whoever works on this script in the future should do it in tails.git. I also switched to committing with more details to tails.git and only copying the resulting script to my tails-usb-image-from-iso repo.

#18 Updated by segfault 2018-11-13 19:51:52

> They fail during provisioning and I don’t understand why.

After your gpg fix and after merging the fix for Bug #16120, the build failed once and succeeded once. During the failed build, the temp directory could not be unmounted during the whole 12 retries / 60 seconds. During the successful build, the directory was unmounted successfully on the first try.

I will try to debug this later. I find the debugging very painful when it takes half an our between changing something and seeing the result :/

#19 Updated by segfault 2018-11-14 12:52:17

I fixed that on two occasions I used asynchronous methods to call udisks D-Bus method, which has the effect that errors are ignored if they are not fetched in a separate method call. One of these methods was the loop delete method. If this call failed, it would explain why the temp dir couldn’t be unmounted, because an underlying file was still mounted on a loop device. Let’s see whether we get an error message for the next build failure.

#20 Updated by intrigeri 2018-11-23 13:27:41

  • Assignee changed from segfault to intrigeri
  • % Done changed from 10 to 50
  • QA Check changed from Dev Needed to Ready for QA

No recent build failure except some caused by well-known CI robustness issues so I guess this is ready for QA and will do a last code review.

#21 Updated by intrigeri 2018-11-23 15:53:43

  • Assignee changed from intrigeri to segfault
  • % Done changed from 50 to 70

LGTM, thanks for the fixes. I’ve pushed a few polishing commits in the same area, please review and if happy, close this ticket as resolved.

#22 Updated by segfault 2018-11-25 22:18:47

  • Status changed from In Progress to Resolved
  • Assignee deleted (segfault)
  • % Done changed from 70 to 100
  • QA Check deleted (Ready for QA)

intrigeri wrote:
> I’ve pushed a few polishing commits in the same area, please review and if happy, close this ticket as resolved.

LGTM