Feature #17657: Allow building IUKs in parallel (locally)

Feature #17657

Allow building IUKs in parallel (locally)

Added by CyrilBrulebois 2020-04-26 11:41:22 . Updated 2020-05-18 09:37:45 .

Status:

In Progress

Priority:

Elevated

Assignee:

CyrilBrulebois

Category:

Build system

Target version:

Tails_4.7

Start date:

Due date:

% Done:

Feature Branch:

feature/parallel-iuk-builds

Type of work:

Code

Blueprint:

Starter:

Affected tool:

Deliverable for:

Description

I’ve instrumented the IUK builds to verify the “mksquashfs is likely the bottleneck” guesstimate.

It turns out that there’s another lengthy step! That’s the rsync gathering the differences between both old and new contents.

Of course, one might point at the SSD and figure things would be better from NVMe, but switching to a tmpfs almost changes nothing… Since I don’t suppose we can do much regarding the kernel’s loopback performances, I’ve investigated whether running several IUK builds in parallel would make it possible to run several rsync in parallel, each of them being slow on its own, but meaning a smaller wallclock time.

I’ve collected data in the attached files, so that one gets an idea what it looks like. All tests performed on an HP Z220 CMT Workstation, equipped with Intel® Xeon® CPU E3-1245 V2 (3.40GHz), on top of an SSD (~/tails-release) for the main part, and on top of a tmpfs (/scratch) on the right side. After a number of unclocked hours, I was a little lazy and didn’t re-run the serial case to gather actual data for the tmpfs case; but I can do that if that feels needed.

On the software side: see the feature/parallel-iuk-builds branches in tails.git (main repository) and in puppet-tails.git (https://mraw.org/git/?p=puppet-tails.git); the idea is basically using the multiprocess module to start a number of jobs in parallel (one by default, sticking to the status quo until otherwise requested) on the puppet-tails.git side, and adding a little instrumenting and locking on the tails.git side.

As one would expect, the more we run rsync in parallel, the lengthier they get, because spawning several of them at the same time means they influence each other. But overall, that’s leading to a smaller total runtime: for 11 IUKs, that goes from 78m in the serial case, to 57m with 2 in parallel, to 44m with 4 in parallel. Further increasing the number of jobs started in parallel doesn’t really help as that makes rsync last for really longer. And given I’m using locking to ensure a single mksquashfs runs at a single time (having several of them compete didn’t look like a good plan), this means a minimal mksquashfs runtime that we cannot do anything about.

My conclusions so far:

It seems that on this particular machine, using 4 jobs (50% of the logical cores) is a reasonable approach, meaning a drastically reduced wallclock time without having the machine “overcommitted” (e.g. by running all jobs in parallel). Keeping in mind that was only about 11 IUKs, that’s going to grow over time…

I’ve prepared those changes and tested them for 4.5~rc1 and 4.5 by amending the release process documentation to allow running a modified version of the otherwise cloned-from-upstream puppet-tails.git repository; and that seems to work fine. I’ve just rebased them on top of stable (tails.git) and master (puppet-tails.git).

I’m a little concerned regarding two things:
- The lock file was dummily created as a temporary file, at the root of TMPDIR, while we have a dedicated temporary directory created for the whole job; it could probably move there.
- Contrary to say check_po, my proof of concept doesn’t catch errors in the multiprocess-based parallelization, and the caller needs to figure out whether all IUKs have been built; it’s arguably something that would blow up in the RM’s face soon enough, but a little extra check wouldn’t hurt. I’m not sure this should work considering merging those speed-ups though.

Dear anonym, intrigeri, reviews/comments/suggestions/ACKs/NACKs welcome.

Files

parallel-iuk-builds.ods (17473 B)	CyrilBrulebois, 2020-04-26 11:40:43
parallel-iuk-builds.pdf (39412 B)	CyrilBrulebois, 2020-04-26 11:40:43

Subtasks

Related issues

Related to Tails - Bug #17435: Building many IUKs (v2) takes a while on the RM's system	Confirmed
Blocks Tails - Feature #16209: Core work: Foundations Team	Confirmed

History

#1 Updated by CyrilBrulebois 2020-04-26 11:46:45

Subject changed from Allow building IUKs in parallel to Allow building IUKs in parallel (locally)

#2 Updated by intrigeri 2020-04-29 16:03:32

related to Bug #17435: Building many IUKs (v2) takes a while on the RM's system added

#3 Updated by intrigeri 2020-04-29 16:04:23

Dear @CyrilBrulebois,

> Dear anonym, intrigeri, reviews/comments/suggestions/ACKs/NACKs welcome.

I won’t have energy to dive into this any time soon, so please don’t block on me.

#4 Updated by CyrilBrulebois 2020-05-04 16:42:19

Assignee set to anonym

Roger that; assigning to @anonym accordingly.

For what it’s worth, even if I wanted to get that committed without further review, I couldn’t, because of missing permissions on the puppet-tails repository as far as I remember.

It’s of course too late for 4.6, but it would be great if I didn’t have to implement workarounds for 4.7 again…

#5 Updated by CyrilBrulebois 2020-05-05 00:00:30

For the record, on my “quick” machine, the serial build run time got bumped to 1h30 already (and would likely take ~4h on my laptop).

#6 Updated by CyrilBrulebois 2020-05-06 04:29:01

Target version changed from Tails_4.6 to Tails_4.7

#7 Updated by anonym 2020-05-14 08:51:07

CyrilBrulebois wrote:
> I’ve instrumented the IUK builds to verify the “mksquashfs is likely the bottleneck” guesstimate.
>
> It turns out that there’s another lengthy step! That’s the rsync gathering the differences between both old and new contents.
>
> Of course, one might point at the SSD and figure things would be better from NVMe, but switching to a tmpfs almost changes nothing… Since I don’t suppose we can do much regarding the kernel’s loopback performances, I’ve investigated whether running several IUK builds in parallel would make it possible to run several rsync in parallel, each of them being slow on its own, but meaning a smaller wallclock time.

Excellent! Quite honestly, to me this seems like fairly essential stuff for RM sanity under our new “many, many IUKs” paradigm, so I think this should be classified as FT work. @intrigeri, can you confirm or reject?

#8 Updated by anonym 2020-05-14 08:51:23

blocks Feature #16209: Core work: Foundations Team added

#9 Updated by anonym 2020-05-14 09:57:55

Status changed from New to In Progress
Assignee changed from anonym to CyrilBrulebois

Ey, this looks awesome! Great work!

CyrilBrulebois wrote:
> I’ve collected data in the attached files, so that one gets an idea what it looks like. All tests performed on an HP Z220 CMT Workstation, equipped with Intel® Xeon® CPU E3-1245 V2 (3.40GHz), on top of an SSD (~/tails-release) for the main part, and on top of a tmpfs (/scratch) on the right side. After a number of unclocked hours, I was a little lazy and didn’t re-run the serial case to gather actual data for the tmpfs case; but I can do that if that feels needed.

Cool data! Much appreciated! \o/

Can you explain why the efficiency (average time) of mksquashfs seems to depend on the number of jobs? Up to jobs = 4 the efficiency decreases, but then it increases, which I find extremely odd; if I imagine the plot, I see a quadratic curve where I would expect a constant, horizontal line (since they run serially). If what I saw looked (semi-)linearly growing that would already be weird, but a quadratic “hill”? WTF? This is the kind of thing that makes me worry that there was some bug in the data collection process. :S

> On the software side: see the feature/parallel-iuk-builds branches in tails.git (main repository) and in puppet-tails.git (https://mraw.org/git/?p=puppet-tails.git); the idea is basically using the multiprocess module to start a number of jobs in parallel (one by default, sticking to the status quo until otherwise requested) on the puppet-tails.git side, and adding a little instrumenting and locking on the tails.git side.

As a code-review:y comment, the only part I’m not liking so much, and frankly had trouble understanding, is the bits about “use new_iso copies for parallel runs” (commit c97ff34 in puppet-tails.git).

First of all, I don’t get why the new ISO will be mounted multiple times (in parallel); each new (= target version) ISO should only be mounted once, right? If this was the old (= source version) ISO I would understand, since it obviously is involved in each IUK. What am I missing?

Second, what do you mean with “rw vs. ro mounts for the new_iso”? Surely we mount ISOs ro with plain old mount (and loop devices)? (IIRC there is some FUSE thing to mount ISOs rw in order to remaster them, but I am pretty sure that is not what we’re using.) How is rw involved here at all?

Third, since we already must mount the old ISO multiple times in parallel, but don’t have issues with it, I don’t get why mounting the new ISOs multiple times would be any different.

> As one would expect, the more we run rsync in parallel, the lengthier they get, because spawning several of them at the same time means they influence each other. But overall, that’s leading to a smaller total runtime: for 11 IUKs, that goes from 78m in the serial case, to 57m with 2 in parallel, to 44m with 4 in parallel. Further increasing the number of jobs started in parallel doesn’t really help as that makes rsync last for really longer. And given I’m using locking to ensure a single mksquashfs runs at a single time (having several of them compete didn’t look like a good plan), this means a minimal mksquashfs runtime that we cannot do anything about.

Makes sense to me!

> My conclusions so far:
>
> * It seems that on this particular machine, using 4 jobs (50% of the logical cores) is a reasonable approach, meaning a drastically reduced wallclock time without having the machine “overcommitted” (e.g. by running all jobs in parallel). Keeping in mind that was only about 11 IUKs, that’s going to grow over time…

So it sounds like we should, by default, run N jobs, where N = the number of physical cores (i.e. this seems like a workload that cannot utilize hyperthreading well; in fact, it adds devestating overhead!).

> * I’ve prepared those changes and tested them for 4.5~rc1 and 4.5 by amending the release process documentation to allow running a modified version of the otherwise cloned-from-upstream puppet-tails.git repository; and that seems to work fine. I’ve just rebased them on top of stable (tails.git) and master (puppet-tails.git).

I don’t see the changes to the release docs. I’m interested in trying to run this, so can you please add the doc changes to the branch!

> * I’m a little concerned regarding two things:
> The lock file was dummily created as a temporary file, at the root of TMPDIR, while we have a dedicated temporary directory created for the whole job; it could probably move there.

Does IO::LockedFile() respect the TMPDIR env var? If so the caller (wrap_tails_create_iuks) could just export TMPDIR as the “dedicated temporary directory created for the whole job” and it will transparently Just Work.

However, unless there’s a specific concern that you really think could bite us, I’m fine with ignoring this instance of imperfection. :)

> Contrary to say check_po, my proof of concept doesn’t catch errors in the multiprocess-based parallelization, and the caller needs to figure out whether all IUKs have been built; it’s arguably something that would blow up in the RM’s face soon enough, but a little extra check wouldn’t hurt. I’m not sure this should work considering merging those speed-ups though.

I am confused about what you mean here. I definitely think we should collect the exit status of each job and then list the failing ones (if any) once all jobs have completed, to help prevent a RM meltdown some time in the future. But what do you mean with the last sentence in this context? The required change sounds simple to me, but perhaps I’m missing something?

#10 Updated by intrigeri 2020-05-18 06:53:52

> Excellent! Quite honestly, to me this seems like fairly essential stuff for RM sanity under our new “many, many IUKs” paradigm, so I think this should be classified as FT work. intrigeri, can you confirm or reject?

Makes sense to me!

#11 Updated by intrigeri 2020-05-18 08:10:22

Hi,

(context: I wanted to lower the number of Jenkins jobs so I thought “hey, perhaps I can trivially merge the tails.git branch since it looks like it’s a no-op until the puppet-tails.git branch is merged”)

I only looked at the Perl stuff. I’ve pushed a minor code cleanup to the tails.git branch.

Would you mind if I replaced IO::LockedFile with the mostly-equivalent File::Flock::Retry? The former is really old-school Perl and seems mostly abandoned upstream, while the later is actively maintained and follows current best practices. I would do this for fun, as a volunteer, and would test my changes.

Otherwise, if there’s a good reason to stick to IO::LockedFile, I’m happy to merge the tails.git branch ASAP (after replacing the problematic new IO::LockedFile syntax with ->new and adding the new dependency to the release process doc), in order to free some cycles on Jenkins!

>> The lock file was dummily created as a temporary file, at the root of TMPDIR, while we have a dedicated temporary directory created for the whole job; it could probably move there.

I don’t understand :/

I don’t see any trace of a “dedicated temporary directory created for the whole job”.
Every tails-create-iuk instance creates & manages its own temporary directory under args.tmp_dir, but we need a lock file shared by all these instances, so those per-instance directories are not a solution.
What did I miss?

#12 Updated by CyrilBrulebois 2020-05-18 08:53:20

Hi folks,

And thanks for looking into this! I’ve been swamped and I’ll need some time to clean up my Redmine backlog… I’ll start with some possibly easy parts:

intrigeri wrote:
> I only looked at the Perl stuff. I’ve pushed a minor code cleanup to the tails.git branch.
>
> Would you mind if I replaced IO::LockedFile with the mostly-equivalent File::Flock::Retry? The former is really old-school Perl and seems mostly abandoned upstream, while the later is actively maintained and follows current best practices. I would do this for fun, as a volunteer, and would test my changes.

You know I only manage to put together lines of code that work totally by accident, very happy to learn about best practices! Feel free to go ahead if you feel like it, otherwise I can totally look into doing that on my own. Don’t burden yourself.

Regarding the testing, to be honest I’m fine if something breaks or doesn’t work immediately for the next (few?) releases. I’ve already dealt with some screwdriver-based releases (to hack/test/check my proposed changes), I can definitely do that again! So don’t worry too much about that.

> Otherwise, if there’s a good reason to stick to IO::LockedFile, I’m happy to merge the tails.git branch ASAP (after replacing the problematic new IO::LockedFile syntax with ->new and adding the new dependency to the release process doc), in order to free some cycles on Jenkins!

None as far as I’m concerned. (It looked like something that would do the job, was packaged, I think I noticed the old date but thought maybe it was low maintenance given the basic needs it covered; and more importantly: it did the job!)

> >> The lock file was dummily created as a temporary file, at the root of TMPDIR, while we have a dedicated temporary directory created for the whole job; it could probably move there.
>
> I don’t understand :/
>
> I don’t see any trace of a “dedicated temporary directory created for the whole job”.
> Every tails-create-iuk instance creates & manages its own temporary directory under args.tmp_dir, but we need a lock file shared by all these instances, so those per-instance directories are not a solution.
> What did I miss?

I think I failed to mention it was “one level up”, in the caller. Quoting release_process.md:

Build the Incremental Upgrade Kits locally
------------------------------------------

    (
       set -eu
       WORK_DIR=$(mktemp -d)
       TAILS_REMOTE="$(git -C "$RELEASE_CHECKOUT" remote get-url origin)"
       PUPPET_TAILS_REMOTE=$(echo -n "$TAILS_REMOTE" | perl -p -E 's,:tails(:?[.]git)?\z,:puppet-tails,')
       cd "$WORK_DIR"
       git clone "$PUPPET_TAILS_REMOTE"
       ./puppet-tails/files/jenkins/slaves/isobuilders/wrap_tails_create_iuks
[…]

so we’re in "$WORK_DIR", and we can just create the lock file in that directory, rather than stashing it at the “top-level” of $TMPDIR (where the removal can be tricky depending on who’s the owner, because of the usual ‘+t’), which is why I resorted to 771db0d865bb8d3908a182b2be71e887fa09c823 in puppet-tails.git.

Hopefully that clarifies, but I must confess it’s a little hard to context-switch back to this topic after having dived into a different project for several weeks; apologies if that doesn’t help, feel free to let me know if I should just go back to running it once again, to get actual examples so to that you get a better/clearerto picture.

(Regarding the FT vs. RM distinction @anonym mentioned, am I correct in assuming that’s just about how we classify/clock our work on this?)

#13 Updated by CyrilBrulebois 2020-05-18 09:24:27

anonym wrote:
> Ey, this looks awesome! Great work!

Glad you like it!

> Can you explain why the efficiency (average time) of mksquashfs seems to depend on the number of jobs? Up to jobs = 4 the efficiency decreases, but then it increases, which I find extremely odd; if I imagine the plot, I see a quadratic curve where I would expect a constant, horizontal line (since they run serially). If what I saw looked (semi-)linearly growing that would already be weird, but a quadratic “hill”? WTF? This is the kind of thing that makes me worry that there was some bug in the data collection process. :S

Non-accurate, ballpark-estimate incoming: I suspect what’s happening is that the rsync commands that are running are eating I/O plus CPU (as I have or should have mentioned, they’re rather slow/seem inefficient), meaning more struggle for the running mksquashfs. That’s why I went as far as trying to throw all jobs at once (exceeding the number of logical cores), which means all rsync compete with each other and “take more time collectively” but once they (mostly) finish at the same time, all resources can be taken by mksquashfs.

I don’t think the (trivial) data collection process was buggy; but then I’m absolutely no data scientist or seasoned data gatherer/analyser.

> > On the software side: see the feature/parallel-iuk-builds branches in tails.git (main repository) and in puppet-tails.git (https://mraw.org/git/?p=puppet-tails.git); the idea is basically using the multiprocess module to start a number of jobs in parallel (one by default, sticking to the status quo until otherwise requested) on the puppet-tails.git side, and adding a little instrumenting and locking on the tails.git side.
>
> As a code-review:y comment, the only part I’m not liking so much, and frankly had trouble understanding, is the bits about “use new_iso copies for parallel runs” (commit c97ff34 in puppet-tails.git).
>
> First of all, I don’t get why the new ISO will be mounted multiple times (in parallel); each new (= target version) ISO should only be mounted once, right? If this was the old (= source version) ISO I would understand, since it obviously is involved in each IUK. What am I missing?

OK. We do agree we have several rsync commands runnning in parallel, while we keep the mksquashfs calls serial?

What data is rsync chewing?

See config/chroot_local-includes/usr/src/iuk/lib/Tails/IUK.pm in tails.git:

```perl

run_as_root(“mount”, “-o”, “loop,ro”, $self->old_iso, $old_iso_mount);
my $old_squashfs = path($old_iso_mount, ‘live’, ‘filesystem.squashfs’);
croak “SquashFS ‘$old_squashfs’ not found in ‘$old_iso_mount’” unless -e $old_squashfs;
run_as_root(qw{mount -t squashfs -o loop}, $old_squashfs, $old_squashfs_mount);

run_as_root(“mount”, “-o”, “loop,ro”, $self->new_iso, $new_iso_mount);
my $new_squashfs = path($new_iso_mount, ‘live’, ‘filesystem.squashfs’);
croak “SquashFS ‘$new_squashfs’ not found in ‘$new_iso_mount’” unless -e $new_squashfs;
run_as_root(qw{mount -t squashfs -o loop}, $new_squashfs, $new_squashfs_mount);

[ plus overlayfs/aufs mount]

my rsync_options = qw{--archive --quiet --delete-after --acls --checksum}; pushrsync_options, “—xattrs” if $self->union_type eq ‘overlayfs’;
run_as_root(
“rsync”, @rsync_options,
sprintf(“%s/”, $new_squashfs_mount),
sprintf(“%s/”, $union_mount),
);
```

so we’re mounting each “old iso” (initially-installed-version) once, and each “new iso” (target-version-to-be-released-if-luck-is-on-our-side) N times. The various mount/juggling instructions trigger the issue I mentioned, which I’m working around by creating copies.

> Second, what do you mean with “rw vs. ro mounts for the new_iso”? Surely we mount ISOs ro with plain old mount (and loop devices)? (IIRC there is some FUSE thing to mount ISOs rw in order to remaster them, but I am pretty sure that is not what we’re using.) How is rw involved here at all?

If you’re not convinced, I can remove that code and post the zillion lines that are triggered.

> Third, since we already must mount the old ISO multiple times in parallel, but don’t have issues with it, I don’t get why mounting the new ISOs multiple times would be any different.

I don’t understand, there’s no unicity for “the old ISO”? Plus I’m not sure why we would mount it multiple times in parallel?

> > As one would expect, the more we run rsync in parallel, the lengthier they get, because spawning several of them at the same time means they influence each other. But overall, that’s leading to a smaller total runtime: for 11 IUKs, that goes from 78m in the serial case, to 57m with 2 in parallel, to 44m with 4 in parallel. Further increasing the number of jobs started in parallel doesn’t really help as that makes rsync last for really longer. And given I’m using locking to ensure a single mksquashfs runs at a single time (having several of them compete didn’t look like a good plan), this means a minimal mksquashfs runtime that we cannot do anything about.
>
> Makes sense to me!

Ah, I seem to have repeated myself in the first paragraph, oops. (But yay, consistency?)

> > My conclusions so far:
> >
> > * It seems that on this particular machine, using 4 jobs (50% of the logical cores) is a reasonable approach, meaning a drastically reduced wallclock time without having the machine “overcommitted” (e.g. by running all jobs in parallel). Keeping in mind that was only about 11 IUKs, that’s going to grow over time…
>
> So it sounds like we should, by default, run N jobs, where N = the number of physical cores (i.e. this seems like a workload that cannot utilize hyperthreading well; in fact, it adds devestating overhead!).

I would guess so, yes. (But as I’m nowhere knowledgeable about hardware stuff, I tend to dummily look at logical cores all the time, hence my cores/2 approach. ;))

> > * I’ve prepared those changes and tested them for 4.5~rc1 and 4.5 by amending the release process documentation to allow running a modified version of the otherwise cloned-from-upstream puppet-tails.git repository; and that seems to work fine. I’ve just rebased them on top of stable (tails.git) and master (puppet-tails.git).
>
> I don’t see the changes to the release docs. I’m interested in trying to run this, so can you please add the doc changes to the branch!

Right, I think I meant to mention I’ve been maintaining changes to the release process docs (see other tickets I’ve published since then) and that I’ve been stashing changes on top of those. I don’t think I have committed/pushed them anywhere. I’ll look around and let you know, but I seem to remember it was basically(/only?) about replacing the git clone of puppet-tails.git with a clone of the local repository. Feel free to poke me back if that slips my mind.

> > * I’m a little concerned regarding two things:
> > The lock file was dummily created as a temporary file, at the root of TMPDIR, while we have a dedicated temporary directory created for the whole job; it could probably move there.
>
> Does IO::LockedFile() respect the TMPDIR env var? If so the caller (wrap_tails_create_iuks) could just export TMPDIR as the “dedicated temporary directory created for the whole job” and it will transparently Just Work.
>
> However, unless there’s a specific concern that you really think could bite us, I’m fine with ignoring this instance of imperfection. :)

Hopefully my earlier answer to @intrigeri will help understand what I initially meant, and the switch to a different module (possibly written by someone who knows©®™ ;)) will fix this.

> > Contrary to say check_po, my proof of concept doesn’t catch errors in the multiprocess-based parallelization, and the caller needs to figure out whether all IUKs have been built; it’s arguably something that would blow up in the RM’s face soon enough, but a little extra check wouldn’t hurt. I’m not sure this should work considering merging those speed-ups though.

Oops! s/work/block/

> I am confused about what you mean here. I definitely think we should collect the exit status of each job and then list the failing ones (if any) once all jobs have completed, to help prevent a RM meltdown some time in the future. But what do you mean with the last sentence in this context? The required change sounds simple to me, but perhaps I’m missing something?

I meant “it’s not implementing proper checks yet, but can we please get that reviewed and possibly merged as-is, without demanding this get added”.

I think this is not often that I kind of push/advocate for possibly merging “sooner-than-ready” code, but my line of thinking was:

It’s probably going to bite me in the first place (until it’s properly fixed of course);
I’ll probably be hmm more confident/happier to improve reliability/cleanness of my changes once the bulk of the work is merged and usable for the next releases, instead of having to dive into it outside a release session, as a prerequisite.

To be frank, knowing myself, I’d expect to be working on that particular patch next time I have actual work to throw at the machine, rather than repeating the same IUK builds over and over again.

But if you feel this is not a reasonable approach, maybe I’ll bite the bullet and work on this during this week, as part of my Redmine catch-up session.

And thanks for the many comments!

#14 Updated by CyrilBrulebois 2020-05-18 09:37:45

@anonym, hopefully that’s what I successfully used, and given the log file at the end, I think it is. :)

(Also, a file named release_process+parallel.mdwn was a pretty strong hint.)

Build the Incremental Upgrade Kits locally
------------------------------------------

    export JOBS=4
    time (
       set -eu
       set -x
       sudo true
       WORK_DIR=$(mktemp -d)
       TAILS_REMOTE="$(git -C "$RELEASE_CHECKOUT" remote get-url origin)"
       PUPPET_TAILS_REMOTE=$(echo -n "$TAILS_REMOTE" | perl -p -E 's,:tails(:?[.]git)?\z,:puppet-tails,')
       cd "$WORK_DIR"
       #git clone "$PUPPET_TAILS_REMOTE"
       rsync -av ~/work/clients/tails/puppet-tails.git .
       mv puppet-tails.git puppet-tails
           ./puppet-tails/files/jenkins/slaves/isobuilders/wrap_tails_create_iuks \
           --tails-git-remote "file://${RELEASE_CHECKOUT}/.git"             \
           --tails-git-commit "$TAG"                                      \
           --source-date-epoch "$SOURCE_DATE_EPOCH"                       \
           --local-isos-dir "$ISOS"                                       \
           --tmp-dir "${TMPDIR:-/tmp}"                                        \
           --output-dir "$IUKS_DIR"                                       \
           --source-versions "$IUK_SOURCE_VERSIONS"                       \
           --new-version "$VERSION"                                       \
           --verbose --jobs "$JOBS" --debug
       cd "$IUKS_DIR"
       sha256sum Tails_amd64_*_to_${VERSION}.iuk > "$IUKS_HASHES"
   ) 2>&1 | tee ~/full-$JOBS-tmpfs.log

OK, this isn’t a diff but basically: don’t clone puppet-tails over the network (I have no r/w access there), rsync it from a nearby directory instead. Add --debug for good measure, and the new, fancy --jobs "$JOBS" option (plus tee all the things into a per-$JOBS log file as part of the data collection effort).