Bug #12599: /var/lib/libvirt/images gets filled on isobuilders

Bug #12599

/var/lib/libvirt/images gets filled on isobuilders

Added by intrigeri 2017-05-25 19:58:42 . Updated 2017-06-30 19:12:02 .

Status:

Resolved

Priority:

Elevated

Assignee:

Category:

Continuous Integration

Target version:

Tails_3.0

Start date:

2017-05-25

Due date:

% Done:

100%

Feature Branch:

bugfix/12599-rake-task-removing-tails-builder-volumes,jenkins-jobs:bugfix/12599-rake-task-removing-tails-builder-volumes

Type of work:

Sysadmin

Blueprint:

Starter:

Affected tool:

Deliverable for:

289

Description

On isobuilder3, right now:

total 2.8G
drwx------ 2 root         root          16K May  9 14:04 lost+found/
-rwxr--r-- 1 libvirt-qemu libvirt-qemu 741M May 25 16:45 tails-builder-amd64-jessie-20170510-d6ed806417_vagrant_box_image_0.img*
-rw------- 1 libvirt-qemu libvirt-qemu 1.3G May 25 18:27 tails-builder-amd64-jessie-20170524-8cc1ccbade_default.img
-rwxr--r-- 1 libvirt-qemu libvirt-qemu 743M May 25 17:37 tails-builder-amd64-jessie-20170524-8cc1ccbade_vagrant_box_image_0.img*

… while https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_feature-stretch/22/console is running.

Subtasks

Related issues

Related to Tails - ~~Bug #12579~~: reproducibly_build_Tails_ISO_* Jenkins job are broken	Resolved	2017-05-22
Related to Tails - ~~Feature #12002~~: Estimate hardware cost of reproducible builds in Jenkins	Resolved	2016-11-28
Related to Tails - ~~Bug #13302~~: /var/lib/libvirt/images sometimes gets filled on isobuilders, take 2	Resolved	2017-06-30
Blocks Tails - ~~Bug #12637~~: Deploy rake libvirt volumes clean up task on all Jenkins build jobs	Resolved	2017-06-04

History

#1 Updated by intrigeri 2017-05-25 19:59:06

/dev/vdc                   ext4      2.9G  2.8G     0 100% /var/lib/libvirt/images

#2 Updated by bertagaz 2017-05-26 10:24:05

related to ~~Bug #12579~~: reproducibly_build_Tails_ISO_* Jenkins job are broken added

#3 Updated by bertagaz 2017-05-26 13:14:46

blocks ~~Feature #12576~~: Have Jenkins use basebox:clean_old instead of basebox:clean_all added

#4 Updated by intrigeri 2017-05-27 09:00:12

blocked by deleted (~~~~Feature #12576~~: Have Jenkins use basebox:clean_old instead of basebox:clean_all~~)

#5 Updated by intrigeri 2017-05-27 09:01:08

blocks ~~Feature #12576~~: Have Jenkins use basebox:clean_old instead of basebox:clean_all added

#6 Updated by intrigeri 2017-05-27 09:12:20

Parent task changed from ~~Feature #5630~~ to ~~Bug #12531~~

#7 Updated by anonym 2017-05-28 11:26:37

To essentially halve the disk space usage per base box, see the end of ~~Bug #12595#note-13~~.

#8 Updated by anonym 2017-05-28 12:56:46

Status changed from Confirmed to In Progress

Applied in changeset commit:69faba0c1d5e7b517616103f8e1c14528bdb55e8.

#9 Updated by intrigeri 2017-05-28 16:40:11

Feature Branch set to feature/12599

Rebased the topic branch on testing (devel FTBFS at the moment, and we’ll want to merge this into testing anyway) and forced-pushed.

#10 Updated by intrigeri 2017-05-28 18:00:36

Assignee changed from bertagaz to anonym
% Done changed from 0 to 10

Sorry, I have bad news: this branch works neither on my system, nor on Jenkins: in both cases, I see several tails-builder-amd64-jessie-*_vagrant_box_image_0.img files in /var/lib/libvirt/images/. Looking at the code around old_domain =, it seems to me we’re only cleaning up 1 old VM, not “any residual VM”, no?

#11 Updated by intrigeri 2017-05-28 18:10:50

QA Check set to Info Needed

intrigeri wrote:
> Sorry, I have bad news: this branch works neither on my system, nor on Jenkins: in both cases, I see several tails-builder-amd64-jessie-*_vagrant_box_image_0.img files in /var/lib/libvirt/images/. Looking at the code around old_domain =, it seems to me we’re only cleaning up 1 old VM, not “any residual VM”, no?

Wait, actually perhaps it’s merely a bootstrapping issue than an algorithmic one: in theory, assuming this code had been deployed on all branches forever, and it runs fine every time, then cleaning up only the previous default Vagrant VM is enough to ensure we never keep any old VM images around. So I’d be happy to merge this and manually clean up the VM images that are not tracked as the default one by Vagrant, and then in theory we should be good. Correct so far? If we agree on this, then I’ll check that this branch does what it’s meant to be (instead of what I believed it would do :) and merge it right away! That’ll already be a nice start :)

Now, I don’t think we can rely on these assumptions on the long term for a production deployment: sometimes a VM will be rebooted at the exact wrong time, or there will be a power issue and everything will suddenly go down, or something. So the only robust enough way to ensure we have no such VM image around when we start a build is to delete all these images before building, either with a Rake task (that we would perhaps run on Jenkins only) or an additional build step in jenkins-jobs.git.

Thoughts, both of you?

#12 Updated by anonym 2017-05-29 10:18:29

Assignee changed from anonym to bertagaz

intrigeri wrote:
> intrigeri wrote:
> > Sorry, I have bad news: this branch works neither on my system, nor on Jenkins: in both cases, I see several tails-builder-amd64-jessie-*_vagrant_box_image_0.img files in /var/lib/libvirt/images/. Looking at the code around old_domain =, it seems to me we’re only cleaning up 1 old VM, not “any residual VM”, no?
>
> Wait, actually perhaps it’s merely a bootstrapping issue than an algorithmic one: in theory, assuming this code had been deployed on all branches forever, and it runs fine every time, then cleaning up only the previous default Vagrant VM is enough to ensure we never keep any old VM images around. So I’d be happy to merge this and manually clean up the VM images that are not tracked as the default one by Vagrant, and then in theory we should be good. Correct so far?

Correct! It’s

> If we agree on this, then I’ll check that this branch does what it’s meant to be (instead of what I believed it would do :) and merge it right away! That’ll already be a nice start :)
>
> Now, I don’t think we can rely on these assumptions on the long term for a production deployment: sometimes a VM will be rebooted at the exact wrong time, or there will be a power issue and everything will suddenly go down, or something. So the only robust enough way to ensure we have no such VM image around when we start a build is to delete all these images before building, either with a Rake task (that we would perhaps run on Jenkins only) or an additional build step in jenkins-jobs.git.
>
> Thoughts, both of you?

Yeah, I agree. And I rather fix this now when I have these things remotely fresh in my head, so I’ve implemented it in the branch, commit:4e954097ad348f3b1bc9f93a8dedc5a0848c8d1e. This should cover us forever except if really weird stuff happens.

#13 Updated by intrigeri 2017-05-29 17:10:27

anonym wrote:
> Correct! It’s

Missing bits?

So, anonym, what’s the next step then? bertagaz reviews your stuff, i.e. Ready for QA? Or you want to first check it works fine, and then reassign to you?

#14 Updated by bertagaz 2017-05-30 12:13:35

Assignee changed from bertagaz to anonym
QA Check changed from Info Needed to Dev Needed

intrigeri wrote:
> So, anonym, what’s the next step then? bertagaz reviews your stuff, i.e. Ready for QA? Or you want to first check it works fine, and then reassign to you?

I had a look already, ran it both in Jenkins and locally. I have two feedbacks:

First I think there’s a flaw in this part of Rakefile:490:

#{domain.name}_vagrant_box_image_0.img

This part does not work: domain.name contains the string default (probably the libvirt storage pool name), whereas this particular volume name does not, so it does not get deleted. I’ve seen that indeed locally and in Jenkins. This volume still exists after the build. But the snapshot of this volume is correctly removed.

Second, as I get the code, it will loop over existing domain names. That’s fine if at some point a previous build failed badly and the domain used at that time has not been undefined. But if it has been, but for some reason the related volumes have not been removed from the pool, then this old orphan volumes won’t get deleted. So I guess the idea is, to quote intrigeri, to have “an additional build step in jenkins-jobs.git”? I wonder if we should not just loop over the existing volumes in the pool to delete them all, no matter the domain name. What do you think?

#15 Updated by intrigeri 2017-05-31 08:53:51

I’ve found another, unrelated reason why we eat more space in /var/lib/libvirt/images than we could: at the end of a successful build, we do mv -f tails-* "${TAILS_GIT_DIR}/", and $TAILS_GIT_DIR is not a tmpfs, so the tails-builder-*_default.img that’s in use suddenly grows by ~1.2GB. This also causes I/O load we could happily live without, and increases build time a little bit. I guess we’re doing it this way in order to ease the “Retrieving artifacts from Vagrant build box” step, and anything else that relies on list_artifacts. I see two ways to solve this (perhaps minor?) issue: either mount a tmpfs on $TAILS_GIT_DIR before doing git clone /amnesia.git/.git "${TAILS_GIT_DIR}"; or drop the mv command, and teach list_artifacts to look in the most recent /tmp/tails-build.* directory, instead of in /home/vagrant/amnesia, e.g. something like (untested!):

--- a/Rakefile
+++ b/Rakefile
@@ -312,8 +312,11 @@ end

 def list_artifacts
   user = vagrant_ssh_config('User')
-  stdout = capture_vagrant_ssh("find '/home/#{user}/amnesia/' -maxdepth 1 " +
-                                        "-name 'tails-*.iso*'").first
+  build_dirs_stdout = capture_vagrant_ssh("ls -1 -d --sort=time " +
+                                          "/tmp/tails-build.*").first
+  build_dir = build_dirs_stdout.split("\n")[0]
+  stdout = capture_vagrant_ssh("find '#{build_dir}' -maxdepth 1 " +
+                               "-name 'tails-*.iso*'").first
   stdout.split("\n")
 rescue VagrantCommandError
   return Array.new
--- a/vagrant/provision/assets/build-tails
+++ b/vagrant/provision/assets/build-tails
@@ -110,5 +110,3 @@ cd "${BUILD_DIR}"
 as_root_do lb config --cache false

 as_root_do lb build
-
-mv -f tails-* "${TAILS_GIT_DIR}/"

What do you folks think?

#16 Updated by bertagaz 2017-05-31 13:19:59

intrigeri wrote:
> What do you folks think?

I think that’s a good idea. :)

#17 Updated by anonym 2017-06-01 01:09:56

intrigeri wrote:
> I’ve found another, unrelated reason why we eat more space in /var/lib/libvirt/images than we could: […]

I’m aware of this but sadly your fix is not enough; build-tails will clean up the tmpfs used for the build before we return to the Rakefile and realize that there are no build artifacts. I once actually started working on a branch where build-tails was modularized enough so we could fetch the artifacts directly from the tmpfs before its teardown, but I never got finished it and now it’s too outdated. Now I’m wondering if we perhaps should go a step further by moving translating all of build-tails into tasks in the Rakefile.

#18 Updated by intrigeri 2017-06-01 06:21:09

> I’m aware of this but sadly your fix is not enough; build-tails will clean up the tmpfs used for the build before we return to the Rakefile and realize that there are no build artifacts.

OK, let’s forget about this issue for now then: it’s slightly OT on this ticket as the required space is a constant.
We can come back to it later, on another ticket, if needed (but as far as I’m concerned, I’ll try hard not to spend any time on such optimizations before having looked at the big picture and identified low-hanging fruits).

#19 Updated by intrigeri 2017-06-01 06:42:12

related to ~~Feature #12002~~: Estimate hardware cost of reproducible builds in Jenkins added

#20 Updated by bertagaz 2017-06-01 10:47:27

intrigeri wrote:
> > I’m aware of this but sadly your fix is not enough; build-tails will clean up the tmpfs used for the build before we return to the Rakefile and realize that there are no build artifacts.

Oooh, good catch. :)

> OK, let’s forget about this issue for now then: it’s slightly OT on this ticket as the required space is a constant.
> We can come back to it later, on another ticket, if needed (but as far as I’m concerned, I’ll try hard not to spend any time on such optimizations before having looked at the big picture and identified low-hanging fruits).

So what should we do about this ticket? Remove the blocker on ~~Feature #12576~~? Remove the Deliverable for and/or 3.0 target? Or just wait a bit before deciding?

#21 Updated by intrigeri 2017-06-01 11:20:54

> So what should we do about this ticket?

I’m sorry I apparently confused you with my “another, unrelated reason why we eat more space”. The comment you’re replying to was about this slightly OT sub-discussion, not at all about the general problem this ticket is about. This ticket is high priority as it’s currently the main cause of irrelevant build failures (~~Bug #12531~~). I see no reason to delay fixing it. Once the topic branch is fixed and merged, case closed, no? bertagaz, would you be ready to take this topic branch over? anonym seems to be plenty busy with other high-priority matters.

> Remove the blocker on ~~Feature #12576~~?

I don’t know why this ticket blocks ~~Feature #12576~~ (because I don’t know how clean_old and clean_all relate to /var/lib/libvirt/images, on current testing and on feature/12599).

#22 Updated by bertagaz 2017-06-01 12:46:55

Assignee changed from anonym to bertagaz

intrigeri wrote:
> > So what should we do about this ticket?
>
> I’m sorry I apparently confused you with my “another, unrelated reason why we eat more space”. The comment you’re replying to was about this slightly OT sub-discussion, not at all about the general problem this ticket is about. This ticket is high priority as it’s currently the main cause of irrelevant build failures (~~Bug #12531~~). I see no reason to delay fixing it. Once the topic branch is fixed and merged, case closed, no?

Ooch right, sorry, I should have finished my coffee before replying.

> bertagaz, would you be ready to take this topic branch over? anonym seems to be plenty busy with other high-priority matters.

Yes, let’s do that.

> > Remove the blocker on ~~Feature #12576~~?
>
> I don’t know why this ticket blocks ~~Feature #12576~~ (because I don’t know how clean_old and clean_all relate to /var/lib/libvirt/images, on current testing and on feature/12599).

Hmm, right, I’ve been confused here, will remove this.

#23 Updated by bertagaz 2017-06-01 12:47:16

blocked by deleted (~~~~Feature #12576~~: Have Jenkins use basebox:clean_old instead of basebox:clean_all~~)

#24 Updated by anonym 2017-06-01 15:44:07

bertagaz wrote:
> intrigeri wrote:
> > So, anonym, what’s the next step then? bertagaz reviews your stuff, i.e. Ready for QA? Or you want to first check it works fine, and then reassign to you?
>
> I had a look already, ran it both in Jenkins and locally. I have two feedbacks:
>
> First I think there’s a flaw in this part of Rakefile:490:
>
> […]
>
> This part does not work: domain.name contains the string default (probably the libvirt storage pool name), whereas this particular volume name does not, so it does not get deleted. I’ve seen that indeed locally and in Jenkins. This volume still exists after the build. But the snapshot of this volume is correctly removed.

Whoops! This explains intrigeri’s problems reported in ~~Bug #12599#note-10~~, and we can rule out his “bootstrap” theory in ~~Bug #12599#note-10~~. I see you already have fixed this, and I had in parallel prepared this fix:

--- a/Rakefile
+++ b/Rakefile
@@ -487,7 +487,8 @@ def clean_up_builder_vms
     rescue Libvirt::RetrieveError
       # Expected if the pool does not exist
     else
-      for disk in ["#{domain.name}.img", "#{domain.name}_vagrant_box_image_0.img"] do
+      box_name = domain.name.sub(/_default$/, '')
+      for disk in ["#{domain.name}.img", "#{box_name}_vagrant_box_image_0.img"] do
         begin
           pool.lookup_volume_by_name(disk).delete
         rescue Libvirt::RetrieveError

As I noted during our CI meeting today, your fix, commit:d27bb28a2474748123b5c61158657e779ba229bd, needs to match disk names to /^tails-builder/ or similar before deleting, otherwise it will delete all my VMs’ disks! :)

> Second, as I get the code, it will loop over existing domain names. That’s fine if at some point a previous build failed badly and the domain used at that time has not been undefined. But if it has been, but for some reason the related volumes have not been removed from the pool, then this old orphan volumes won’t get deleted.

Correct. My reasoning is that orphan volumes should be impossible; since we deal with the volume removal immediately after we undefine the domain there’s not really anything that can go wrong except that we fail to remove the volumes. Put a different way: such a failure would affect any other approach we take (e.g. the one you mention below) just the same.

> So I guess the idea is, to quote intrigeri, to have “an additional build step in jenkins-jobs.git”?

I think this is moot now. We want to do the same cleanup on all builders, whether on Jenkins or any contributor’s system.

> I wonder if we should not just loop over the existing volumes in the pool to delete them all, no matter the domain name. What do you think?

So this is what you commit:d27bb28a2474748123b5c61158657e779ba229bd does. For the reasons I mentioned above I don’t think this will be more reliable than my fix for the old approach, but I’m not against this approach (as long as it matches only tails-builder disks, and not all disks).

#25 Updated by bertagaz 2017-06-02 13:21:06

Assignee changed from bertagaz to anonym
QA Check changed from Dev Needed to Info Needed

anonym wrote:
> As I noted during our CI meeting today, your fix, commit:d27bb28a2474748123b5c61158657e779ba229bd, needs to match disk names to /^tails-builder/ or similar before deleting, otherwise it will delete all my VMs’ disks! :)

Ooch, right. I’ve reverted my commit, and adapted yours as discussed on the chat yesterday.

> Correct. My reasoning is that orphan volumes should be impossible; since we deal with the volume removal immediately after we undefine the domain there’s not really anything that can go wrong except that we fail to remove the volumes. Put a different way: such a failure would affect any other approach we take (e.g. the one you mention below) just the same.
>
> > So I guess the idea is, to quote intrigeri, to have “an additional build step in jenkins-jobs.git”?
>
> I think this is moot now. We want to do the same cleanup on all builders, whether on Jenkins or any contributor’s system.
>
> > I wonder if we should not just loop over the existing volumes in the pool to delete them all, no matter the domain name. What do you think?
>
> So this is what you commit:d27bb28a2474748123b5c61158657e779ba229bd does. For the reasons I mentioned above I don’t think this will be more reliable than my fix for the old approach, but I’m not against this approach (as long as it matches only tails-builder disks, and not all disks).

Well, that’s a bit conflicting with what intrigeri pointed at the end of ~~Bug #12599#note-11~~. I agree we need a way to cleanup which is the same in our infra than for the devs, but OTOH, I think we need a safety net in case something goes wrong in Jenkins. The situation is not really the same: while a developer will simply notice it and remove the unnecessary volumes by hand if it ever happens, as sysadmins I think we’d be glad if they get deleted automatically at the beginning of the next build because something like a power outage came in play at the wrong moment.

So I’ve also added a rake task in commit:e2d5817c7c03fe0aaa994aacac01cff57fa8d99a that removes all libvirt volumes, that we’ll be able to use in our infra. I think with both code we’d have now, we’ll be far more at ease on the Jenkins side.

Does it make sense to you? Care to have a quick look if that rake task code looks good to you? Note that I’ve already tested it locally.

#26 Updated by anonym 2017-06-02 23:45:53

Assignee changed from anonym to bertagaz
QA Check deleted (~~Info Needed~~)

Sure, I didn’t think of power outages… :)

But let’s not ship a rake task that can accidentally destroy all VM images! Please only delete disks matching /^tails-builder-/!

In fact, why making this a rake task at all when it is code that no one but Jenkins needs? Can’t we just run the following one-liner on Jenkins before each build?

 
virsh -q vol-list default | awk '{ print $1 }' | grep '^tails-builder-' | xargs -n1 virsh vol-delete --pool default

#27 Updated by bertagaz 2017-06-03 10:07:54

Assignee changed from bertagaz to anonym
QA Check set to Info Needed

anonym wrote:
> Sure, I didn’t think of power outages… :)

> But let’s not ship a rake task that can accidentally destroy all VM images! Please only delete disks matching /^tails-builder-/!

Fair enough, I’ll add this condition. I thought about adding one that would avoid doing anything if not ran by Jenkins, but in the end for people like me that use a dedicated computer, it’ll be probably useful.

> In fact, why making this a rake task at all when it is code that no one but Jenkins needs? Can’t we just run the following one-liner on Jenkins before each build? […]

I think I’m more confident to have this code in the Rakefile rather than done by a shitty “${shell}”, don’t know why… :) Also because it can be reused elsewhere.

#28 Updated by bertagaz 2017-06-03 13:31:50

Assignee changed from anonym to bertagaz
QA Check changed from Info Needed to Dev Needed

bertagaz wrote:
> anonym wrote:
> > But let’s not ship a rake task that can accidentally destroy all VM images! Please only delete disks matching /^tails-builder-/!
>
> Fair enough, I’ll add this condition. I thought about adding one that would avoid doing anything if not ran by Jenkins, but in the end for people like me that use a dedicated computer, it’ll be probably useful.

Done, just pushed a commit implementing that. I noticed that it may be helpful also in case people are using the internal apt-cacher-ng partition. In that case, they’ll be able to use it AND use this rake task to clean their Libvirt storage from any remaining tails-build-* volume.

Not sure we need another QA round, given how simple it is. I’ve already tested it locally, and will see how it goes in Jenkins. If it goes well, I’ll merge that into testing.

#29 Updated by intrigeri 2017-06-03 13:49:49

> Not sure we need another QA round, given how simple it is. I’ve already tested it locally, and will see how it goes in Jenkins. If it goes well, I’ll merge that into testing.

Given the destructive effects simple mistakes / lack of robustness in this code may have, I’d rather see someone (anonym or I) review it before it’s run on innocent bystanders’ systems. (I understood the new code won’t be intentionally run by default, but rake<TAB><TAB>… mistakes are very easy to make – pun intended –, at least with zsh.)

#30 Updated by intrigeri 2017-06-03 14:34:48

It seems to me that this branch changes the behavior of forcecleanup, so its doc is now outdated, no? Or are (some of) the modifications to clean_up_builder_vms leftovers that are now useless given we have clean_up_libvirt_volumes? I’ll now re-read the discussion above to better understand the context.
Typo in tails-buidler

#31 Updated by intrigeri 2017-06-03 14:43:03

After re-reading the conversation, it seems to me that the clean_up_libvirt_volumes new Rake target is enough to address the problem we’re facing, and it superseds the changes made earlier in clean_up_builder_vms on the topic branch. Correct?

So I’d rather merge a branch that does only that, so unless there’s a good reason I’ve missed, let’s please create a new branch with only the needed commits (good time to rebase -i and clean up history a bit BTW?). And if the changes done in clean_up_builder_vms are useful to solve another problem, please create another ticket about it, pointing to feature/12599. What do you think about this plan?

#32 Updated by bertagaz 2017-06-03 15:20:36

Assignee changed from bertagaz to anonym
QA Check changed from Dev Needed to Info Needed

intrigeri wrote:
> After re-reading the conversation, it seems to me that the clean_up_libvirt_volumes new Rake target is enough to address the problem we’re facing, and it superseeds the changes made earlier in clean_up_builder_vms on the topic branch. Correct?

Well, yes and no. clean_up_builder_vms is also there to ensure that the VM is destroyed and undefined, and is targeting both the Jenkins and the developer use case (the one that use @forcecleanup at least). The new rake task is just there to ensure that no volumes are left overs and the Libvirt images partition of our isobuilders doesn’t get filled because of some weird failures.

> So I’d rather merge a branch that does only that, so unless there’s a good reason I’ve missed, let’s please create a new branch with only the needed commits (good time to rebase -i and clean up history a bit BTW?). And if the changes done in clean_up_builder_vms are useful to solve another problem, please create another ticket about it, pointing to feature/12599. What do you think about this plan?

Not sure. As said above I think both code have their reasons, so I think we can keep both of them, or maybe merge the volume removal in the clean_up_builder_vms function (closer to how I implemented it at first in commit:d27bb28a2474748123b5c61158657e779ba229bd).

I’ll assign to anonym, as he has written a good deal of this code and he may have relevant input about this discussion.

#33 Updated by intrigeri 2017-06-03 16:25:05

Assignee changed from anonym to bertagaz
QA Check changed from Info Needed to Dev Needed

> The new rake task is just there to ensure that no volumes are left overs and the Libvirt images partition of our isobuilders doesn’t get filled because of some weird failures.

Which is exactly what this ticket is about, right? :)

Summing up what I replied bertagaz elsewhere:

I like the current semantics of forcecleanup, that we’ve had for a while and got used to.
I like the current semantics of TAILS_BUILD_OPTIONS, i.e. it (mostly?) affects the current build, rather than all future builds of any other branch. I don’t think TAILS_BUILD_OPTIONS is the best place to trigger that broad and potentially destructive changes.
If we need a way to properly remove all build VMs (including unmounting, destroying and undefining them before deleting storage volumes), which can be achieved thanks to the changes this topic branch bring in clean_up_builder_vms: fine by me. Let’s file another ticket about it, discuss there how it should be called, refactor/merge the code with the one added to clean_up_libvirt_volumes. This is definitely not High priority IMO though: the problem this ticket is about has been hurting us too much for too long, so I really would rather not block on lower-priority improvements to get it fixed.

Fair enough?

#34 Updated by bertagaz 2017-06-04 09:34:31

Assignee changed from bertagaz to intrigeri
% Done changed from 10 to 50
QA Check changed from Dev Needed to Ready for QA

intrigeri wrote:
> > The new rake task is just there to ensure that no volumes are left overs and the Libvirt images partition of our isobuilders doesn’t get filled because of some weird failures.
>
> Which is exactly what this ticket is about, right? :)
>
> Summing up what I replied bertagaz elsewhere:
>
> * I like the current semantics of forcecleanup, that we’ve had for a while and got used to.
> * I like the current semantics of TAILS_BUILD_OPTIONS, i.e. it (mostly?) affects the current build, rather than all future builds of any other branch. I don’t think TAILS_BUILD_OPTIONS is the best place to trigger that broad and potentially destructive changes.
> * If we need a way to properly remove all build VMs (including unmounting, destroying and undefining them before deleting storage volumes), which can be achieved thanks to the changes this topic branch bring in clean_up_builder_vms: fine by me. Let’s file another ticket about it, discuss there how it should be called, refactor/merge the code with the one added to clean_up_libvirt_volumes. This is definitely not High priority IMO though: the problem this ticket is about has been hurting us too much for too long, so I really would rather not block on lower-priority improvements to get it fixed.
>
> Fair enough?

Pushed a new branch (bugfix/12599-rake-task-removing-tails-builder-volumes) containing only the rake task, in the Tails repo as well as the jjb repo.

#35 Updated by intrigeri 2017-06-04 09:42:12

Feature Branch changed from feature/12599 to bugfix/12599-rake-task-removing-tails-builder-volumes,jenkins-jobs:bugfix/12599-rake-task-removing-tails-builder-volumes

#36 Updated by intrigeri 2017-06-04 09:48:49

Code review passes. I’ll deploy that and we’ll see what happens. If we’re unlucky and it breaks stuff I’ll revert quickly so we don’t have to fix the branch(es) under stress :)

#37 Updated by bertagaz 2017-06-04 10:05:33

blocks ~~Bug #12637~~: Deploy rake libvirt volumes clean up task on all Jenkins build jobs added

#38 Updated by intrigeri 2017-06-04 10:17:13

Status changed from In Progress to Fix committed
% Done changed from 50 to 100

Applied in changeset commit:94e99417870dff8c69bf14f7eebf478913694cf4.

#39 Updated by intrigeri 2017-06-04 10:18:15

Assignee deleted (~~intrigeri~~)
QA Check changed from Ready for QA to Pass

Deployed and merged into testing + devel. Optimistically closing, even though only time will tell if this works well in all cases. Congrats!

#40 Updated by intrigeri 2017-06-04 10:47:32

Renamed feature/12599 to wip/feature/12599.

#41 Updated by intrigeri 2017-06-12 16:10:07

Status changed from Fix committed to Resolved

#42 Updated by intrigeri 2017-06-30 18:06:56

related to ~~Bug #13302~~: /var/lib/libvirt/images sometimes gets filled on isobuilders, take 2 added

#43 Updated by intrigeri 2017-06-30 19:12:02

Priority changed from High to Elevated