Bug #12725

Sort out the apt-snapshots-disk partition situation on apt.lizard

Added by bertagaz 2017-06-16 13:50:25 . Updated 2017-07-04 08:29:16 .

Status:
Resolved
Priority:
High
Assignee:
Category:
Infrastructure
Target version:
Start date:
2017-06-16
Due date:
% Done:

100%

Feature Branch:
Type of work:
Sysadmin
Blueprint:

Starter:
Affected tool:
Deliverable for:
289

Description

Due to the implementation of Vagrant builds in Jenkins being deployed before Feature #12002, we could not grow this partition as much as planned/needed. On top of that, we had to freeze a bit more APT snapshots than usual for Feature #5630. So our monitoring is rightfully complaining that the disk space left on the apt-snapshosts-disk partition of apt.lizard is critical.


Subtasks


Related issues

Related to Tails - Bug #12829: FTBFS due to buggy APT sources set in Vagrant build box by the provision script Resolved 2017-06-21
Related to Tails - Feature #12002: Estimate hardware cost of reproducible builds in Jenkins Resolved 2016-11-28
Related to Tails - Bug #13526: apt-snapshots partition lacks disk space Resolved 2017-07-27

History

#1 Updated by intrigeri 2017-06-17 07:50:02

Given this operation is destructive and hard to revert (we don’t backup time-based snapshots), I strongly suggest you give me a list of snapshots you want to force garbage collection for (and how you built the list) so I can review it before you delete stuff. Thanks!

#2 Updated by bertagaz 2017-06-19 09:33:23

  • Assignee changed from bertagaz to intrigeri
  • % Done changed from 0 to 20
  • QA Check set to Info Needed

intrigeri wrote:
> Given this operation is destructive and hard to revert (we don’t backup time-based snapshots), I strongly suggest you give me a list of snapshots you want to force garbage collection for (and how you built the list) so I can review it before you delete stuff. Thanks!

True. So here’s what I found after searching for APT snapshots which Valid-Until is not in June:

  • 2017040603, Valid-Until set to Sat, 15 Jul 2017
  • 2017042704, Valid-Until set to Sat, 28 Oct 2017

The former is the 2.12 snapshot and will disappear soon. The later is one we’ve bumped while working on the vagrant builds. It’s not used anymore AFAIK, since the basebox snapshot serials have been bumped with the 3.0 release so it seems a good candidate to garbage collection.

#3 Updated by intrigeri 2017-06-19 11:25:48

  • Assignee changed from intrigeri to bertagaz
  • QA Check changed from Info Needed to Dev Needed

> * 2017040603, Valid-Until set to Sat, 15 Jul 2017
> * 2017042704, Valid-Until set to Sat, 28 Oct 2017

> The former is the 2.12 snapshot and will disappear soon.

OK, so this one is fully expected, and I don’t see any reason to remove it manually.
(If we need to remove it manually, fine, but then something’s seriously wrong and the root cause should be tracked elsewhere.)

> The later is one we’ve bumped while working on the vagrant builds. It’s not used anymore AFAIK, since the basebox snapshot serials have been bumped with the 3.0 release so it seems a good candidate to garbage collection.

OK, let’s force early expiration for this one then.

#4 Updated by bertagaz 2017-06-20 12:49:52

  • Status changed from Confirmed to In Progress
  • % Done changed from 20 to 50

intrigeri wrote:
> > * 2017040603, Valid-Until set to Sat, 15 Jul 2017
> > * 2017042704, Valid-Until set to Sat, 28 Oct 2017
>
> > The former is the 2.12 snapshot and will disappear soon.
>
> OK, so this one is fully expected, and I don’t see any reason to remove it manually.
> (If we need to remove it manually, fine, but then something’s seriously wrong and the root cause should be tracked elsewhere.)

Yes.

> > The later is one we’ve bumped while working on the vagrant builds. It’s not used anymore AFAIK, since the basebox snapshot serials have been bumped with the 3.0 release so it seems a good candidate to garbage collection.
>
> OK, let’s force early expiration for this one then.

Done, will be garbage collected tomorrow, let see if it fixes the situation.

#5 Updated by intrigeri 2017-06-24 17:55:46

  • Subject changed from Clean up apt-snapshots-disk partition on apt.lizard to Sort out the apt-snapshots-disk partition situation on apt.lizard
  • Priority changed from Normal to High

bertagaz wrote:
> Done, will be garbage collected tomorrow, let see if it fixes the situation.

Time runs and today our snapshots system can’t do its job anymore:

NOT ENOUGH FREE SPACE on filesystem 0xfe20 (the filesystem '/srv/apt-snapshots/time-based/repositories/debian/db' is on)
available blocks 832090, needed blocks 1084079, block size is 4096.
"/usr/bin/reprepro" unexpectedly returned exit value 255 at /usr/local/bin/tails-update-time-based-apt-snapshots line 40.

#6 Updated by intrigeri 2017-06-25 10:10:37

  • related to Bug #12829: FTBFS due to buggy APT sources set in Vagrant build box by the provision script added

#7 Updated by intrigeri 2017-06-25 11:41:47

We’re far from having allocated everything planned for this partition, so I’m growing it by 10GB in the hope that’s enough to unbreak this temporarily. But a real solution is needed, and we don’t have enough space available to grow this partition as much as we planned, due to the Vagrant thing having been deployed and Feature #12002 not being done yet.

#8 Updated by intrigeri 2017-06-25 11:43:05

  • related to Feature #12002: Estimate hardware cost of reproducible builds in Jenkins added

#9 Updated by intrigeri 2017-06-30 19:04:54

  • Description updated

#10 Updated by intrigeri 2017-06-30 19:06:39

intrigeri wrote:
> > * 2017040603, Valid-Until set to Sat, 15 Jul 2017
>
> > The former is the 2.12 snapshot and will disappear soon.
>
> OK, so this one is fully expected, and I don’t see any reason to remove it manually.
> (If we need to remove it manually, fine, but then something’s seriously wrong and the root cause should be tracked elsewhere.)

Now that the root cause is clear, please do force early expiration of this one too, as yet another temporary mitigation of the problematic situation we’re in until the root cause is solved.

#11 Updated by bertagaz 2017-07-02 10:53:32

intrigeri wrote:
> Now that the root cause is clear, please do force early expiration of this one too, as yet another temporary mitigation of the problematic situation we’re in until the root cause is solved.

Ack, done. 2017040603 will expire tomorrow, we’ll see how much disk space it frees.

#12 Updated by bertagaz 2017-07-03 13:02:25

bertagaz wrote:
> Ack, done. 2017040603 will expire tomorrow, we’ll see how much disk space it frees.

This 2.12 snapshot has freed around 38G, we’re back to green here for now with 54G left.

#13 Updated by intrigeri 2017-07-04 08:23:21

  • Status changed from In Progress to Resolved

> This 2.12 snapshot has freed around 38G, we’re back to green here for now with 54G left.

Great! Closing then: this ticket was about the short-term emergency situation we were in, and the long-term fix is tracked by Feature #11806 (which is itself blocked by Feature #12002).

#14 Updated by intrigeri 2017-07-04 08:29:16

  • Assignee deleted (bertagaz)
  • % Done changed from 50 to 100
  • QA Check changed from Dev Needed to Pass

#15 Updated by intrigeri 2017-07-29 06:07:49

  • related to Bug #13526: apt-snapshots partition lacks disk space added