Bug #16020

acngtool shrink is insufficient to maintain acng cache size

Added by anonym 2018-10-02 13:15:53 . Updated 2019-05-06 18:15:42 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Build system
Target version:
Start date:
2018-10-02
Due date:
% Done:

100%

Feature Branch:
bugfix/16020-fix-cache-shrinking
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

In build-tails we call acngtool shrink 10G before each build to prevent the cache from running out of disk space. From what I can tell it doesn’t clean APT indices correctly, e.g. in my /var/cache/apt-cacher-ng/time-based.snapshots.deb.tails.boum.org/debian I have snapshots dating back to January 2017. Each such snapshot takes 30-120 MB (especially the old ones with multiarch are large) so it adds up, for me to 8 GBs. :S

Either we need to improve acngtool (for everyone’s benefit) or we manually find snapshots older than six months (or whatever) and purge them from acng’s cache.


Subtasks


Related issues

Related to Tails - Bug #17288: ENOSPC during build, while upgrading the Vagrant box Resolved
Blocks Tails - Feature #16209: Core work: Foundations Team Confirmed

History

#1 Updated by bertagaz 2018-10-02 22:15:42

I’ve noticed that too locally, acng is quite sloppy at shrinking to the maximum amount it is given. It always do to a somewhat higher value. I “solved” that by just lowering the number to get closer to my needs.

#2 Updated by anonym 2018-10-03 09:12:11

That didn’t work for me — no matter the size argument I gave nothing was removed. It is as if a too high ratio of non-debs (APT indices, TBB tarballs) makes its calculations flip out and nothing is freed.

#3 Updated by anonym 2018-10-08 15:24:37

intri suggested that acng’s daily cronjob should be able to clean it up, but that it takes a long time: “(it fetches all dists again to identify obsolete packages) so I doubt we can do that at every build”.

#4 Updated by segfault 2018-10-08 15:25:23

I had the same issue (see Bug #16032)

#5 Updated by anonym 2018-10-08 15:27:33

anonym wrote:
> intri suggested that acng’s daily cronjob should be able to clean it up, but that it takes a long time: “(it fetches all dists again to identify obsolete packages) so I doubt we can do that at every build”.

Or maybe not:

(17:25:53) intrigeri: ah ah on lizard, we do
"rm -rf /var/cache/apt-cacher-ng/*.tails.boum.org"
weekly, so no, the cronjob is not what saves us there.

#6 Updated by intrigeri 2019-03-18 06:38:04

#7 Updated by intrigeri 2019-03-18 06:39:46

  • Assignee deleted (anonym)

After segfault & anonym, @CyrilBrulebois was affected by this problem yesterday ⇒ added to the FT’s radar.

No progress since a while here ⇒ deassigning anonym for now, let’s make it clear that this ticket is up for grabs and could be tackled by whoever else has time for it :)

#8 Updated by CyrilBrulebois 2019-03-18 13:52:53

  • Assignee set to CyrilBrulebois

Hit this yesterday or the day before, it’s next on my list.

#9 Updated by CyrilBrulebois 2019-03-21 07:30:11

Let’s look at our code calling acngtool shrink (vagrant/provision/assets/build-tails):

if [ "${TAILS_PROXY_TYPE}" = "vmproxy" ]; then
    # The apt-cacher-ng cache disk is 15G, so let's ensure at most 10G
    # of it is used there is 5G before each build, which should be
    # enough for any build, even if we have to download a complete set
    # of new packages for a new Debian release.
    /usr/lib/apt-cacher-ng/acngtool shrink 10G -f || \
        echo "The clean-up of apt-cacher-ng's cache failed: this is" \
             "not fatal and most likely just means that some disk" \
             "space could not be reclaimed -- in order to fix that" \
             "situation you need to manually investigate " \
             "/var/cache/apt-cacher-ng/apt-cacher-ng-log/main_*.html" >&2
fi

It seems pretty straightforward, catching issues and displaying an error message when that happens (while carrying on).

But upstream code (source/acngtool.cc in apt-cacher-ng 2-2) has:

        if(verbose)
                cout << "Found " << totalSize << " bytes of relevant data, reducing to " << wantedSize << endl;
        while(!delQ.empty())
        {
                bool todel = (totalSize > wantedSize);
                totalSize -= delQ.top().size;
                const char *msg = 0;
                if(verbose || dryrun)
                        msg = (todel ? "Delete: " : "Keep: " );
                auto& delpath(delQ.top().path);
                if(msg)
                        cout << msg << delpath << endl << msg << delpath << ".head" << endl;
                if(todel && apply)
                {
                        unlink(delpath.c_str());
                        unlink(mstring(delpath + ".head").c_str());
                }
                delQ.pop();
        }
        return 0;

Notice the utter lack of error handling and the mandatory return 0;? That’s why we have been missing this for so long: that command never fails. In verbose mode, it seems there’s much work going on, with many “Delete:” and a few “Keep:” entries. But the filesystem is left untouched.

Prepending the command with as_root_do fixes the shrinking…

#10 Updated by CyrilBrulebois 2019-03-21 07:36:10

  • Status changed from Confirmed to In Progress

Applied in changeset commit:tails|f68a97af7ce475c1efcd7a7208b60995adf59d52.

#11 Updated by CyrilBrulebois 2019-03-21 07:37:06

  • Status changed from In Progress to Confirmed
  • Assignee deleted (CyrilBrulebois)
  • QA Check set to Ready for QA
  • Feature Branch set to bugfix/16020-fix-cache-shrinking

#12 Updated by intrigeri 2019-03-21 08:12:18

  • Assignee set to segfault
  • Target version set to Tails_3.14

Thanks a lot, kibi, for your work here :)

Hi @segfault! Last time I checked you used the apt-cacher-ng maintained by our build system. Could you please review this branch?

#13 Updated by anonym 2019-04-02 15:53:36

  • Status changed from Confirmed to Fix committed
  • % Done changed from 0 to 100

Applied in changeset commit:tails|3269cb5870f1de236ec2724e43214ee923823313.

#14 Updated by anonym 2019-04-02 15:53:50

  • Assignee deleted (segfault)
  • % Done changed from 100 to 0
  • QA Check changed from Ready for QA to Pass

I tested it, and it worked perfectly for me! Woohoo!

I’ve merged into stabledevel but skipped feature/buster to not force us all to build a new basebox given our bandwidth limitations during the sprint.

#15 Updated by anonym 2019-04-02 15:53:59

  • % Done changed from 0 to 100

#16 Updated by intrigeri 2019-05-05 08:24:00

  • Target version changed from Tails_3.14 to Tails_3.13.2

#17 Updated by anonym 2019-05-06 15:00:39

  • Status changed from Fix committed to Resolved

#18 Updated by anonym 2019-05-06 15:03:17

  • Target version changed from Tails_3.13.2 to Tails_3.14

#19 Updated by intrigeri 2019-05-06 18:15:42

  • Target version changed from Tails_3.14 to Tails_3.13.2

#20 Updated by intrigeri 2019-12-12 07:10:29

  • related to Bug #17288: ENOSPC during build, while upgrading the Vagrant box added