Feature #9508

Evaluate freezable APT repo's storage needs

Added by intrigeri 2015-05-30 09:26:28 . Updated 2015-06-23 09:11:19 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Infrastructure
Target version:
Start date:
2015-05-30
Due date:
% Done:

100%

Feature Branch:
Type of work:
Research
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

This should be done for two different hypotheses:

  1. if we import only the packages we need;
  2. if we import the entire APT suites we fetch packages from.

In both cases, source packages must be taken into account.


Subtasks


Related issues

Related to Tails - Feature #9487: Research what solution to use for the freezable APT repository Resolved 2013-09-26
Related to Tails - Feature #6303: Adapt our infrastructure to be able to handle tons of packages Resolved 2016-01-04

History

#1 Updated by intrigeri 2015-05-30 11:51:57

  • Status changed from Confirmed to In Progress
  • % Done changed from 0 to 10

Importing only selected packages

Here’s how to compute a very rough estimates of the total size of source packages corresponding to source+binary packages used while building Tails/Jessie, plus all the binary packages built from these source packages. It assumes that we’re only fetching stuff from Jessie main.

Get the list of source packages corresponding to source+binary packages used while building Tails/Jessie:

wget http://nightly.tails.boum.org/build_Tails_ISO_feature-jessie/latest.iso.{bin,src}pkgs && \
for PKG in $(cat latest.iso.binpkgs latest.iso.srcpkgs) ; do grep-dctrl -X -n -FPackage -sSource $PKG /var/lib/apt/lists/ftp.us.debian.org_debian_dists_jessie_*_binary-amd64_Packages ; done | cut -d ' ' -f1 | sort -u > srcpkgs

And then, in a Jessie i386 chroot with deb-src enabled, get the size of files from the source packages and corresponding binary packages built from them, and compute the total:

rm sizes
for SRCPKG in $(cat srcpkgs) ; do
  (
    grep-dctrl -S -sChecksums-Sha256 --no-field-names --exact-match $SRCPKG /var/lib/apt/lists/*_dists_jessie_*_source_Sources | grep -E -v '^\s*$' | cut -d ' ' -f3
    grep-dctrl -S -sSize --no-field-names --eregex "$SRCPKG(\s+\(.*\))?" /var/lib/apt/lists/*_dists_jessie_*_binary-i386_Packages 
  ) >> sizes
done
TOTAL=0
for s in $(cat sizes) ; do
  TOTAL=$(($TOTAL + $s))
done
echo $TOTAL

=> 14857315433 bytes, i.e. 15GB.

A similar computation for current Tails stable (Wheezy) outputs about 12GB.

And soon enough, we’ll need the same for Stretch; assuming a similar growth to Wheezy to Jessie (25%), let’s say 19GB.

In total, that’s 46GB. On top of that, let’s assume a 10% yearly growth due to security updates and point-releases, and all in all this gives us a total of about 50GB in a year.

Importing entire APT suites for i386 and source

All dists

According to https://www.debian.org/mirror/size, as of today:

  • source = 77GB
  • all = 112GB
  • i386 = 114GB
  • backports = 55GB for all archs, so likely ~20GB for source+all+i386
  • security = 54GB for all archs, so likely ~20GB for source+all+i386

Total = 340GB

Only selected dists

As of today, this excludes the *-updates, squeeze*, backports and security dists, compared to mirroring the entire archive. Also, we’re not counting indices, but their size is negligible compared to the actual packages’.

debmirror -v --progress --dry-run --proxy=http://127.0.0.1:3142/ --method=http -h ftp.us.debian.org -d wheezy,jessie,testing,sid -s main,contrib,non-free -a i386 --rsync-extra=none --source ./debmirror
cd debmirror/.temp
grep-dctrl -sPackage,Version -S --no-field-names --eregex '.*' dists/*/*/source/Sources  | grep -v -E '^\s*$' | perl -E 'my $in_pkg = 1; while (my $l = <>) { if ($in_pkg) { chomp $l; print $l; $in_pkg = 0} else { print " $l" ; $in_pkg = 1} }' | sort -u | while read srcpkg version ; do grep-dctrl -sChecksums-Sha256  --no-field-names  -S -X $srcpkg -a -FVersion -X $version  dists/*/*/source/Sources ; done  | grep -E -v '^\s*$' | cut -d ' ' -f3 > srcpkgs_sizes
grep-dctrl -sPackage,Version,Size -P --no-field-names --eregex '.*' dists/*/*/binary-i386/Packages | grep -v -E '^\s*$' | perl -E 'my $in = 'pkg'; while (my $l = <>) { if ($in eq 'pkg') { chomp $l; print $l; $in = 'version' } elsif($in eq 'version') { chomp $l; print " $l"; $in = 'size'} else { print " $l" ; $in = 'pkg'} }' | sort -u | cut -d ' ' -f3 > binpkgs_sizes 
TOTAL=0
for s in $(cat {bin,}srcpkgs_sizes) ; do
  TOTAL=$(($TOTAL + $s))
done
echo $TOTAL

Total = 340GB (which feels a bit wrong given the above results — mistake in the process anywhere?)

#2 Updated by intrigeri 2015-05-30 13:42:19

  • related to Feature #9487: Research what solution to use for the freezable APT repository added

#3 Updated by intrigeri 2015-05-30 13:42:38

  • related to Feature #6303: Adapt our infrastructure to be able to handle tons of packages added

#4 Updated by intrigeri 2015-05-30 19:18:13

  • Status changed from In Progress to Resolved
  • % Done changed from 10 to 100

As stated on Feature #9488#note-4, I’m now strongly leaning towards importing full APT dists, so let’s say the frozen APT repo will need 340GB + 25% growth = 425GB in a year.

#5 Updated by bertagaz 2015-06-23 09:11:19

  • Assignee deleted (intrigeri)

Unassigning a resolved ticket.