Feature #6295

Evaluate consequences of importing large amounts of packages into reprepro

Added by intrigeri 2013-09-26 05:26:21 . Updated 2016-01-04 15:03:21 .

Status:
Resolved
Priority:
Elevated
Assignee:
Category:
Infrastructure
Target version:
Start date:
2013-09-26
Due date:
% Done:

100%

Feature Branch:
Type of work:
Sysadmin
Starter:
0
Affected tool:
Deliverable for:
269

Description

Try it, see how it behaves, including:

The information we need is:

  • information about the hardware and system load it was tested on
  • how much time each operation takes
  • how much peak memory each operation takes
  • disk space used after each operation

For the set of packages to import, as a first approximation, use the *.packages file found at https://tails.boum.org/torrents/files/. These are binary packages, so a first step would be to convert this list to the corresponding list of source packages, based on the APT sources (plus corresponding deb-src lines) configured in Tails. Note that this part of the work can very well be done in very hackish ways to start with (baby steps!), but a nicer solution will have to be found later (Feature #6297).

People are doing that, e.g. to maintain local mirrors:
http://vincent.bernat.im/en/blog/2014-local-apt-repositories.html

After the initial evaluation, we’ll want to keep an eye on resources usage and performance in production-like settings.


Subtasks


Related issues

Blocks Tails - Feature #6303: Adapt our infrastructure to be able to handle tons of packages Resolved 2016-01-04

History

#1 Updated by intrigeri 2013-12-29 03:20:31

  • Category set to Infrastructure

#2 Updated by sajolida 2013-12-31 04:57:19

We’ve been advised to ask people from Grml on their setup, which is made to answer similar questions.

<http://deb.grml.org/>

#3 Updated by intrigeri 2014-06-21 13:21:15

#4 Updated by intrigeri 2015-05-28 15:25:44

#5 Updated by intrigeri 2015-05-28 15:27:03

#6 Updated by intrigeri 2015-05-28 15:28:14

#7 Updated by intrigeri 2015-05-28 15:29:27

  • blocks #8668 added

#8 Updated by intrigeri 2015-05-28 15:29:52

  • Assignee set to intrigeri
  • Target version changed from Sustainability_M1 to Tails_2.3

#9 Updated by intrigeri 2015-05-28 15:46:20

  • Target version changed from Tails_2.3 to 246

#10 Updated by intrigeri 2015-05-28 15:53:33

  • blocked by deleted (Feature #6303: Adapt our infrastructure to be able to handle tons of packages)

#11 Updated by intrigeri 2015-05-30 13:41:54

  • Description updated

#12 Updated by intrigeri 2015-05-30 17:42:20

#13 Updated by intrigeri 2015-08-26 05:59:40

  • Deliverable for set to 269

#14 Updated by intrigeri 2015-10-21 08:11:14

  • Status changed from Confirmed to In Progress
  • reprepro update to clone wheezy main i386 from a local mirror: 30 minutes => with a remote mirror, the network will be the limiting factor
  • reprepro pull to snapshot that clone into a new, empty distribution: blazingly fast
  • doing the same 200 more times: the first cloning operations take 4-5 seconds, this time raises up to 15-20 seconds for the last ones; the resulting db/packages.db file is 20GB big
  • doing the same 600 more times: the first cloning operation takes 3 minutes (presumably because lots of data needs to be read from disk), the following ones each take 15-20 seconds; the resulting db/packages.db file is 78GB big
  • the raw number of entries in conf/distributions has little effect; what slows down operations is importing stuff into these distributions

#15 Updated by intrigeri 2015-10-22 09:18:21

  • Description updated

#16 Updated by intrigeri 2015-10-22 09:19:15

  • Assignee changed from intrigeri to CyrilBrulebois
  • % Done changed from 0 to 10

#17 Updated by intrigeri 2015-10-22 09:43:35

  • Assignee changed from CyrilBrulebois to intrigeri

(Err, the next step, i.e. setting up time-based snapshots on our infra and checking how it goes, is on my own plate.)

#18 Updated by intrigeri 2015-10-23 09:36:03

After the initial mirroring, with reprepro update, of (Wheezy, Jessie) * i386 + (Strech, sid, experimental) * (amd64, i386) into 5 distributions: the reprepro directory is 205GB large, among which 833MB is in db/. After adding ( oldstable * (base, updates, p-u, backports, sloppy-backports) + stable * (base, updates, p-u, backports) + testing * (base, updates, p-u) + sid + experimental), i386 for each and amd64 for stretch and newer: 226GB, including 902MB DB.

#19 Updated by intrigeri 2015-10-24 02:47:30

  • Blueprint set to https://tails.boum.org/blueprint/freezable_APT_repository/

I’m doing tests with reprepro gensnapshot and will report about it on the blueprint.

#20 Updated by intrigeri 2015-11-02 02:46:02

  • Target version changed from 246 to Tails_1.8
  • % Done changed from 10 to 90
  • Starter changed from Yes to No

I think we’re done here. Only remaining evaluation left to do is about disk space (since it blocks purchasing hardware), which I’ll move to another ticket since it’s not blocking our design choices anymore. Off the top of my head that would be:

  • for time-based snapshots we don’t have enough storage space and 24/7 good bandwidth at the same place, so we need to estimate based on:
    • total storage space needed by a complete mirror with no snapshots
    • apply to that the (size added by keeping N days of incremental snapshots / size of an initial mirror) ratio found for a partial mirror (some architectures disabled)
  • for tagged snapshots?

#21 Updated by intrigeri 2015-12-13 05:48:24

  • Target version changed from Tails_1.8 to Tails_2.0

#22 Updated by intrigeri 2015-12-13 07:23:52

Starting everything needed and taking notes of initial stats so that I can have numbers for the time-based snapshots in ~10 days.

#23 Updated by intrigeri 2015-12-13 12:11:04

  • blocks Feature #6303: Adapt our infrastructure to be able to handle tons of packages added

#24 Updated by intrigeri 2015-12-13 12:18:38

  • blocked by deleted (Feature #6296: Configure reprepro to pull from foreign APT repositories)

#25 Updated by intrigeri 2015-12-13 12:20:12

  • Priority changed from Normal to Elevated

This will be blocking the actual deployment so I’d like to get it done early.

#26 Updated by intrigeri 2015-12-14 03:36:12

Initial partial mirror (not all suites/archs) on misc.lizard:

  • debian: 235G
  • debian-security: 9.5G
  • tails: 90M
  • torproject: 45M

=> total = 245G

#27 Updated by intrigeri 2015-12-24 06:53:24

10 days later:

  • Debian: 292G
  • debian-security: 12G
  • tails: 269M
  • torproject: 56M

Total = 304G, that is +24%.

#28 Updated by intrigeri 2015-12-29 01:20:55

Complete mirror without snapshots:

  • Debian: 330G
  • debian-security: 13G
  • tails: 287M
  • torproject: 44M

Total = 343G

#29 Updated by intrigeri 2016-01-04 15:01:03

  • time-based snapshots: in the current state of the archive: complete without snapshots * (size after keeping N days of incremental snapshots / size of an initial mirror) ratio = 343G * 1.24 = 425G. Assuming +25% growth/year, in a year that’ll be 425G*1.25 = 531G
  • tagged snapshots:
    • assuming packages are not stored multiple times (that is, we import them in a single, persistent reprepro instance, even though the filterlist etc. used for importing are volatile): Feature #9508 ’s results + amd64 for 3 versions of Debian should be around (15 + 15*1.25 + 15*1.25*1.25) * 1.43 (for adding amd64) = 82G; add 10% for security updates etc. over a year => 90G
    • I’ll have more up-to-date numbers once I’ve tested tails-prepare-tagged-apt-snapshot-import (Feature #10749), but let’s not block on it

=> 620G in a year should be a good enough estimate to allow us to get the hardware and stop blocking on lack of disk space.

#30 Updated by intrigeri 2016-01-04 15:03:21

  • Status changed from In Progress to Resolved
  • Assignee deleted (intrigeri)
  • % Done changed from 90 to 100