Project

General

Profile

Feature #6295

Feature #5926: Freezable APT repository

Feature #9487: Research what solution to use for the freezable APT repository

Evaluate consequences of importing large amounts of packages into reprepro

Added by intrigeri about 4 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Elevated
Assignee:
-
Category:
Infrastructure
Target version:
Start date:
09/26/2013
Due date:
% Done:

100%

QA Check:
Feature Branch:
Type of work:
Sysadmin

Description

Try it, see how it behaves, including:

The information we need is:

  • information about the hardware and system load it was tested on
  • how much time each operation takes
  • how much peak memory each operation takes
  • disk space used after each operation

For the set of packages to import, as a first approximation, use the *.packages file found at https://tails.boum.org/torrents/files/. These are binary packages, so a first step would be to convert this list to the corresponding list of source packages, based on the APT sources (plus corresponding deb-src lines) configured in Tails. Note that this part of the work can very well be done in very hackish ways to start with (baby steps!), but a nicer solution will have to be found later (#6297).

People are doing that, e.g. to maintain local mirrors:
http://vincent.bernat.im/en/blog/2014-local-apt-repositories.html

After the initial evaluation, we'll want to keep an eye on resources usage and performance in production-like settings.


Related issues

Blocks Tails - Feature #6303: Adapt our infrastructure to be able to handle tons of packages Resolved 01/04/2016

History

#1 Updated by intrigeri almost 4 years ago

  • Category set to Infrastructure

#2 Updated by sajolida almost 4 years ago

We’ve been advised to ask people from Grml on their setup, which is made to answer similar questions.

<http://deb.grml.org/>

#3 Updated by intrigeri over 3 years ago

#4 Updated by intrigeri over 2 years ago

  • Parent task changed from #5926 to #9487

#5 Updated by intrigeri over 2 years ago

#6 Updated by intrigeri over 2 years ago

#8 Updated by intrigeri over 2 years ago

  • Assignee set to intrigeri
  • Target version changed from Sustainability_M1 to Tails_2.3

#9 Updated by intrigeri over 2 years ago

  • Target version changed from Tails_2.3 to 246

#10 Updated by intrigeri over 2 years ago

  • Blocks deleted (Feature #6303: Adapt our infrastructure to be able to handle tons of packages)

#11 Updated by intrigeri over 2 years ago

  • Description updated (diff)

#12 Updated by intrigeri over 2 years ago

#14 Updated by intrigeri about 2 years ago

  • Status changed from Confirmed to In Progress
  • reprepro update to clone wheezy main i386 from a local mirror: 30 minutes => with a remote mirror, the network will be the limiting factor
  • reprepro pull to snapshot that clone into a new, empty distribution: blazingly fast
  • doing the same 200 more times: the first cloning operations take 4-5 seconds, this time raises up to 15-20 seconds for the last ones; the resulting db/packages.db file is 20GB big
  • doing the same 600 more times: the first cloning operation takes 3 minutes (presumably because lots of data needs to be read from disk), the following ones each take 15-20 seconds; the resulting db/packages.db file is 78GB big
  • the raw number of entries in conf/distributions has little effect; what slows down operations is importing stuff into these distributions

#15 Updated by intrigeri about 2 years ago

  • Description updated (diff)

#16 Updated by intrigeri about 2 years ago

  • Assignee changed from intrigeri to CyrilBrulebois
  • % Done changed from 0 to 10

#17 Updated by intrigeri about 2 years ago

  • Assignee changed from CyrilBrulebois to intrigeri

(Err, the next step, i.e. setting up time-based snapshots on our infra and checking how it goes, is on my own plate.)

#18 Updated by intrigeri about 2 years ago

After the initial mirroring, with reprepro update, of (Wheezy, Jessie) * i386 + (Strech, sid, experimental) * (amd64, i386) into 5 distributions: the reprepro directory is 205GB large, among which 833MB is in db/. After adding ( oldstable * (base, updates, p-u, backports, sloppy-backports) + stable * (base, updates, p-u, backports) + testing * (base, updates, p-u) + sid + experimental), i386 for each and amd64 for stretch and newer: 226GB, including 902MB DB.

#19 Updated by intrigeri about 2 years ago

  • Blueprint set to https://tails.boum.org/blueprint/freezable_APT_repository/

I'm doing tests with reprepro gensnapshot and will report about it on the blueprint.

#20 Updated by intrigeri about 2 years ago

  • Target version changed from 246 to Tails_1.8
  • % Done changed from 10 to 90
  • Easy changed from Yes to No

I think we're done here. Only remaining evaluation left to do is about disk space (since it blocks purchasing hardware), which I'll move to another ticket since it's not blocking our design choices anymore. Off the top of my head that would be:

  • for time-based snapshots we don't have enough storage space and 24/7 good bandwidth at the same place, so we need to estimate based on:
    • total storage space needed by a complete mirror with no snapshots
    • apply to that the (size added by keeping N days of incremental snapshots / size of an initial mirror) ratio found for a partial mirror (some architectures disabled)
  • for tagged snapshots?

#21 Updated by intrigeri almost 2 years ago

  • Target version changed from Tails_1.8 to Tails_2.0

#22 Updated by intrigeri almost 2 years ago

Starting everything needed and taking notes of initial stats so that I can have numbers for the time-based snapshots in ~10 days.

#23 Updated by intrigeri almost 2 years ago

  • Blocks Feature #6303: Adapt our infrastructure to be able to handle tons of packages added

#24 Updated by intrigeri almost 2 years ago

  • Blocks deleted (Feature #6296: Configure reprepro to pull from foreign APT repositories)

#25 Updated by intrigeri almost 2 years ago

  • Priority changed from Normal to Elevated

This will be blocking the actual deployment so I'd like to get it done early.

#26 Updated by intrigeri almost 2 years ago

Initial partial mirror (not all suites/archs) on misc.lizard:

  • debian: 235G
  • debian-security: 9.5G
  • tails: 90M
  • torproject: 45M

=> total = 245G

#27 Updated by intrigeri almost 2 years ago

10 days later:

  • Debian: 292G
  • debian-security: 12G
  • tails: 269M
  • torproject: 56M

Total = 304G, that is +24%.

#28 Updated by intrigeri almost 2 years ago

Complete mirror without snapshots:

  • Debian: 330G
  • debian-security: 13G
  • tails: 287M
  • torproject: 44M

Total = 343G

#29 Updated by intrigeri almost 2 years ago

  • time-based snapshots: in the current state of the archive: complete without snapshots * (size after keeping N days of incremental snapshots / size of an initial mirror) ratio = 343G * 1.24 = 425G. Assuming +25% growth/year, in a year that'll be 425G*1.25 = 531G
  • tagged snapshots:
    • assuming packages are not stored multiple times (that is, we import them in a single, persistent reprepro instance, even though the filterlist etc. used for importing are volatile): #9508 's results + amd64 for 3 versions of Debian should be around (15 + 15*1.25 + 15*1.25*1.25) * 1.43 (for adding amd64) = 82G; add 10% for security updates etc. over a year => 90G
    • I'll have more up-to-date numbers once I've tested tails-prepare-tagged-apt-snapshot-import (#10749), but let's not block on it

=> 620G in a year should be a good enough estimate to allow us to get the hardware and stop blocking on lack of disk space.

#30 Updated by intrigeri almost 2 years ago

  • Status changed from In Progress to Resolved
  • Assignee deleted (intrigeri)
  • % Done changed from 90 to 100

Also available in: Atom PDF