Project

General

Profile

Bug #14622

Have the Installer detect when "Clone & Upgrade" won't fit in the destination partition before deleting anything

Added by intrigeri 3 months ago. Updated 15 days ago.

Status:
In Progress
Priority:
High
Assignee:
Category:
Installation
Target version:
Start date:
09/11/2017
Due date:
% Done:

10%

QA Check:
Info Needed
Feature Branch:
bugfix/14622-installer-check-space-before-deleting, installer:bugfix/14622-installer-check-space-before-deleting, installer:debian_bugfix/14622-installer-check-space-before-deleting
Type of work:
Code
Blueprint:
Starter:
Affected tool:
Installer

Description

When working on #12707 we noticed that "Clone & Upgrade" does that:

  1. delete all the known Tails system files on the target system partition
  2. check for the available free space (gui.py:self.live.check_free_space())
  3. copy the source file to the target system partition

So if the source system partition is larger than the target one, and enough automatic upgrades have been installed on the source system partition, then the check for the available free space will fail and we will abort the upgrade… after having deleted Tails from the target partition. This is not ideal, to say the least :] This situation will happen once the landspace of Tails devices is made of system partitions with a variety of sizes, filled by IUKs to various levels.

Thankfully we have some time to fix this after #12707 goes live, and before it becomes a problem in practice:

  • "the source system partition is larger than the target one" is unlikely before #12707 is deployed in the wild (there are surely Tails devices somewhere with a 1.5 GiB system partition, but then those are already affected by this problem anyway if they "Clone & Upgrade" from their friend's more recent Tails device so it's not a regression);
  • It'll take some time until a newly installed Tails 3.2, with a 4 or 8 GiB system partition, gathers "enough automatic upgrades have been installed on the source system partition".

Next step: figure out how much time we have, adjust "Target version" accordingly and set "Type of work" to "Code". I want to do this during the 3.3 cycle.

I'll handle this ticket with my Foundations Team hat.


Related issues

Related to Tails - Feature #12615: Consider basing Tails on quarterly snapshots of Debian testing In Progress 06/13/2017
Blocked by Tails - Feature #12705: Update the size of the system partition to >= 4 GiB Resolved 06/15/2017

History

#1 Updated by intrigeri 3 months ago

  • Parent task deleted (#12705)

#2 Updated by intrigeri 3 months ago

  • Blocked by Feature #12705: Update the size of the system partition to >= 4 GiB added

#3 Updated by intrigeri 3 months ago

  • Target version changed from Tails_3.3 to Tails_3.5

anonym, can you please double-check my reasoning below? (I'm not explicitly reassigning to you as I don't want to risk realizing late in the 3.4 cycle that this is still waiting on your plate and I still have a big chunk of dev work to do.)

intrigeri wrote:

Thankfully we have some time to fix this after #12707 goes live, and before it becomes a problem in practice:
[…]
  • It'll take some time until a newly installed Tails 3.2, with a 4 or 8 GiB system partition, gathers "enough automatic upgrades have been installed on the source system partition".

Next step: figure out how much time we have

So the problem will happen once a source device has more than 2.5G of Tails data on its system partition. The earliest it can happen is when a newly installed Tails 3.2 (with a system partition that's at least 4G big) will have accumulated enough automatic upgrades. I'll reason for the worst case.

  • Let's assume that installing a given IUK takes exactly that IUK's size on the device (in practice it takes less space is we replace some files when installing it).
  • Let's assume that any IUK to upgrade to a major release is 350M big if we stick to Debian stable: I don't see any that's bigger than 300M on http://dl.amnesia.boum.org/tails/stable/iuk/ currently. But if we base Tails on quarterly snapshots of Debian testing, we should expect larger IUKs: I've built a 3.0~beta2_to_3.0.iuk locally (3 months passed between these two releases) and it takes 588M.
  • Let's assume a user that tests all our release candidates using automatic upgrades, i.e. for each major release they install, they instead of having the "previous release to major release" update installed, they have "previous release to RC + RC to major release" updates installed. The 3.1 → 3.2~rc1 → 3.2 path is 35% bigger than the 3.1 → 3.2 one.
  • Let's assume that any IUK to upgrade to a bugfix release is 300M big: 3.0 to 3.1 was 200M, 2.10 to 2.11 was 300M but we upgraded the kernel for two archs back then.
  • Let's assume we put out one emergency release every three months, and its IUK takes the same size as the one for a bugfix release i.e. 300M.
  • Tails Upgrader requires 3 times the size of an IUK to install it, so a 4G can't be filled more than 4G - 3 * size of the smallest possible IUK. We've already published IUKs smaller than 100M so the resulting number will be bigger than 2.5G anyway, so forget it.

So, if we stick to Debian stable until at least next January:

  • after upgrading to 3.3: 1150 + 300 = 1450M
  • after upgrading to 3.3.1: 1450 + 300 = 1750M
  • after upgrading to 3.4~rc1 and then to 3.4: 1750 + 350*1.35 = 2225M
  • so after upgrading to 3.4.1 or 3.5 it will be larger than 2500M => we have to implement a fix for this bug in time for 3.4 (in case we have to release 3.4.1, it'll be too late to do that on short notice)

But if we force everyone to manually upgrade to 3.4 based on Debian testing in January, then we have some more time; so depending on the outcome of #12615 we may be able to postpone this.

#4 Updated by anonym 3 months ago

I believe you have missed stating an assumption: you assume that the 4 GiB stick was upgraded to 3.2 immediately, and not 3.2~rc1 in between. Let's check how your timeline would look if we include 3.2~rc1:

  • after upgrading to 3.2~rc1 and then to 3.3: 1150 + 350*1.35 = 1622.5M
  • after upgrading to 3.3.1: 1622.5 + 300 = 1922.5M
  • after upgrading to 3.4~rc1 and then to 3.4: 1922.5 + 350*1.35 = 2395.0M which is still < 2500M => we're good!

So 3.4 indeed seems like the last release we can fix this in with the above expectations of emergency releases. However, I think it is worth noting that a single additional emergency release (which would not be unprecedented) would mean we'd have to plan to fix this in Tails 3.3, otherwise users that are testing our RC IUKs will get into trouble in Tails 3.4.

Any way, I think we're running ahead of ourselves. It's not clear to me what the exact implications (= risks for our users) of this Tails Installer bug. In the description you say:

[...] the check for the available free space will fail and we will abort the upgrade… after having deleted Tails from the target partition [...]

So, to clarify: the next time the device is scanned by Tails Installer, it will be treated as an "empty" drive so the only operation allowed will be an "Install", and that will remove any persistence partition without warning (I'm disregarding users that would click "Reinstall (delete all data)" since they literally are asking for the persistence partition to be overwritten). If so, we're talking about data loss (although it should be doable to completely recover the partition), and then I'm not sure if we want to bet on only a single emergency release.

In other words: does Tails Installer fail closed in this instance w.r.t. data loss (or removal of the TailsData part from the partition table)?

OTOH, if we have two emergency releases, we know this happens, and we can warn our users. So, in the end I think I'm good with your plan, as long as we pay attention to this!

#5 Updated by anonym 3 months ago

anonym wrote:

In other words: does Tails Installer fail closed in this instance w.r.t. data loss (or removal of the TailsData part from the partition table)?

tl;dr: Tails Installer fails closed!

I actually had a test set up around so I could easily test this. When the problematic upgrade is done, only the contents of the target Tails partition is removed, so next time Tails Installer looks at it it will still suggest an "Upgrade". That Tails installation is now broken, of course, but it can be restored simply by upgrading from a source that is small enough (e.g. the ISO, an install consisting of fewer IUKs, ...). The persistence partition is kept at all times, and at no point is the users facing its removal without a warning.

So, IMHO, your plan is completely sane!

#6 Updated by anonym 3 months ago

anonym wrote:

you assume that the 4 GiB stick was upgraded to 3.2 immediately, and not 3.2~rc1 in between.

Gah, this is just wrong. It should be: "you assume that the (8 GiB) stick was initially installed with 3.2, and not 3.2~rc1". So...

  • after upgrading to 3.2~rc1 and then to 3.3: 1150 + 350*1.35 = 1622.5M
  • after upgrading to 3.3.1: 1622.5 + 300 = 1922.5M
  • after upgrading to 3.4~rc1 and then to 3.4: 1922.5 + 350*1.35 = 2395.0M which is still < 2500M => we're good!

... this is also wrong. You original calculation only misses this initial step:

  • after upgrading from 3.2~rc1 to 3.2: 1150 + 92 = 1242M

The "92" comes from the actual size of Tails_amd64_3.2~rc1_to_3.2.iuk. :) In the end this will just bump your numbers with 92, so in the end we have 2225 + 92 = 2317M which is < 2500M so we are still good. :)

#7 Updated by intrigeri 3 months ago

I believe you have missed stating an assumption:

Thanks a lot for finding issues in my reasoning! :)

OTOH, if we have two emergency releases, we know this happens, and we can warn our users. So, in the end I think I'm good with your plan, as long as we pay attention to this!

ACK

#8 Updated by intrigeri 3 months ago

  • Related to Feature #12615: Consider basing Tails on quarterly snapshots of Debian testing added

#9 Updated by intrigeri 3 months ago

The "92" comes from the actual size of Tails_amd64_3.2~rc1_to_3.2.iuk. :) In the end this will just bump your numbers with 92, so in the end we have 2225 + 92 = 2317M which is < 2500M so we are still good. :)

Great.

I think we'll be in a better position to take bets about #12615 on October 9, and then if we feel it's unlikely that our January release is based on Buster, I'll probably try to work on this during the 3.3 cycle, just to be on the safe side. And then we'll see if the required changes qualify for a bugfix release.

#10 Updated by intrigeri 15 days ago

  • Feature Branch set to bugfix/14622-installer-check-space-before-deleting, installer:bugfix/14622-installer-check-space-before-deleting, installer:debian_bugfix/14622-installer-check-space-before-deleting

#11 Updated by intrigeri 15 days ago

  • Status changed from Confirmed to In Progress
  • Assignee changed from intrigeri to sajolida
  • % Done changed from 0 to 10
  • QA Check set to Info Needed

sajolida, this is high priority because it really really needs to be shipped in Tails 3.5. If you can't answer my question below by December 10, please reassign to me and I'll go ahead with my proposal.

To reliably know whether an upgrade by cloning will work space-wise, we need to mount the candidate destination Tails system partition: checking its size is not enough, as we need to know how much room the OS files we're going to delete occupy (some users put non-Tails files on their drive even if we discourage against it). So I see three options:

  1. don't bother fixing the problem this ticket is about in case the user has added non-Tails files to their Tails system partition; then I think this ticket should be very easy to fix for everyone else with minimal code changes
  2. for each candidate destination Tails system partition, mount it before adding it to the list of validate candidates, so we can check how much room the OS files occupy; in many cases that would trigger a desktop notification, which is not great for a UX point of view I guess; note that #14958 will require mounting these filesystems as well, if we ever want to address it; I seem to remember there's a way to temporary disable these notifications (that we already display sometimes), not sure if it's real and suitable here; in any case I'm pretty sure the resulting code will make an ugly, hard to understand/debug part of the code even worse;
  3. postpone the space check and do it after the user has selected a destination drive and clicked Upgrade; I guess it sucks from a UX perspective, because we're proposing to the user something that we're not sure can work, and only later we tell them it cannot; it sucks only for the users affected by the problem this ticket is about but at some point there will be many of them

I dislike the 3rd option because it breaks the flow and introduces a UX regression, while this ticket is part of a project that's supposed to improve UX.

I propose I quickly look into the 2nd option and evaluate how much more expensive to implement it is than the 1st one, and how deep into ugly code one has to dive in order to implement it. If my analysis concludes "it's not that bad", then I'll do it. Otherwise I'll fallback to option 1.

sajolida, what do you think?

#12 Updated by intrigeri 15 days ago

intrigeri wrote:

I propose I quickly look into the 2nd option and evaluate how much more expensive to implement it is than the 1st one, and how deep into ugly code one has to dive in order to implement it. If my analysis concludes "it's not that bad", then I'll do it. Otherwise I'll fallback to option 1.

sajolida, what do you think?

Additional notes:

  • if you tell me "don't bother with option 2, go directly with option 1", I'll certainly won't complain
  • an automated test case for option 2 will definitely be much more expensive to implement than one for option 1

Also available in: Atom PDF