Project

General

Profile

Bug #15695

Avoid breaking automatic upgrades to Tails 3.9

Added by intrigeri 3 months ago. Updated 18 days ago.

Status:
Resolved
Priority:
Elevated
Assignee:
-
Category:
Build system
Target version:
Start date:
06/30/2018
Due date:
% Done:

100%

Estimated time:
0.50 h
QA Check:
Pass
Feature Branch:
bugfix/15695-avoid-breaking-automatic-upgrades-to-tails-3-9
Type of work:
Research
Blueprint:
Starter:
Affected tool:
Upgrader

Description

As noticed thanks to #15419, our devel branch changes the GID of the debian-tor user:

-vboxsf:x:112:
-monkeysphere:x:113:
-debian-tor:x:114:tor-launcher
-lpadmin:x:115:
+monkeysphere:x:112:
+debian-tor:x:113:tor-launcher
+lpadmin:x:114:
+vboxsf:x:115:

Related issues

Related to Tails - Bug #15407: Prevent system user uid:s and gid:s from changing between releases Resolved 06/28/2018
Related to Tails - Bug #15424: Use fixed UID and GID for debian-tor Rejected 03/16/2018
Related to Tails - Bug #13426: Tor does not start on Tails 3.0.1 automatically upgraded from 3.0 Resolved 07/05/2017
Blocks Tails - Feature #15334: Core work 2018Q3: Foundations Team Confirmed 02/20/2018
Blocks Tails - Bug #15419: Detect earlier in the dev process if we're breaking automatic upgrades Resolved 06/28/2018

Associated revisions

Revision 37a9dd76 (diff)
Added by lamby about 1 month ago

Factor-out fixing of tails-persistent-setup UID/GID settings into utility methods. (refs: #15695)

Revision a9767517 (diff)
Added by lamby about 1 month ago

Be more explicit when changing UIDs and GIDs. (refs: #15695)

Revision 1d3d7854 (diff)
Added by lamby about 1 month ago

Factor-out fixing of tails-persistent-setup UID/GID settings into utility methods. (refs: #15695)

v2 by Cyril Brulebois:
- Use the correct command to adjust group ID.

Signed-off-by: Cyril Brulebois <>

Revision 3682fde3 (diff)
Added by lamby about 1 month ago

Prevent change of gid of debian-tor user breaking automatic upgrades. (Closes: #15695)

As outlined in a774ba2aba7012f3f5bc630b0f3dab8ac696dbd7, since the
virtualbox 5.2.8-dfsg-7 upload the virtualbox-guest-utils package depends
on virtualbox-guest-dkms which in-turn creates a "vboxsf" group later than
before.

This results in the resulting debian-tor group ID being assigned a
different number due to change of relative ordering and, accordingly,
breaks automatic upgrades.

We therefore assign a number of groups, fresh, unreserved group IDs to
avoid collisions.

Revision 070e872b (diff)
Added by intrigeri about 1 month ago

Ensure GIDs are stable across releases (refs: #15695).

Fixup on top of 3682fde35080381eb97d649966b584fc9c777998: we should not
"assign a number of groups, fresh, unreserved group IDs" because then these
groups would have a different GID than in Tails 3.8, which is precisely
what we're trying to avoid here :)

Revision 6091605f (diff)
Added by intrigeri about 1 month ago

Acknowledge expected innocuous change to /etc/group (refs: #15695)

Only order changes, which does not matter.

Revision a158c465
Added by intrigeri about 1 month ago

Merge remote-tracking branch 'kibi/bugfix/15695-avoid-breaking-automatic-upgrades-to-tails-3-9' into devel (Fix-committed: #15695, #15407, #15419)

History

#1 Updated by intrigeri 3 months ago

As explained in a774ba2aba7012f3f5bc630b0f3dab8ac696dbd7, "Since the virtualbox 5.2.8-dfsg-7 upload, the virtualbox-guest-utils package depends on virtualbox-guest-dkms". I think this new dependency explains why virtualbox-guest-utils, that creates the vboxsf group, is installed later than before.

#2 Updated by intrigeri 3 months ago

  • Related to Bug #15419: Detect earlier in the dev process if we're breaking automatic upgrades added

#3 Updated by intrigeri 3 months ago

#4 Updated by intrigeri 3 months ago

  • Status changed from Confirmed to In Progress
  • % Done changed from 0 to 10

Brainstorming our options:

  1. ship a patched virtualbox-guest-utils package that removes the added dependency and hope that's enough to fix this instance of the more general problem (while we're at it, we can as well backport the latest version from sid, that adds compatibility with Linux 4.17; too bad, virtualbox is now in stretch-backports which could have given us the opportunity to stop shipping it via our custom APT repo); that's relatively cheap and easy, and surely an acceptable way to postpone the problem, hoping we have more time to address its root cause next time another instance of this problem pops up.
  2. go back to the hacks from #15424: change UIDs/GIDs via a chroot hook, killing processes as needed; service startup is disabled in the chroot, so in theory only processes started via maintainer scripts and left behind should have to be killed; in practice, that probably boils down to gpg-agent; so while my concern about killing processes outside of the chroot remains, limiting the kill spree to gpg-agent processes might be safe enough; in our current Vagrant VM, I see no user that has obvious reasons to have gpg-agent running, and indeed I see no such process running during a build. That's also relatively cheap and easy, and most of the work has been done already. It'l be wasted work at some point if overlayfs is not affected by the bug, but well, at least we're covered until #8415 is done.
  3. find some way to stabilize all UIDs/GIDs (#15407): looks like more work than the two first options, and that work will be wasted if overlayfs is not affected by the bug, which makes this option very unappealing to me.
  4. complete #8415 (overlayfs) in time for Tails 3.9… assuming overlayfs is not affected by the bug that makes us worry about changing UIDs/GIDs (to be verified on #15689): there's not that much work left to do and we have to do it soon anyway, so clearly it would be the best place to invest our time in; but it's not clear whether it's realistic to do that in the next 1.5 months

I'm currently leaning towards the 2nd option (stabilize just the UIDs/GIDs we want), with the 1st one as a fallback. But depending on how our first shots at #15091 and #15023 go, and on how we distribute the work on these two big tasks, who knows, #8415 might be doable. I'll come back here in a few weeks and depending on our progress on these other fronts, I'll pick the best approach among the options that will be realistic at this point.

#5 Updated by intrigeri 3 months ago

  • Related to Bug #15407: Prevent system user uid:s and gid:s from changing between releases added

#6 Updated by intrigeri 3 months ago

  • Related to Bug #15424: Use fixed UID and GID for debian-tor added

#7 Updated by intrigeri 3 months ago

  • Related to deleted (Bug #15419: Detect earlier in the dev process if we're breaking automatic upgrades)

#8 Updated by intrigeri 3 months ago

  • Blocks Bug #15419: Detect earlier in the dev process if we're breaking automatic upgrades added

#9 Updated by intrigeri 3 months ago

  • Related to Bug #13426: Tor does not start on Tails 3.0.1 automatically upgraded from 3.0 added

#10 Updated by intrigeri about 2 months ago

intrigeri wrote:

I'm currently leaning towards the 2nd option (stabilize just the UIDs/GIDs we want), with the 1st one as a fallback. But depending on how our first shots at #15091 and #15023 go, and on how we distribute the work on these two big tasks, who knows, #8415 might be doable.

If I magically have time to do #8415 in time for the freeze, great, but let's not count on it => I think we should go for the 2nd option, aiming to have something merged by August 10. I'll try to find a FT member who can handle this.

#11 Updated by intrigeri about 2 months ago

lamby will take a look and tell me by Aug 11 whether he wants it (tentative budget: 2h).

#12 Updated by intrigeri about 2 months ago

  • Assignee changed from intrigeri to lamby

#13 Updated by intrigeri about 1 month ago

intrigeri wrote:

lamby will take a look and tell me by Aug 11 whether he wants it (tentative budget: 2h).

Ping? The freeze is on the 15th so if you don't want it, I'll work on it tomorrow or Tuesday; and if you want it, great :)

#14 Updated by lamby about 1 month ago

  • Assignee changed from lamby to intrigeri

Thanks for the ping. I've spent about 1h looking at this issue and trying to understand it.

In my attempt to properly understand this issue, I dived into the code, resulting in the following entirely-untested commits: https://github.com/lamby/tails/commits/15695-avoid-breaking-automatic-upgrades-to-tails-3-9. These might be worth merging anyway as it's essentially "better commenting"

However, I got somewhat stuck on:

change UIDs/GIDs via a chroot hook, killing processes as needed; service startup is disabled in the chroot, so in theory only processes started via maintainer scripts and left behind should have to be killed; in practice, that probably boils down to gpg-agent

Three questions arise for me here: why would anything be running at all? Why would that be gpg-agent and not, well, something running as debian-tor? Why do we not do (or need?) the kill-process dance for the tails-persistent-setup chroot hack right now?

(Some quick personal notes for this bug: See #13426 for the background on why this breaks upgrades, the root cause being probably in aufs which will eventually be replaced with overlayfs in #8415.)

#15 Updated by intrigeri about 1 month ago

  • Assignee changed from intrigeri to lamby

In my attempt to properly understand this issue, I dived into the code, resulting in the following entirely-untested commits: https://github.com/lamby/tails/commits/15695-avoid-breaking-automatic-upgrades-to-tails-3-9. These might be worth merging anyway as it's essentially "better commenting"

Good idea! Two potential issues that prevent me from merging it though:

  • In Change_gid I think you meant -exec chgrp instead of -exec chown.
  • Change_gid 112 150 feels strange because we had TPS_GROUP_STEALER=$(getent group 122 | awk -F ':' '{print $1}') (note 112 vs. 122)

By the way, basing your work on top of my branch for #15419 will help you validate it :) I've updated bugfix/15419-detect-uid-and-gid-changes so our CI builds it and I'll share the FTBFS message with you as soon as a build has failed, which will demonstrate better what the practical problem we need to solve now is.

However, I got somewhat stuck on:

change UIDs/GIDs via a chroot hook, killing processes as needed; service startup is disabled in the chroot, so in theory only processes started via maintainer scripts and left behind should have to be killed; in practice, that probably boils down to gpg-agent

Three questions arise for me here: why would anything be running at all?
Why would that be gpg-agent and not, well, something running as debian-tor?

On #15424 we noticed that the monkeysphere user had a running process. I've assumed this was gpg-agent because on our ISO build CI jobs we have to kill those on exit in order to be able to tear down a mountpoint. IIRC that's because the monkeysphere postinst performs GnuPG operations, that trigger gpg-agent startup (in the background, that's how GnuPG 2 works). The tor package does no such thing which is why there's no running process left behind running as the debian-tor user. Now, my hunch might be wrong (I don't recall whether I reality-checked it) but we'll know as soon as the workaround this ticket is now about is implemented :)

Why do we not do (or need?) the kill-process dance for the tails-persistent-setup chroot hack right now?

Do you mean: why don't we do it yet (in current devel branch)? Or why shouldn't we do it for 3.9 as part of this ticket? I guess because we've never needed it so far :)

#16 Updated by lamby about 1 month ago

  • Assignee changed from lamby to intrigeri

In Change_gid I think you meant -exec chgrp instead of -exec chown.

Fixed.

Change_gid 112 150 feels strange because we had TPS_GROUP_STEALER=$(getent group 122 | awk -F ':' '{print $1}') (note 112 vs. 122)

Fixed. That'll teach me to work in the sun...

By the way, basing your work on top of my branch for #15419 will help you validate it

Rebased.

#17 Updated by intrigeri about 1 month ago

  • Assignee changed from intrigeri to lamby

I'll review the (side-)changes you've done already (and will likely merge them!) but let's keep this assigned to you for tracking the main problem this ticket is about.

#18 Updated by intrigeri about 1 month ago

I've merged your changes into bugfix/15419-detect-uid-and-gid-changes, now building.

And here's the current problem we're trying to "solve" here:

Checking UIDs and GIDs stability
/usr/share/tails/build/passwd /etc/passwd differ: char 1310, line 25
/etc/passwd and/or /etc/group differs from expected:
--- /usr/share/tails/build/passwd    2018-06-09 15:22:28.000000000 +0000
+++ /etc/passwd    2018-08-13 09:48:10.437902256 +0000
@@ -22,8 +22,8 @@
 _apt:x:104:65534::/nonexistent:/bin/false
 messagebus:x:103:105::/var/run/dbus:/bin/false
 memlockd:x:105:110:memlockd system account,,,:/usr/lib/memlockd:/bin/false
-monkeysphere:x:106:113:monkeysphere authentication user,,,:/var/lib/monkeysphere:/bin/bash
-debian-tor:x:107:114::/var/lib/tor:/bin/false
+monkeysphere:x:106:112:monkeysphere authentication user,,,:/var/lib/monkeysphere:/bin/bash
+debian-tor:x:107:113::/var/lib/tor:/bin/false
 speech-dispatcher:x:108:29:Speech Dispatcher,,,:/var/run/speech-dispatcher:/bin/false
 colord:x:109:117:colord colour management daemon,,,:/var/lib/colord:/bin/false
 saned:x:110:118::/var/lib/saned:/bin/false

Content of '/etc/passwd':
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
irc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
systemd-timesync:x:100:102:systemd Time Synchronization,,,:/run/systemd:/bin/false
systemd-network:x:101:103:systemd Network Management,,,:/run/systemd/netif:/bin/false
systemd-resolve:x:102:104:systemd Resolver,,,:/run/systemd/resolve:/bin/false
_apt:x:104:65534::/nonexistent:/bin/false
messagebus:x:103:105::/var/run/dbus:/bin/false
memlockd:x:105:110:memlockd system account,,,:/usr/lib/memlockd:/bin/false
monkeysphere:x:106:112:monkeysphere authentication user,,,:/var/lib/monkeysphere:/bin/bash
debian-tor:x:107:113::/var/lib/tor:/bin/false
speech-dispatcher:x:108:29:Speech Dispatcher,,,:/var/run/speech-dispatcher:/bin/false
colord:x:109:117:colord colour management daemon,,,:/var/lib/colord:/bin/false
saned:x:110:118::/var/lib/saned:/bin/false
pulse:x:111:119:PulseAudio daemon,,,:/var/run/pulse:/bin/false
hplip:x:112:7:HPLIP system user,,,:/var/run/hplip:/bin/false
Debian-gdm:x:113:121:Gnome Display Manager:/var/lib/gdm3:/bin/false
tails-persistence-setup:x:115:122::/home/tails-persistence-setup:/bin/false
clearnet:x:114:123::/home/clearnet:/bin/false
htp:x:116:124::/home/htp:/bin/false
tails-iuk-get-target-file:x:117:125::/home/tails-iuk-get-target-file:/bin/false
tails-upgrade-frontend:x:118:126::/home/tails-upgrade-frontend:/bin/false
tor-launcher:x:119:127::/home/tor-launcher:/bin/false
tails-install-iuk:x:120:128::/home/tails-install-iuk:/bin/false

--- /usr/share/tails/build/group    2018-06-09 15:22:28.000000000 +0000
+++ /etc/group    2018-08-13 09:48:10.505901231 +0000
@@ -48,10 +48,10 @@
 ssh:x:109:
 memlockd:x:110:
 ssl-cert:x:111:
-vboxsf:x:112:
-monkeysphere:x:113:
-debian-tor:x:114:tor-launcher
-lpadmin:x:115:
+monkeysphere:x:112:
+debian-tor:x:113:tor-launcher
+lpadmin:x:114:
+vboxsf:x:115:
 scanner:x:116:saned
 colord:x:117:
 saned:x:118:

Content of '/etc/group':
root:x:0:
daemon:x:1:
bin:x:2:
sys:x:3:
adm:x:4:
tty:x:5:
disk:x:6:
lp:x:7:
mail:x:8:
news:x:9:
uucp:x:10:
man:x:12:
proxy:x:13:
kmem:x:15:
dialout:x:20:
fax:x:21:
voice:x:22:
cdrom:x:24:
floppy:x:25:
tape:x:26:
sudo:x:27:
audio:x:29:pulse
dip:x:30:
www-data:x:33:
backup:x:34:
operator:x:37:
list:x:38:
irc:x:39:
src:x:40:
gnats:x:41:
shadow:x:42:
utmp:x:43:
video:x:44:
sasl:x:45:
plugdev:x:46:
staff:x:50:
games:x:60:
users:x:100:
nogroup:x:65534:
systemd-journal:x:101:
systemd-timesync:x:102:
systemd-network:x:103:
systemd-resolve:x:104:
input:x:106:
crontab:x:107:
netdev:x:108:
messagebus:x:105:
ssh:x:109:
memlockd:x:110:
ssl-cert:x:111:
monkeysphere:x:112:
debian-tor:x:113:tor-launcher
lpadmin:x:114:
vboxsf:x:115:
scanner:x:116:saned
colord:x:117:
saned:x:118:
pulse:x:119:
pulse-access:x:120:
Debian-gdm:x:121:
tails-persistence-setup:x:122:
clearnet:x:123:
htp:x:124:
tails-iuk-get-target-file:x:125:tails-install-iuk
tails-upgrade-frontend:x:126:
tor-launcher:x:127:
tails-install-iuk:x:128:

If these changes are innocuous, update these files in config/chroot_local-includes/usr/share/tails/build/.
See #15407 and #13426 for more context.
E: config/chroot_local-hooks/99-zzz_check_uids_and_gids failed (exit non-zero). You should check for errors.

#19 Updated by lamby about 1 month ago

  • Assignee changed from lamby to intrigeri

let's keep this assigned to you for tracking the main problem this ticket is about.

Thanks, good idea — as mentioned, I only really wrote and sent over these patches so I ensured I was understanding the issue, kinda like how making your own notes in a lecture is not the same as just reading someone else's! :) Do appreciate they are somewhat distracting however.

Here's the current problem we're trying to "solve" here:

Thanks for sending that over.

So, my reading of that diff of that we need to move the uid of vboxdrv from 112 → 151 and its gid from 112 → 151. That your quick assessment too? :)

#20 Updated by intrigeri about 1 month ago

  • Assignee changed from intrigeri to lamby

So, my reading of that diff of that we need to move the uid of vboxdrv from 112 → 151 and its gid from 112 → 151. That your quick assessment too? :)

I don't see vboxdrv involved anywhere in the error message I've pasted.

My quick assessment is that UIDs are all right (unchanged in the diff), only GIDs are problematic. Wrt. GIDs, 4 groups are involved and they're fighting for a stable set of 4 GIDs (112-115). What we need to do is to restore the state from Tails 3.8 (captured in /usr/share/tails/build/group) i.e.:

  • vboxsf: GID 115 → 112
  • monkeysphere: GID 112 → 113
  • debian-tor: GID 113 → 114
  • lpadmin: GID 114 → 115

The "fun" part is that I think we can't do this in place with atomic groupmod operations. So likely we need to temporarily give one of these groups another, free GID that's not in the 112-115 range, and then change the GID for the 3 other groups in the right order, and finally give its correct GID to the group that was temporarily moved out of the way.

Needless to say, all this is ugly and I don't expect it to get any better in the future… until we fix the root cause of the problem or devise a nicer solution to set all these IDs in stone.

#21 Updated by lamby about 1 month ago

I don't see vboxdrv involved anywhere in the error message I've pasted.

Ah, copy-paste fail. vboxsf.. :)

[…]

How about the attached? :)

#22 Updated by intrigeri about 1 month ago

  • Assignee changed from intrigeri to lamby

How about the attached? :)

I think it will work but:

  • Can you confirm you tested it and the new check for changed UIDs/GIDs does not abort the build?
  • The first part of the patch hard-codes the current (as in, on our devel branch today) GID allocated to these 4 groups. As the mere existence of this very ticket shows, it's a risky assumption. So next time these GIDs start dancing around, IMO this patch will make it harder to fix the problem than it could, because if we apply it as-is we may be changing the GID of unrelated groups in the future. So I'd rather see us do things like Change_gid $(gid_of monkeysphere) 1120. Makes sense?

#23 Updated by lamby about 1 month ago

I'd rather see us do thingws like Change_gid $(gid_of monkeysphere) 1120

Oh, good idea. This is not only a more stable solution, it is also much cleaner in the code (self-documenting!). Please see the attached 2 patches which I have also pushed to https://github.com/lamby/tails/commits/15695-avoid-breaking-automatic-upgrades-to-tails-3-9.

Can you confirm you tested it and the new check for changed UIDs/GIDs does not abort the build?

Unfortunately I have not tested it (except in a local chroot actually, but that doesn't count). Unfortunately, I am semi-VAC today and tomorrow (first time off since 12th July...) so I won't be able to commit to testing this before the 15th unless it happens to rain this afternoon (!) which would mean that I would not need to be entertaining my dear mother :)

#24 Updated by intrigeri about 1 month ago

I'd rather see us do thingws like Change_gid $(gid_of monkeysphere) 1120

Oh, good idea. This is not only a more stable solution, it is also much cleaner in the code (self-documenting!). Please see the attached 2 patches which I have also pushed to https://github.com/lamby/tails/commits/15695-avoid-breaking-automatic-upgrades-to-tails-3-9.

I like it! I'll take it over from here. I had a quick look and the only major problem I've noticed is that you're giving these groups different GIDs than in Tails 3.8, which is precisely what we want to avoid. But the machinery is in place on your branch so I should be able to easily use it to fix the problem we're tackling here :)

[…] unless it happens to rain this afternoon (!) which would mean that I would not need to be entertaining my dear mother :)

Indeed, rain is entertaining enough in itself :)

#25 Updated by intrigeri about 1 month ago

  • Assignee changed from intrigeri to segfault
  • % Done changed from 10 to 50
  • Estimated time set to 0.50 h
  • QA Check set to Ready for QA
  • Feature Branch set to bugfix/15695-avoid-breaking-automatic-upgrades-to-tails-3-9

This branch includes the check brought for #15419 so if it builds (and it did build at least until I've merged current devel into it), that means it does its job :) And indeed, until my last fixes that check did abort the build, which is confidence-inspiring.

#26 Updated by intrigeri about 1 month ago

  • Assignee changed from segfault to CyrilBrulebois

#27 Updated by lamby about 1 month ago

Damn, I just came back to this issue after I've been (somewhat failing) to build this on my local machine for the past hour due to reasons unrelated to the changes. I should build Tails more often, basically.

you're giving these groups different GIDs than in Tails 3.8

What's the problem there?

#28 Updated by intrigeri about 1 month ago

you're giving these groups different GIDs than in Tails 3.8

What's the problem there?

This would trigger another instance of the problem we had with the upgrade to 3.0.1 (#13426) and again with the upgrade to 3.6 (which made us create #15407 and #15419, which in turn made me notice we would have the same problem again for 3.9, which finally made me file this very ticket): due to what looks like an aufs bug, on devices upgraded via our incremental (automatic) update mechanism, the tor daemon won't be allowed to write to its /var/lib/tor because the GID of the group owning that directory changed between the different levels of the SquashFS stack we build with aufs.

#29 Updated by intrigeri about 1 month ago

  • Status changed from In Progress to Fix committed
  • Assignee deleted (CyrilBrulebois)
  • % Done changed from 50 to 100
  • QA Check changed from Ready for QA to Pass

#30 Updated by lamby about 1 month ago

  • Status changed from Fix committed to In Progress

#31 Updated by lamby about 1 month ago

  • Status changed from In Progress to Resolved

#32 Updated by intrigeri about 1 month ago

  • Status changed from Resolved to Fix committed

#33 Updated by intrigeri 18 days ago

  • Status changed from Fix committed to Resolved

Also available in: Atom PDF