Project

General

Profile

Feature #12428

Bug #12354: Fix shutdown and memory wipe regressions on 3.0~betaN

Ensure disk caches and aufs read-write branch are emptied during emergency shutdown

Added by intrigeri over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
Start date:
04/05/2017
Due date:
% Done:

100%

QA Check:
Pass
Feature Branch:
bugfix/12354-drop-kexec-memory-wipe
Type of work:
Code
Blueprint:
Starter:
Affected tool:

Description

… so that they are overwritten by kernel memory poisoning. "Emergency" shutdown = when one unplugs the boot device.

This needs to be done in a way that's as reliable as possible: in particular, the storage medium may host the persistent filesystem. If unmounting doesn't work well enough (or just as an additional safeguard), we should probably echo 3 > /proc/sys/vm/drop_caches at some point during the shutdown process.

Writing an automated test for the "Tails with persistent volume unlocked" and "aufs read-write branch" usecases would help confirming it actually works. It probably requires implementing a /lib/systemd/system-shutdown/ hook that pauses for a while when debug=wipemem is passed on the kernel command line, so that we can dump memory after we've tried unmounting the filesystems.

Associated revisions

Revision e6382573 (diff)
Added by intrigeri over 1 year ago

Drop kernel caches before shutting down (refs: #12428, #12354).

Let's optimize how much memory the kernel memory poisoning feature acts on.

Revision 690b968b (diff)
Added by intrigeri over 1 year ago

Test suite: add tests for memory erasure on shutdown (refs: #12428).

Revision 888ccc5a (diff)
Added by intrigeri over 1 year ago

Return to the initramfs (unpacked in /run/initramfs) on shutdown (refs: #12428, #12354, Debian#778849).

… otherwise the aufs read-write (tmpfs) branch, among possibly other things,
can't be properly unmounted and its content remains in memory.

Notes:

  • We have to handle some unmounting ourselves in initramfs-pre-shutdown-hook:
    systemd-shutdown doesn't manage to unmount the aufs read-write
    branch (/oldroot/lib/live/mount/overlay) as it is needed by the
    aufs (/oldroot) filesystem, and reciprocally it cannot unmount /oldroot as it
    is kept busy by /oldroot/lib/live/mount/*. So we disentangle this mess
    ourselves. And we have to manually empty the aufs read-write (tmpfs) branch,
    otherwise for some reason its content remains in memory. This code will of
    course need to be adapted for overlayfs some day.
  • We lock /bin/kill in memory: apparently systemd-exit.service needs it.
  • We remount /run on shutdown before dropping caches, just in case dropping
    caches removes what we've locked into memory.
  • We unpack the initramfs to /run/initramfs at boot time: sadly, I was not
    able to have it unpacked reliably in udev-watchdog-wrapper when the boot
    medium is ejected, so we'll use a little bit more RAM (instead of locking the
    compressed initramfs into memory, we're storing the uncompressed one there)
    and probably slow down the boot a bit, in order to make emergency shutdown
    robust. Note, however, that we save some of the RAM used by the uncompressed
    initramfs by deleting the worst offenders (kernel modules).
  • For now the whole procedure is quite noisy on the screen: the pre-shutdown
    hook runs under "set -x", doesn't run "clear", and spits out lots of
    debugging information. The goal is to enable users to provide useful
    debugging data if they have problems with emergency shutdown. Once we have
    shipped this code in a few releases and trust it's robust enough, we can
    surely reconsider and polish the UX by making the output less noisy.
  • We use absolute paths in many places to avoid $PATH lookup which might
    fail if the root filesystem is not there anymore.

Revision 5f588e52
Added by intrigeri over 1 year ago

Merge remote-tracking branch 'origin/bugfix/12354-drop-kexec-memory-wipe' into feature/stretch (Fix-committed: #12428, #12354)

History

#1 Updated by intrigeri over 1 year ago

  • Subject changed from Ensure filesystems are unmounted during emergency shutdown to Ensure disk caches are emptied during emergency shutdown
  • Description updated (diff)

#2 Updated by intrigeri over 1 year ago

  • Priority changed from Elevated to High

#3 Updated by intrigeri over 1 year ago

systemd-shutdown(8) (src/core/shutdown.c in the systemd source tree) tries hard to detach all DM & loop devices and unmount all filesystems. Then it runs everything found in /lib/systemd/system-shutdown/ before actually shutting down or rebooting. So /lib/systemd/system-shutdown/ indeed seems to be a good place to drop a script that echo 3 > /proc/sys/vm/drop_caches and then pauses if /run/tails_shutdown_debugging exists. Note that "All executables in this directory are executed in parallel, and execution of the action is not continued before all executables finished" so what we want to do really needs to be in one single script.

#4 Updated by intrigeri over 1 year ago

  • Subject changed from Ensure disk caches are emptied during emergency shutdown to Ensure disk caches and aufs read-write branch are emptied during emergency shutdown
  • Description updated (diff)

#5 Updated by intrigeri over 1 year ago

  • Status changed from Confirmed to In Progress
  • % Done changed from 0 to 10

My initial tests show that the content of the aufs read-write branch is not erased from memory on shutdown. It's not very surprising, as systemd's umount_all function (called from systemd-shutdown) does not try to unmount the root filesystem. Now, systemd-shutdown has code to return to the initrd and run /shutdown in there. Next step: check if this facility works in the context of Tails (might it be that only dracut supports this?). If it does, great! But it happens after /lib/systemd/system-shutdown/* so we cannot automatically test this with a script dropped in there.

#6 Updated by intrigeri over 1 year ago

intrigeri wrote:

Now, systemd-shutdown has code to return to the initrd and run /shutdown in there. Next step: check if this facility works in the context of Tails (might it be that only dracut supports this?).

This is indeed supported by dracut, but not by initramfs-tools. I'm assuming that fully switching to dracut requires more work than we're ready to put in time for Tails 3.0. I can think of two other solutions, and I don't know which one would be cheaper:

  • hack support for this facility into our initramfs; requirements:
    • a /run/initramfs that systemd-shutdown can chroot into (when using dracut, dracut-shutdown.service executes /usr/lib/dracut/dracut-initramfs-restore which unpacks the initramfs to /run/initramfs), and that contains everything needed to run:
    • a /run/initramfs/shutdown executable, that systemd-shutdown will call after chroot'ing; it can access the old root filesystem in /oldroot; its main task would be to unmount the old root FS; presumably the shutdown script included in dracut-generated initrds would be an excellent source of inspiration
  • switch to a dracut-generated initramfs during shutdown; requirements:
    • install dracut-core
    • during ISO build, use dracut to build a second initramfs dedicated to shutdown; it can be very small, e.g. we don't need any kernel module in there
    • possibly disable some dracut systemd units
    • ensure dracut-shutdown.service and /run/initramfs/shutdown work fine

Requirements common to both cases:

  • All this must work reliably during emergency shutdown as well: all the needed files must be locked into memory, and whatever code is responsible for unpacking the initramfs to /run/initramfs must either have been run already, or must work even when the root filesystem is not available anymore.
  • /run/initramfs/shutdown must sleep for a while when /run/tails_shutdown_debugging exists, so we can write automated tests. This should be easy both with the initramfs-tools option (we write the shutdown script ourselves so it can do whatever we want) and with the dracut one (its shutdown script runs administrator-defined custom hooks after unmounting the old root FS).

#7 Updated by intrigeri over 1 year ago

https://www.freedesktop.org/wiki/Software/systemd/InitrdInterface/ says that the ArchLinux initrd supports the initrd interface of systemd; see their:

And https://www.freedesktop.org/wiki/Software/systemd/RootStorageDaemons/ has some useful info.

#8 Updated by intrigeri over 1 year ago

This is indeed supported by dracut, but not by initramfs-tools.

There's a wishlist bug for it: https://bugs.debian.org/778849.

#9 Updated by anonym over 1 year ago

intrigeri wrote:

  • hack support for this facility into our initramfs; requirements:
    [vs]
  • switch to a dracut-generated initramfs during shutdown; requirements:

IMHO, if we plan to fully migrate to dracut soon (say, within a year) then let's consider going that way, otherwise let's not introduce yet another technology that only you understand and that we only partially use; I worry that since we won't be in the "normal" use case of dracut, we will be a bit on our own, and any changes that cause breakage for us will frustrate me unless you take it as your responsibility to fix it. Also, it feels bloaty to have two different initramfs generation systems, and since you present no argument why the dracut approach is preferable than the one we have, I fail to see any reason for considering it.

#10 Updated by intrigeri over 1 year ago

Thanks for your feedback!

IMHO, if we plan to fully migrate to dracut soon (say, within a year)

I would not bet on this, and given what I read below, I'm not going to lead this effort unless you are happy to use it and learn how it works (until external changes force us to evolve, at least).

otherwise let's not introduce yet another technology that only you understand and that we only partially use;

Got it. I understand the reluctance about having to learn new tools, and I empathize with the frustration that comes with breakage related to new tools, breakage that's hard to understand before one as spent some time learning how they work. Now, IMO it's an essential part of the Foundations Team job to learn new technologies when we (need to) switch to them (e.g. systemd). Conclusion: I agree that we should be careful when introducing new technologies, and carefully weight if they're worth the learning time they require from us all.

I worry that since we won't be in the "normal" use case of dracut, we will be a bit on our own, and any changes that cause breakage for us will frustrate me unless you take it as your responsibility to fix it.

Right, valid point, but:

and since you present no argument why the dracut approach is preferable than the one we have, I fail to see any reason for considering it.

There's no such thing as "the one we have": if we go the initramfs-tools way, we need to implement a brand new feature there, and then we'll probably have to maintain it ourselves. That's the main argument I was (implicitly) making in favour of the dracut approach, when I was comparing them above: it already does what we need (although surely nobody tested it in the same context yet). So in both cases, we will 1. have something Tails-specific to maintain; and 2. have to deal with breakage caused by external changes.

With this in mind, your argument in favour of initramfs-tools looks a bit like "let's write our own stuff from scratch, so that we don't have to learn about existing software that does essentially what we want already", which can be relevant sometimes, but doesn't feel very convincing in general :) In this case it's somewhat relevant: relevant since the "existing software" doesn't really support the context in which we want to use it. But only somewhat since writing our own code won't prevent external changes that can break it, and the initial implementation might require more Tails-specific work.

So as you can see, it's not clear cut to me what's best. I think I'll take your feedback/concerns into account, and will first give a try to the initramfs-tools option. If I realize that it involves reinventing too many wheels, I'll want to reconsider.

Thanks again!

#11 Updated by anonym over 1 year ago

Meta: I feel that you misunderstood me a lot, so I'll be overly clear to get my point across this time. Sorry for the verbosity!

intrigeri wrote:

otherwise let's not introduce yet another technology that only you understand and that we only partially use;

Got it. I understand the reluctance about having to learn new tools, and I empathize with the frustration that comes with breakage related to new tools, breakage that's hard to understand before one as spent some time learning how they work. Now, IMO it's an essential part of the Foundations Team job to learn new technologies when we (need to) switch to them (e.g. systemd).

Clarification: my "reluctance to learn new stuff" stems purely from time constraints. I feel excited about learning new stuff when I have the time to do it properly, which is rare. I hate learning stuff when I don't have the time, since that degenerates into learning by stressful, frustrating trial-and-error when trying to get something to work ASAP, therefore taking shortcuts in the learning process so you miss essential stuff, and finally ending up with something you are seriously unsure of does the right thing, and a sour initial feeling towards this technology.

Conclusion: I agree that we should be careful when introducing new technologies, and carefully weight if they're worth the learning time they require from us all.

Exactly! Let's just not forget that this is not only about the Foundations team learning new technologies, but about future contributors, auditors etc.

Also, let me refine my position like this: let's only use dracut if it comes out on top in the cost-benefit analysis with enough margin to justify introducing a new tool.

I worry that since we won't be in the "normal" use case of dracut, we will be a bit on our own, and any changes that cause breakage for us will frustrate me unless you take it as your responsibility to fix it.

Right, valid point, but:

and since you present no argument why the dracut approach is preferable than the one we have, I fail to see any reason for considering it.

There's no such thing as "the one we have":

Clarification: re-read the above with s/the one we have/the initramfs-tools appriach/! That was what I meant, sorry for being unclear!

if we go the initramfs-tools way, we need to implement a brand new feature there, and then we'll probably have to maintain it ourselves. That's the main argument I was (implicitly) making in favour of the dracut approach, when I was comparing them above: it already does what we need (although surely nobody tested it in the same context yet). So in both cases, we will 1. have something Tails-specific to maintain; and 2. have to deal with breakage caused by external changes.

So the choice boils down to picking between:
  • maintaining a new feature for initramsfs-tools
  • maintaining a probably unsupported use case of dracut

With this in mind, your argument in favour of initramfs-tools looks a bit like "let's write our own stuff from scratch, so that we don't have to learn about existing software that does essentially what we want already"

That is not my argument. My argument is: "Let's extend the tool we already are using, so that we don't have to learn about existing software that does essentially what we want already, but that we will use in an unusual (possibly unsupported) way, and it won't replace the other tool we are already using, but now we will use two tools."

And with my refined position, let's concateneate: "Unless extending the tool we already use turns out too costly."

In this case it's somewhat relevant: relevant since the "existing software" doesn't really support the context in which we want to use it. But only somewhat since writing our own code won't prevent external changes that can break it, and the initial implementation might require more Tails-specific work.

Agreed, so (again) once these are weighed against each other, let's pick dracut if it is superior enough to justify introducing another tool/technology.

So as you can see, it's not clear cut to me what's best.

Exactly, and take into account that I have no idea what dracut is beyond an event-driven initramfs-tools replacement and no experience of it whatsoever, so I focused on what I know, which are just some general points:

  • Introducing a new technology imposes a cost in time for learning it, both for current and future contributors.
  • Introducing a new technology in parallel to a similar technology introduces bloat and complexity.

Let me end with that I fully trust that your choice will be the right one! :)

#12 Updated by intrigeri over 1 year ago

My (local) work addresses this but so far emergency shutdown on boot medium removal doesn't return to the initramfs so the RAM is (presumably) not cleared. I'm working on this last part.

#13 Updated by anonym over 1 year ago

intrigeri wrote:

My (local) work addresses this but so far emergency shutdown on boot medium removal doesn't return to the initramfs so the RAM is (presumably) not cleared. I'm working on this last part.

What is the status on this?

#14 Updated by intrigeri over 1 year ago

anonym wrote:

intrigeri wrote:

My (local) work addresses this but so far emergency shutdown on boot medium removal doesn't return to the initramfs so the RAM is (presumably) not cleared. I'm working on this last part.

What is the status on this?

Exactly what I wrote above (I generally keep my tickets up-to-date as I prefer storing status on Redmine than in my brain :)

#15 Updated by intrigeri over 1 year ago

  • Target version changed from Tails_3.0 to Tails_3.0~rc1

#16 Updated by intrigeri over 1 year ago

  • Assignee changed from intrigeri to anonym
  • % Done changed from 10 to 50
  • QA Check set to Ready for QA

#17 Updated by anonym over 1 year ago

  • Assignee changed from anonym to intrigeri
  • % Done changed from 50 to 100
  • QA Check changed from Ready for QA to Pass

See #12354 for the review (no blockers found). Please close when you merge!

#18 Updated by intrigeri over 1 year ago

  • Status changed from In Progress to Fix committed

#19 Updated by intrigeri over 1 year ago

  • Assignee deleted (intrigeri)

#20 Updated by intrigeri over 1 year ago

  • Status changed from Fix committed to Resolved

Also available in: Atom PDF