Project

General

Profile

Feature #8415

Migrate from aufs to overlayfs

Added by intrigeri about 3 years ago. Updated 4 months ago.

Status:
In Progress
Priority:
Elevated
Assignee:
Category:
-
Target version:
Start date:
12/18/2014
Due date:
% Done:

55%

QA Check:
Feature Branch:
wip/feature/8415-overlayfs-stretch
Type of work:
Code
Blueprint:
Starter:
Affected tool:

Description

Tails 2.x uses aufs, that is 1. is a pain to use with the grsecurity patchset; 2. is out-of-tree and requires aufs-dkms. overlayfs has none of these problems. It's been merged into Linux 3.18, so the Debian kernel team has dropped aufs.

team: anonym, intrigeri (review)


Subtasks

Feature #8456: Test whiteouts support with overlayfsResolved

Feature #8472: Wait for overlayfs to support multiple read-only lower layersResolved

Feature #8473: Add support to live-boot to support multiple read-only lower layers with overlayfsIn Progress

Feature #8474: Test overlayfs in TailsResolved

Bug #8483: Fix overlayfs support in live-boot upstreamResolved

Feature #8801: Test our file ordering profiling process vs. overlayfsConfirmed

Bug #8908: bilibop doesn't support overlayfs and then the boot device has buggy permissionsResolved

Bug #9045: overlayfs breaks AppArmorResolved

Feature #9373: Make tails-iuk support overlayfsIn Progress

Feature #9421: Upgrade the overlayfs branch to Linux 4.xResolved

Feature #12105: Adjust chrooted browsers to overlayfsConfirmed

Feature #12106: Adjust test suite to overlayfsConfirmed

Feature #12112: Drop aufs-specific kludges from AppArmor policyConfirmedintrigeri

Associated revisions

Revision 81ae6446 (diff)
Added by Tails 12 months ago

Have live-boot use overlayfs as its union filesystem.

aufs doesn't play well with grsec.

Refs: #7649, #8415

Revision 59a9d8a7 (diff)
Added by intrigeri 12 months ago

AppArmor: add an alias that covers the overlayfs read-write branch.

Refs: #8415

Revision c1b80b4d (diff)
Added by intrigeri 12 months ago

AppArmor: pass the attach_disconnected flag to all profiles, for compatibility with overlayfs.

Refs: #8415

Revision e5879797 (diff)
Added by intrigeri 12 months ago

Stop including the aufs module (refs: #8415).

History

#1 Updated by intrigeri about 3 years ago

  • Description updated (diff)

#2 Updated by anonym almost 3 years ago

Bad news: by default overlayfs only supports stacking two layers and I've verified that at least the current 3.18 "trunk" kernel has this limitation. Given that we have the following for the stated use cases:

  • regular usage

Not sure what this means.

  • persistence

It should be fine for us. The base Live system needs one layer for the filesystem.squashfs and COW-dir/tmpfs union, and we need no more for read-write persistence. Read-only persistence will indeed need a layer too, but it will be irrelevant since it will start a new union; its lowerdir is not from another union but the persistence source directory on the persistence medium.

Actually I think Debian Live's persistence is safe with this change in general.

  • incremental upgrades

Since the base Live system already occupies one layer, we'll be limited to only one incremental upgrade.

Next up: whiteouts. It seems it's a character devices, but I didn't look into it.

  • chroots for web browsers

Affected by the same issue as incremental upgrades.

The stack depth limit of 2 has the following comment in the kernel sources: "Needs to be limited to prevent kernel stack overflow". That's vague, and 2 seems a bit arbitrary, so hopefully it can be increased without loss of stability, although it would require some serious testing. The question remains if we can get Debian to patch it to something higher, or, better yet, upstream Linux. For a stack depth limit of N, we can support N-1 incremental upgrades, so I guess we'd be happy with 8 or whatever.

To avoid more stacking than 2 layers we could resort to merging the contents of our squashfs when they're installed, but that would increase the RAM needed (for temporary disk space to extract them) to perform an upgrade quite a bit.

#3 Updated by intrigeri almost 3 years ago

anonym wrote:

  • regular usage

Not sure what this means.

I meant the stacked FS built by live-boot from the base squashfs and a tmpfs. Just what you describe below with "one layer for the [...]" :)

Next up: whiteouts. It seems it's a character devices, but I didn't look into it.

Indeed, for non-directories it's a char device: does SquashFS support char devices?

But for directories it's done with a xattr to make it opaque: does SquashFS support xattr?

The stack depth limit of 2 has the following comment in the kernel sources: "Needs to be limited to prevent kernel stack overflow". That's vague, and 2 seems a bit arbitrary, so hopefully it can be increased without loss of stability, although it would require some serious testing. The question remains if we can get Debian to patch it to something higher, or, better yet, upstream Linux.

I wonder why "2" was chosen, as opposed e.g. to 4 or 8.

For a stack depth limit of N, we can support N-1 incremental upgrades, so I guess we'd be happy with 8 or whatever.

Due to size limitations, we likely won't be able to install more than 4 IUKs in a row anyway, so N=5 would be just fine for us.

To avoid more stacking than 2 layers we could resort to merging the contents of our squashfs when they're installed, but that would increase the RAM needed (for temporary disk space to extract them) to perform an upgrade quite a bit.

... and CPU usage to recompress the merged squashfs!

Anyway, there's some hope: some work is in progress (by the current overlayfs maintainer) to support more than one read-only lower layer, and it's said to be the "top feature request". The code lives in git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git (branch overlayfs-next, currently). I'm not sure I got it right, but my understanding is that once this is implemented, we can merge all squashfs in one single mount in live-boot, and then both incremental upgrades (if whiteouts work right, otherwise we'll have to resort to crazy kludges) and chroot'ed browsers work again.

Later on the same thread, there are also patches to make the stack depth limit configurable at runtime, but it was deemed risky by the overlayfs maintainer without more work to check that the stack will be safe.

#4 Updated by anonym almost 3 years ago

intrigeri wrote:

anonym wrote:

Next up: whiteouts. It seems it's a character devices, but I didn't look into it.

Indeed, for non-directories it's a char device: does SquashFS support char devices?

Yes. I created a file with mknod test c 0 0 and squashed and the unsquashed, and I got it back. However, when testing overlayfs I couldn't for the life of me find the location of this whiteout character devices after deleting files..

But for directories it's done with a xattr to make it opaque: does SquashFS support xattr?

It seems it does. Both mksquashfs has the -xattrs option: "store extended attributes (default)."

For a stack depth limit of N, we can support N-1 incremental upgrades, so I guess we'd be happy with 8 or whatever.

Due to size limitations, we likely won't be able to install more than 4 IUKs in a row anyway, so N=5 would be just fine for us.

Right now, yes, but what about when we have same-day security updates? Then we probably would end up with several small IUKs between each planned (six week cycle) release.

To avoid more stacking than 2 layers we could resort to merging the contents of our squashfs when they're installed, but that would increase the RAM needed (for temporary disk space to extract them) to perform an upgrade quite a bit.

... and CPU usage to recompress the merged squashfs!

Right. While we cannot do anything about the CPU requirements I think we could at least reduce the memory needed for the squashfs merging by doing it on a file-by-file basis, i.e. use unsquashfs to list the contents of the new .squashfs, and for each of those files we extract it and add it into the previous .squashfs .squashfs. We'd have to deal with deleted/whiteout:ed files some how, of course.

Actually, if this approach is applied to the filesystem.squash, we'd eliminate much of the extra space needed for incremental upgrades since we'd end up with something that is (more or less) identical to a properly "Clone & Install":ed version of the target Tails version (modulo partition table stuff and maybe some other lower level stuff). This would allow users to do incremental upgrades essentially forever.

OTOH, perhaps modifying filesystem.squashfs is risky since the current live system is running from it, but if so we could do it in a separate upgrades.squashfs file that is the only extra overlayfs layer we'd ever add, then we could support incremental upgrades more or less forever too, provided something like 1 GB of free space on the Tails partition after a fresh install.

In either case, perhaps it'd be sound to require a fresh start ("Clone & Upgrade") for new Debian releases though. And with these approaches we probably could use tarballs instead of squashfs files for packaging the stuff inside the .iuk file. I would expect quite some complexity for dealing with deleted files, though.

I'll stop rambling but at least it's some food for thought.

Anyway, there's some hope: some work is in progress (by the current overlayfs maintainer) to support more than one read-only lower layer, and it's said to be the "top feature request". [...]

Indeed, that is good news! So I guess we'll be fine with living with the Jessie kernel until that feature hits a kernel in Debian Sid.

#5 Updated by intrigeri almost 3 years ago

anonym wrote:

intrigeri wrote:

anonym wrote:

Next up: whiteouts. It seems it's a character devices, but I didn't look into it.

[...]

Thanks for the tests, and glad the results are good! Filed #8456 to track the remaining bits on this front.

For a stack depth limit of N, we can support N-1 incremental upgrades, so I guess we'd be happy with 8 or whatever.

Due to size limitations, we likely won't be able to install more than 4 IUKs in a row anyway, so N=5 would be just fine for us.

Right now, yes, but what about when we have same-day security updates? Then we probably would end up with several small IUKs between each planned (six week cycle) release.

Indeed.

To avoid more stacking than 2 layers we could resort to merging the contents of
our squashfs when they're installed, but that would increase the RAM needed (for
temporary disk space to extract them) to perform an upgrade quite a bit.

... and CPU usage to recompress the merged squashfs!

Right. While we cannot do anything about the CPU requirements I think we could at least reduce the memory needed for the squashfs merging by doing it [...] I'll stop rambling but at least it's some food for thought.

Wow, crazy idea -- I like it! Given multiple read-only lower layers support WIP, this will hopefully be related to, but not blocking, when it comes to solving the problem at hand. Capture it in a low priority research ticket?

Anyway, there's some hope: some "work is in
progress":https:/msg00079.html
(by the current overlayfs maintainer) to support more than one read-only lower
layer, and it's said to be the "top feature request". [...]

Indeed, that is good news! So I guess we'll be fine with living with the Jessie kernel until that feature hits a kernel in Debian Sid.

I hope so. Not sure if it's worth testing their preliminary patches right now. On the one hand, it's quite a lot of effort, that has great chances to be useless. OTOH, it would be sad to wait months for the feature to land, and then discover only then that it's not fit for our use cases, for some reason. Thoughts?

#6 Updated by intrigeri almost 3 years ago

  • Assignee changed from intrigeri to anonym
  • QA Check set to Info Needed
  • Type of work changed from Test to Research

anonym wrote:

  • chroots for web browsers

Affected by the same issue as incremental upgrades.

I don't get this part: we'll only be stacking one browser chroot mount on top of the system one, so it should be fine, no?

#7 Updated by anonym almost 3 years ago

intrigeri wrote:

To avoid more stacking than 2 layers we could resort to merging the contents of
our squashfs when they're installed, but that would increase the RAM needed (for
temporary disk space to extract them) to perform an upgrade quite a bit.

... and CPU usage to recompress the merged squashfs!

Right. While we cannot do anything about the CPU requirements I think we could at least reduce the memory needed for the squashfs merging by doing it [...] I'll stop rambling but at least it's some food for thought.

Wow, crazy idea -- I like it! Given multiple read-only lower layers support WIP, this will hopefully be related to, but not blocking, when it comes to solving the problem at hand.

This idea would work even with the current limitations of overlayfs; the key difference is that with the idea's approach we'd never introduce more than one squashfs:es due to incremental upgrades.

Capture it in a low priority research ticket?

Done in #8534. Unfortunately I discovered a serious limitation of the squashfs format while writing the ticket which now makes me a lot less enthusiastic about this whole idea.

Anyway, there's some hope: some "work is in
progress":https:/msg00079.html
(by the current overlayfs maintainer) to support more than one read-only lower
layer, and it's said to be the "top feature request". [...]

Indeed, that is good news! So I guess we'll be fine with living with the Jessie kernel until that feature hits a kernel in Debian Sid.

I hope so. Not sure if it's worth testing their preliminary patches right now. On the one hand, it's quite a lot of effort, that has great chances to be useless. OTOH, it would be sad to wait months for the feature to land, and then discover only then that it's not fit for our use cases, for some reason. Thoughts?

The only thing that worries me is how stacking will work w.r.t. whiteouts. I would assume it behaves in a sane way that would make sense for us, but many disasters has started with such assumptions. But building a Debian kernel with these patches applied cannot be that hard, right?

  • chroots for web browsers

Affected by the same issue as incremental upgrades.

I don't get this part: we'll only be stacking one browser chroot mount on top of the system one, so it should be fine, no?

The problem isn't the mounting of the chroot inside the Live system's root filesystem, but the actual construction of the chroot, which at the bottom of the stack will have the filesystem.squashfs, and then any IUKs' .squashfs:s on top (and lastly a "COW" tmpfs), all to provide a "fresh" view of the Live system so no data trivially leaks from the running session into the browser's chroot. This was what we fixed for bugs #8152 and #8158 in Tails 1.2.1.

#8 Updated by intrigeri almost 3 years ago

  • Related to Feature #8604: Evaluate a grsec kernel from corsac's APT repository in Tails added

#9 Updated by intrigeri almost 3 years ago

  • Related to Feature #8600: Evaluate a grsec kernel from spender's build service in Tails added

#10 Updated by intrigeri almost 3 years ago

  • Subject changed from Evaluate overlayfs to Migrate from aufs to overlayfs

Actually, it's not as if we're going to maintain our own Linux kernel with aufs ourselves.

#11 Updated by intrigeri almost 3 years ago

  • Assignee deleted (anonym)
  • QA Check deleted (Info Needed)
  • Type of work changed from Research to Code

#12 Updated by intrigeri almost 3 years ago

  • Feature Branch set to feature/8415-overlayfs

#13 Updated by intrigeri over 2 years ago

  • Status changed from Confirmed to In Progress

#14 Updated by intrigeri over 2 years ago

  • Target version set to Sustainability_M1

(If we don't do that, we'll be stuck forever with Jessie's 3.16 kernel, which is not viable.)

#15 Updated by sajolida over 2 years ago

  • Target version changed from Sustainability_M1 to 2016

#16 Updated by intrigeri about 2 years ago

#17 Updated by intrigeri over 1 year ago

I've seen no progress upstream on #9045, and #10298 now has a branch ready that doesn't requires overlayfs, so I'm in favour of postponing this at least to 2017, or rather: let's drop this from our roadmap until something forces us to migrate, if this ever happens. anonym, what do you think?

#18 Updated by BitingBird over 1 year ago

  • Assignee set to anonym
  • QA Check set to Info Needed

#19 Updated by intrigeri over 1 year ago

  • Assignee deleted (anonym)
  • Target version deleted (2016)

Given we could do #10298 without migrating to overlayfs, we removed this from our roadmap at the summit this year.

#20 Updated by BitingBird about 1 year ago

  • QA Check deleted (Info Needed)

#21 Updated by intrigeri 12 months ago

  • Related to deleted (Feature #8604: Evaluate a grsec kernel from corsac's APT repository in Tails)

#22 Updated by intrigeri 12 months ago

#23 Updated by intrigeri 12 months ago

  • Related to deleted (Feature #8600: Evaluate a grsec kernel from spender's build service in Tails)

#24 Updated by intrigeri 12 months ago

  • Related to Feature #7649: Include a grsecurity-patched kernel added

#25 Updated by intrigeri 12 months ago

  • Related to deleted (Feature #7649: Include a grsecurity-patched kernel)

#26 Updated by intrigeri 12 months ago

  • Blocks Feature #7649: Include a grsecurity-patched kernel added

#27 Updated by intrigeri 11 months ago

  • Feature Branch changed from feature/8415-overlayfs to feature/8415-overlayfs-stretch

#28 Updated by intrigeri 11 months ago

I've updated the branch, run the test suite, and created subtasks for what's left to do.

#29 Updated by intrigeri 11 months ago

  • Description updated (diff)

#30 Updated by intrigeri 7 months ago

  • Feature Branch changed from feature/8415-overlayfs-stretch to wip/feature/8415-overlayfs-stretch

#31 Updated by intrigeri 6 months ago

  • Assignee set to intrigeri

The Debian kernel maintainers are waiting for us to do that before they can stop applying the aufs compat patch, so re-adding to my radar in the hope I think about it when we discuss our roadmap this year.

#32 Updated by intrigeri 6 months ago

  • Blocks deleted (Feature #7649: Include a grsecurity-patched kernel)

#33 Updated by BitingBird 4 months ago

  • Assignee changed from intrigeri to anonym
  • Target version set to 2018

#34 Updated by BitingBird 4 months ago

  • Description updated (diff)

Also available in: Atom PDF