Project

General

Profile

Feature #14976

Upgrade the Linux kernel to get KPTI

Added by intrigeri 10 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Hardware support
Target version:
Start date:
11/17/2017
Due date:
% Done:

100%

QA Check:
Pass
Feature Branch:
feature/14976-linux-4.14+force-all-tests, feature/14976-linux-4.14-devel+force-all-tests
Type of work:
Code
Blueprint:
Starter:
Affected tool:

Description

We've currently frozen it to 4.13.10-1. It's likely that security issues are fixed in sid until Tails 3.4.

If we upgrade to Linux 4.14 we may have to pin the AppArmor feature set to an older one (likely 4.13's) but beware of kernel bugs wrt. feature set pinning, e.g. https://bugs.debian.org/883703.

kernel-panic.png View (37.2 KB) intrigeri, 01/04/2018 12:24 PM


Related issues

Related to Tails - Feature #15000: Ensure we benefit from new security features in Linux 4.14 Resolved 11/25/2017
Related to Tails - Bug #15148: Upgrade AMD processor microcodes to mitigate the Spectre attack Resolved 01/06/2018
Blocked by Tails - Feature #14999: Upgrade to Stretch 9.3 Resolved 11/25/2017
Blocks Tails - Feature #13245: Core work 2018Q1: Foundations Team Resolved 06/29/2017

Associated revisions

Revision acfd98cd (diff)
Added by intrigeri 9 months ago

Install Linux 4.14.0-2 from sid (refs: #14976)

Revision 867c1da6 (diff)
Added by intrigeri 9 months ago

Install Linux 4.14.0-2 from sid (refs: #14976)

Revision a64f76f8 (diff)
Added by intrigeri 9 months ago

Fix Linux 4.14 headers installation with a fake linux-compiler-gcc-7-x86 package (refs: #14976).

Revision 8624b4e5 (diff)
Added by intrigeri 9 months ago

Install Linux 4.14.0-2 from stretch-backports (refs: #14976)

Revision 5b864570 (diff)
Added by intrigeri 9 months ago

Install Linux 4.14.0-2 from sid (refs: #14976)

Revision d56633a3 (diff)
Added by intrigeri 9 months ago

Fix Linux 4.14 headers installation with a fake linux-compiler-gcc-7-x86 package (refs: #14976).

Revision 36adab31 (diff)
Added by intrigeri 9 months ago

Update Linux to 4.14.0-3 (current version: 4.14.12-1) with KPTI.

refs: #14976

Revision 9cf42c44 (diff)
Added by intrigeri 9 months ago

Workaround Debian bug #886366 that breaks DKMS modules build (refs: #14976).

XXX: don't merge

Let's drop this commit once we get Linux 4.14.12-2 which fixes that bug for
real (likely today around 5pm UTC). I'm only committing this now in order to
have automated tests results with Linux 4.14.12 ASAP.

Revision 321f8b9d
Added by anonym 9 months ago

Merge remote-tracking branch 'origin/feature/14976-linux-4.14+force-all-tests' into stable

Fix-committed: #14976

Revision 4f8b50af (diff)
Added by anonym 9 months ago

Use the same version for the fake linux-compiler-gcc-7-x86 as the kernel.

The `apt-cache policy` approach doesn't work for tagged
snapshots (e.g. when building releases) since they include only the
exact packages we use.

Refs: #14976

History

#1 Updated by intrigeri 10 months ago

#2 Updated by intrigeri 10 months ago

  • Status changed from Confirmed to Duplicate
  • Assignee deleted (intrigeri)
  • Target version deleted (Tails_3.5)

#3 Updated by intrigeri 10 months ago

  • Subject changed from Consider upgrading Linux kernel in Tails 3.4 to Consider upgrading Linux kernel in Tails 3.5

#4 Updated by intrigeri 10 months ago

  • Related to Feature #15000: Ensure we benefit from new security features in Linux 4.14 added

#5 Updated by intrigeri 10 months ago

  • Category set to Hardware support
  • Status changed from Duplicate to Confirmed
  • Assignee set to intrigeri
  • Target version set to Tails_3.5

#6 Updated by intrigeri 10 months ago

#7 Updated by intrigeri 10 months ago

I'll try this once https://bugs.debian.org/880387 is fixed.

#8 Updated by intrigeri 9 months ago

  • Description updated (diff)

#9 Updated by intrigeri 9 months ago

  • Status changed from Confirmed to In Progress
  • % Done changed from 0 to 10

I've looked at the CVEs fixed since the kernel we have in Tails 3.3 and src:linux 4.14.2-1:

apt-get changelog linux-image-4.14.0-1-amd64 \
  | dpkg-parsechangelog -l - --since 4.13.10-1 \
  | grep --color=never --extended-regexp -o 'CVE-[0-9]+-[0-9]+' \
  | while read cve; do
      echo ${cve}
      curl --silent "http://cve.circl.lu/api/cve/${cve}" | \
      ruby -ryaml -rfacets -e \
          'h = YAML.load(STDIN.read);
           puts h ? h["summary"].word_wrap(72) : "RESERVED"'
      echo
    done

tl;dr: nothing too scary apparently, as long as the adversary hasn't physical access to the machine; other than that, it's worth noting that a great number of "unspecified other impact via a crafted USB device" were fixed, which should encourage us to spend time on hardening this with usbguard, usbauth or similar.

I'll look at it again later in the 3.5 cycle.

If the aufs-dkms package is not updated and we have to upgrade the kernel, worst case we can go back to building the aufs module ourselves.

#10 Updated by intrigeri 9 months ago

  • Feature Branch set to feature/14976-linux-4.14-devel

intrigeri wrote:

I'll try this once https://bugs.debian.org/880387 is fixed.

It's been fixed.

#11 Updated by intrigeri 9 months ago

I'll first evaluate how 4.14 would work on a branch based on devel. If I'm happy with the result and feel we should upgrade in Tails 3.5, I'll go through the more involved steps needed to get it in our stable branch; and if not, well, that'll be time saved for Tails 3.6 :)

#12 Updated by intrigeri 9 months ago

The branch FTBFS as the sid kernel headers depend on gcc-7 which is not in Stretch. Linux 4.14 was uploaded to stretch-backports but binary packages are not in the archive yet.

#13 Updated by intrigeri 9 months ago

#14 Updated by intrigeri 9 months ago

#15 Updated by intrigeri 9 months ago

intrigeri wrote:

Linux 4.14 was uploaded to stretch-backports but binary packages are not in the archive yet.

I've asked around and that's because the backports version check is broken: it looks for "is the uploaded version lower than the version in unstable" but ignores the fact that there can be multiple versions in unstable :/

#16 Updated by intrigeri 9 months ago

I finally have a branch that builds successfully, but I get a kernel panic on boot in the aufs module when mounting the rootfs, both on bare metal and in a VM; same in Troubleshooting mode. I wonder if the hack I had to do in order to build the aufs module with gcc-6 can cause this problem.

#17 Updated by intrigeri 9 months ago

intrigeri wrote:

I finally have a branch that builds successfully, but I get a kernel panic on boot in the aufs module when mounting the rootfs, both on bare metal and in a VM; same in Troubleshooting mode. I wonder if the hack I had to do in order to build the aufs module with gcc-6 can cause this problem.

I don't know if a more proper aufs.ko would fix that bug, but at least an ISO built that uses overlayfs + this branch merged in (wip/feature/8415-overlayfs-stretch) boots just fine.

#18 Updated by intrigeri 9 months ago

Hi anonym,

It may be that we have to upgrade our kernel really soon (to get KPTI) and I think our only realistic option is 4.14, so this "consider upgrading" job might quickly become "OMG we really need to do it now", which is why I've been working on it this week. I'm on it for now but I'm close to reach the limits of my skills, and I wouldn't mind some help. If you can put some time into it, let me know and let's coordinate :)

intrigeri wrote:

intrigeri wrote:

I finally have a branch that builds successfully, but I get a kernel panic on boot in the aufs module when mounting the rootfs, both on bare metal and in a VM; same in Troubleshooting mode. I wonder if the hack I had to do in order to build the aufs module with gcc-6 can cause this problem.

I don't know if a more proper aufs.ko would fix that bug,

Ouch, I see the same bug with Linux 4.14.7-1~bpo9+1 + aufs-dkms (4.14+20171218-1) built with linux-compiler-gcc-6-x86 (4.14.7-1~bpo9+1), see attached screenshot. Booting with aufs.debug=1 gives a full trace of what aufs is doing; debug=1 puts live-boot in debug mode; it seems that things go wrong during the aufs mount operation or very shortly after it's done.

Things I'd like to try and misc ideas:

  • drop the noxino option (in live-boot), who knows
  • 4.14 adds set_fs() balance checking (https://outflux.net/blog/archives/2017/11/14/security-things-in-linux-v4-14/) and aufs uses set_fs() quite a lot; might be related?
  • Try to set up an aufs unionmount on a regular Stretch system with this kernel + aufs module. This might make it easier to debug what's going on; and if I can't reproduce in that environment, it'll be interesting info.
  • Upgrade aufs-tools to the version found in testing/sid: perhaps the old userspace is not compatible with the new kernel module?
  • Dump all this aufs.debug info via a (virtual) serial console and report a bug.
  • Other ideas?

#19 Updated by intrigeri 9 months ago

intrigeri wrote:

  • Try to set up an aufs unionmount on a regular Stretch system with this kernel + aufs module. This might make it easier to debug what's going on; and if I can't reproduce in that environment, it'll be interesting info.

Done: mounting works just fine, but merely running ls on the mountpoint segfaults with the same call trace. Dropping the noatime,noxino options => same result. Upgrading to aufs-tools (1:4.9+20170918-1) => same result.

Same result on a sid system.

The good news is that I now have a debugging environment that doesn't require building an ISO to try stuff.

Testing procedure: modprobe aufs debug=1 && mkdir /tmp/{ro,rw,mount} && touch /tmp/ro/bla && mount -t aufs -o dirs=/tmp/rw=rw:/tmp/ro=rr+wh aufs /tmp/mount && ls /tmp/mount

#20 Updated by intrigeri 9 months ago

Reported https://bugs.debian.org/886329, trying to implement a workaround in live-boot.

#21 Updated by intrigeri 9 months ago

intrigeri wrote:

trying to implement a workaround in live-boot.

My workaround seems to do the job! :)))

#22 Updated by intrigeri 9 months ago

  • Feature Branch changed from feature/14976-linux-4.14-devel to feature/14976-linux-4.14

#23 Updated by intrigeri 9 months ago

  • Subject changed from Consider upgrading Linux kernel in Tails 3.5 to Upgrade the Linux kernel to get KPTI
  • Target version changed from Tails_3.5 to Tails_3.4

#24 Updated by intrigeri 9 months ago

  • Feature Branch changed from feature/14976-linux-4.14 to feature/14976-linux-4.14+force-all-tests

#25 Updated by intrigeri 9 months ago

  • Type of work changed from Research to Code

For now I've simply bumped the debian APT snapshots. I'll inspect the build manifest diff to see if it seems reasonable; keep in mind that a kernel upgrade requires us to go through our entire QA anyway; if that's not reasonable for some reason, we'll have to import the new kernel in our custom APT repo. And regardless of what we decide on this front, we'll have to do it again once a kernel with KPTI is available in sid.

#26 Updated by intrigeri 9 months ago

intrigeri wrote:

For now I've simply bumped the debian APT snapshots. I'll inspect the build manifest diff to see if it seems reasonable

It does look reasonable to me.

#27 Updated by intrigeri 9 months ago

For test results, see:

We'll need to run more tests once the branch ships a kernel that has KPTI but I find it useful to first evaluate the impact of 4.14 without KPTI.

#28 Updated by intrigeri 9 months ago

I've analyzed a bunch of test suite runs. tl;dr: nothing particularly scary. Most failures seem to be caused by an overloaded CI infra.

Analyzing builds 1 to 6; note that lizard was extremely loaded during these tests (all ISO builders and testers busy). 11-15 failures per run, which is similar to what I see on https://jenkins.tails.boum.org/view/Tails_ISO/job/test_Tails_ISO_test-anonym-force-all-tests/. So at least Linux 4.14 does not seem to break tons of stuff. Unless specifically noted each problem happened once:

  • 'When I install "cowsay" using Synaptic' often fails but that's common on other branches too (#12586).
  • OpenPGP applet: text was not selected in gedit so clicking "copy" (that was grayed out but well) didn't do anything. I'll blame test suite robustness.
  • #12131 (many times)
  • #15006 (many times)
  • Thunderbird POP3 fails, likely transient network failure
  • Failure to load our homepage or labs.riseup.net in Tor Browser (a few times).
  • MAC spoofing failure notification never shown (3 times).
  • #15031 (twice)
  • Many times the Unsafe Browser does not start at all. I suspected it was caused by the same aufs bug I have workaround'ed in live-boot, but I see no such error in the Journal. Bumped the timeout because 10s seemed short.

The only (partial) test suite that's been run so far passed.

The only (partial) test suite that's been run so far has too many failures so I won't analyze it: tons of transient network issues and weird behaviour. Looks like something went very wrong during this test run. If I don't see other similar cases I'll blame lizard being overloaded.

  • I'm also running tests locally.

Seen a full test suite pass. Other than that, the failures in other runs are explained by (each once unless specifically noted):

  • #11188
  • #15006 (quite a few times)
  • dogtail clicking the "Start Tor Browser" button in "I start the Tor Browser in offline mode" is not effective.
  • "The page was not saved to /home/amnesia/Tor Browser/index.html" in "The Tor Browser directory is usable"; I'm sure I've seen that elsewhere already but cannot find the ticket. We're waiting 20s already so I don't think it's a matter of bumping the timeout. Nothing weird in the Journal.
  • One Thunderbird test case bug.
  • "Gobby should only connect to [9050] but was seen connecting to 127.0.0.1:53"; in the Journal I see gobby-0.5[10438]: Failure during SRV record lookup: Host name lookup failure. Will go on with normal A/AAAA lookup, which is not present in any *.journal file on Jenkins. It looks like a temporary network problem triggering error handling code in Gobby. We run it with torsocks and AllowOutboundLocalhost 2 so that's not a proxy bypass. So I think this test case should ideally allow connecting to 127.0.0.1:53 without raising eyebrows. anonym, if you agree please consider applying this (untested) patch:
--- a/features/step_definitions/tor.rb
+++ b/features/step_definitions/tor.rb
@@ -295,6 +295,9 @@ Then /^I see that (.+) is properly stream isolated$/ do |application|
info = stream_isolation_info(application)
expected_ports = [info[:socksport]]
expected_ports << 9051 if info[:controller]
+  # Apps run with torsocks can legitimately fall back to using the local
+  # DNS resolver
+  expected_ports << 53
assert_not_nil(@process_monitor_log)
log_lines = $vm.file_content(@process_monitor_log).split("\n")
assert(log_lines.size > 0,

#29 Updated by intrigeri 9 months ago

  • Feature Branch changed from feature/14976-linux-4.14+force-all-tests to feature/14976-linux-4.14+force-all-tests, feature/14976-linux-4.14-devel+force-all-tests

I've noticed one regression on this branch: the splash screen is initially displayed, but it disappears as soon as the aufs bug is triggered and the kernel stack trace is displayed. I doubt we can do anything about it until the aufs bug is fixed. This should be documented in the 3.4 known issues.

#30 Updated by intrigeri 9 months ago

intrigeri wrote:

I've noticed one regression on this branch: the splash screen is initially displayed, but it disappears as soon as the aufs bug is triggered and the kernel stack trace is displayed. I doubt we can do anything about it until the aufs bug is fixed. This should be documented in the 3.4 known issues.

Draft known issue text:

The graphical splash screen usually displayed during Tails startup quickly disappears and is replaced by garbled text messages. As long as Tails appears to work fine for you otherwise, please ignore these messages, including the alarming message about a "kernel BUG" (which was [[!debbug 886329 desc="reported to Debian"]]): they do not affect the safety of your Tails system.

#31 Updated by intrigeri 9 months ago

  • Related to Bug #15148: Upgrade AMD processor microcodes to mitigate the Spectre attack added

#32 Updated by intrigeri 9 months ago

intrigeri wrote:

If we upgrade to Linux 4.14 we may have to pin the AppArmor feature set to an older one (likely 4.13's) but beware of kernel bugs wrt. feature set pinning, e.g. https://bugs.debian.org/883703.

I'm now testing this. It may be a hard decision to make:

  • Without pinning, any AppArmor profile that lacks rules for the new mediation features brought in 4.14 may break the confined app;
  • With pinning to the Linux 4.9 feature set, that won't be a problem except due to that kernel bug, all mount operations for confined apps will be blocked (even if they are explicitly allowed in the policy).

On my own sid system, the only bits of policy that have to allow mount operations are for libvirt. So I expect that broken mount operations for confined apps in Tails won't be a problem in practice which is why I'm leaning towards pinning. (Granted, our test suite did not identify any breakage on Linux 4.14 without pinning; but we don't exercise our confined apps this much so that doesn't mean much.)

#33 Updated by intrigeri 9 months ago

  • % Done changed from 10 to 20

First full test suite result with KPTI but without APT feature set pinning (on my local Jenkins) passes except #15006 broke one scenario, woohoo!. It did take 6% longer than the average of my 3 previous runs, might be the impact of KPTI, might be part of the usual deviation. Anyway. Next step:

  1. wait for results with APT feature set pinning
  2. once our APT snapshots have the fix for https://bugs.debian.org/886366, revert the corresponding workaround, trigger builds and wait for test results
  3. hopefully everything goes well and I can clean up the Git history and send this to anonym's plate for QA; otherwise, rince & repeat.

#34 Updated by intrigeri 9 months ago

intrigeri wrote:

First full test suite result with KPTI but without APT feature set pinning (on my local Jenkins) passes except #15006 broke one scenario, woohoo!. It did take 6% longer than the average of my 3 previous runs, might be the impact of KPTI, might be part of the usual deviation. Anyway. Next step:

  1. wait for results with APT feature set pinning
  2. once our APT snapshots have the fix for https://bugs.debian.org/886366, revert the corresponding workaround, trigger builds

Done.

and wait for test results

Almost there: https://jenkins.tails.boum.org/job/test_Tails_ISO_feature-14976-linux-4.14-force-all-tests/12/ and following. Looks OK so far.

Older tests look good except there's a somewhat alarming amount of connection failures from Tor Browser to our website. Not sure if it's related. Sadly there's no Journal saved for these failures (reported on the corresponding ticket).

  1. hopefully everything goes well and I can clean up the Git history and send this to anonym's plate for QA; otherwise, rince & repeat.

I'll skip the "clean up the Git history" part. It's not that ugly.

#35 Updated by intrigeri 9 months ago

Older tests look good except there's a somewhat alarming amount of connection failures from Tor Browser to our website. Not sure if it's related. Sadly there's no Journal saved for these failures (reported on the corresponding ticket).

https://jenkins.tails.boum.org/view/Tails_ISO/job/test_Tails_ISO_test-anonym-force-all-tests/ shows many similar failures so that's not a regression brought by this branch.

#36 Updated by intrigeri 9 months ago

  • Assignee changed from intrigeri to anonym
  • % Done changed from 20 to 50
  • QA Check set to Ready for QA

#37 Updated by anonym 9 months ago

intrigeri wrote:

intrigeri wrote:

For now I've simply bumped the debian APT snapshots. I'll inspect the build manifest diff to see if it seems reasonable

It does look reasonable to me.

Agreed, beyond the expected kernel related bumps I get:

  • virtualbox-guest-{dkms,utils,x11}: 5.2.2-dfsg-3 → 5.2.4-dfsg-2
  • torbrowser-launcher: 0.2.8-5 → 0.2.8-6

#38 Updated by anonym 9 months ago

  • Status changed from In Progress to Fix committed
  • Assignee deleted (anonym)
  • % Done changed from 50 to 100
  • QA Check changed from Ready for QA to Pass

Code looks good. Also, for automated tests, runs #12 + #13 + #14 together sees all scenarios pass!

Merged!

#39 Updated by anonym 9 months ago

I also bumped the 2018010603 APT snapshot's expiry!

#40 Updated by intrigeri 9 months ago

I've dared merging feature/14976-linux-4.14-devel+force-all-tests into devel myself, presumably you simply missed it in the "Feature Branch" field. Otherwise ISO images built from devel won't have what we want.

#42 Updated by anonym 9 months ago

  • Status changed from Fix committed to In Progress

#43 Updated by intrigeri 9 months ago

  • Status changed from In Progress to Fix committed

#44 Updated by anonym 9 months ago

  • Status changed from Fix committed to Resolved

Also available in: Atom PDF