Feature #14976

Upgrade the Linux kernel to get KPTI

Added by intrigeri 2017-11-17 15:12:17 . Updated 2018-01-09 20:55:13 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Hardware support
Target version:
Start date:
2017-11-17
Due date:
% Done:

100%

Feature Branch:
feature/14976-linux-4.14+force-all-tests, feature/14976-linux-4.14-devel+force-all-tests
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

We’ve currently frozen it to 4.13.10-1. It’s likely that security issues are fixed in sid until Tails 3.4.

If we upgrade to Linux 4.14 we may have to pin the AppArmor feature set to an older one (likely 4.13’s) but beware of kernel bugs wrt. feature set pinning, e.g. https://bugs.debian.org/883703.


Files


Subtasks


Related issues

Related to Tails - Feature #15000: Ensure we benefit from new security features in Linux 4.14 Resolved 2017-11-25
Related to Tails - Bug #15148: Upgrade AMD processor microcodes to mitigate the Spectre attack Resolved 2018-01-06
Blocked by Tails - Feature #14999: Upgrade to Stretch 9.3 Resolved 2017-11-25
Blocks Tails - Feature #13245: Core work 2018Q1: Foundations Team Resolved 2017-06-29

History

#1 Updated by intrigeri 2017-11-17 15:12:27

#2 Updated by intrigeri 2017-12-09 11:43:21

  • Status changed from Confirmed to Duplicate
  • Assignee deleted (intrigeri)
  • Target version deleted (Tails_3.5)

Feature #14999

#3 Updated by intrigeri 2017-12-10 15:06:13

  • Subject changed from Consider upgrading Linux kernel in Tails 3.4 to Consider upgrading Linux kernel in Tails 3.5

#4 Updated by intrigeri 2017-12-10 15:06:23

  • related to Feature #15000: Ensure we benefit from new security features in Linux 4.14 added

#5 Updated by intrigeri 2017-12-10 15:06:43

  • Category set to Hardware support
  • Status changed from Duplicate to Confirmed
  • Assignee set to intrigeri
  • Target version set to Tails_3.5

#6 Updated by intrigeri 2017-12-10 15:06:58

#7 Updated by intrigeri 2017-12-10 15:07:37

I’ll try this once https://bugs.debian.org/880387 is fixed.

#8 Updated by intrigeri 2017-12-16 10:48:27

  • Description updated

#9 Updated by intrigeri 2017-12-23 09:10:53

  • Status changed from Confirmed to In Progress
  • % Done changed from 0 to 10

I’ve looked at the CVEs fixed since the kernel we have in Tails 3.3 and src:linux 4.14.2-1:

apt-get changelog linux-image-4.14.0-1-amd64 \
  | dpkg-parsechangelog -l - --since 4.13.10-1 \
  | grep --color=never --extended-regexp -o 'CVE-[0-9]+-[0-9]+' \
  | while read cve; do
      echo ${cve}
      curl --silent "http://cve.circl.lu/api/cve/${cve}" | \
      ruby -ryaml -rfacets -e \
          'h = YAML.load(STDIN.read);
           puts h ? h["summary"].word_wrap(72) : "RESERVED"'
      echo
    done

tl;dr: nothing too scary apparently, as long as the adversary hasn’t physical access to the machine; other than that, it’s worth noting that a great number of “unspecified other impact via a crafted USB device” were fixed, which should encourage us to spend time on hardening this with usbguard, usbauth or similar.

I’ll look at it again later in the 3.5 cycle.

If the aufs-dkms package is not updated and we have to upgrade the kernel, worst case we can go back to building the aufs module ourselves.

#10 Updated by intrigeri 2017-12-26 15:35:13

  • Feature Branch set to feature/14976-linux-4.14-devel

intrigeri wrote:
> I’ll try this once https://bugs.debian.org/880387 is fixed.

It’s been fixed.

#11 Updated by intrigeri 2017-12-26 15:38:08

I’ll first evaluate how 4.14 would work on a branch based on devel. If I’m happy with the result and feel we should upgrade in Tails 3.5, I’ll go through the more involved steps needed to get it in our stable branch; and if not, well, that’ll be time saved for Tails 3.6 :)

#12 Updated by intrigeri 2017-12-31 14:04:25

The branch FTBFS as the sid kernel headers depend on gcc-7 which is not in Stretch. Linux 4.14 was uploaded to stretch-backports but binary packages are not in the archive yet.

#13 Updated by intrigeri 2018-01-01 16:38:55

  • blocked by deleted (Feature #13244: Core work 2017Q4: Foundations Team)

#14 Updated by intrigeri 2018-01-01 16:38:58

#15 Updated by intrigeri 2018-01-02 15:29:45

intrigeri wrote:
> Linux 4.14 was uploaded to stretch-backports but binary packages are not in the archive yet.

I’ve asked around and that’s because the backports version check is broken: it looks for “is the uploaded version lower than the version in unstable” but ignores the fact that there can be multiple versions in unstable :/

#16 Updated by intrigeri 2018-01-03 13:47:18

I finally have a branch that builds successfully, but I get a kernel panic on boot in the aufs module when mounting the rootfs, both on bare metal and in a VM; same in Troubleshooting mode. I wonder if the hack I had to do in order to build the aufs module with gcc-6 can cause this problem.

#17 Updated by intrigeri 2018-01-03 15:39:03

intrigeri wrote:
> I finally have a branch that builds successfully, but I get a kernel panic on boot in the aufs module when mounting the rootfs, both on bare metal and in a VM; same in Troubleshooting mode. I wonder if the hack I had to do in order to build the aufs module with gcc-6 can cause this problem.

I don’t know if a more proper aufs.ko would fix that bug, but at least an ISO built that uses overlayfs + this branch merged in (wip/feature/8415-overlayfs-stretch) boots just fine.

#18 Updated by intrigeri 2018-01-04 13:38:43

Hi anonym,

It may be that we have to upgrade our kernel really soon (to get KPTI) and I think our only realistic option is 4.14, so this “consider upgrading” job might quickly become “OMG we really need to do it now”, which is why I’ve been working on it this week. I’m on it for now but I’m close to reach the limits of my skills, and I wouldn’t mind some help. If you can put some time into it, let me know and let’s coordinate :)

intrigeri wrote:
> intrigeri wrote:
> > I finally have a branch that builds successfully, but I get a kernel panic on boot in the aufs module when mounting the rootfs, both on bare metal and in a VM; same in Troubleshooting mode. I wonder if the hack I had to do in order to build the aufs module with gcc-6 can cause this problem.
>
> I don’t know if a more proper aufs.ko would fix that bug,

Ouch, I see the same bug with Linux 4.14.7-1~bpo9+1 + aufs-dkms (4.14+20171218-1) built with linux-compiler-gcc-6-x86 (4.14.7-1~bpo9+1), see attached screenshot. Booting with aufs.debug=1 gives a full trace of what aufs is doing; debug=1 puts live-boot in debug mode; it seems that things go wrong during the aufs mount operation or very shortly after it’s done.

Things I’d like to try and misc ideas:

  • drop the noxino option (in live-boot), who knows
  • 4.14 adds set_fs() balance checking (https://outflux.net/blog/archives/2017/11/14/security-things-in-linux-v4-14/) and aufs uses set_fs() quite a lot; might be related?
  • Try to set up an aufs unionmount on a regular Stretch system with this kernel + aufs module. This might make it easier to debug what’s going on; and if I can’t reproduce in that environment, it’ll be interesting info.
  • Upgrade aufs-tools to the version found in testing/sid: perhaps the old userspace is not compatible with the new kernel module?
  • Dump all this aufs.debug info via a (virtual) serial console and report a bug.
  • Other ideas?

#19 Updated by intrigeri 2018-01-04 13:47:53

intrigeri wrote:
> * Try to set up an aufs unionmount on a regular Stretch system with this kernel + aufs module. This might make it easier to debug what’s going on; and if I can’t reproduce in that environment, it’ll be interesting info.

Done: mounting works just fine, but merely running ls on the mountpoint segfaults with the same call trace. Dropping the noatime,noxino options => same result. Upgrading to aufs-tools (1:4.9+20170918-1) => same result.

Same result on a sid system.

The good news is that I now have a debugging environment that doesn’t require building an ISO to try stuff.

Testing procedure: modprobe aufs debug=1 && mkdir /tmp/{ro,rw,mount} && touch /tmp/ro/bla && mount -t aufs -o dirs=/tmp/rw=rw:/tmp/ro=rr+wh aufs /tmp/mount && ls /tmp/mount

#20 Updated by intrigeri 2018-01-04 14:47:52

Reported https://bugs.debian.org/886329, trying to implement a workaround in live-boot.

#21 Updated by intrigeri 2018-01-04 15:40:36

intrigeri wrote:
> trying to implement a workaround in live-boot.

My workaround seems to do the job! :)))

#22 Updated by intrigeri 2018-01-04 18:21:34

  • Feature Branch changed from feature/14976-linux-4.14-devel to feature/14976-linux-4.14

#23 Updated by intrigeri 2018-01-04 18:23:06

  • Subject changed from Consider upgrading Linux kernel in Tails 3.5 to Upgrade the Linux kernel to get KPTI
  • Target version changed from Tails_3.5 to Tails_3.4

#24 Updated by intrigeri 2018-01-04 18:28:49

  • Feature Branch changed from feature/14976-linux-4.14 to feature/14976-linux-4.14+force-all-tests

#25 Updated by intrigeri 2018-01-04 18:37:36

  • Type of work changed from Research to Code

For now I’ve simply bumped the debian APT snapshots. I’ll inspect the build manifest diff to see if it seems reasonable; keep in mind that a kernel upgrade requires us to go through our entire QA anyway; if that’s not reasonable for some reason, we’ll have to import the new kernel in our custom APT repo. And regardless of what we decide on this front, we’ll have to do it again once a kernel with KPTI is available in sid.

#26 Updated by intrigeri 2018-01-04 19:06:44

intrigeri wrote:
> For now I’ve simply bumped the debian APT snapshots. I’ll inspect the build manifest diff to see if it seems reasonable

It does look reasonable to me.

#27 Updated by intrigeri 2018-01-04 19:09:24

For test results, see:

We’ll need to run more tests once the branch ships a kernel that has KPTI but I find it useful to first evaluate the impact of 4.14 without KPTI.

#28 Updated by intrigeri 2018-01-05 08:05:01

I’ve analyzed a bunch of test suite runs. tl;dr: nothing particularly scary. Most failures seem to be caused by an overloaded CI infra.

> * the correct branch, based on stable: https://jenkins.tails.boum.org/view/Tails_ISO/job/test_Tails_ISO_feature-14976-linux-4.14-force-all-tests/

Analyzing builds 1 to 6; note that lizard was extremely loaded during these tests (all ISO builders and testers busy). 11-15 failures per run, which is similar to what I see on https://jenkins.tails.boum.org/view/Tails_ISO/job/test_Tails_ISO_test-anonym-force-all-tests/. So at least Linux 4.14 does not seem to break tons of stuff. Unless specifically noted each problem happened once:

  • ‘When I install “cowsay” using Synaptic’ often fails but that’s common on other branches too (Bug #12586).
  • OpenPGP applet: text was not selected in gedit so clicking “copy” (that was grayed out but well) didn’t do anything. I’ll blame test suite robustness.
  • Bug #12131 (many times)
  • Bug #15006 (many times)
  • Thunderbird POP3 fails, likely transient network failure
  • Failure to load our homepage or labs.riseup.net in Tor Browser (a few times).
  • MAC spoofing failure notification never shown (3 times).
  • Bug #15031 (twice)
  • Many times the Unsafe Browser does not start at all. I suspected it was caused by the same aufs bug I have workaround’ed in live-boot, but I see no such error in the Journal. Bumped the timeout because 10s seemed short.

> * FWIW basically the same branch, but based on devel:
> https://jenkins.tails.boum.org/job/test_Tails_ISO_feature-14976-linux-4.14-devel/

The only (partial) test suite that’s been run so far passed.

> https://jenkins.tails.boum.org/job/test_Tails_ISO_feature-14976-linux-4.14-devel-force-all-tests/

The only (partial) test suite that’s been run so far has too many failures so I won’t analyze it: tons of transient network issues and weird behaviour. Looks like something went very wrong during this test run. If I don’t see other similar cases I’ll blame lizard being overloaded.

> * I’m also running tests locally.

Seen a full test suite pass. Other than that, the failures in other runs are explained by (each once unless specifically noted):

  • Bug #11188
  • Bug #15006 (quite a few times)
  • dogtail clicking the “Start Tor Browser” button in “I start the Tor Browser in offline mode” is not effective.
  • “The page was not saved to /home/amnesia/Tor Browser/index.html” in “The Tor Browser directory is usable”; I’m sure I’ve seen that elsewhere already but cannot find the ticket. We’re waiting 20s already so I don’t think it’s a matter of bumping the timeout. Nothing weird in the Journal.
  • One Thunderbird test case bug.
  • “Gobby should only connect to [9050] but was seen connecting to 127.0.0.1:53”; in the Journal I see gobby-0.5[10438]: Failure during SRV record lookup: Host name lookup failure. Will go on with normal A/AAAA lookup, which is not present in any *.journal file on Jenkins. It looks like a temporary network problem triggering error handling code in Gobby. We run it with torsocks and AllowOutboundLocalhost 2 so that’s not a proxy bypass. So I think this test case should ideally allow connecting to 127.0.0.1:53 without raising eyebrows. anonym, if you agree please consider applying this (untested) patch:
--- a/features/step_definitions/tor.rb
+++ b/features/step_definitions/tor.rb
@@ -295,6 +295,9 @@ Then /^I see that (.+) is properly stream isolated$/ do |application|
info = stream_isolation_info(application)
expected_ports = [info[:socksport]]
expected_ports << 9051 if info[:controller]
+  # Apps run with torsocks can legitimately fall back to using the local
+  # DNS resolver
+  expected_ports << 53
assert_not_nil(@process_monitor_log)
log_lines = $vm.file_content(@process_monitor_log).split("\n")
assert(log_lines.size > 0,

#29 Updated by intrigeri 2018-01-05 08:42:48

  • Feature Branch changed from feature/14976-linux-4.14+force-all-tests to feature/14976-linux-4.14+force-all-tests, feature/14976-linux-4.14-devel+force-all-tests

I’ve noticed one regression on this branch: the splash screen is initially displayed, but it disappears as soon as the aufs bug is triggered and the kernel stack trace is displayed. I doubt we can do anything about it until the aufs bug is fixed. This should be documented in the 3.4 known issues.

#30 Updated by intrigeri 2018-01-06 07:36:52

intrigeri wrote:
> I’ve noticed one regression on this branch: the splash screen is initially displayed, but it disappears as soon as the aufs bug is triggered and the kernel stack trace is displayed. I doubt we can do anything about it until the aufs bug is fixed. This should be documented in the 3.4 known issues.

Draft known issue text:

The graphical splash screen usually displayed during Tails startup quickly disappears and is replaced by garbled text messages. As long as Tails appears to work fine for you otherwise, please ignore these messages, including the alarming message about a "kernel BUG" (which was [[!debbug 886329 desc="reported to Debian"]]): they do not affect the safety of your Tails system.

#31 Updated by intrigeri 2018-01-06 08:03:16

  • related to Bug #15148: Upgrade AMD processor microcodes to mitigate the Spectre attack added

#32 Updated by intrigeri 2018-01-06 15:59:54

intrigeri wrote:
> If we upgrade to Linux 4.14 we may have to pin the AppArmor feature set to an older one (likely 4.13’s) but beware of kernel bugs wrt. feature set pinning, e.g. https://bugs.debian.org/883703.

I’m now testing this. It may be a hard decision to make:

  • Without pinning, any AppArmor profile that lacks rules for the new mediation features brought in 4.14 may break the confined app;
  • With pinning to the Linux 4.9 feature set, that won’t be a problem except due to that kernel bug, all mount operations for confined apps will be blocked (even if they are explicitly allowed in the policy).

On my own sid system, the only bits of policy that have to allow mount operations are for libvirt. So I expect that broken mount operations for confined apps in Tails won’t be a problem in practice which is why I’m leaning towards pinning. (Granted, our test suite did not identify any breakage on Linux 4.14 without pinning; but we don’t exercise our confined apps this much so that doesn’t mean much.)

#33 Updated by intrigeri 2018-01-06 16:34:09

  • % Done changed from 10 to 20

First full test suite result with KPTI but without APT feature set pinning (on my local Jenkins) passes except Bug #15006 broke one scenario, woohoo!. It did take 6% longer than the average of my 3 previous runs, might be the impact of KPTI, might be part of the usual deviation. Anyway. Next step:

  1. wait for results with APT feature set pinning
  2. once our APT snapshots have the fix for https://bugs.debian.org/886366, revert the corresponding workaround, trigger builds and wait for test results
  3. hopefully everything goes well and I can clean up the Git history and send this to anonym’s plate for QA; otherwise, rince & repeat.

#34 Updated by intrigeri 2018-01-06 22:25:30

intrigeri wrote:
> First full test suite result with KPTI but without APT feature set pinning (on my local Jenkins) passes except Bug #15006 broke one scenario, woohoo!. It did take 6% longer than the average of my 3 previous runs, might be the impact of KPTI, might be part of the usual deviation. Anyway. Next step:
>
> # wait for results with APT feature set pinning
> # once our APT snapshots have the fix for https://bugs.debian.org/886366, revert the corresponding workaround, trigger builds

Done.

> and wait for test results

Almost there: https://jenkins.tails.boum.org/job/test_Tails_ISO_feature-14976-linux-4.14-force-all-tests/12/ and following. Looks OK so far.

Older tests look good except there’s a somewhat alarming amount of connection failures from Tor Browser to our website. Not sure if it’s related. Sadly there’s no Journal saved for these failures (reported on the corresponding ticket).

> # hopefully everything goes well and I can clean up the Git history and send this to anonym’s plate for QA; otherwise, rince & repeat.

I’ll skip the “clean up the Git history” part. It’s not that ugly.

#35 Updated by intrigeri 2018-01-06 22:57:43

> Older tests look good except there’s a somewhat alarming amount of connection failures from Tor Browser to our website. Not sure if it’s related. Sadly there’s no Journal saved for these failures (reported on the corresponding ticket).

https://jenkins.tails.boum.org/view/Tails_ISO/job/test_Tails_ISO_test-anonym-force-all-tests/ shows many similar failures so that’s not a regression brought by this branch.

#36 Updated by intrigeri 2018-01-07 00:57:23

  • Assignee changed from intrigeri to anonym
  • % Done changed from 20 to 50
  • QA Check set to Ready for QA

#37 Updated by anonym 2018-01-08 11:56:22

intrigeri wrote:
> intrigeri wrote:
> > For now I’ve simply bumped the debian APT snapshots. I’ll inspect the build manifest diff to see if it seems reasonable
>
> It does look reasonable to me.

Agreed, beyond the expected kernel related bumps I get:

  • virtualbox-guest-{dkms,utils,x11}: 5.2.2-dfsg-3 → 5.2.4-dfsg-2
  • torbrowser-launcher: 0.2.8-5 → 0.2.8-6

#38 Updated by anonym 2018-01-08 12:55:08

  • Status changed from In Progress to Fix committed
  • Assignee deleted (anonym)
  • % Done changed from 50 to 100
  • QA Check changed from Ready for QA to Pass

Code looks good. Also, for automated tests, runs #12 + #13 + #14 together sees all scenarios pass!

Merged!

#39 Updated by anonym 2018-01-08 12:59:11

I also bumped the 2018010603 APT snapshot’s expiry!

#40 Updated by intrigeri 2018-01-08 18:35:56

I’ve dared merging feature/14976-linux-4.14-devel+force-all-tests into devel myself, presumably you simply missed it in the “Feature Branch” field. Otherwise ISO images built from devel won’t have what we want.

#42 Updated by anonym 2018-01-08 18:37:28

  • Status changed from Fix committed to In Progress

Applied in changeset commit:4f8b50afb10a1ce1faf7645971bc020d2eb5d7dd.

#43 Updated by intrigeri 2018-01-08 18:53:41

  • Status changed from In Progress to Fix committed

#44 Updated by anonym 2018-01-09 20:55:13

  • Status changed from Fix committed to Resolved