Bug #16100

Fix systemd CVE-2018-15687

Added by intrigeri 2018-11-05 14:12:58 . Updated 2018-12-07 12:59:49 .

Status:
Rejected
Priority:
Elevated
Assignee:
CyrilBrulebois
Category:
Target version:
Start date:
2018-11-05
Due date:
% Done:

0%

Feature Branch:
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Deliverable for:


Subtasks


Related issues

Related to Tails - Bug #16097: Memory erasure tests regression on the devel branch Resolved 2018-11-05
Related to Tails - Bug #16184: Intermittent test failures on the devel branch: fails to login "Failed to fully start up daemon: Permission denied" Resolved 2018-12-03
Blocks Tails - Feature #15506: Core work 2018Q4: Foundations Team Resolved 2018-04-08

History

#1 Updated by intrigeri 2018-11-05 14:13:17

#2 Updated by intrigeri 2018-11-05 14:14:52

3.11 is a bugfix release so we’ll need to use our “freeze exception” mechanism => one needs upload access to our custom APT repo in order to work on this. Problem is, this is probably going to introduce Bug #16097.

#3 Updated by intrigeri 2018-11-05 14:14:58

  • related to Bug #16097: Memory erasure tests regression on the devel branch added

#4 Updated by intrigeri 2018-11-05 14:16:23

> Problem is, this is probably going to introduce Bug #16097.

… unless, of course, we solve this by backporting the security fix into something like 237-3~bpo9+1.0tails1, which is an acceptable fallback option for 3.11 if we can’t fix Bug #16097 in time.

#5 Updated by CyrilBrulebois 2018-11-06 15:19:27

  • Assignee set to CyrilBrulebois

#6 Updated by intrigeri 2018-12-03 13:41:29

  • related to Bug #16184: Intermittent test failures on the devel branch: fails to login "Failed to fully start up daemon: Permission denied" added

#7 Updated by intrigeri 2018-12-03 13:42:59

Another regression brought by this upgrade: Bug #16184. Given how far we are in the 3.11 dev cycle, and we’re still discovering regressions incidentally, at this point it’s clear to me that we should not ship v239 in 3.11 but instead backport the security fix.

#8 Updated by CyrilBrulebois 2018-12-05 10:38:00

I’m afraid fixing the bug by cherry-picking won’t be reasonably possible: backporting the packaging fix (grabbing a couple of patches from upstream and quiltifying them) is rather easy, but the patched code depends a lot of new features in the codebase. I’ve tried chasing down all the needed additions but there’s a bunch of commits, and I fear we could run into side effects due to so much cherry-picking… If switching to a brand new backported version isn’t feasible due to regressions, is letting the bug unfixed a reasonable outcome? :(

#9 Updated by CyrilBrulebois 2018-12-05 11:58:34

The details follow.

Once this commit is cherry-picked (systemd debian packaging):

commit e3696255cf57ec98177b3144f6a3a0b4a8aac0a0
Author: Michael Biebl <biebl@debian.org>
Date:   Sun Oct 28 12:56:21 2018 +0100

    chown-recursive: Rework the recursive logic to use O_PATH

    Fixes a race condition in chown_one() which allows an attacker to cause
    systemd to set arbitrary permissions on arbitrary files.

    CVE-2018-15687
    LP: #1796692
    Closes: #912007

Then one needs to refresh the patch a bit to get dpkg-source to apply it cleanly.

Then, building fails with:

../src/core/chown-recursive.c: In function ‘chown_recursive_internal’:
../src/core/chown-recursive.c:104:37: error: implicit declaration of function ‘fd_reopen’ [-Werror=implicit-function-declaration]
                         subdir_fd = fd_reopen(path_fd, O_RDONLY|O_CLOEXEC|O_NOATIME);
                                     ^~~~~~~~~
../src/core/chown-recursive.c:104:25: warning: nested extern declaration of ‘fd_reopen’ [-Wnested-externs]
                         subdir_fd = fd_reopen(path_fd, O_RDONLY|O_CLOEXEC|O_NOATIME);
                         ^~~~~~~~~
../src/core/chown-recursive.c: In function ‘path_chown_recursive’:
../src/core/chown-recursive.c:150:41: error: implicit declaration of function ‘TAKE_FD’ [-Werror=implicit-function-declaration]
         return chown_recursive_internal(TAKE_FD(fd), &st, uid, gid); /* we donate the fd to the call, regardless if it succeeded or failed */
                                         ^~~~~~~
../src/core/chown-recursive.c:150:9: warning: nested extern declaration of ‘TAKE_FD’ [-Wnested-externs]
         return chown_recursive_internal(TAKE_FD(fd), &st, uid, gid); /* we donate the fd to the call, regardless if it succeeded or failed */
         ^~~~~~
cc1: some warnings being treated as errors

This means one needs to get at least fd_reopen and TAKE_FD defined. Starting from now, all mentioned commits are upstream commits (systemd.git).

The TAKE_FD addition is part of a merge (in 97a33b126c845327a3a19d6e66f05684823868fb), which touches all the codebase since a coccinelle patch is shipped and applied at the same time. The actual diff for the header can be copied/pasted on its own, no huge issue here.

diff --git a/src/basic/fd-util.h b/src/basic/fd-util.h
index e8d915bfa6..22bf02e933 100644
--- a/src/basic/fd-util.h
+++ b/src/basic/fd-util.h
@@ -106,3 +106,11 @@ int rearrange_stdio(int original_input_fd, int original_output_fd, int original_
 static inline int make_null_stdio(void) {
         return rearrange_stdio(-1, -1, -1);
 }
+
+/* Like TAKE_PTR() but for file descriptors, resetting them to -1 */
+#define TAKE_FD(fd)                             \
+        ({                                      \
+                int _fd_ = (fd);                \
+                (fd) = -1;                      \
+                _fd_;                           \
+        })

The fd_reopen part is less fun: we need the f2324783ceee84fbd24fd66f34d379fa1e6dc887 commit, which fails to apply cleanly because a lot of prototypes are missing in the src/basic/fd-util.h header:

  • fd_move_above_stdio
  • rearrange_stdio
  • make_null_stdio
  • TAKE_FD
  • fd_reopen

The beginning isn’t too scary, one can cherry-pick as follows:

  • 7fe2903c2315db9d9f501ae255a6b6e4f01ba46c OK (fd_move_above_stdio)
  • aa11e28bf2866cc8634b5706dfcd07f0e5579e17 OK (rearrange_stdio)
  • e7685a77b41bbd1b8289aeaf75fccaf4bb68a361 OK (safe_close_above_stdio, needed for the next one)
  • 8bb2db738e13f67cdfe17afefac5fb03ca68a7e6 almost OK (make_null_stdio): two deleted functions that appear in conflicts (likely their code changed a little, but the commit aims at removing them entirely anyway).

Then a stripped-down version of the TAKE_FD addition can be applied.

Now back to the actual commit adding fd_reopen (f2324783ceee84fbd24fd66f34d379fa1e6dc887), there are a lot of code changes in src/tmpfiles/tmpfiles.c, including function names/signatures. Having a chance to apply the said commit cleanly would mean being able to apply 936f6bdb803c432578e2cdcc5f93f3bfff93aff0, which in turn depends on 7ea5a87f92bbc7e30cf198bfbad2472a1ecdbf78, and there are more needed commits. I decided to stop here, as it seems to me we’re going to diverge way too much, risking side effects.

Thoughts?

#10 Updated by intrigeri 2018-12-05 13:02:42

  • Subject changed from Upgrade to systemd 239-11~bpo9+1 to fix CVE-2018-15687 to Fix systemd CVE-2018-15687

#11 Updated by intrigeri 2018-12-05 13:41:29

> I decided to stop here, as it seems to me we’re going to diverge way too much, risking side effects.

Yes, let’s stop here. Thanks for giving it a try!

> If switching to a brand new backported version isn’t feasible due to regressions, is letting the bug unfixed a reasonable outcome?

To answer this question we need to weigh the relative impact of these regressions vs. leaving CVE-2018-15687 unfixed.

Wrt. regressions:

  • Bug #16097 is quite annoying: even if probably few of our users rely on it, if they do rely on it, they’re at significant risk if we break it, and even more so if they miss the known issue we would document in the release notes.
  • The exact impact of Bug #16184 is not clear to me yet. Anyway, we could presumably cherry-pick its fix just to be sure.

Wrt. CVE-2018-15687: it’s in chown_one, which is only called by chown_recursive_internal, which is only called by path_chown_recursive, which itself is only called by setup_exec_directory. My understanding is that for a unit to allow exploiting the bug, it needs:

  • to use at least one of RuntimeDirectory=, StateDirectory=, CacheDirectory=, LogsDirectory= and ConfigurationDirectory= are affected
  • to have either User= set to non-root or DynamicUser= enabled

On Tails 3.10.1, AFAICT the only units that satisfy the pre-conditions for successful exploitation are systemd-networkd, systemd-resolved, and systemd-timesyncd, that are all either disabled or masked. So in the default configuration our users are safe and one can only be affected on Tails by installing an additional package that ships a unit that satisfies these pre-conditions; most of these options, except RuntimeDirectory=, friends were added in systemd v235, so I bet that there’s no unit in Stretch that uses them (they’re not mere bonus options, if they’re not available I expect most services would simply not start); only RuntimeDirectory= is old enough to be of real concern here. Still, I presume most Tails users install additional software that are apps with a .desktop file and no system systemd unit, so the risk seems very low to me.

Conclusion: IMO fixing CVE-2018-15687 is not worth breaking the memory erasure feature.

Even if we managed to fix Bug #16097 and Bug #16184 really soon, at this stage of the 3.11 release cycle it feels too late to gain confidence that v239 does not break more stuff, especially given 3.11 is a bugfix release with no release candidate, so I say let’s stick to v237 for Tails 3.11 and focus on making 3.12 solid.

Please review my reasoning and if you agree, reject this ticket: for 3.12 we will get v239 by mere inaction anyway.

#12 Updated by CyrilBrulebois 2018-12-05 15:13:57

Full ACK regarding the involved functions: setup_exec_directory → path_chown_recursive → chown_recursive_internal → chown_one.

Now, getting to the conditions:

https://bugzilla.redhat.com/show_bug.cgi?id=1639076#c1 mentions { CacheDirectory, LogsDirectory or StateDirectory } together with DynamicUser.

You mention: { RuntimeDirectory, StateDirectory, CacheDirectory, LogsDirectory, ConfigurationDirectory } with { User, DynamicUser }

Looking at the code in v237, we get r = path_chown_recursive(effective, uid, gid); while effective can be set in two ways, from this test:

                if (context->dynamic_user &&
                    !IN_SET(type, EXEC_DIRECTORY_RUNTIME, EXEC_DIRECTORY_CONFIGURATION)) {

but also from within its else block.

So the then block seems to match the assessment from bugzilla, since type not being in the [EXEC_DIRECTORY_RUNTIME, EXEC_DIRECTORY_CONFIGURATION] set means it’s in the [EXEC_DIRECTORY_STATE, EXEC_DIRECTORY_CACHE, EXEC_DIRECTORY_LOGS] set instead (based on the ExecDirectoryType definition).

I couldn’t pinpoint exactly where dynamic_user is set though but I would expect it to be set only with DynamicUser?

And there’s always the else block anyway?

#13 Updated by intrigeri 2018-12-06 09:00:52

> And there’s always the else block anyway?

I don’t fully understand/follow your reasoning but AFAIU you’re not concluding that the impact is bigger than what I thought. If that’s right, then please reject this ticket :)

#14 Updated by CyrilBrulebois 2018-12-07 12:59:49

  • Status changed from Confirmed to Rejected

Rejecting for now, we’ll get a newer version later.