Feature #11524

Blocklist insecure PI futexes to harden kernel

Added by cypherpunks 2016-06-12 05:13:31 . Updated 2018-08-31 22:57:38 .

Status:
Rejected
Priority:
Normal
Assignee:
Dr_Whax
Category:
Target version:
Start date:
2016-06-12
Due date:
% Done:

0%

Feature Branch:
Type of work:
Discuss
Blueprint:

Starter:
0
Affected tool:
Deliverable for:

Description

The futex system call is used for locking resources, allowing threads to play nicely together. PI (priority-inheritance) futexes are a class of futex operations which resist a situation of reduced performance called priority inversion. Unfortunately, there’s a PI-based 0day being sold by VUPEN (now Zerodium) which allows escalation to kernelmode, utilizing these futex operations. It is a race condition similar to the “Towelroot” vulnerability from 2014. I can’t get this fixed upstream, due to not having a proper fix which does not involving sacrificing the PI futexes (futex.c is exceptionally complicated). Luckily for us, Tails does not make use of these futexes (few systems do, and they tend not to be necessary anyway).

About a year ago, I first heard a rumor of this vulnerability, and patched up my own systems by whitelisting only the futex calls which I needed. Recently, I was able to confirm that it existed, indirectly through a contact who actually works at VUPEN. Because of this, I wrote an LKM which mitigates this by hooking the futex system call and causing it to return ENOSYS and log details to the syslog if such a banned futex is called.

Unfortunately, Tails uses a kernel which is too old to support livepatch, so this is the only method which seems practical, and which has been used extensively in practice (though ironically, the methods used are more typical of rootkits. Not many people hook syscalls for benevolent reasons!). A possible alternative is to use kprobes, and modify the registers of val or futex_op to force it into an invalid state (which would be using official kernel APIs), but that seems a bit sloppy. Hooking the syscall table looks like the most stable solution.

I’ve attached the kernel module source file, the makefile, and a very simple program to test it by intentionally calling a blacklisted futex call. The module works on both the 64bit and 32bit kernels. I ran the module by pipacs (one of the developers of grsecurity, and the primary developer of PaX), and his only concern was that I did not flush the TLB after changing the r/w status of the syscall tables. I have since added local_flush_tlb() to the end of the relevant functions.

There are a few other unrelated issues with the Linux kernel which I have been made privately aware of, and I am still thinking of ways to deal with them.

Example usage:

root@amnesia:~/test# ls
Makefile  test_futex.c  vupensux.c
root@amnesia:~/test# apt-get -qq update && apt-get -qqy install build-essential linux-headers-$(uname -r)  
root@amnesia:~/test# make
make -C /lib/modules/3.16.0-4-amd64/build M=/root/test modules
make[1]: Entering directory '/usr/src/linux-headers-3.16.0-4-amd64'
Makefile:10: *** mixed implicit and normal rules: deprecated syntax
make[1]: Entering directory `/usr/src/linux-headers-3.16.0-4-amd64'
  CC [M]  /root/test/vupensux.o
  Building modules, stage 2.
  MODPOST 1 modules
  CC      /root/test/vupensux.mod.o
  LD [M]  /root/test/vupensux.ko
make[1]: Leaving directory '/usr/src/linux-headers-3.16.0-4-amd64'
root@amnesia:~/test# ls 
Makefile  modules.order  Module.symvers  test_futex.c  vupensux.c  vupensux.ko  vupensux.mod.c  vupensux.mod.o  vupensux.o
root@amnesia:~/test# gcc -o test_futex test_futex.c
root@amnesia:~/test# ./test_futex
FUTEX_REQUEUE: Bad address
FUTEX_LOCK_PI: Bad address
root@amnesia:~/test# insmod ./vupensux.ko
root@amnesia:~/test# ./test_futex
FUTEX_REQUEUE: Bad address
FUTEX_LOCK_PI: Function not implemented
root@amnesia:~/test# rmmod vupensux
root@amnesia:~/test# ./test_futex
FUTEX_REQUEUE: Bad address
FUTEX_LOCK_PI: Bad address
root@amnesia:~/test# sudo dmesg | tail -n 3
[26848.859975] loaded vupensux module, pi futexes are disabled
[26851.106445] from test_futex[6927], attempted to call banned pi futex with futex_op 6 and val 0
[26855.793791] unloaded vupensux module, pi futexes are enabled

Files

test_futex.c (222 B) cypherpunks, 2016-06-12 04:57:44
Makefile (270 B) cypherpunks, 2016-06-12 04:59:43
vupensux.c (10323 B) cypherpunks, 2016-06-12 05:07:11
pa_no_prio_inherit.c (1793 B) cypherpunks, 2016-07-09 00:19:11

Subtasks


History

#1 Updated by mercedes508 2016-06-14 12:36:11

  • Type of work changed from Code to Discuss

I don’t understand, Tails doesn’t use futex so it can’t be affected, but you talk about patching the kernel?

Why don’t you make those issues public for all linux users to benefit from fixes?

#2 Updated by cypherpunks 2016-06-15 20:13:03

mercedes508 wrote:
> I don’t understand, Tails doesn’t use futex so it can’t be affected, but you talk about patching the kernel?

What are you talking about? Of course Tails uses futex. All glibc-based systems require futex.

See the status of futex support in the kernel, and how futex replies to the arguments “0, 0”:

amnesia@amnesia:~$ grep FUTEX /boot/config-$(uname -r)
CONFIG_FUTEX=y
amnesia@amnesia:~$ printf '#include<syscall.h>\nmain(){syscall(SYS_futex,0,0);perror("");}' | gcc -xc -
amnesia@amnesia:~$ ./a.out
Invalid argument

> Why don’t you make those issues public for all linux users to benefit from fixes?

I answered that already in the first paragraph.

#3 Updated by Dr_Whax 2016-06-18 04:39:42

  • Assignee set to Dr_Whax

mercedes508 wrote:
> I don’t understand, Tails doesn’t use futex so it can’t be affected, but you talk about patching the kernel?
>
> Why don’t you make those issues public for all linux users to benefit from fixes?

This was separately discussed on irc between the user who reported this, anonym and myself. We’ll look into this.

#4 Updated by cypherpunks 2016-06-21 05:13:36

It seems on the latest Tails, Totem seems to want to call some PI futexes. That’s weird because I tested Totem with a couple dozen videos and it never tried to use a single PI futex.

[ 5991.744846] from threaded-ml[30951], attempted to call banned pi futex with futex_op 134 and val 1
[ 5991.744864] from threaded-ml[30951], attempted to call banned pi futex with futex_op 135 and val -316370944
[ 5991.744867] from threaded-ml[30951], attempted to call banned pi futex with futex_op 134 and val 1
[ 5991.744872] from threaded-ml[30951], attempted to call banned pi futex with futex_op 135 and val -316370944
...

It dies with an assert() (so I guess pulse, not Totem):

Assertion 'pthread_mutex_unlock(&m->mutex) == 0' failed at pulsecore/mutex-posix.c:110, function pa_mutex_unlock(). Aborting.
Aborted

This make things slightly harder (PI futexes will have to be converted into regular futexes, rather than simply made to return -ENOSYS) but it shouldn’t be an issue. I’ll just have to look into the implementation specifics of PI futexes more to make sure I don’t misunderstand any implementation details.

#5 Updated by cypherpunks 2016-06-21 06:30:30

So it seems it seems pulseaudio never used PI futexes before 2.4, but now it uses them heavily. On all pre-2.4 systems, even speaker-test worked, but on 2.4, it dies with an assertion failure with the lkm. I imagine everything that uses pulse will end that way now. So yeah, looks like I’ll have some work ahead of me to make this compatible. On the other hand, it might just be as easy as stripping the flags from the bitmask. I’ll play with it later this week.

#6 Updated by Dr_Whax 2016-06-25 04:37:30

  • Status changed from New to Confirmed

#7 Updated by cypherpunks 2016-07-01 04:59:24

So the issue is indeed the latest Pulseaudio, which uses PI futexes if compiled when HAVE_PTHREAD_PRIO_INHERIT is defined. The relevant code is at: https://github.com/pulseaudio/pulseaudio/blob/master/src/pulsecore/mutex-posix.c#L40

There’s no reason in the universe Lennart Poettering would change the code for us to have a configuration option or environmental variable to control Pi support, so the only solution I can think of would involve an LD_PRELOAD or ptrace-based injector against Pulseaudio to fix its behavior without requiring it be recompiled. I can’t see how this can be fixed from kernelmode, as the futexes have to either be completely allowed or completely blocked. Would a solution like that be acceptable?

The only offending PI futexes are FUTEX_LOCK_PI and FUTEX_UNLOCK_PI it seems. Pulseaudio survives when the other futexes return -ENOSYS.

#8 Updated by cypherpunks 2016-07-09 00:24:12

Here’s a simple library which is loaded via LD_PRELOAD, and fixes the afformentioned issue. It checks for the highest PA version number for libpulsecore before loading it in order to wrap the affected function. So far it works with every PA-linked application I’ve thrown at it (after whitelisting the path of the .so file in its AppArmor policy, of course). The ones I’ve tested most extensively are Totem, Pidgin, and speaker-test.

Basically, the problem is that PulseAudio, if compiled with HAVE_PTHREAD_PRIO_INHERIT set, will cause the function pa_mutex_new() to accept true as its second argument rather than ignoring it, which will enable PI futex optimizations. This leads to an assertion failure later on when the kernel module is loaded. All this library does is wrap around that function and set the second argument to false. The .c file is in the attachment. It’s short at only 74 lines. An abridged version would be:

#include <dlfcn.h>
#include <stdbool.h>

typedef struct pa_mutex pa;

pa *pa_mutex_new(bool recursive, bool inherit_priority)
{
        static pa *(*orig_mutex)(bool, bool);
        void *h;

        if (!orig_mutex) {
                h = dlopen("/usr/lib/libpulsecore-5.0.so", RTLD_LAZY|RTLD_LOCAL);
                orig_mutex = dlsym(h, "pa_mutex_new");
                dlclose(h);
        }

        return orig_mutex(recursive, false);
}

I’d like to know if this is an acceptable solution worth persuing. The library does not take any untrusted data which is not already owned and writable only by root (i.e. /usr/lib{32,}/libpulsecore-*.so for 32bit executables). It also does not change the internal behavior of PulseAudio (and programs linked against it) in any ways which are not intended. When compiled with HAVE_PTHREAD_PRIO_INHERIT unset, the exact same behavior triggered by this library occurs, so no new logic bugs can be brought out by forcing the second argument of the wrapped function to false.

#9 Updated by intrigeri 2016-07-16 09:04:55

> This was separately discussed on irc between the user who reported this, anonym and myself. We’ll look into this.

anonym, DrWhax: did you find time to do so?

#10 Updated by intrigeri 2016-07-16 09:14:03

> So it seems it seems pulseaudio never used PI futexes before 2.4, but now it uses them heavily.

Wow, this is strange. I checked the packages list diff and could not spot anything relevant, and the pulseaudio package hasn’t changed. I’ve read our 2.4 changelog and here’s a naive newbie question: can this possibly have been triggerred by the kernel cmdline options that were added in 2.4 (slab_nomerge slub_debug=FZ mce=0 vsyscall=none)?

#11 Updated by intrigeri 2016-07-16 09:33:15

(Adding co6 and jvoisin into the loop, I’d love to hear what they think about all this.)

Thanks for your efforts! I’m no expert in this area, so all the input I have is about our project’s strategies, and I won’t be able to comment about the implementation details.

To say the least, I’m not thrilled at the idea of shipping a custom LKM + LD_PRELOAD library to workaround an undisclosed Linux security vulnerability, that we are hearing of by way of a 2-hops circuit whose nodes are equally undisclosed.

Even if we decide to blindly believe that said vulnerability does exist, having the LKM and library reviewed and audited by skilled people will take quite some time and energy, and perhaps a bit too much for a Tails-specific workaround. Ditto for integrating all this into Tails itself.

Let’s keep in mind that this bonus code will have to be maintained, upgraded, adjusted (e.g. what apps will need PI futexes in Tails 3.0, based on Debian Stretch?), presumably forever since it seems that there’s no ongoing effort being made to fix the bug in Linux mainline. Speaking of which: from the initial ticket description, I understand that it is hard to fix that vulnerability, and it may also be hard to look for help to fix it, since the vuln itself is undisclosed — is my understanding correct?

#12 Updated by cypherpunks 2016-07-18 19:04:41

intrigeri wrote:
> > So it seems it seems pulseaudio never used PI futexes before 2.4, but now it uses them heavily.
>
> Wow, this is strange. I checked the packages list diff and could not spot anything relevant, and the pulseaudio package hasn’t changed. I’ve read our 2.4 changelog and here’s a naive newbie question: can this possibly have been triggerred by the kernel cmdline options that were added in 2.4 (slab_nomerge slub_debug=FZ mce=0 vsyscall=none)?

I’ve been adding those manually even beforehand while testing this, so I’m pretty sure those couldn’t cause it. Remember that the issue is triggered by a configuration option being enabled at compile time, not necessarily the PulseAudio source code being changed, so checking a source code diff might not turn anything up.

#13 Updated by cypherpunks 2016-07-18 19:47:33

intrigeri wrote:
> (Adding co6 and jvoisin into the loop, I’d love to hear what they think about all this.)
>
> To say the least, I’m not thrilled at the idea of shipping a custom LKM + LD_PRELOAD library to workaround an undisclosed Linux security vulnerability, that we are hearing of by way of a 2-hops circuit whose nodes are equally undisclosed.

The 2-hop circuit was only what was necessary for me to verify that it has or is being sold by VUPEN/Zerodium, not that it exists. I didn’t originally hear about it through a convoluted network of people playing telephone. It’d be more accurate to represent what happened as:

Person on IRC -> Me
(a year later)
Person I know -> Me
(later, at my request)
VUPEN -> Person I know -> Me

After I got word that VUPEN/Zerodium was indeed selling it, I talked with several people who said that, if there was a vulnerability in the futex syscall, it probably would manifest in the way described (a race condition in PI locking), due to the complexity of those operations.

Although it’s not officially disclosed, it’s not like this is the first time anyone’s heard about it. It’s been known about by multiple people. Like I said, I had heard about it off-handedly a year or so before I was told about it from someone who actually was able to confirm it.

If trust itself is a major issue, then on IRC, I can put you in contact with someone who I believe you already know and trust who can likely vouch for the honesty, or at least technical competancy, of the “circuit”.

> Even if we decide to blindly believe that said vulnerability does exist, having the LKM and library reviewed and audited by skilled people will take quite some time and energy, and perhaps a bit too much for a Tails-specific workaround. Ditto for integrating all this into Tails itself.

The LKM is extremely trivial. The vast majority of it is composed of functions to locate the syscall tables, and only a tiny portion is actually dedicated to hooking the relelvant syscalls.

I already had it given a quick “audit” by pipacs (PaXTeam), who’s only suggestion was to flush the TLB after changing the r/w bits. Regarding the effects on the kernel of disabling PI futexes, it will have no undesirable effects. If the cmpxchg instruction is not present, they are disabled anyway, so it wont cause any undefined behavior (in other words, the kernel is designed to handle PI futexes being disabled).

The shared object is also extremely simple, simply hooking pa_mutex_new() and setting its second argument to false. Like the PI futex situation, PulseAudio is designed to handle the second argument being “stuck” at false, as that is the intended behavior when it is compiled without PI futex support.

> Let’s keep in mind that this bonus code will have to be maintained, upgraded, adjusted (e.g. what apps will need PI futexes in Tails 3.0, based on Debian Stretch?), presumably forever since it seems that there’s no ongoing effort being made to fix the bug in Linux mainline. Speaking of which: from the initial ticket description, I understand that it is hard to fix that vulnerability, and it may also be hard to look for help to fix it, since the vuln itself is undisclosed — is my understanding correct?

The LKM code will be very easy to maintain. The syscall table-finding code is present there specifically to make maintainence easier, so all that has to be done is compiling it against the next kernel. The last major change that would have been needed to be made to it would be when the Pentium 4 came out I believe, and the LSTAR MSR was created. Until we get 128 bit processors or switch to a new architecture, there won’t need to be any major changes to the LKM. The only minor changes would be adding new PI futexes as they come out. That will be very easy: simply add new PI futexes from futex.h to the switch statement.

As for other applications that need PI futexes, they will much more likely gracefully handle returned -ENOSYS. In a discussion on IRC, I found out that PulseAudio and Glibc’s pthread implementation had several bugs which, together, caused it to handle failed futexes poorly, resulting in the assertion failures. PA even has code to gracefully handle the failures and fall back to regular futexes, but it is broken due to issues with Glibc. Other applications will tend not to freak out like PA does.

It is hard to fix the underlying vulnerability, correct. It’s nasty enough that multiple race conditions were lurking around futex.c. All that I am aware of that can be done is mitigating it through disabling the affected futex operations.

I completely understand not being thrilled to ship a custom LKM, but when Tails is a known target of VUPEN/Zerodium, and they are known to have multiple 0days that can break out of Tor Browser, I think it’s of tantemount importance to find a way to mitigate this threat. I do believe that I can convince you that at least the PI futexes contain an insane amount of attack surface area, and that if a vulnerability will be anywhere, it will be there.

The only other mitigation I can think of would be to use mode 2 seccomp, either on every at-risk application, or starting at the init, with PR_SET_NO_NEW_PRIVS disabled. That would be a bit of a pain because it would also require blacklisting certain ptrace operations (PTRACE_POKEUSER, PTRACE_SETREGSET, etc), among other things. Escaping a blacklist seccomp sandbox is not hard. And of course, that would still require using LD_PRELOAD.

#14 Updated by intrigeri 2016-07-19 02:33:21

> Remember that the issue is triggered by a configuration option being enabled at compile time, not necessarily the PulseAudio source code being changed, so checking a source code diff might not turn anything up.

Sure. Build configuration options are encoded in the Debian source package. Anyway, I’ve checked and the package has not been recompiled (this is Debian stable so no big surprise).

#15 Updated by cypherpunks 2016-08-22 20:50:18

intrigeri wrote:
> > Remember that the issue is triggered by a configuration option being enabled at compile time, not necessarily the PulseAudio source code being changed, so checking a source code diff might not turn anything up.
>
> Sure. Build configuration options are encoded in the Debian source package. Anyway, I’ve checked and the package has not been recompiled (this is Debian stable so no big surprise).

It’s possible that I got the version wrong, and I was on an older Tails than I thought I was.

#16 Updated by cypherpunks 2016-09-28 20:05:23

Finally! Now that Tails uses a 4.x kernel, livepatch is supported. That should be a much nicer mitigation than an LKM with a syscall hooker.

#17 Updated by cypherpunks 2016-12-06 07:23:42

Is there still any interest in this? I can rewrite this using livepatch now that it’s officially supported by the Tails kernel if so.

Btw, a lot of the patch can be removed and replaced with a call to kallsyms_lookup_name(). It’s not available in all kernels AFIAK, but apparently that hasn’t been true for a long time, so sloppy LSTAR_MSR and int80 interrupt-based table-finding code is unnecessary.

#18 Updated by intrigeri 2016-12-06 10:16:49

FTR I’ll let DrWhax handle this, unless he explicitly states he doesn’t want, or can’t, do it.

#19 Updated by Anonymous 2018-08-17 15:43:34

  • QA Check set to Info Needed

ping? Are you interested in looking into this?

#20 Updated by Dr_Whax 2018-08-22 17:41:21

Even though it might be a good intention to ship an LKM to mitigate this attack. I’m against shipping stuff like this, we are not kernel developers or want to become kernel developers (afaik).

So.. I could ask people to audit this part of the kernel or we can close this for now..

thoughts? u intrigeri

#21 Updated by mercedes508 2018-08-23 06:28:39

  • Subject changed from Blacklist insecure PI futexes to harden kernel to Blocklist insecure PI futexes to harden kernel

#22 Updated by intrigeri 2018-08-24 05:39:00

  • Status changed from Confirmed to Rejected

> Even though it might be a good intention to ship an LKM to mitigate this attack. I’m against shipping stuff like this, we are not kernel developers or want to become kernel developers (afaik).

> So.. I could ask people to audit this part of the kernel or we can close this for now..

The benefit is very hard to evaluate given we’re talking about undisclosed issues.
The cost is not that small and I don’t think that’s where we should put our efforts.
Anyone interested in making Tails users safer on this front, please make Linux users safer by ensuring this problem is known and fixed in Linux mainline.

#23 Updated by cypherpunks 2018-08-31 22:57:38

Dr_Whax wrote:
> Even though it might be a good intention to ship an LKM to mitigate this attack. I’m against shipping stuff like this, we are not kernel developers or want to become kernel developers (afaik).

That’s fair enough. Especially with things like grsecurity/PaX no longer being on the radar, there’s really not much we can hope to do.

> So.. I could ask people to audit this part of the kernel or we can close this for now..

This part of the kernel is stupidly convoluted if you’re looking for a race condition anything like those seen before with PI futexes. While security audits are always good, the kind folk at the Linux Foundation are not particularly interested in security when it comes down to it, as much as it may hurt to hear.

intrigeri wrote:
> > Even though it might be a good intention to ship an LKM to mitigate this attack. I’m against shipping stuff like this, we are not kernel developers or want to become kernel developers (afaik).
>
> > So.. I could ask people to audit this part of the kernel or we can close this for now..
>
> The benefit is very hard to evaluate given we’re talking about undisclosed issues.
> The cost is not that small and I don’t think that’s where we should put our efforts.
> Anyone interested in making Tails users safer on this front, please make Linux users safer by ensuring this problem is known and fixed in Linux mainline.

Unfortunately, Linux attempts to support far more use-cases than Tails does, meaning they would absolutely not disable PI futexes (and reintroduce the risk of priority inversion in certain scenarios which are likely irrelevant to Tails). The only thing I can think of is to create a kernel build option or even boot parameter to disable said futexes.