Feature #11179
Enable automatic NUMA balancing on lizard
100%
Description
After reading chapter 8 on https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html-single/Virtualization_Tuning_and_Optimization_Guide/index.html#sect-Virtualization_Tuning_Optimization_Guide-NUMA-Auto_NUMA_Balancing and other Red Hat performance tuning doc, I came to the conclusion that we should enable automatic NUMA balancing on lizard (echo 1 > /proc/sys/kernel/numa_balancing
, i.e.
sysctl::value { 'kernel.numa_balancing': value => 1 }
).
It is supposed to give us better performance, and the other options are not practical:
- manual NUMA tuning: quite some initial + maintenance work, let’s avoid this if we can
- numad: the package is in Stretch but not in Jessie, and according to Red Hat it’s not really more efficient than automatic NUMA balancing
To be able to evaluate how well this works, we can:
- benchmark
- use
numastat -c qemu-system-x86_64
, that gives per-command information, so we can easily check what CPUs our kvm processes are using; see 8.3 on https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html-single/Virtualization_Tuning_and_Optimization_Guide/index.html)
Subtasks
Related issues
Related to Tails - |
Resolved | 2018-08-22 | |
Blocked by Tails - |
Resolved | 2016-02-29 | |
Blocked by Tails - |
Resolved | 2016-09-20 |
History
#1 Updated by intrigeri 2016-02-29 01:24:12
- blocked by
Feature #11178: Upgrade lizard host system to Jessie added
#2 Updated by intrigeri 2016-02-29 01:24:31
- Target version set to Tails_2.3
I’d like to do that late March / early April. If we don’t manage to do it, no big deal, it’s no emergency => we can postpone quite a bit the target version.
#4 Updated by intrigeri 2016-04-16 15:41:09
- Target version changed from Tails_2.3 to Tails_2.4
#5 Updated by intrigeri 2016-04-29 14:26:02
- Target version deleted (
Tails_2.4)
#6 Updated by intrigeri 2016-09-20 09:44:54
- Description updated
#7 Updated by intrigeri 2016-09-20 09:48:13
- Description updated
#8 Updated by intrigeri 2016-09-20 15:17:03
- blocked by
Feature #11817: Optimize I/O settings on lizard added
#9 Updated by intrigeri 2016-09-20 15:17:27
(I don’t want to mix two concurrent experiments so I’ll wait.)
#10 Updated by intrigeri 2016-09-20 15:19:10
- Description updated
#11 Updated by intrigeri 2016-11-06 10:04:46
Before enabling:
sudo numastat -c qemu-system-x86_64
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Total
--------------- ------ ------ ------
8121 (sudo) 3 1 4
23663 (qemu-syst 8855 34 8889
23685 (qemu-syst 110 534 644
23709 (qemu-syst 89 529 618
23727 (qemu-syst 101 528 629
23745 (qemu-syst 79 23575 23653
23775 (qemu-syst 641 12 653
23793 (qemu-syst 168 8732 8900
23840 (qemu-syst 101 23570 23672
23866 (qemu-syst 130 539 668
23885 (qemu-syst 23645 12 23657
23925 (qemu-syst 1120 14 1134
23943 (qemu-syst 151 2745 2895
23961 (qemu-syst 90 530 620
24031 (qemu-syst 145 8738 8883
24140 (qemu-syst 100 1563 1663
24159 (qemu-syst 1193 17 1209
24179 (qemu-syst 189 4147 4336
24198 (qemu-syst 92 23577 23669
24235 (qemu-syst 155 1552 1707
24295 (qemu-syst 3342 20335 23676
24446 (qemu-syst 8869 32 8901
24468 (qemu-syst 23644 21 23665
31929 (qemu-syst 23645 22 23667
31970 (qemu-syst 23648 16 23664
46498 (qemu-syst 1083 25 1108
--------------- ------ ------ ------
Total 121386 121397 242783
#12 Updated by intrigeri 2016-11-06 10:23:51
- Status changed from Confirmed to In Progress
- % Done changed from 0 to 10
And after enabling + shutting down a bunch of VMs and starting the ones that are allocated large amounts of RAM first:
sudo numastat -c qemu-system-x86_64
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Total
--------------- ------ ------ ------
16421 (qemu-syst 8745 5 8750
16559 (qemu-syst 23 8731 8753
16625 (qemu-syst 8736 4 8740
17331 (qemu-syst 27 23562 23589
17447 (qemu-syst 23581 10 23590
17626 (qemu-syst 68 23523 23592
17717 (qemu-syst 23575 15 23590
19667 (qemu-syst 22 523 545
21982 (qemu-syst 15 23574 23590
22173 (qemu-syst 1725 2409 4134
22210 (qemu-syst 1552 22 1575
22659 (sudo) 3 1 4
23709 (qemu-syst 90 528 618
23727 (qemu-syst 99 530 629
23745 (qemu-syst 78 23575 23653
23775 (qemu-syst 639 14 653
23793 (qemu-syst 165 8735 8900
23866 (qemu-syst 130 538 668
23925 (qemu-syst 1117 16 1134
23943 (qemu-syst 150 2746 2896
23961 (qemu-syst 88 531 620
24140 (qemu-syst 97 1563 1661
24159 (qemu-syst 1193 16 1209
24468 (qemu-syst 23645 20 23665
31929 (qemu-syst 23645 22 23667
46498 (qemu-syst 1085 22 1108
--------------- ------ ------ ------
Total 120295 121237 241532
The isotesters (and all isobuilders but one that I did not restart) now have their memory in the right NUMA node. I’m not sure what’ll happen on next reboot: will it look nice, or will it depend on the startup order of the VM:s?
#13 Updated by intrigeri 2017-05-28 13:59:09
Ouch, things are still not ideally balanced:
$ sudo numastat
node0 node1
numa_hit 393402402 453822993
numa_miss 284650 544264909
numa_foreign 544105750 443809
interleave_hit 59805 59815
local_node 393402402 453822993
other_node 0 0
$ sudo numastat -c qemu-system-x86_64
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Total
--------------- ------ ------ ------
12071 (qemu-syst 26505 37 26542
12119 (qemu-syst 1566 24 1590
12160 (qemu-syst 775 6 780
12203 (qemu-syst 6 2744 2750
12269 (qemu-syst 1256 11 1267
12451 (qemu-syst 890 25650 26539
12511 (sudo) 1 2 4
12790 (qemu-syst 3106 1048 4155
12837 (qemu-syst 462 1122 1584
12977 (qemu-syst 11 26530 26541
13040 (qemu-syst 35 532 567
13083 (qemu-syst 9 14546 14555
13131 (qemu-syst 5 874 879
13172 (qemu-syst 7 553 560
13213 (qemu-syst 556 4 560
13254 (qemu-syst 27 533 560
13296 (qemu-syst 10489 16054 26543
13350 (qemu-syst 10 553 563
13393 (qemu-syst 10 554 564
13438 (qemu-syst 14551 5 14556
13492 (qemu-syst 24073 2469 26541
13548 (qemu-syst 9 14545 14553
13608 (qemu-syst 8 555 563
--------------- ------ ------ ------
Total 84367 108950 193316
So I’ve shut down all isobuilders and isotesters, and tried:
- turning off automatic NUMA balancing, running numad, starting isotesters one after the other: several are badly aligned
- turning on automatic NUMA balancing, stopping numad, starting 5 isotesters sequentially: several are badly aligned
- turning off both automatic NUMA balancing and numad, starting 5 isotesters sequentially: all correctly aligned
So I’m not quite sure what’s going on. Possible explanations and ideas:
- One of the algorithms used by automatic NUMA balancing is Migrate-on-Fault, so it’s plausible that the situation gets better over time, and looking at stats immediately after starting the VMs might be mostly worthless.
- We use
<vcpu placement='static'>
, while<vcpu placement='auto'>
would querynumad
; this is best combined with<numatune><memory mode='strict' placement='auto'/></numatune>
, that queriesnumad
too.
#14 Updated by intrigeri 2017-05-28 14:10:48
intrigeri wrote:
> * We use <vcpu placement='static'>
, while <vcpu placement='auto'>
would query numad
; this is best combined with <numatune><memory mode='strict' placement='auto'/></numatune>
, that queries numad
too.
I’ve tried that (with numad running), let’s see how it goes.
#15 Updated by intrigeri 2017-06-29 10:01:23
- blocks
Feature #13232: Core work 2017Q2: Sysadmin (Maintain our already existing services) added
#16 Updated by intrigeri 2017-06-29 13:33:33
- blocked by deleted (
)Feature #13232: Core work 2017Q2: Sysadmin (Maintain our already existing services)
#17 Updated by intrigeri 2018-08-18 09:28:04
intrigeri wrote:
> intrigeri wrote:
> > * We use <vcpu placement='static'>
, while <vcpu placement='auto'>
would query numad
; this is best combined with <numatune><memory mode='strict' placement='auto'/></numatune>
, that queries numad
too.
>
> I’ve tried that (with numad running), let’s see how it goes.
I get better results than previously: all but one iso{builder,tester}s are correctly placed on a single NUMA node. So I’ll configure the same settings for all our other VMs and will call this done.
#18 Updated by intrigeri 2018-08-18 09:35:22
- Target version set to Tails_3.9
- % Done changed from 10 to 70
- QA Check set to Ready for QA
Done, but not restarted the VMs so the changes are not really applied yet. I’ll check how things are after the next reboot.
Updated our VM creation doc accordingly.
#19 Updated by intrigeri 2018-08-19 12:59:49
- Status changed from In Progress to Resolved
- Assignee deleted (
intrigeri) - % Done changed from 70 to 100
- QA Check changed from Ready for QA to Pass
Interestingly, 4 out of our 9 iso{build,test}ers still seem to be badly balanced. We could probably improve things by tweaking the order in which stuff is started. But the numa_miss / numa_hit
ratio is 0 on node0 and ~1% on node1, which is much better than what I’ve seen before, so I’ll call this good enough.
#20 Updated by groente 2018-08-22 09:48:42
- related to
Bug #15832: lizard kernel oops on 4.9.0-8 kernel added