Bug #15832
lizard kernel oops on 4.9.0-8 kernel
80%
Description
Booting lizard on a 4.9.0-8 kernel resulted in a non-functional system spewing out the following oops:
Aug 22 08:27:57 lizard kernel: [ 387.963350] Oops: 0000 [#1] SMP
Aug 22 08:27:57 lizard kernel: [ 388.244093] Call Trace:
Aug 22 08:27:57 lizard kernel: [ 388.246539] [
Aug 22 08:27:57 lizard kernel: [ 388.252884] [
Aug 22 08:27:57 lizard kernel: [ 388.258888] [
Aug 22 08:27:57 lizard kernel: [ 388.264799] [
Some suggestions on how to proceed:
- perhaps we can reboot on the 4.9.0-8 kernel but disable the l1tf fixes (iirc one can enable/disable parts of it selectively), they’re the only change in –8 so likely the cause of the trouble
- reboot on the 4.9.0-8 kernel, disabling libvirtd and numad on the kernel cmdline (iirc systemd has means to disable service startup this way), log in, start numad, make sure it’s really really up and ready (Type=forking does not really guarantee the daemon is ready to answer requests), disable autostarting of all VMs, start libvirtd, start the biggest (RAM-wise) VMs one after the other, check numa allocation, then start everything else if no trouble. maybe that would help diagnose what’s going on wrt numa.
- reboot on a much newer kernel, in the hope that the problem is the backport of this big pile of fixes to 4.9
Subtasks
Related issues
Related to Tails - |
Resolved | 2016-02-29 | |
Blocks Tails - Feature #13242: Core work: Sysadmin (Maintain our already existing services) | Confirmed | 2017-06-29 |
History
#1 Updated by groente 2018-08-22 09:48:41
- related to
Feature #11179: Enable automatic NUMA balancing on lizard added
#2 Updated by groente 2018-08-22 09:49:41
intri: i’ve assigned this to you for now since i won’t have time to properly look into it the next few weeks, please feel free to reassign to me if you don’t have time either :)
#3 Updated by intrigeri 2018-09-19 17:39:13
- Assignee changed from intrigeri to groente
groente wrote:
> intri: i’ve assigned this to you for now since i won’t have time to properly look into it the next few weeks, please feel free to reassign to me if you don’t have time either :)
Indeed, I don’t have time either, so please go ahead. We’re now far enough from the major 3.9 release to afford a little bit of well-managed downtime. Please keep https://tails.boum.org/contribute/calendar/ in mind when scheduling this work :)
If you want to first try to revert my recent NUMA changes, in order to check whether they’re the culprit, that’s 0ac2378b0919c3778a41e59a5609317864d373f2 in lizard’s /etc
.
#4 Updated by intrigeri 2018-09-19 17:39:31
- blocks Feature #13242: Core work: Sysadmin (Maintain our already existing services) added
#5 Updated by groente 2018-10-08 14:12:15
- Status changed from Confirmed to Resolved
upgraded the kernel to 4.18 from stretch-backports and the problem disappeared \o/
#6 Updated by intrigeri 2018-10-11 11:54:28
- Status changed from Resolved to In Progress
- % Done changed from 0 to 80
I’ve reviewed the corresponding Puppet changes and have found two issues:
ensure => $ensure
feels wrong/useless (there’s no$ensure
variable in this context, is there?)- missing origin in the APT pinning; I think you want
release o=Debian Backports,a=stretch-backports
#7 Updated by groente 2018-10-11 12:47:42
- Assignee changed from groente to intrigeri
- QA Check changed from Dev Needed to Info Needed
> * ensure => $ensure
feels wrong/useless (there’s no $ensure
variable in this context, is there?)
> * missing origin in the APT pinning; I think you want release o=Debian Backports,a=stretch-backports
Fixed both, but it does make me wonder about the diffoscope pinning for isobuilders, is the pinning in puppet-tails:manifests/iso_builder.pp on line 18 broken then?
#8 Updated by intrigeri 2018-10-11 12:56:41
- Assignee changed from intrigeri to groente
- QA Check changed from Info Needed to Dev Needed
> but it does make me wonder about the diffoscope pinning for isobuilders, is the pinning in puppet-tails:manifests/iso_builder.pp on line 18 broken then?
I think it’s a no-op indeed. No idea how diffoscope from backports got installed there anyway. Best would be to check: deinstall diffoscope, run Puppet again, see which version it installs.
#9 Updated by groente 2018-10-11 13:23:17
- Status changed from In Progress to Resolved
Okay, the diffoscope pin for isobuilders was indeed broken, that’s also fixed now, thanks for the review!
#10 Updated by intrigeri 2018-10-11 17:27:20
> Okay, the diffoscope pin for isobuilders was indeed broken, that’s also fixed now, thanks for the review!
Glad it had the side effect of fixing something else :)