Bug #16084

VM's crashing

Added by groente 2018-10-31 10:55:40 . Updated 2018-11-05 21:18:43 .

Status:
Resolved
Priority:
Urgent
Assignee:
groente
Category:
Infrastructure
Target version:
Start date:
2018-10-31
Due date:
% Done:

0%

Feature Branch:
Type of work:
Sysadmin
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

VM’s started to randomly crash recently:

root@lizard:/var/log/libvirt# grep ‘reason=crashed’ qemu/*log
qemu/isotester1.log:2018-10-28 17:10:10.123+0000: shutting down, reason=crashed
qemu/isotester1.log:2018-10-30 00:05:38.091+0000: shutting down, reason=crashed
qemu/isotester2.log:2018-10-30 07:02:28.928+0000: shutting down, reason=crashed
qemu/isotester3.log:2018-10-29 05:05:18.158+0000: shutting down, reason=crashed
qemu/isotester4.log:2018-10-26 11:02:03.757+0000: shutting down, reason=crashed
qemu/isotester5.log:2018-10-23 16:05:19.535+0000: shutting down, reason=crashed
qemu/survey.log:2018-10-25 11:05:59.783+0000: shutting down, reason=crashed


Subtasks


History

#1 Updated by groente 2018-10-31 11:30:55

Looks like it’s OOM-related on lizard:

Oct 29 05:05:10 lizard kernel: [1782677.734096] IO iothread2 invoked oom-killer: gfp_mask=0x6280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0, oom_score_adj=0

Oct 30 00:05:37 lizard kernel: [1851103.428884] IO iothread3 invoked oom-killer: gfp_mask=0x6280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0, oom_score_adj=0

Oct 30 07:02:28 lizard kernel: [1876113.582543] IO iothread1 invoked oom-killer: gfp_mask=0x6280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0, oom_score_adj=0

#2 Updated by groente 2018-10-31 11:40:15

  • Assignee changed from groente to bertagaz
  • QA Check set to Info Needed

I’m tempted to give lizard some swap space (i’d say 8GB), are there any reasons it doesn’t currently have swap?

#3 Updated by intrigeri 2018-10-31 12:44:02

Note that we have lots of unused huge pages, we could free some of those so the host system has more RAM.
But that’s weird because the host already has ~7G of RAM available once you remove the huge pages it cannot use.

#4 Updated by groente 2018-11-01 11:26:34

isobuilder1 just crashed aswell, it looks like this is becoming a daily thing now.. please advise asap!

#5 Updated by bertagaz 2018-11-01 11:46:01

  • Assignee changed from bertagaz to groente
  • QA Check deleted (Info Needed)

groente wrote:
> isobuilder1 just crashed aswell, it looks like this is becoming a daily thing now.. please advise asap!

Sorry, head busy elsewhere. I’d say the same than intrigeri, maybe we can free some of the hugepages to give a bit more memory to the host? Not sure about the amount, maybe 2G to start with?

I’ve seen lately isotesters going oom too, I don’t know what’s happening memory wise, might be related, or not.

#6 Updated by groente 2018-11-01 12:14:33

  • QA Check set to Info Needed

okay, decreased the huge pages by 1024, which resulted in roughly 2GB extra memfree on lizard. let’s see how it holds up the next couple of days…

#7 Updated by smokepath 2018-11-02 03:48:28

Hugepages >120 or even 256 is generally very very unnecessary and decreases performance- Hugepages allocates up to a max of only a gigabyte. Hugepage value generally works best at anything under 128 for me.

#8 Updated by groente 2018-11-05 21:18:43

  • Status changed from Confirmed to Resolved

that seems to have done the trick, no more crashes \o/