Bug #16126

no alerts when icinga2 is down

Added by groente 2018-11-14 13:23:35 . Updated 2019-06-02 15:18:35 .

Status:
Confirmed
Priority:
Elevated
Assignee:
Category:
Target version:
Start date:
2018-11-14
Due date:
% Done:

0%

Feature Branch:
Type of work:
Sysadmin
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

icinga2 on monitor was down for several days without us noticing. the web frontend showed no indication of the backend being down and there seem to be no other checks outside of icinga to keep an eye whether our monitoring is still actually functioning.

let’s set an hourly cron for a simple script called that attempts to connect to monitor on port 5665 and mails tails-sysadmins on failure. i’d propose running this script on ecours, what do you think?


Subtasks


History

#1 Updated by intrigeri 2018-11-14 13:42:55

> there seem to be no other checks outside of icinga to keep an eye whether our monitoring is still actually functioning.

Assuming that icinga2.service works decently well, the systemd check for monitor.lizard should tell us if icinga2.service is not in a good shape on that host. Now of course, if that service is down, Icinga2 won’t report about itself being down. But I would expect one could teach the central monitoring aggregator (Icinga2 on ecours) to treat “no recent results from check X” as a check failure. Had we had that in place, would have seen that something was wrong.

> let’s set an hourly cron for a simple script called that attempts to connect to monitor on port 5665 and mails tails-sysadmins on failure. i’d propose running this script on ecours, what do you think?

I’m all for adding an external check for that service. Any reason we can’t do this with an icinga check (running on ecours) instead of a cronjob?

#2 Updated by intrigeri 2019-06-02 15:18:35

  • Assignee deleted (bertagaz)
  • QA Check deleted (Info Needed)