Bug #16126
no alerts when icinga2 is down
0%
Description
icinga2 on monitor was down for several days without us noticing. the web frontend showed no indication of the backend being down and there seem to be no other checks outside of icinga to keep an eye whether our monitoring is still actually functioning.
let’s set an hourly cron for a simple script called that attempts to connect to monitor on port 5665 and mails tails-sysadmins on failure. i’d propose running this script on ecours, what do you think?
Subtasks
History
#1 Updated by intrigeri 2018-11-14 13:42:55
> there seem to be no other checks outside of icinga to keep an eye whether our monitoring is still actually functioning.
Assuming that icinga2.service
works decently well, the systemd check for monitor.lizard
should tell us if icinga2.service
is not in a good shape on that host. Now of course, if that service is down, Icinga2 won’t report about itself being down. But I would expect one could teach the central monitoring aggregator (Icinga2 on ecours) to treat “no recent results from check X” as a check failure. Had we had that in place, would have seen that something was wrong.
> let’s set an hourly cron for a simple script called that attempts to connect to monitor on port 5665 and mails tails-sysadmins on failure. i’d propose running this script on ecours, what do you think?
I’m all for adding an external check for that service. Any reason we can’t do this with an icinga check (running on ecours) instead of a cronjob?
#2 Updated by intrigeri 2019-06-02 15:18:35
- Assignee deleted (
bertagaz) - QA Check deleted (
Info Needed)