Bug #16125

Icinga2 config not properly cleaned up after host is renamed or deleted

Added by groente 2018-11-14 13:16:51 . Updated 2018-11-15 17:53:02 .

Status:
Confirmed
Priority:
Normal
Assignee:
Category:
Infrastructure
Target version:
Start date:
2018-11-14
Due date:
% Done:

0%

Feature Branch:
Type of work:
Sysadmin
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

after renaming buse, the old buse.riseup.net entry in /etc/icinga2/zones.d was never cleaned up, which caused errors and prevented icinga2 from starting.
is this expected behaviour from the puppet module and did we simply forget to take a step in the renaming process and/or should something be fixed in the puppet code to prevent such an incident from occurring again?


Subtasks


History

#1 Updated by groente 2018-11-14 18:11:54

add to that /etc/icinga2/conf.d/servicegroup_buse.riseup.net.conf on ecours that kept hanging around

#2 Updated by groente 2018-11-14 22:07:57

ah, and check_puppetmaster is still looking for buse.riseup.net aswell

#3 Updated by intrigeri 2018-11-15 17:28:57

  • Target version set to Tails_3.11

> after renaming buse, the old buse.riseup.net entry in /etc/icinga2/zones.d was never cleaned up, which caused errors and prevented icinga2 from starting.

FTR and FWIW, what I did about this after renaming the node was:

  • I cleaned it up manually on the hosts I expected to be affected (buse and ecours) but:
    • I forgot that our Icinga2 setup is made in a way that the config for all hosts lands basically everywhere :/
    • I missed /etc/icinga2/conf.d/servicegroup_buse.riseup.net.conf
  • I manually cleaned as much as I could about the old hostname in PuppetDB. I might have missed some stuff though.

> is this expected behaviour from the puppet module and did we simply forget to take a step in the renaming process and/or should something be fixed in the puppet code to prevent such an incident from occurring again?

We have a documented process to add a node to our monitoring but I’m not aware of any doc about renaming or removing. I was clearly expecting too much from that Puppet code. I guess I could have tried this instead:

  1. disabled monitoring for that node, using ensure => absent and ensuring it’s cleaned up everywhere
  2. renamed the node
  3. enabled monitoring for that node

But I have no clue whether it would have been sufficient: chances are that some classes we apply there expect the monitoring to be set up.

The proper solution to this class of problems is that the Puppet code for our monitoring should not merely drop files into /etc/icinga2/conf.d/: it should take full responsibility for its content and purge unmanaged files (recurse and purge attributes, like the sudo module we’re using does). Then there’s no need to do a fancy dance when we rename or delete a node: the only manual thing to do would be to delete resources exported by that node from PuppetDB, and then on next run, Puppet will remove the corresponding files from /etc/icinga2/conf.d/.

> ah, and check_puppetmaster is still looking for buse.riseup.net aswell

I’ll try to fix that, looks like I did not clean PuppetDB sufficiently.

#4 Updated by intrigeri 2018-11-15 17:49:54

>> ah, and check_puppetmaster is still looking for buse.riseup.net aswell

> I’ll try to fix that, looks like I did not clean PuppetDB sufficiently.

Fixed with sudo puppet node deactivate buse.riseup.net. Next time I’ll read the doc…

#5 Updated by intrigeri 2018-11-15 17:53:02

  • Subject changed from icinga config not properly cleaned up after host rename to Icinga2 config not properly cleaned up after host is renamed or deleted
  • Category set to Infrastructure
  • Assignee deleted (intrigeri)
  • Target version deleted (Tails_3.11)
  • QA Check deleted (Info Needed)

OK, I’m done with the clean ups of the buse renaming => repurposing this ticket to track the more general problem it’s about.