Bug #16271

Monitoring check for Postfix mail queue never reaches warning/critical state

Added by intrigeri 2019-01-04 13:30:20 . Updated 2019-08-17 09:39:46 .

Status:
Resolved
Priority:
Elevated
Assignee:
groente
Category:
Infrastructure
Target version:
Start date:
2019-01-04
Due date:
% Done:

100%

Feature Branch:
Type of work:
Sysadmin
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

There were dozens of messages in the queue today and I don’t think we’ve received any alert.


Subtasks


Related issues

Related to Tails - Bug #12086: Monitor the size of WhisperBack SMTP relay's queue Resolved 2016-12-26
Blocks Tails - Feature #13242: Core work: Sysadmin (Maintain our already existing services) Confirmed 2017-06-29

History

#1 Updated by intrigeri 2019-01-04 13:41:56

  • Target version set to Tails_3.12
  • Type of work changed from Security Audit to Sysadmin

#2 Updated by intrigeri 2019-01-26 17:42:01

  • blocks Feature #13242: Core work: Sysadmin (Maintain our already existing services) added

#3 Updated by intrigeri 2019-01-26 17:42:54

(I’m not sure why I’ve assigned this to me: arguably this is about keeping things working, i.e. weekly shifts. But whatever, IIRC I’ve implemented these checks initially so I feel kinda responsible, and I’m curious, so I’ll give it a try :)

#4 Updated by intrigeri 2019-01-26 17:54:36

  • related to Bug #12086: Monitor the size of WhisperBack SMTP relay's queue added

#5 Updated by intrigeri 2019-01-26 17:58:00

  • Subject changed from Monitoring check for WhisperBack mail queue is broken to Monitoring check for Postfix mail queue never reaches warning/critical state
  • Status changed from Confirmed to In Progress
  • % Done changed from 0 to 10

The check works just fine, as far as collecting data (that we don’t store) goes, but “By default, all thresholds are 0 except corrupt_crit”. So well, since 2 years Icinga has never told us about things being wrong on the email front.

#6 Updated by intrigeri 2019-01-26 18:01:23

To be clear, actually that comment is wrong: all thresholds are empty by default. I had mistakenly believed this comment and concluded that if the threshold is 0, then any value greater than 0 must trigger a notification, i.e. the default settings were good for us. Reading the code leads to a different understanding.

#7 Updated by intrigeri 2019-01-26 18:34:26

  • Assignee changed from intrigeri to bertagaz
  • % Done changed from 10 to 50
  • QA Check set to Ready for QA

Fix deployed. Please review:

It would be nice to actually trigger an email delivery problem, somehow, and make sure we actually get notifications. Or take advantage of the fact that some of our Postfix currently have deferred email, set the threshold low enough, and profit :)

#8 Updated by intrigeri 2019-01-28 13:10:57

We just received “Subject: PROBLEM - jenkins.lizard - mailqueue@jenkins.lizard is CRITICAL” so it looks like it’s working :)

#9 Updated by anonym 2019-01-30 11:59:37

  • Target version changed from Tails_3.12 to Tails_3.13

#10 Updated by CyrilBrulebois 2019-03-20 14:34:11

  • Target version changed from Tails_3.13 to Tails_3.14

#11 Updated by CyrilBrulebois 2019-05-23 21:23:30

  • Target version changed from Tails_3.14 to Tails_3.15

#12 Updated by intrigeri 2019-06-02 14:42:57

  • Status changed from In Progress to Needs Validation

#13 Updated by intrigeri 2019-06-11 18:25:32

  • Target version deleted (Tails_3.15)

(Would be nice to have a review at some point but let’s stop pretending it’s urgent.)

#14 Updated by groente 2019-08-01 09:30:33

  • Assignee changed from bertagaz to Sysadmins

#15 Updated by groente 2019-08-17 09:39:46

  • Status changed from Needs Validation to Resolved
  • Assignee changed from Sysadmins to groente
  • % Done changed from 50 to 100

looks good!