Project

General

Profile

Feature #11598

Replace reboot-notifier cron email notification with an Icinga check

Added by intrigeri about 2 years ago. Updated 8 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Infrastructure
Target version:
Start date:
07/22/2016
Due date:
% Done:

50%

QA Check:
Pass
Feature Branch:
Type of work:
Sysadmin
Blueprint:
Starter:
Affected tool:

Description

I'd love to receive less email, and to use have our monitoring dashboard be the place to go to know what we have to do on our systems. Also, the Icingaweb2 dashboard always knows what's the current state of things, and has a concept of state transition, as opposed to a set of emails.


Related issues

Related to Tails - Bug #12455: Replace Puppet last run check cron email notifications with an Icinga check Resolved 04/18/2017

History

#1 Updated by intrigeri about 2 years ago

  • Assignee set to bertagaz
  • QA Check set to Info Needed

bertagaz, what do you think? Please reassign to me for implementation if you agree, I'll use it as a way to test the fancy new monitoring setup doc.

#2 Updated by bertagaz about 2 years ago

  • Assignee changed from bertagaz to intrigeri
  • QA Check changed from Info Needed to Dev Needed

intrigeri wrote:

bertagaz, what do you think? Please reassign to me for implementation if you agree, I'll use it as a way to test the fancy new monitoring setup doc.

Why not? That will probably not solve the emails problem, as Icinga2 will send some too, unless you intend to disable notifications for that. But I like the "one place to rule them all" idea. :) Hope the doc won't be too fuzzy.

#3 Updated by intrigeri about 2 years ago

That will probably not solve the emails problem, as Icinga2 will send some too, unless you intend to disable notifications for that.

Icinga2 sends email only on state change, which I find much more manageable :)

#4 Updated by intrigeri about 2 years ago

  • Type of work changed from Discuss to Sysadmin

#5 Updated by intrigeri almost 2 years ago

  • QA Check deleted (Dev Needed)

If there's no existing check for "does a given file exist", it's trivial to write our own: our custom check_number_in_file is basically a more complex version of that.

#6 Updated by intrigeri over 1 year ago

  • Description updated (diff)

#7 Updated by intrigeri over 1 year ago

  • Related to Bug #12455: Replace Puppet last run check cron email notifications with an Icinga check added

#8 Updated by intrigeri 8 months ago

  • Status changed from Confirmed to In Progress
  • % Done changed from 0 to 10

Here's a plan:

  1. use the check_file_age plugin with its ignore-missing option, pointing it to the flag file created by reboot-notifier (/run/reboot-required), and tweaking the --warning-age and --critical-age settings passed to the check_file_age plugin (to start with I'll make it so that reboot needed = warning as soon as detected, and reboot needed becomes critical after 48 hours)
  2. test that this new check works as intended
  3. disable email notification from reboot-notifier: set NOTIFICATION_EMAIL= in /etc/default/reboot-notifier
  4. if we still receive too much email from Icinga about this (I think I was wrong when I wrote "Icinga2 sends email only on state change" above), tweak the notification settings for this check

#9 Updated by intrigeri 8 months ago

  • % Done changed from 10 to 20

First step done.

#10 Updated by intrigeri 8 months ago

Created /run/reboot-required on bridge.lizard in order to test that the check works. I'll wait 48h to make sure it switches to critical in due time.

#11 Updated by intrigeri 8 months ago

  • % Done changed from 20 to 30

intrigeri wrote:

disable email notification from reboot-notifier: set NOTIFICATION_EMAIL= in /etc/default/reboot-notifier

Done: https://git-tails.immerda.ch/puppet-tails/commit/?id=bc3c728a6fb32f818a4aca9234dfbb70b81ba46e

if we still receive too much email from Icinga about this (I think I was wrong when I wrote "Icinga2 sends email only on state change" above), tweak the notification settings for this check

By default, we'll receive 1 email/day from Icinga2 for each service that needs rebooting (just like with reboot-notifier) but that'll start only 48h after the reboot need is identified, so the sysadmin on duty now has a good chance to reboot systems, or to acknowledge the problem if they have a good reason to postpone the reboots, before Icinga2 starts spamming the whole team. If we want to change the notification rate we need to add a new notification type to templates/monitoring/notifications.conf.erb and conditionals about a custom vars.$something we could set in services.

#12 Updated by intrigeri 8 months ago

  • Subject changed from Consider replacing reboot-notifier with an Icinga check to Replace reboot-notifier cron email notification with an Icinga check

#13 Updated by intrigeri 8 months ago

  • Target version set to Tails_3.5

#14 Updated by intrigeri 8 months ago

  • Assignee changed from intrigeri to groente
  • % Done changed from 30 to 50
  • QA Check set to Ready for QA

intrigeri wrote:

Created /run/reboot-required on bridge.lizard in order to test that the check works. I'll wait 48h to make sure it switches to critical in due time.

It did switch to critical after 48h, we got a notification about it, and then one of us (who might not have followed the discussion here and thus was perhaps not aware it was part of an experiment) rebooted that VM, so the check switched back to normal, which is expected.

So I think we're done here. If we ever want to fine-tune the notification rate, see #11598#note-11.

#15 Updated by groente 8 months ago

  • Status changed from In Progress to Resolved
  • QA Check changed from Ready for QA to Pass

Also available in: Atom PDF