Project

General

Profile

Bug #11858

Monitor if isobuilders systems are running fine

Added by bertagaz almost 2 years ago. Updated 18 days ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
Infrastructure
Target version:
Start date:
10/03/2016
Due date:
% Done:

50%

QA Check:
Dev Needed
Feature Branch:
puppet-tails:feature/11858-monitor-systemd
Type of work:
Sysadmin
Blueprint:
Starter:
Yes
Affected tool:

Description

We experienced times where our isobuilders were slowly getting all down when a branch was triggering the OOM during its build.

We should use our monitoring system to check using systemd and/or anything else if the isobuilders systems are running fine, so that we know if we have to restart them or their jenkins-slave service.


Related issues

Related to Tails - Bug #11632: ISO builds from branch that need more RAM can break all our Jenkins isobuilders without us being notified Resolved 08/11/2016
Related to Tails - Bug #12009: Jenkins ISO builders are highly unreliable Resolved 12/01/2016
Related to Tails - Bug #13582: Monitoring bridge Duplicate 08/04/2017
Blocks Tails - Feature #13242: Core work 2017Q4 → 2019Q2: Sysadmin (Maintain our already existing services) Confirmed 06/29/2017

History

#1 Updated by intrigeri almost 2 years ago

  • Assignee set to bertagaz

(Assuming that's what you meant given you've set a target version.)

#2 Updated by intrigeri almost 2 years ago

  • Related to Bug #11632: ISO builds from branch that need more RAM can break all our Jenkins isobuilders without us being notified added

#3 Updated by bertagaz almost 2 years ago

  • Target version changed from Tails_2.7 to Tails_2.9.1

#4 Updated by intrigeri almost 2 years ago

  • Related to Bug #12009: Jenkins ISO builders are highly unreliable added

#5 Updated by anonym almost 2 years ago

  • Target version changed from Tails_2.9.1 to Tails 2.10

#6 Updated by anonym over 1 year ago

  • Target version changed from Tails 2.10 to Tails_2.11

#7 Updated by bertagaz over 1 year ago

  • Target version changed from Tails_2.11 to Tails_2.12

#8 Updated by bertagaz over 1 year ago

  • Target version changed from Tails_2.12 to Tails_3.0

#9 Updated by bertagaz over 1 year ago

  • Target version changed from Tails_3.0 to Tails_3.1

#10 Updated by bertagaz over 1 year ago

  • Target version changed from Tails_3.1 to Tails_3.2

#11 Updated by intrigeri about 1 year ago

  • Blocks Feature #13233: Core work 2017Q3: Sysadmin (Maintain our already existing services) added

#12 Updated by groente about 1 year ago

  • Starter set to Yes

A simple check whether

systemctl --quiet is-failed \*

returns 0 (in which case something is wrong) should do the trick, both for the isobuilders and #13582

#13 Updated by groente about 1 year ago

#14 Updated by intrigeri about 1 year ago

systemctl is-system-running might do exactly what we want.

#15 Updated by bertagaz about 1 year ago

  • Target version changed from Tails_3.2 to Tails_3.3

#16 Updated by bertagaz about 1 year ago

  • Blocks deleted (Feature #13233: Core work 2017Q3: Sysadmin (Maintain our already existing services))

#17 Updated by bertagaz about 1 year ago

  • Blocks Feature #13242: Core work 2017Q4 → 2019Q2: Sysadmin (Maintain our already existing services) added

#18 Updated by bertagaz 11 months ago

pynagsystemd sounds like a good candidate. I'll give a try to this one.

#19 Updated by bertagaz 11 months ago

  • Status changed from Confirmed to In Progress
  • Assignee changed from bertagaz to intrigeri
  • % Done changed from 0 to 50
  • QA Check set to Ready for QA
  • Feature Branch set to puppet-tails:feature/11858-monitor-systemd

bertagaz wrote:

pynagsystemd sounds like a good candidate. I'll give a try to this one.

I've committed everything in the dedicated branch, merged it in master and deployed that. We now have a systemd check on all agents as we discussed in #13582. To test it, just find one check that will be run soon, and set one service as failing on the related host (e.g by misconfiguring and restarting it so that it does fail to start). Then you'll see an alert in icinga2 about this service failing.

#20 Updated by intrigeri 11 months ago

  • Assignee changed from intrigeri to groente

(As per "Shifts for 2018Q1 + intrigeri's involvement in the sysadmin team".)

#21 Updated by intrigeri 11 months ago

As reported by groente today, apparently this does not work for the jenkins-slave service, which is precisely the one that made us create this ticket in the first place.

#22 Updated by anonym 10 months ago

  • Target version changed from Tails_3.3 to Tails_3.5

#23 Updated by intrigeri 10 months ago

  • Assignee changed from groente to bertagaz
  • QA Check changed from Ready for QA to Dev Needed

intrigeri wrote:

As reported by groente today, apparently this does not work for the jenkins-slave service, which is precisely the one that made us create this ticket in the first place.

Reproduced again: isotester4 was offline in Jenkins for ~1.5 days but the jenkins-slave service was seen as successfully started by systemd. jenkins-slave.log said Error: Invalid or corrupt jarfile /var/run/jenkins/slave.jar. So I guess this ticket shall be blocked by a new one about making the jenkins-slave service able to report its state reliably.

#24 Updated by anonym 8 months ago

  • Target version changed from Tails_3.5 to Tails_3.6

#25 Updated by bertagaz 6 months ago

  • Target version changed from Tails_3.6 to Tails_3.7

#26 Updated by bertagaz 5 months ago

  • Target version changed from Tails_3.7 to Tails_3.8

#27 Updated by intrigeri 3 months ago

  • Target version changed from Tails_3.8 to Tails_3.9

#28 Updated by intrigeri 18 days ago

  • Target version changed from Tails_3.9 to Tails_3.10

Also available in: Atom PDF