Project

General

Profile

Bug #11562

Monitor servers from the htpdate pools

Added by bertagaz about 2 years ago. Updated about 2 months ago.

Status:
Confirmed
Priority:
Normal
Assignee:
Category:
Time synchronization
Target version:
Start date:
07/14/2016
Due date:
% Done:

0%

QA Check:
Feature Branch:
Type of work:
Sysadmin
Blueprint:
Starter:
Affected tool:

Description

While tackling #10494, it came up that some of the HTTP servers of the htpdate pools were buggy. This has some incidence for Tails to boot correctly, and our test suite to run nicely. We should monitor if this servers are up and answering correctly to the CURL requests made by htpdate to ensure this service is reliable.


Related issues

Related to Tails - Bug #13472: Replace www.centos.org in htpdate pools Resolved 07/15/2017
Related to Tails - Bug #10494: Retry htpdate when it fails In Progress 07/17/2016
Blocks Tails - Bug #10495: The 'the time has synced' step is fragile In Progress 11/06/2015
Blocks Tails - Feature #13242: Core work 2017Q4 → 2018Q3: Sysadmin (Maintain our already existing services) Confirmed 06/29/2017

History

#1 Updated by intrigeri about 2 years ago

Excellent idea!

The consequences of a failing check will likely need to be different from what we do for our own services: we can't fix the web servers that are in the HTP pools, all we can do is to drop them from the pool in next Tails release. So, what matters here is aggregated availability stats, rather than real-time up/down status info.

Email notifications would be useless noise, and as a sysadmin I'd rather not see info about such failures on our dashboard's "Current Incidents" page, if possible: sysadmins' duty does not include maintaining the HTP pools we use, and I don't want to train myself to ignore incidents.

But the RM (or the Foundations team?) needs to regularly check, e.g. at the beginning of each release cycle, if some servers in the pool are too unreliable, so that they can be replaced. How can they be given access to the aggregated availability stats they need to do this job? The easiest their task, the greatest the chances that it'll actually be done regularly.

#2 Updated by anonym almost 2 years ago

  • Target version changed from Tails_2.6 to Tails_2.7

#3 Updated by bertagaz almost 2 years ago

  • Target version changed from Tails_2.7 to Tails_2.9.1

#4 Updated by anonym over 1 year ago

  • Target version changed from Tails_2.9.1 to Tails 2.10

#5 Updated by intrigeri over 1 year ago

  • Target version changed from Tails 2.10 to Tails_2.11

#6 Updated by bertagaz over 1 year ago

  • Target version changed from Tails_2.11 to Tails_2.12

#7 Updated by bertagaz over 1 year ago

  • Target version changed from Tails_2.12 to Tails_3.0

#8 Updated by intrigeri over 1 year ago

  • Type of work changed from Code to Sysadmin

#9 Updated by bertagaz about 1 year ago

  • Target version changed from Tails_3.0 to Tails_3.1

#10 Updated by bertagaz about 1 year ago

  • Target version changed from Tails_3.1 to Tails_3.2

#11 Updated by intrigeri about 1 year ago

  • Blocks Feature #13233: Core work 2017Q3: Sysadmin (Maintain our already existing services) added

#12 Updated by intrigeri about 1 year ago

  • Blocks Bug #10495: The 'the time has synced' step is fragile added

#13 Updated by bertagaz about 1 year ago

  • Related to Bug #13472: Replace www.centos.org in htpdate pools added

#14 Updated by bertagaz 11 months ago

  • Target version changed from Tails_3.2 to Tails_3.3

#15 Updated by bertagaz 11 months ago

  • Blocks deleted (Feature #13233: Core work 2017Q3: Sysadmin (Maintain our already existing services))

#16 Updated by bertagaz 11 months ago

  • Blocks Feature #13242: Core work 2017Q4 → 2018Q3: Sysadmin (Maintain our already existing services) added

#17 Updated by bertagaz 11 months ago

  • Target version changed from Tails_3.3 to Tails_3.5

#18 Updated by bertagaz 10 months ago

One idea about this: with #13541 and the feature/13541-save-more-data-on-htpdate-or-tor-failures branch merge, we're now collecting htpdate logs each time there's sudch a failure of that kind in our isotesters. We could gather this files and use them as a source to output statistics about servers failures. That'd give an overview closer to server failure in almost real Tails context, rather than using basic URL fetching or coding some htpdate behavior simulation (depending how we want to test this servers).

intrigeri wrote:

But the RM (or the Foundations team?) needs to regularly check, e.g. at the beginning of each release cycle, if some servers in the pool are too unreliable, so that they can be replaced. How can they be given access to the aggregated availability stats they need to do this job? The easiest their task, the greatest the chances that it'll actually be done regularly.

Then maybe there are different options:

  • It could be accessible through a web page. Could be hosted on www.lizard. That could even be the starter of some status.t.b.o page, where to output such informations + where to also publicly output Jenkins builds statuses. Or maybe joined with other type of stats on a metrics.t.b.o page?
  • Given the people we're talking about, and the impact it has on our test suite in Jenkins, maybe the tails-ci list is a good recipient. We could send email notifications there.

#19 Updated by bertagaz 9 months ago

  • Target version changed from Tails_3.5 to Tails_3.6

#20 Updated by u 7 months ago

  • Related to Bug #10494: Retry htpdate when it fails added

#21 Updated by bertagaz 5 months ago

  • Target version changed from Tails_3.6 to Tails_3.7

#22 Updated by intrigeri 5 months ago

FWIW I was told that some servers in our pool don't send a Date header anymore, which could explain issues we've seen. I've not verified it myself but to identify such issues, here also: "what matters here is aggregated availability stats, rather than real-time up/down status info".

#23 Updated by bertagaz 3 months ago

  • Target version changed from Tails_3.7 to Tails_3.8

#24 Updated by intrigeri about 2 months ago

  • Target version changed from Tails_3.8 to Tails_3.9

Also available in: Atom PDF