Feature #11355

Feature #9614: Automated builds: complete phase two

Re-enable Jenkins notifications on ISO build/test failure

Added by intrigeri over 1 year ago. Updated 21 days ago.

Status:In ProgressStart date:04/15/2016
Priority:ElevatedDue date:
Assignee:bertagaz% Done:

60%

Category:Continuous Integration
Target version:Tails_3.2
QA Check: Blueprint:
Feature Branch:puppet-tails:feature/11355-reenable-jenkins-notifications;jenkins-jobs:feature/11355-reenable-jenkins-notifications Easy:
Type of work:Code Affected tool:

Description

Re-enable them one by one, carefully evaluating the outcome at each step, to make sure we continuously deliver a good enough UX to contributors.


Related issues

Related to Tails - Feature #9719: Configure Jenkins notifications to Redmine In Progress 07/10/2015
Related to Tails - Bug #10288: Fix newly identified issues to make our test suite more robust and faster In Progress 02/26/2015
Related to Tails - Bug #12647: Deal with June 2107 false positive CI notifications Resolved 06/07/2017
Related to Tails - Bug #12648: Deal with July 2017 false positive CI notifications Confirmed 06/07/2017
Related to Tails - Bug #12649: Deal with August 2017 false positive CI notifications Confirmed 06/07/2017
Related to Tails - Bug #12650: Deal with September 2017 false positive CI notifications Confirmed 06/07/2017
Blocked by Tails - Bug #11354: Disable Jenkins notifications on ISO build/test failure Resolved 04/15/2016

History

#1 Updated by intrigeri over 1 year ago

Next step is probably to clearly identify the blockers.

#2 Updated by intrigeri over 1 year ago

  • Blocked by Bug #11354: Disable Jenkins notifications on ISO build/test failure added

#3 Updated by intrigeri over 1 year ago

A first change we decided to make is to only send Jenkins ISO build/test failure notifications for Ready for QA tickets. This is a pre-requisite to re-enabling such notifications (except on base branches, of course). Note that #9719 has technical info that may be relevant for the implementation.

#4 Updated by intrigeri over 1 year ago

  • Blocked by deleted (Bug #11354: Disable Jenkins notifications on ISO build/test failure)

#5 Updated by intrigeri over 1 year ago

  • Parent task set to #9614

#6 Updated by intrigeri over 1 year ago

  • Blocked by Bug #11354: Disable Jenkins notifications on ISO build/test failure added

#7 Updated by intrigeri over 1 year ago

  • Related to Feature #9719: Configure Jenkins notifications to Redmine added

#8 Updated by bertagaz about 1 year ago

  • Status changed from Confirmed to In Progress

As a first step, I'll do some stats on the last months build run to see if we can at least re-enable notifications for them.

#9 Updated by intrigeri about 1 year ago

What's the status / updated timeline on this front?

#10 Updated by bertagaz 12 months ago

  • Target version changed from Tails_2.5 to Tails_2.6

#11 Updated by intrigeri 11 months ago

  • Related to Bug #10288: Fix newly identified issues to make our test suite more robust and faster added

#12 Updated by anonym 10 months ago

  • Target version changed from Tails_2.6 to Tails_2.7

#13 Updated by bertagaz 10 months ago

  • Target version changed from Tails_2.7 to Tails_2.9.1

#14 Updated by anonym 7 months ago

  • Target version changed from Tails_2.9.1 to Tails 2.10

#15 Updated by intrigeri 7 months ago

  • Target version changed from Tails 2.10 to Tails_2.11

#16 Updated by bertagaz 5 months ago

  • Target version changed from Tails_2.11 to Tails_2.12

#17 Updated by bertagaz 5 months ago

  • Priority changed from Normal to Elevated

#18 Updated by intrigeri 4 months ago

The current plan for the 1st iteration is #11355#note-3, and having a global kill switch for this feature so that any sysadmin can disable it. The kill switch must live outside of the code (e.g. somewhere in /etc) so we don't keep modifying a script merely to configure it (we've seen how reverts-of-reverts-of-reverts have lead us to problematic situations in the past).

#19 Updated by intrigeri 3 months ago

  • Target version changed from Tails_2.12 to Tails_3.0~rc1

Please ensure the new target version is more realistic than 2.12, after having looked at your 3.0~rc1 and 3.0 tickets. Also, IIRC you mentioned you already have had some WIP locally for a while, that you might want to push to a branch somewhere, so it's not stored on your hard drive only, and others can help move this forward :)

#20 Updated by intrigeri 2 months ago

  • Target version changed from Tails_3.0~rc1 to Tails_3.0

intrigeri wrote:

Please ensure the new target version is more realistic than 2.12, after having looked at your 3.0~rc1 and 3.0 tickets. Also, IIRC you mentioned you already have had some WIP locally for a while, that you might want to push to a branch somewhere, so it's not stored on your hard drive only, and others can help move this forward :)

Ping :)

#21 Updated by bertagaz 2 months ago

  • % Done changed from 0 to 20
  • Feature Branch set to puppet-tails:feature/11355-reenable-jenkins-notifications;jenkins-jobs:feature/11355-reenable-jenkins-notifications

intrigeri wrote:

intrigeri wrote:

Please ensure the new target version is more realistic than 2.12, after having looked at your 3.0~rc1 and 3.0 tickets. Also, IIRC you mentioned you already have had some WIP locally for a while, that you might want to push to a branch somewhere, so it's not stored on your hard drive only, and others can help move this forward :)

Ping :)

Right, I've just reworked a bit what I had and pushed it. It's for now completely untested but close to what it will be. If you want to have a first look at the design and code, you're welcome.

#22 Updated by intrigeri 2 months ago

Right, I've just reworked a bit what I had and pushed it. It's for now completely untested but close to what it will be. If you want to have a first look at the design and code, you're welcome.

Thanks, I'll try! :)

#23 Updated by intrigeri 2 months ago

Very quick initial look:

  • I like the design choices, so I'll only comment about the implementation here. Congrats!
  • /usr/local/bin/setup_notifications should have a less generic name
  • I'm not happy with the name of command line arguments to setup_notifications; I can certainly improve them myself later on during this process, but I would appreciate if you first sat down, looked at the use cases, thought about how confusing they can be (e.g. disable_build does not disable any "build") and improved them a bit; hint: the help string for each argument are pretty good and provide good inspiration for the corresponding option name :)
  • /etc/redmine/apikey should live in /etc/jenkins IMO: it is surely about Redmine, but what it configures is Jenkins stuff, not Redmine
  • I'd vastly prefer if setup_notifications output'ed its results (as a pure function!) instead of reporting them as a side effect by writing to $WORKSPACE/tmp/recipient
  • I'm confused when looking at the diff of files/jenkins/slaves/isobuilders/collect_build_environment: did you mean NOTIFY_BUILD_TO?
  • Typo in recipent
  • Please use named arguments for the setup_notifications function instead of positional ones (if Python lets you do that, I don't remember).
  • ticket_state variable name is confusing: it's used for something that is not Redmine ticket "Status".
  • Please extract the checking for RfQA in a predicate function.

#24 Updated by bertagaz 2 months ago

  • Assignee changed from bertagaz to intrigeri
  • % Done changed from 20 to 50
  • QA Check set to Ready for QA

intrigeri wrote:

Very quick initial look:

  • I like the design choices, so I'll only comment about the implementation here. Congrats!

Thanks! Most difficult was to remember how it was done in the several related scripts and repos. :)

  • /usr/local/bin/setup_notifications should have a less generic name

Renamed in 4411e80e7cc40726e708a09af74a6d4427eaab06

  • I'm not happy with the name of command line arguments to setup_notifications; I can certainly improve them myself later on during this process, but I would appreciate if you first sat down, looked at the use cases, thought about how confusing they can be (e.g. disable_build does not disable any "build") and improved them a bit; hint: the help string for each argument are pretty good and provide good inspiration for the corresponding option name :)

I wasn't too happy about that too. Tentatively done in 9859ab0093a0e4906fcd452e8fb6225a073811d8.

  • /etc/redmine/apikey should live in /etc/jenkins IMO: it is surely about Redmine, but what it configures is Jenkins stuff, not Redmine

Makes sense. Done in bd7efbb5e9e6636607514807d18ea68b2b9a4ce3

  • I'd vastly prefer if setup_notifications output'ed its results (as a pure function!) instead of reporting them as a side effect by writing to $WORKSPACE/tmp/recipient

Done in 6cb019f677cc7a0924c4046b80be888eaf083a7f

  • I'm confused when looking at the diff of files/jenkins/slaves/isobuilders/collect_build_environment: did you mean NOTIFY_BUILD_TO?

Nop, this script collects the environment variable used during the build to pass and use them in the test job environment. So we don't care about the build notification recipient here, we just need to pass the test one.

  • Typo in recipent

Corrected that yesterday already,in 7e15570ca653952f17ec17ec37912f025aa5dc3c

  • Please use named arguments for the setup_notifications function instead of positional ones (if Python lets you do that, I don't remember).

They were already named ones, but in Python they can be used as positional ones too. I've still changed that accordingly in 9731bebd335e7010ce3828f81da25f3e746ed9e8

  • ticket_state variable name is confusing: it's used for something that is not Redmine ticket "Status".
  • Please extract the checking for RfQA in a predicate function.

Done both in fee1b2c675a0934036130ff42d7f8f1070e70e34.

From local tests I made it works quite well, so I think it can be reviewed now.

#25 Updated by intrigeri 2 months ago

  • Assignee changed from intrigeri to bertagaz
  • QA Check changed from Ready for QA to Dev Needed

Note: I've pushed some (untested) improvements.

Thanks! Most difficult was to remember how it was done in the several related scripts and repos. :)

Design doc, anyone? (Sorry if I'm beating a dead horse.)

  • I'm not happy with the name of command line arguments to setup_notifications; I can certainly improve them myself later on during this process, but I would appreciate if you first sat down, looked at the use cases, thought about how confusing they can be (e.g. disable_build does not disable any "build") and improved them a bit; hint: the help string for each argument are pretty good and provide good inspiration for the corresponding option name :)

I wasn't too happy about that too. Tentatively done in 9859ab0093a0e4906fcd452e8fb6225a073811d8.

OK, great! Improved a tiny bit on top.

  • /etc/redmine/apikey should live in /etc/jenkins IMO: it is surely about Redmine, but what it configures is Jenkins stuff, not Redmine

Makes sense. Done in bd7efbb5e9e6636607514807d18ea68b2b9a4ce3

Cool. Then please adjust puppet:///modules/tails_secrets_jenkins/jenkins/slaves/isobuilders/redmine/redmine_apikey and require => File['/etc/redmine/apikey'] accordingly (that Puppet code can't possibly work; please test stuff before sending for QA!).

  • I'd vastly prefer if setup_notifications output'ed its results (as a pure function!) instead of reporting them as a side effect by writing to $WORKSPACE/tmp/recipient

Done in 6cb019f677cc7a0924c4046b80be888eaf083a7f

Good. So the name of the setup_notifications function doesn't reflect what it does anymore. Same for the script: it doesn't setup anything AFAICT.

  • I'm confused when looking at the diff of files/jenkins/slaves/isobuilders/collect_build_environment: did you mean NOTIFY_BUILD_TO?

Nop, this script collects the environment variable used during the build to pass and use them in the test job environment. So we don't care about the build notification recipient here, we just need to pass the test one.

OK. Please add a comment then :)

From local tests I made it works quite well, so I think it can be reviewed now.

Cool! How did you test this? In a clone of our Jenkins setup, or anything else? There was a syntax error in the Puppet code so I'm wondering how it can have been tested.

Some more questions/comments:

What's the requested kill switch? Is it the --disable* args one can pass in macros/builders.yaml?

What happens if TAILS_TICKET 0 and args.rfqa is true? To me it looks like print(notification_vars) will raise an exception. Same question when args.rfqa is false and TAILS_TICKET 0. And once you've added code to handle the missing cases, IMO the nested if/else thing will have grown out of control and will want a refactoring.

Why does the jenkins user need write access to /etc/jenkins?

#26 Updated by bertagaz 2 months ago

  • Assignee changed from bertagaz to intrigeri
  • % Done changed from 50 to 60
  • QA Check changed from Dev Needed to Ready for QA

intrigeri wrote:

Note: I've pushed some (untested) improvements.

Thanks! doesn't seem to break anything.

Thanks! Most difficult was to remember how it was done in the several related scripts and repos. :)

Design doc, anyone? (Sorry if I'm beating a dead horse.)

Doesn't hurt. I'll do that while doing the documentation for #12409.

OK, great! Improved a tiny bit on top.

Good one!

Cool. Then please adjust puppet:///modules/tails_secrets_jenkins/jenkins/slaves/isobuilders/redmine/redmine_apikey and require => File['/etc/redmine/apikey'] accordingly (that Puppet code can't possibly work; please test stuff before sending for QA!).

Oops, forgot to push the update of the tails_secrets_jenkins repo. Done. Fix the missing one in cbb9224aa082e912358af7557977d73e11dc9f77

Good. So the name of the setup_notifications function doesn't reflect what it does anymore. Same for the script: it doesn't setup anything AFAICT.

Fixed in 6cc5811cb4028d08ebc66ab179d3aa533005f7a1 and b6e15293a33e2df8072544da68745602e0d38431

Nop, this script collects the environment variable used during the build to pass and use them in the test job environment. So we don't care about the build notification recipient here, we just need to pass the test one.

OK. Please add a comment then :)

458c8ef5272719eb3c7c5b6f15589ed42916eb0e

From local tests I made it works quite well, so I think it can be reviewed now.

Cool! How did you test this? In a clone of our Jenkins setup, or anything else? There was a syntax error in the Puppet code so I'm wondering how it can have been tested.

I did test the script with a manual test suite. On the puppet side, considering it was just a matter of installing two files, I did not bother testing it and am willing to fix any problem that may arise at the deployment step.

What's the requested kill switch? Is it the --disable* args one can pass in macros/builders.yaml?

Yes. I'll document that in the design doc.

What happens if TAILS_TICKET 0 and args.rfqa is true? To me it looks like print(notification_vars) will raise an exception. Same question when args.rfqa is false and TAILS_TICKET 0. And once you've added code to handle the missing cases, IMO the nested if/else thing will have grown out of control and will want a refactoring.

Ooch, my manual test suite did not include testing when ticket was '0'. Indeed it would not work. Fixed in 52f0191eb42ba50c69084bce6f3f8b4d2e101f20

Why does the jenkins user need write access to /etc/jenkins?

Because I'm too permissive? Fixed in 11e204a024367155ecaa41e677829dfca354947b

#27 Updated by bertagaz 2 months ago

bertagaz wrote:

intrigeri wrote:

Design doc, anyone? (Sorry if I'm beating a dead horse.)

Doesn't hurt. I'll do that while doing the documentation for #12409.

I just had a look and it seems it's already documented in the blueprint. I just did not think about looking at it, and dig into the code.

#28 Updated by intrigeri 2 months ago

  • Assignee changed from intrigeri to bertagaz
  • QA Check changed from Ready for QA to Dev Needed

The code now looks OK to me, so please go ahead with the deployment whenever you have time to deal with any issues in the following N days. Please notify our fellow contributors before deploying, making it clear that we've learned from last year's experience, and this time we are ready to immediately disable the whole thing as soon as they notice they are ignoring such notifications. Yeepee :)

Then be careful, don't close this ticket: what's been done so far is "only" the first iteration and this ticket is about something bigger. I'll let you decide if using subtasks for the next steps would help you organize your work.

#29 Updated by intrigeri 2 months ago

I just had a look and it seems it's already documented in the blueprint. I just did not think about looking at it, and dig into the code.

The current state of our blueprints is very diverse: some reflect initial ideas that predated the actual (slightly different) implementation, some are almost up-to-date design doc, etc. So one cannot trust a blueprint to accurately describe what we're running. Ergo, a blueprint is not the right place to document something that works. Thankfully we now have a space (under contribute/working_together/roles/sysadmins) to store our design doc, so please consider moving there whatever blueprints content that is current and relevant.

#30 Updated by intrigeri 2 months ago

  • QA Check changed from Dev Needed to Info Needed

intrigeri wrote:

The code now looks OK to me, so please go ahead with the deployment whenever […]

Oops, I forgot to ask: what's happening to the notifications for the base branches, with the proposed changes? IMO they should either be kept as-is (i.e. sent to the RMs) for this first iteration, or be sent to the RM + last committer(s?).

#31 Updated by bertagaz 2 months ago

  • QA Check changed from Info Needed to Dev Needed

intrigeri wrote:

intrigeri wrote:

The code now looks OK to me, so please go ahead with the deployment whenever […]

Oops, I forgot to ask: what's happening to the notifications for the base branches, with the proposed changes? IMO they should either be kept as-is (i.e. sent to the RMs) for this first iteration, or be sent to the RM + last committer(s?).

The proposed change does not modify the current behavior: the script that generates the jobs sets the build and test recipient to tails-rm@ for base branches (as before), while for other branches it sets it to the NOTIFY_*_TO variables, which is then set by the new script at the beginning of the build to the last commit author.

#32 Updated by intrigeri 2 months ago

The proposed change does not modify the current behavior: […]

Thanks! Sounds perfect to me for this 1st iteration :)

#33 Updated by bertagaz about 2 months ago

  • Related to Bug #12647: Deal with June 2107 false positive CI notifications added

#34 Updated by bertagaz about 2 months ago

  • Related to Bug #12648: Deal with July 2017 false positive CI notifications added

#35 Updated by bertagaz about 2 months ago

  • Related to Bug #12649: Deal with August 2017 false positive CI notifications added

#36 Updated by bertagaz about 2 months ago

  • Related to Bug #12650: Deal with September 2017 false positive CI notifications added

#37 Updated by bertagaz about 2 months ago

  • Assignee changed from bertagaz to intrigeri
  • QA Check changed from Dev Needed to Ready for QA

intrigeri wrote:

The proposed change does not modify the current behavior: […]

Thanks! Sounds perfect to me for this 1st iteration :)

Merged the branches in their respective repos and had to fix some minor bugs. It now works as expected, see https://jenkins.tails.boum.org/view/All/job/build_Tails_ISO_doc-12000-graphical-as-root/24/console for non-RfQA branch, https://jenkins.tails.boum.org/job/build_Tails_ISO_feature-12635-tor-browser-7.0/10/console for a RfQA branch, and also have a look at the base branches.

I've also added tickets to track the number of false positives regulary, and sent an email on tails-dev@ to warn people that the notifications are back.

#38 Updated by intrigeri about 2 months ago

  • Assignee changed from intrigeri to bertagaz
  • Target version changed from Tails_3.0 to Tails_3.1
  • QA Check changed from Ready for QA to Dev Needed

I had already reviewed the code, and I've verified it seems to work fine at first glance. Some remaining work is listed on #11355#note-28 and #11355#note-29 (I suspect subtasks would be better suited to track them though). Next step: ensure there's a plan to finish this work; the end goal is not clearly defined beyond this first iteration, and I don't think we should try to define it better now. Let's wait and see how this first iteration works first.

#39 Updated by intrigeri about 2 months ago

In collect_build_environment, [ $NOTIFY_TEST_TO ] || NOTIFY_TEST_TO="" doesn't do what you think: we're running under set -u.

#40 Updated by intrigeri about 1 month ago

Here's my first report as a developer: in the test suite failure notifications, I can see the name of the scenarios that failed, which is useful info; but apart of that I feel I'm receiving both too much, and too little info, to be in a position to efficiently (and realistically) do something out of such email:

  • To see the list of failing scenarios, I have to scroll past the last few dozens lines of output, that are most often useless since the failure happened way earlier; this is not very practical. I suggest not emailing this useless, partial output at all.
  • In most cases, the only way I have to triage common false positives (e.g. time sync'ing failures) is to open the test result web page, as the info I need to do so is not in the email I receive. I'm too lazy to do that, e.g. in the last 5 days I've received 2 such email/day on average; I suspect other developers won't do that either, and they'll essentially start ignoring these email again. if I was sent the output of the failing tests, then I'm pretty sure I would actually do this triaging.

Other than that, notifications have been re-enabled 2 weeks ago, and I think it's time asking contributors what they actually do with them, in order to identify usability issues before they start getting bad habits that'll be hard to fix later: at least I have already started training myself to vaguely skim through such email without diving into the actual possible problems they're supposed to make me aware of.

Thanks in advance!

#41 Updated by intrigeri 29 days ago

I've disabled email notifications for tests.

#42 Updated by intrigeri 27 days ago

  • Priority changed from Elevated to Normal

#43 Updated by bertagaz 25 days ago

  • Priority changed from Normal to Elevated

intrigeri wrote:

Here's my first report as a developer:

Thanks!

in the test suite failure notifications, I can see the name of the scenarios that failed, which is useful info; but apart of that I feel I'm receiving both too much, and too little info, to be in a position to efficiently (and realistically) do something out of such email:

Ok

  • To see the list of failing scenarios, I have to scroll past the last few dozens lines of output, that are most often useless since the failure happened way earlier; this is not very practical. I suggest not emailing this useless, partial output at all.
  • In most cases, the only way I have to triage common false positives (e.g. time sync'ing failures) is to open the test result web page, as the info I need to do so is not in the email I receive. I'm too lazy to do that, e.g. in the last 5 days I've received 2 such email/day on average; I suspect other developers won't do that either, and they'll essentially start ignoring these email again. if I was sent the output of the failing tests, then I'm pretty sure I would actually do this triaging.

Sadly, the content of the emails sent does not seem easy to customize. If we want to adjust such things, we'll have to switch to the email-ext plugin.

Other than that, notifications have been re-enabled 2 weeks ago, and I think it's time asking contributors what they actually do with them, in order to identify usability issues before they start getting bad habits that'll be hard to fix later: at least I have already started training myself to vaguely skim through such email without diving into the actual possible problems they're supposed to make me aware of.

Before doing so, I made some stats to know who has received notifications in June. It appears apart from tails-rm, you're the only one that had (--rfqa-only!), for the #12686, #12685 and #12684 branch. So I'm not sure such a broad email to the developers is needed. I'll get anonym's feedback though.

#44 Updated by intrigeri 23 days ago

Sadly, the content of the emails sent does not seem easy to customize. If we want to adjust such things, we'll have to switch to the email-ext plugin.

OK. It's good to know that we have an option to make such email useful enough to be worth sending :)

(In passing: interestingly, we already install that plugin, but don't seem to use it yet. It seems you planned at some point (2 years ago) to have Jenkins interact with Redmine over email using that plugin, instead of using the Redmine HTTP API.)

Before doing so, I made some stats to know who has received notifications in June. It appears apart from tails-rm, you're the only one that had (--rfqa-only!), for the #12686, #12685 and #12684 branch.

OK, good to hear :) I'm not surprised as very few of the people who have commit access to tails.git have submitted branches lately (either they're simply not doing much development at all anymore, or what they do happens elsewhere than in tails.git).

So I'm not sure such a broad email to the developers is needed.

Well, indeed it would be useless noise. Skip it.

I'll get anonym's feedback though.

This is needed indeed.

#45 Updated by bertagaz 22 days ago

  • Assignee changed from bertagaz to anonym
  • QA Check changed from Dev Needed to Info Needed

intrigeri wrote:

I'll get anonym's feedback though.

This is needed indeed.

So anonym, what feedbacks do you have regarding Jenkins notifications? Do you read them? Are they useful?

#46 Updated by anonym 21 days ago

  • Assignee changed from anonym to bertagaz
  • QA Check deleted (Info Needed)

bertagaz wrote:

intrigeri wrote:

I'll get anonym's feedback though.

This is needed indeed.

So anonym, what feedbacks do you have regarding Jenkins notifications? Do you read them? Are they useful?

I'm sorry, but I don't think my answer is relevant -- I was AFK half of this time, and the rest was super stressful for me for various reasons, so I only read the emails I absolutely had to, i.e. I still ignore my folder where the Jenkins notifications are sorted into.

I think I need a "normal" month to evaluate this. I expect September to be such a month. Sorry if this doesn't help you for what you need now. :/

#47 Updated by bertagaz 21 days ago

anonym wrote:

I'm sorry, but I don't think my answer is relevant -- I was AFK half of this time, and the rest was super stressful for me for various reasons, so I only read the emails I absolutely had to, i.e. I still ignore my folder where the Jenkins notifications are sorted into.

I think I need a "normal" month to evaluate this. I expect September to be such a month. Sorry if this doesn't help you for what you need now. :/

Ok, thanks a lot to reply! That's fine, we'll have other evaluation rounds, so we can wait a bit more your feedbacks. Take a good rest from the stress meanwhile!

#48 Updated by intrigeri 21 days ago

  • Target version changed from Tails_3.1 to Tails_3.2

(post-September)

Also available in: Atom PDF