Feature #10296

Reevaluate how the email notifications for failed automatic tests scale

Added by bertagaz 2015-09-28 03:56:15 . Updated 2015-12-18 08:40:03 .

Status:
Resolved
Priority:
High
Assignee:
Category:
Continuous Integration
Target version:
Start date:
2015-09-28
Due date:
% Done:

100%

Feature Branch:
Type of work:
Research
Blueprint:

Starter:
0
Affected tool:
Deliverable for:
267

Description

During first iteration of the automated test suite deployment, we’ll have a first week phase were the email notifications of will only be sent to some individuals, to evaluate if we can put this notifications in the wild and see how it will impact our contributors mailbox (in short, its effectiveness). This is set up by ticket Feature #10287.

We’ll periodically reevaluate this to decide if we’re ok to email contributors.

A first one will happen on Oct. 22, another on Oct. 28. It needs a bit of preparation by having the email statistics sorted out to get an idea of the situation.


Files


Subtasks


Related issues

Blocked by Tails - Feature #10287: Set up limited email notification on automatic test failure for the initial deployment Resolved 2015-09-27
Blocks Tails - Feature #10382: Implement the specified notification system for test suite failures on Jenkins Resolved 2015-10-16

History

#1 Updated by bertagaz 2015-09-28 03:57:42

Assigning to anonym, but the reevaluation will be made collectively.

#2 Updated by bertagaz 2015-09-28 03:59:22

  • blocked by Feature #10287: Set up limited email notification on automatic test failure for the initial deployment added

#3 Updated by intrigeri 2015-10-03 15:05:24

  • Subject changed from Reevaluate how the email notifications scale to Reevaluate how the email notifications for failed automatic tests scale

#4 Updated by intrigeri 2015-10-03 15:05:51

  • blocks #8668 added

#5 Updated by bertagaz 2015-10-14 03:33:43

I’m not sure this ticket should be a child of Feature #5288. This last ticket will probably be closed on time, and this very ticket is meant to track how the notification behaves ones Feature #5288 is deployed.

#6 Updated by intrigeri 2015-10-14 05:36:17

> I’m not sure this ticket should be a child of Feature #5288. This last ticket will probably be closed on time, and this very ticket is meant to track how the notification behaves ones Feature #5288 is deployed.

Let me clarify what these tickets encode: Feature #5288 is not about the stage 1 of the deployment, but about the completion of the deliverable, which includes implementing the autotest specs, which implies it can’t be closed until Feature #10296 and friends are done.

But I think I see what you mean. It seems that you’re lacking a ticket for stage 1 of the deployment, that you can close once you have an initial deployment of “run the test suite automatically on ISOs built on Jenkins” live, that is real soon now. I suggest you resurrect Feature #6565 for this matter. This will also help on reporting etc. :)

#7 Updated by bertagaz 2015-10-15 00:26:34

intrigeri wrote:
> > I’m not sure this ticket should be a child of Feature #5288. This last ticket will probably be closed on time, and this very ticket is meant to track how the notification behaves ones Feature #5288 is deployed.
>
> Let me clarify what these tickets encode: Feature #5288 is not about the stage 1 of the deployment, but about the completion of the deliverable, which includes implementing the autotest specs, which implies it can’t be closed until Feature #10296 and friends are done.

Ah ok, I thought we could close Feature #5288 in time, given the issue is about test suite robustness, and not the deployment of it in our infra, but it seems we’ll be late then.

> But I think I see what you mean. It seems that you’re lacking a ticket for stage 1 of the deployment, that you can close once you have an initial deployment of “run the test suite automatically on ISOs built on Jenkins” live, that is real soon now. I suggest you resurrect Feature #6565 for this matter. This will also help on reporting etc. :)

Should be today if everything goes fine. But I prefer not to open a duplicate ticket to track this. Let use Feature #5288 and sub-tickets then if it isn’t closed.

#8 Updated by intrigeri 2015-10-16 03:03:52

  • blocks Feature #10382: Implement the specified notification system for test suite failures on Jenkins added

#9 Updated by bertagaz 2015-10-21 04:12:40

I’ve compiled a list of the test_Tails_ISO_* jobs that has run since the deployment and will interest us for the evaluation. I’ve removed branches that do not contain a804982ae08a693e8eed96e477ac07f47c24de96 and builds that have been aborted. It will help in preparing the evaluation for tomorrow and compare with the notifications people had.

The branches based on devel have a lot of recurring failures that do not occur in the experimental branch.

Only one successful test job so far: test_Tails_ISO_test-10208-clean-up-deps #1

#10 Updated by bertagaz 2015-11-06 06:37:47

  • Status changed from Confirmed to In Progress
  • Assignee changed from anonym to bertagaz
  • Target version changed from Tails_1.7 to Tails_1.8
  • % Done changed from 0 to 40
  • Starter set to No

Things are getting better, and after a round of @fragile test tagging today, we’ll see how the runs in Jenkins behave in a week and check if we can unleash the notifications to committers or have to fix some last issues we’ll identify then. Kytv and I will take care of this evaluation on Friday, November 13, 2PM CET.

#11 Updated by intrigeri 2015-11-06 07:31:10

  • Priority changed from Normal to Elevated

(Blocks a ticket that has priority = elevated.)

#12 Updated by intrigeri 2015-11-06 07:32:53

  • Deliverable for changed from 268 to 267

#13 Updated by intrigeri 2015-12-05 15:44:22

  • % Done changed from 40 to 50

We’re going to go ahead and mark as fragile (Monday, 3pm CET) more stuff, even if it disables 80% of the test suite.

#14 Updated by intrigeri 2015-12-07 09:02:38

  • Priority changed from Elevated to High

It’s been harder than I expected.

I could not find how one usually does the analysis without too much
by-hand operations, so I spent some time automating this (Bug #10288#note-29).

The results in the summary is dominated by snapshots, so I don’t know if we’re really looking at the biggest offenders, that is the (hidden) steps that make snapshots fail. Then bertagaz let me know about —steps, which helped but then it gave a fucking lot of info: /code/attachments/download/1090/summary-20151201-to-20151207.txt

Some tests have been failing repeatedly on feature/jessie only, because they hadn’t been adapted to Jessie yet, which makes the summary less useful than it could be. I think we should either look at Wheezy only for now, or finish porting the test suite to Jessie and look at Jessie results only. Once 1.8 has been released the latter will be true (except for bugfix branches based on stable) so it’ll be
easier, wait and see.

So at this point I wonder if json-analysis is relevant (I trust it’ll become relevant again in the future though), and perhaps we should instead look at test results by hand.

Anyway.

bertagaz tagged as fragile the tests that rely on Tor having bootstrapped.

On my side, I looked at Tails Installer buggy behaviour that makes way too many tests fail (Bug #10717) => I’ve filed Bug #10720 to track the actual installer bug (or symptom of in our test suite stack on Jenkins, whatever), and tagged these tests @fragile, because I don’t see that behaviour elsewhere than on Jenkins.

I believe that what’s now blocking us is:

  • Issues with the remote shell (Bug #10502): still happens a lot (178 times since the beginning of the month), but apparently much less on Jessie, so I’ve sent that to bertagaz plate Bug #10502#note-14 to come up with a short term plan; still, longer term, input from test suite folks would be useful on that ticket, starting with: do you ever see that when testing (Jessie) ISO images at home?
  • Whatever else we’ll notice in the next days. I expect bertagaz to stay on top of this, and will mark as blocking Feature #10382 whatever blocks it, so we know what needs to be fixed before we can turn on the big red switch.

#15 Updated by bertagaz 2015-12-15 03:39:01

  • Target version changed from Tails_1.8 to Tails_2.0

Postponing

#16 Updated by intrigeri 2015-12-18 07:16:08

  • Assignee changed from bertagaz to intrigeri

Taking this over for today and will update this comment to log what I did:

  • triaged and marked as fragile tests that failed on stable, devel or experimental since the beginning of the month

#17 Updated by intrigeri 2015-12-18 08:40:03

  • Status changed from In Progress to Resolved
  • Assignee deleted (intrigeri)
  • % Done changed from 50 to 100

I think we won’t learn much by waiting more, and should go ahead. Closing.