Bug #10993: Define and bootstrap a process to collect & analyze false positives in Jenkins test suite runs

Bug #10993

Define and bootstrap a process to collect & analyze false positives in Jenkins test suite runs

Added by intrigeri 2016-01-25 12:29:49 . Updated 2016-02-09 18:40:46 .

Status:

Resolved

Priority:

Elevated

Assignee:

Category:

Continuous Integration

Target version:

Tails_2.2

Start date:

2016-01-25

Due date:

% Done:

100%

Feature Branch:

Type of work:

Communicate

Blueprint:

Starter:

Affected tool:

Deliverable for:

267

Description

We’ve enabled notifications for these runs a bit more than a month ago. Since then, there have been lots of false positives. Presumably because there are so many (and then bogus notifications are the rule more than the exception), nobody told the CI team about it. And nobody on the CI team, who received such notifications as well, dealt with it either. So in the end, until today, fragile tests were not flagged as such. This shows that we need a formal process to go on catching false positives, and to flag scenarios as fragile.

Subtasks

History

#1 Updated by bertagaz 2016-02-03 11:56:48

Assignee changed from bertagaz to intrigeri
% Done changed from 0 to 10
QA Check set to Info Needed

I’ve given a bit of thinking about this issue. I think we mostly failed at this because having only one person responsible of this survey is not realistic given our respective workload, and we can’t really only count on stating “it’s a collective effort” without formalizing a process.

So here’s a proposal to have something more reliable:

We should use formal shifts as we sometimes do for other Tails areas. We should define a time range where someone is responsible for this task, that could be something like a week. So during each meeting of the CI team, we spend a moment defining who will handle this task for each week/time range until the next meeting and take note of it in the CI team git repo.

So I propose we discuss this idea at the next CI meeting, and apply it if everyone agree.

#2 Updated by intrigeri 2016-02-05 17:33:23

Assignee changed from intrigeri to bertagaz
QA Check changed from Info Needed to Dev Needed

> I think we mostly failed at this because having only one person responsible of this survey is not realistic given our respective workload,

I agree it would not be realistic.
But JFTR I don’t think we even tried it (if we did try, then who was this responsible person?)

> and we can’t really only count on stating “it’s a collective effort” without formalizing a process.

ACK. Hence this ticket.

> So here’s a proposal to have something more reliable:

> We should use formal shifts as we sometimes do for other Tails areas. We should define a time range where someone is responsible for this task, that could be something like a week.

The idea of shifts sounds good.

> So during each meeting of the CI team, we spend a moment defining who will handle this task for each week/time range until the next meeting and take note of it in the CI team git repo.

This work is not something that necessarily needs to be done continuously, nor with a low latency (e.g. one pass per month is enough IMO). And it’s just a couple hours of work a month. So I think that 1 month is a good duration for these shifts (the longer the shifts, the lower the bureaucratic overhead): even if you’re away two weeks, no big deal.

So I say we can schedule 11 shifts monthly shifts for what remains of 2016 and be done with it.

> So I propose we discuss this idea at the next CI meeting, and apply it if everyone agree.

Sounds like a good place to make sure we’re on the same page, indeed.

What the current proposal lacks, though, is answers to these questions:

How can one actually collect & analyze false positives? Remember that we’re not in the situation where autotest developers receive all notifications.
Who will take these shifts?

#3 Updated by bertagaz 2016-02-05 20:05:22

% Done changed from 10 to 20

intrigeri wrote:
> > I think we mostly failed at this because having only one person responsible of this survey is not realistic given our respective workload,
>
> I agree it would not be realistic.
> But JFTR I don’t think we even tried it (if we did try, then who was this responsible person?)

We did that in the past, late 2015, but in the end we went being only one trying to do it for real. But it was too much load at that time.

> This work is not something that necessarily needs to be done continuously, nor with a low latency (e.g. one pass per month is enough IMO). And it’s just a couple hours of work a month. So I think that 1 month is a good duration for these shifts (the longer the shifts, the lower the bureaucratic overhead): even if you’re away two weeks, no big deal.
>
> So I say we can schedule 11 shifts monthly shifts for what remains of 2016 and be done with it.

Fine with me.

> What the current proposal lacks, though, is answers to these questions:
>
> * How can one actually collect & analyze false positives? Remember that we’re not in the situation where autotest developers receive all notifications.

Please stop asking good questions! :)

In order to do it, I guess one need to have access to the jobResult file for the month he is responsible for. We could either share them online somewhere, or have someone with the right accesses send the file by email. The later seems enough to me, unless I miss something. For the rest, people already have access to our jenkins web interface and to nightly.t.b.o

> * Who will take these shifts?

I probably was too implicit (mentioning only the CI team meeting), but I propose that people taking care of it are the CI team members.

#4 Updated by bertagaz 2016-02-09 11:24:44

Assignee changed from bertagaz to intrigeri
% Done changed from 20 to 50
QA Check changed from Dev Needed to Ready for QA

Created the bunch of tickets as we stated during the meeting. See this search to get them all. The shifts were defined in https://pad.riseup.net/p/A7NxgoINRsZs

#5 Updated by intrigeri 2016-02-09 18:40:46

Status changed from Confirmed to Resolved
Assignee deleted (~~intrigeri~~)
% Done changed from 50 to 100
QA Check changed from Ready for QA to Pass

I tweaked a bit these tickets, now all right for me :)