Bug #12259

reboot_job is broken since Jenkins was upgraded to Stretch, which often breaks the test suite

Added by anonym 2017-02-24 14:22:54 . Updated 2017-03-08 09:01:52 .

Status:
Resolved
Priority:
Elevated
Assignee:
Category:
Infrastructure
Target version:
Start date:
2017-02-24
Due date:
% Done:

100%

Feature Branch:
Type of work:
Sysadmin
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

See e.g.: https://jenkins.tails.boum.org/job/test_Tails_ISO_test-12019-totem-add-local-video-action/29/console

[...]
19:27:23 Command failed (returned pid 4334 exit 255): ["/var/lib/jenkins/workspace/test_Tails_ISO_test-12019-totem-add-local-video-action/submodules/chutney/chutney", "start", "/var/lib/jenkins/workspace/test_Tails_ISO_test-12019-totem-add-local-video-action/features/chutney/test-network", {:err=>[:child, :out]}]:
19:27:23 Using Python 2.7.9
19:27:23
19:27:23 Starting nodes
19:27:23
19:27:23 Couldn't launch test000auth (tor --quiet -f /tmp/TailsToaster/chutney-data/nodes/000auth/torrc): 1
19:27:23
19:27:23 Couldn't launch test001auth (tor --quiet -f /tmp/TailsToaster/chutney-data/nodes/001auth/torrc): 1
19:27:23
19:27:23 Couldn't launch test002auth (tor --quiet -f /tmp/TailsToaster/chutney-data/nodes/002auth/torrc): 1
[... same failures for all other nodes ...]


I have seen the same failure appear for me locally when I run the test suite (and hence Chutney) and press Ctrl+C (so Chutney is not cleaned up) and remove the TMPDIR (so the PID references are lost); if I then restart the test suite there are tor instances on the expected TCP ports, and chutney fails.


Subtasks


History

#1 Updated by anonym 2017-02-24 14:24:17

  • Description updated

#2 Updated by intrigeri 2017-03-05 10:46:08

  • Subject changed from Chutney sometimes fails to start on Jenkins to reboot_job is broken since Jenkins was upgraded to Stretch, which often breaks the test suite
  • Status changed from Confirmed to In Progress
  • Assignee set to intrigeri
  • Priority changed from Normal to High
  • Target version set to Tails_2.11
  • % Done changed from 0 to 10

Pushed a tentative fix + made it so I’ll get more debugging output.

#3 Updated by intrigeri 2017-03-05 10:58:25

  • % Done changed from 10 to 50

Seems to work: https://jenkins.tails.boum.org/job/wrap_test_Tails_ISO_feature-stretch/ just failed, which is exactly what should happen. And it rebooted isotester5. Will verify it’s fixed consistently later.

#4 Updated by intrigeri 2017-03-05 11:43:26

So, with anonym we did a little post-mortem to find out how we could have noticed this problem earlier. Our best idea so far is to have the Jenkins test suite wrapper:

  1. exits with a non-zero error code if some flag file exists
  2. creates the flag file
  3. run the test suite

If we had had this in place, then all test suite runs would have failed and we would have an obvious explanation of what went wrong.

#5 Updated by intrigeri 2017-03-05 13:45:57

  • Priority changed from High to Elevated
  • Target version changed from Tails_2.11 to Tails_2.12

Now that the immediate problem is fixed, I’ll deal with the “how can we avoid that in the future” later.

#6 Updated by intrigeri 2017-03-05 17:43:59

intrigeri wrote:
> So, with anonym we did a little post-mortem to find out how we could have noticed this problem earlier. Our best idea so far is to have the Jenkins test suite wrapper:
>
> # exits with a non-zero error code if some flag file exists
> # creates the flag file
> # run the test suite
>
> If we had had this in place, then all test suite runs would have failed and we would have an obvious explanation of what went wrong.

Pushed an untested implementation to production (“what can possibly go wrong, it’s 9 simple lines of shell?” — famous last words).

#7 Updated by intrigeri 2017-03-08 09:01:52

  • Status changed from In Progress to Resolved
  • Assignee deleted (intrigeri)
  • % Done changed from 50 to 100