Bug #10501: Step 'the "10CC5BC7" key is in the live user's public keyring' is fragile

Bug #10501

Step 'the "10CC5BC7" key is in the live user's public keyring' is fragile

Added by bertagaz 2015-11-06 10:57:37 . Updated 2015-12-16 11:33:51 .

Status:

Resolved

Priority:

Normal

Assignee:

Category:

Test suite

Target version:

Tails_1.8

Start date:

2015-11-06

Due date:

% Done:

100%

Feature Branch:

kytv/test/9095-robust-seahorse

Type of work:

Code

Blueprint:

Starter:

Affected tool:

Deliverable for:

270

Description

We’ve seen it failing a bit in Jenkins (2 times in September, 3 times in October).

We’re not sure of why it fails, so bertagaz will have to dig in the history but it may have been erased with the 1.7 release.
Meanwhile there’s been a bump in the retry_tor number that may have helped to workaround it.

One question raised that may help is that currently we fetch the key and check if it’s actually been fetched within 2 minutes.
Should we perhaps enforce a limit in the fetch step and cancel the fetch if it’s taking too long, then retry?
IIRC that was suggested for the Git tests.

Subtasks

History

#1 Updated by bertagaz 2015-11-06 10:58:39

Deliverable for changed from 267 to 270

#2 Updated by bertagaz 2015-11-12 05:18:43

Assignee changed from bertagaz to anonym
Type of work changed from Research to Discuss

Here are infos about why it fails:

in job test_Tails_ISO_testing #35, fails because the keyserver replied with “Bad gateway” error code:
https://jenkins.tails.boum.org/view/Tails_ISO/job/test_Tails_ISO_testing/35/artifact/build-artifacts/03%3A16%3A02_Fetching_OpenPGP_keys_using_Seahorse_via_the_Tails_OpenPGP_Applet_should_work_and_be_done_over_Tor..png

in job test_Tails_ISO_testing #30, fails because the keyserver replied with “Gateway time-out” error code:
https://jenkins.tails.boum.org/view/Tails_ISO/job/test_Tails_ISO_testing/30/artifact/build-artifacts/03%3A22%3A58_Fetching_OpenPGP_keys_using_Seahorse_should_work_and_be_done_over_Tor..png

in several jobs, it fails because of hostname resolution error for pool.sks-keyservers.net:
https://jenkins.tails.boum.org/view/Tails_ISO/job/test_Tails_ISO_stable/21/artifact/build-artifacts/03%3A14%3A36_Fetching_OpenPGP_keys_using_Seahorse_via_the_Tails_OpenPGP_Applet_should_work_and_be_done_over_Tor..png
https://jenkins.tails.boum.org/view/Tails_ISO/job/test_Tails_ISO_isotester1_metrics/5/artifact/build-artifacts/torified_gnupg-2015-09-28T18%3A37%3A25-07%3A00.png
https://jenkins.tails.boum.org/view/Tails_ISO/job/test_Tails_ISO_isotester1_metrics/6/artifact/build-artifacts/torified_gnupg-2015-09-29T02%3A00%3A11-07%3A00.png

in job test_Tails_ISO_isotester1_metrics #42, fails with error “Cannot connect to destination”:
https://jenkins.tails.boum.org/view/Tails_ISO/job/test_Tails_ISO_isotester1_metrics/42/artifact/build-artifacts/torified_gnupg-2015-10-10T12%3A36%3A19-07%3A00.png

in job test_Tails_ISO_feature-5926-freezable-apt-repository #7, took more than the 2 minutes timeout to fetch the key:
https://jenkins.tails.boum.org/view/Tails_ISO/job/test_Tails_ISO_feature-5926-freezable-apt-repository/7/artifact/build-artifacts/03%3A06%3A17_Fetching_OpenPGP_keys_using_Seahorse_via_the_Tails_OpenPGP_Applet_should_work_and_be_done_over_Tor..png

So it seems some retry magics could help, given it seems these are mostly network errors, either from our side but also on the sks keyserver one.

Note that there is always this stop icon in the error window of seahorse, could maybe be helpful to know when to retry?

Assigning to anonym (and adding kytv as watcher), so that you can organize on the next steps (define fix, and update ticket metadatas).

#3 Updated by kytv 2015-11-12 10:25:54

I’m confident that my work on ~~Bug #9095~~ (& incidentally ~~Bug #9791~~) will increase general robustness of the entire features/torified_gnupg.feature feature. :)

#4 Updated by kytv 2015-11-12 10:31:49

One of the problems that I discovered is that Seahorse WILL always segfault if there’s a network error. It may not segfault until after the close button is clicked, but it will segfault. The way I decided to handle it is to always kill seahorse during the keysyncing step and restart it since we know that it will segfault 100% of the time.

For seahorse, I’m going to assume that if there’s a close button on the screen, that’s an error. Much of the code I added earlier has been refactored and made less convoluted, partly because I’m using what I learned about ruby since I wrote it, partly because I understand seahorse’s failure modes better. :)

I’m also going to kill the gpg binary if key fetching isn’t successful within 60 seconds, force a new Tor circuit, then retry.