Bug #10495: The 'the time has synced' step is fragile

Bug #10495

The 'the time has synced' step is fragile

Added by anonym 2015-11-06 10:22:03 . Updated 2017-08-14 18:15:07 .

Status:

In Progress

Priority:

Elevated

Assignee:

Category:

Test suite

Target version:

Start date:

2015-11-06

Due date:

% Done:

Feature Branch:

Type of work:

Research

Blueprint:

Starter:

Affected tool:

Deliverable for:

Description

See ~~Bug #10494~~ which will fix this in Tails, not the test suite. This ticket is really just to acknowledge this robustness issue, even though nothing will be done in the test suite.

Subtasks

Related issues

Related to Tails - ~~Bug #13472~~: Replace www.centos.org in htpdate pools	Resolved	2017-07-15
Blocked by Tails - ~~Bug #10494~~: Retry htpdate when it fails	Rejected	2016-07-17
Blocked by Tails - ~~Feature #9521~~: Use the chutney Tor network simulator in our test suite	Resolved	2016-04-15
Blocked by Tails - Bug #11562: Monitor servers from the htpdate pools	Confirmed	2016-07-14
Blocks Tails - Feature #16209: Core work: Foundations Team	Confirmed

History

#1 Updated by anonym 2015-11-06 10:22:25

blocked by ~~Bug #10494~~: Retry htpdate when it fails added

#2 Updated by anonym 2015-11-06 10:22:41

blocks #8668 added

#3 Updated by anonym 2015-11-06 10:23:35

Assignee set to kytv
Target version set to Tails_1.8
Parent task set to Bug #10288
Deliverable for set to 270

#4 Updated by bertagaz 2015-11-16 04:01:23

Since Nov 6, 2015 (test_Tails_ISO_experimental #42), this step has been one of the most common failures as it broke a test job 19 times, and without it, some would probably have passed as it was the only trouble.

This should probably be worked on ASAP to complete the test suite robustness.

#5 Updated by intrigeri 2015-12-05 08:17:15

This might be a duplicate of ~~Bug #10440~~.

#6 Updated by intrigeri 2015-12-05 13:24:32

Target version changed from Tails_1.8 to Tails_2.0

(We’re going to mark as fragile all tests that depend on Tor to have bootstrapped for the moment => not so urgent.)

#7 Updated by intrigeri 2015-12-07 07:38:29

bertagaz wrote:
> Since Nov 6, 2015 (test_Tails_ISO_experimental #42), this step has been one of the most common failures as it broke a test job 19 times, and without it, some would probably have passed as it was the only trouble.

I don’t see it at all in the latest summary I generated and posted on Bug #10288. bertagaz, can you please check?

#8 Updated by intrigeri 2015-12-19 10:29:18

It would be fine to postpone this. In any case, please prioritize your SponsorsM4 stuff (Icedove) higher.

#9 Updated by intrigeri 2016-01-06 13:58:15

Target version changed from Tails_2.0 to Tails_2.2

#10 Updated by intrigeri 2016-02-05 13:52:50

Target version changed from Tails_2.2 to Tails_2.3

intrigeri wrote:
> It would be fine to postpone this. In any case, please prioritize your SponsorsM4 stuff (Icedove) higher.

Still the case.

#11 Updated by anonym 2016-03-03 16:08:49

Assignee changed from kytv to anonym
Target version changed from Tails_2.3 to Tails_2.4
Type of work changed from Wait to Research

Hopefully Chutney (~~Feature #9521~~) will fix the tordate parts.

#12 Updated by anonym 2016-03-03 16:18:16

~~Bug #10238 is probably related, but Redmine forbids adding a relationship due to circularity.~~

#13 Updated by anonym 2016-03-03 16:18:41

blocked by ~~Feature #9521~~: Use the chutney Tor network simulator in our test suite added

#14 Updated by intrigeri 2016-05-14 13:44:13

blocked by deleted (~~#8668~~)

#15 Updated by intrigeri 2016-05-18 15:17:21

Again: is it a duplicate of ~~Bug #10440~~?

#16 Updated by intrigeri 2016-05-26 09:10:13

intrigeri wrote:
> Again: is it a duplicate of ~~Bug #10440~~?

Actually, let’s say no: ~~Bug #10440~~ is about the scenarios that are specifically testing time sync, while this one is about the ‘the time has synced’ step, that all online scenarios rely on (via “Tor is ready”). So even though ~~Bug #10440~~ is “fixed” (by disabling some tests) in test/10497-tor-bootstrap-is-fragile, we still have a problem here, and unsurprisingly I’ve seen it break tests again.

#17 Updated by anonym 2016-06-08 01:34:59

Target version changed from Tails_2.4 to Tails_2.5

#18 Updated by bertagaz 2016-07-15 04:16:14

QA Check set to Ready for QA

So, as noted on ~~Bug #10494~~, I’ll report here my test suite run results.

I’ve done a bit of report of my first runs in ~~Bug #10494#note-25~~. As promised, I’ve tried with the --connect-timeout options, but it didn’t bring much amelioration: I had 3 failures of this step on 120 runs (so a bit more than without). Still I think this option makes sense. Waiting 2 minutes for a single request sounds too much for me, even over Tor.

After that I’ve found that 3 urls in the pools were faulty, and fixed them (as stated in ~~Bug #10494#note-30~~). Since then, I’ve run something like 150 times the scenario mentioned in ~~Bug #10494#note-25~~, and seen no failure!

So to me it seems that the little errors that appeared in the previous runs were probably due to this faulty urls. 2 of them were in the HTP_POOL_PAL pool, which may explain things, if htpdate tries 5 times for a pool before erroring out. I still see it is restarted some times though it seems to appear a bit less than before.

So in the end, I think the enhancement brought by ~~Bug #10494~~ fixes this step. Actually, it may very well have been a bug in Tails. I believe this ticket can also be considered RfQA now, so setting it accordingly.

#19 Updated by bertagaz 2016-07-15 08:58:30

Assignee changed from anonym to bertagaz
QA Check changed from Ready for QA to Dev Needed

bertagaz wrote:
> Since then, I’ve run something like 150 times the scenario mentioned in ~~Bug #10494#note-25~~, and seen no failure!

And it seems I needed to post this note to see one. :/

So this step is not entirely fixed, and I was too late at noticing the failure to get the reason why. When I inspected the htpdate logs, it claimed to have succeeded. So this could be due to Tor bootstrapping problems maybe.

I’ll do more test, I’ll raise the try_for time to see if I still have failures.

#20 Updated by intrigeri 2016-07-15 09:35:26

Hold on, see my latest comments on ~~Bug #10494~~. IMO we should do something simpler and less risky first before we invest even more time here.

#21 Updated by intrigeri 2016-07-18 07:22:26

Target version changed from Tails_2.5 to Tails_2.6
Deliverable for changed from 270 to SponsorS_Internal

#23 Updated by intrigeri 2016-08-18 07:30:58

Target version deleted (~~Tails_2.6~~)

#24 Updated by intrigeri 2016-08-18 07:35:53

Assignee deleted (~~bertagaz~~)
Deliverable for deleted (~~SponsorS_Internal~~)

#25 Updated by bertagaz 2017-07-12 11:19:41

Priority changed from Normal to Elevated

Happened 60 times on all currently known branches in June, 104 times in total for what 2017 logs we have.

That’s a lot, so raising priority. First step would probably be to check if HTTP servers used by htpdate are OK, then tackle ~~Bug #10494~~.

#26 Updated by intrigeri 2017-07-13 18:36:35

blocked by Bug #11562: Monitor servers from the htpdate pools added

#27 Updated by intrigeri 2017-07-13 18:38:17

> Happened 60 times on all currently known branches in June, 104 times in total for what 2017 logs we have.

Ouch.

> That’s a lot, so raising priority. First step would probably be to check if HTTP servers used by htpdate are OK, then tackle ~~Bug #10494~~.

IMO next step is Bug #10495 (on your plate): there might be issues in our current HTP pool, and there’s some hope that fixing them will avoid having to do ~~Bug #10494~~ at all.

#28 Updated by bertagaz 2017-07-15 14:41:16

intrigeri wrote:
> > Happened 60 times on all currently known branches in June, 104 times in total for what 2017 logs we have.
>
> Ouch.
>
> > That’s a lot, so raising priority. First step would probably be to check if HTTP servers used by htpdate are OK, then tackle ~~Bug #10494~~.
>
> IMO next step is Bug #10495 (on your plate): there might be issues in our current HTP pool, and there’s some hope that fixing them will avoid having to do ~~Bug #10494~~ at all.

To fix this faster than before we have monitoring in place, I’ve quickly checked the urls of the different HTP pools, and found out that www.centos.org is always failing to reply to curl (with the same cmdline than the one used by htpdate).

So I’ll open a ticket and prepare a branch replacing www.centos.org by something like https://getfedora.org/ that seems to be reliable.

#29 Updated by bertagaz 2017-07-15 14:44:07

related to ~~Bug #13472~~: Replace www.centos.org in htpdate pools added

#30 Updated by intrigeri 2017-07-16 11:58:32

> To fix this faster than before we have monitoring in place, I’ve quickly checked the urls of the different HTP pools, and found out that www.centos.org is always failing to reply to curl (with the same cmdline than the one used by htpdate).

Amazing!

#31 Updated by bertagaz 2017-08-14 18:15:07

Status changed from Confirmed to In Progress

Applied in changeset commit:aeb903a6fe8422c3beb677110c56f04b28c6108b.

#32 Updated by intrigeri 2019-03-08 15:36:50

blocks ~~Feature #13241~~: Core work: Test suite maintenance added

#33 Updated by intrigeri 2019-03-20 14:48:22

blocks Feature #16209: Core work: Foundations Team added

#34 Updated by intrigeri 2019-03-20 14:48:37

blocked by deleted (~~~~Feature #13241~~: Core work: Test suite maintenance~~)