Bug #10444

Git tests are fragile

Added by kytv 2015-10-28 08:42:55 . Updated 2016-08-02 09:29:47 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Test suite
Target version:
Start date:
2015-10-28
Due date:
% Done:

100%

Feature Branch:
test/10444-git-tests-are-fragile
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Deliverable for:
270

Description

When a git clone fails in the test suite, we should force a new Tor circuit and retry the clone.


Files


Subtasks


Related issues

Related to Tails - Bug #11563: Git over HTTPS scenario is fragile Resolved 2016-07-14

History

#1 Updated by kytv 2015-10-28 08:43:37

  • Parent task set to Bug #10288
  • Feature Branch set to kytv/test/10444-git-over-https-is-fragile

#2 Updated by intrigeri 2015-10-31 05:51:07

  • blocks #8668 added

#3 Updated by intrigeri 2015-11-06 06:08:56

  • Assignee set to kytv
  • Deliverable for set to 270

#4 Updated by kytv 2015-11-06 11:26:38

  • Subject changed from Cloning a git repository over HTTPS test is fragile to Git tests are fragile
  • Target version set to Tails_1.8

The whole feature should use retrying upon failure.

#5 Updated by kytv 2015-11-06 12:41:35

  • Feature Branch changed from kytv/test/10444-git-over-https-is-fragile to kytv/test/10444-git-tests-are-fragile

#6 Updated by kytv 2015-11-07 05:56:59

  • Description updated
  • Status changed from Confirmed to In Progress
  • % Done changed from 0 to 10

#7 Updated by kytv 2015-11-07 06:47:44

Attaching example of a failed clone via SSH.

#9 Updated by kytv 2015-11-07 14:37:53

  • Assignee changed from kytv to anonym
  • % Done changed from 10 to 40
  • QA Check set to Ready for QA

#10 Updated by intrigeri 2015-12-05 13:23:13

  • Assignee changed from anonym to intrigeri

I’ll try to review this to lighten a bit anonym’s plate and keep kytv moving :)

#11 Updated by bertagaz 2015-12-05 13:46:41

  • Assignee changed from intrigeri to bertagaz

intrigeri wrote:
> I’ll try to review this to lighten a bit anonym’s plate and keep kytv moving :)

Nop, I am.

#12 Updated by bertagaz 2015-12-15 03:26:18

  • Target version changed from Tails_1.8 to Tails_2.0

Postponing

#13 Updated by kytv 2015-12-16 13:23:30

I merged devel in to this branch, made a simple change for jessie, and pushed.

#14 Updated by kytv 2015-12-16 13:26:45

  • Assignee changed from bertagaz to kytv
  • QA Check changed from Ready for QA to Dev Needed

Nevermind, still more to update for Jessie.

#15 Updated by kytv 2015-12-16 15:44:02

  • Assignee changed from kytv to bertagaz
  • QA Check changed from Dev Needed to Ready for QA

This is now working with Jessie.

Note that the GUI seems to be a bit slower to respond in Jessie, causing Sikuli to type before we’re ready for it, such as typing ‘yes’ when verifying the fingerprint.

I don’t have a problem after setting the TailsToaster VM to use multiple cores/CPUs.

#16 Updated by kytv 2015-12-17 01:34:26

kytv wrote:

> Note that the GUI seems to be a bit slower to respond in Jessie, causing Sikuli to type before we’re ready for it, such as typing ‘yes’ when verifying the fingerprint.
>
> I don’t have a problem after setting the TailsToaster VM to use multiple cores/CPUs.

Note, further, that this could not be reproduced on a system without additional load.

#17 Updated by intrigeri 2015-12-17 05:46:14

>> Note that the GUI seems to be a bit slower to respond in Jessie, causing Sikuli to type before we’re ready for it, such as typing ‘yes’ when verifying the fingerprint.

This seems to indicate that there’s a missing step in the test, no?

#18 Updated by kytv 2015-12-17 15:32:26

intrigeri wrote:
> >> Note that the GUI seems to be a bit slower to respond in Jessie, causing Sikuli to type before we’re ready for it, such as typing ‘yes’ when verifying the fingerprint.
>
> This seems to indicate that there’s a missing step in the test, no?

What I saw (and showed anonym) was that in response to the “Are you sure you want to continue connecting” prompt, sikuli is supposed to type “yes” then press enter. Instead we got an “s”. After that, “please type ‘yes’ or ‘no’” Then there was a steady stream of y:s. (I don’t have the video anymore, but maybe I can reproduce it if wanted)

We’re already waiting for the question to be asked before typing our response; that step is there. Otherwise, we could introduce another static sleep after the “Are you sure you want to continue connecting” prompt.

#19 Updated by intrigeri 2015-12-18 03:06:37

> What I saw (and showed anonym) was that in response to the “Are you sure you want to
> continue connecting” prompt, sikuli is supposed to type “yes” then press enter.
> Instead we got an “s”. After that, “please type ‘yes’ or ‘no’” Then there was
> a steady stream of y:s. (I don’t have the video anymore, but maybe I can reproduce
> it if wanted)

> We’re already waiting for the question to be asked before typing our response; that
> step is there. Otherwise, we could introduce another static sleep after the “Are
> you sure you want to continue connecting” prompt.

I see. Most programs are racy in this respect, I think: they print the prompt and then setup whatever is needed to read the answer. Perhaps we indeed need to wait a bit on our side to make this really robust in practice, then? I don’t think it’s blocking reviewing/testing/merging the current state of your work (iterations!) but please make sure the resulting state of the test suite is robust (including on Jenkins, modulo lack of robustness in dependencies such as Tor bootstrapping process) before this ticket is marked as resolved. Cheers!

#20 Updated by kytv 2015-12-20 10:08:55

I pushed updates which should make it incredibly robust. I’m quite pleased with the end result. :)

#21 Updated by bertagaz 2016-01-27 10:49:47

  • Target version changed from Tails_2.0 to Tails_2.2

#22 Updated by bertagaz 2016-03-10 18:51:05

  • Target version changed from Tails_2.2 to Tails_2.3

#23 Updated by bertagaz 2016-04-15 08:04:36

  • Target version changed from Tails_2.3 to Tails_2.4

#24 Updated by bertagaz 2016-06-07 12:31:34

  • Target version changed from Tails_2.4 to Tails_2.5

#25 Updated by bertagaz 2016-06-30 03:44:40

  • Feature Branch changed from kytv/test/10444-git-tests-are-fragile to test/10444-git-tests-are-fragile

Updated the branch with origin/stable and imported Kytv’s patch. Let see how it behaves in Jenkins while I test it $HOME.

#26 Updated by bertagaz 2016-07-02 03:05:16

  • Assignee changed from bertagaz to anonym
  • % Done changed from 40 to 50

Pushed a few commits that make this tests more robust.

I’ve tested it a lot at home, with a uplink going up and down all day long, and the Git and SSH protocol tests went well and recovered from this network failures. The HTTPS test has proven to be a bit less reliable, but seems to be due to the way Git handles HTTP, or the way it interacts with tsocks. When network is ok, it passes.

In Jenkins they all pass without problem, so I’m not sure we can harden this scenarios more than that.

If the HTTPS scenario seems to be too fragile upon review, I’d be to open another ticket, and mark it as fragile with this one, that way we can merge this branch and still have the other scenarios back in the test suite.

Assigning to anonym for review, but I’m not sure he’s the right one. Should be a quick one anyway.

#27 Updated by bertagaz 2016-07-11 09:32:43

Note that this ticket branch has been merged into bugfix/10494-retry-htpdate-with-more-fragile-tests, so it has also been tested in Jenkins in this other branch.

#28 Updated by intrigeri 2016-07-13 13:27:41

  • Assignee changed from anonym to bertagaz
  • % Done changed from 50 to 60
  • QA Check changed from Ready for QA to Dev Needed

> If the HTTPS scenario seems to be too fragile upon review, I’d be to open another ticket, and mark it as fragile with this one, that way we can merge this branch and still have the other scenarios back in the test suite.

Yes, please do so. Note that we have a test/10444-git-over-https-is-fragile branch.

I’m very much unconvinced by the wording change brought by commit:5111b1c, since it hides the fact that we’re actually running a Git clone command: instead it suggests that we’re merely waiting for it to complete. This is a blocker IMO.

My next to comments are about style, so either you want them and you apply them as-is, or we try to have a useful discussion about it, or you can just ignore them.

You don’t need to test if a directory exists before rm -rf’ing it. Just rm -rf it and the code will be leaner and will feel less prone to race conditions :)

Similarly, in:

    if $vm.has_process?("git")
      step 'I kill the process "git"'
    end

… I think you can skip the test (that is racy anyway, which is what confused me initially, so better drop it).

#29 Updated by bertagaz 2016-07-14 04:34:20

  • Assignee changed from bertagaz to intrigeri
  • QA Check changed from Dev Needed to Ready for QA

intrigeri wrote:
> > If the HTTPS scenario seems to be too fragile upon review, I’d be to open another ticket, and mark it as fragile with this one, that way we can merge this branch and still have the other scenarios back in the test suite.
>
> Yes, please do so. Note that we have a test/10444-git-over-https-is-fragile branch.

Bug #11563

> I’m very much unconvinced by the wording change brought by commit:5111b1c, since it hides the fact that we’re actually running a Git clone command: instead it suggests that we’re merely waiting for it to complete. This is a blocker IMO.

Ok, I guess I could argue, but that change has not that much importance. :)

Reverted, as well as commit:d834073

> You don’t need to test if a directory exists before rm -rf’ing it. Just rm -rf it and the code will be leaner and will feel less prone to race conditions :)
>
> Similarly, in:
>
> … I think you can skip the test (that is racy anyway, which is what confused me initially, so better drop it).

Right, racy. That’s older code I didn’t touch. It has been written this way in other places (as you pointed in other tickets), that’s why I thought it was the way to go.

Anyway, commit:746971d

#30 Updated by intrigeri 2016-07-14 17:28:24

  • Assignee changed from intrigeri to anonym

> That’s older code I didn’t touch.

It might be older, but I was checking the diff between current stable and the proposed topic branch. If I’m not mistaken, the code that I was commenting was introduced by the branch you’ve proposed for merging.

> Anyway, commit:746971d

Thanks for all the changes! Code review now passes for me, so I’m reassigning to anonym for the last steps :)

#31 Updated by intrigeri 2016-07-21 06:01:50

  • Assignee changed from anonym to intrigeri

#32 Updated by intrigeri 2016-07-21 08:53:22

  • related to Bug #11563: Git over HTTPS scenario is fragile added

#33 Updated by intrigeri 2016-07-21 08:54:59

  • Status changed from In Progress to Fix committed
  • Assignee deleted (intrigeri)
  • % Done changed from 60 to 100
  • QA Check changed from Ready for QA to Pass

Merged, thanks!

#34 Updated by intrigeri 2016-08-02 09:29:47

  • Status changed from Fix committed to Resolved