Bug #15799

"Additional software packages are installed even without network" test always fail in my environment

Added by intrigeri 2018-08-17 06:53:38 . Updated 2018-09-05 16:21:29 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Test suite
Target version:
Start date:
2018-08-17
Due date:
% Done:

100%

Feature Branch:
test/13541-more-robust-tor-bootstrap+force-all-tests
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Additional Software Packages
Deliverable for:
299

Description

Hi! I’ve briefly mentioned this a few times already but never managed to file a ticket, so here we go. I’m aware that this test works fine on your machine and on lizard, but the fact it always fails in my environment (a replica of our Jenkins setup modulo I have a poor Internet connection), which otherwise runs the rest of the test suite in a more reliable manner than lizard, may be an indicator of a real problem, that could potentially affect more developers and not just me.

01:59:04   Scenario: Additional software packages are installed even without network                                                                       # features/additional_software_packages.feature:8
02:06:13     Given I have started Tails without network from a USB drive with a persistent partition enabled and logged in with an administration password # features/step_definitions/snapshots.rb:186
02:06:13     And the network is plugged                                                                                                                    # features/step_definitions/common_steps.rb:79
02:06:13     And Tor is ready                                                                                                                              # features/step_definitions/common_steps.rb:347
02:06:13       Tor failed to bootstrap (TorBootstrapFailure)
02:06:13       ./features/support/helpers/misc_helpers.rb:170:in `rescue in wait_until_tor_is_working'
02:06:13       ./features/support/helpers/misc_helpers.rb:164:in `/^Tor has built a circuit$/'
02:06:13       features/additional_software_packages.feature:11:in `And Tor is ready'

I’m attaching all the debug artifacts Jenkins saved for this scenario when I’ve run the full test suite on the 3.9~rc1 tentative ISO.

Potentially fishy things:

  • tor-has-bootstrapped returns 1 for 4 seconds and then returns 3 forever.
  • Logs begin at Fri 2018-08-17 00:50:17 UTC, end at Fri 2018-08-17 02:04:12 UTC means — I guess — that the oldest snapshot in the tree used here is already more than 1h old when we run this scenario; I wonder if ordering matters and if it’s just bad luck if this scenario fails and perhaps another similar one that would run at the same (relative) time would expose the same problem; I’ll try running only this scenario in the same environment, and perhaps running it at a different (relative) time
  • In the logs I see: Unexpected valid-until: [2018-08-17 02:02:00] is not [2018-08-17 02:01:40 + 3h]. Could this possibly make tor fail to bootstrap?

What other info should I provide to help you debug this? E.g. I could apply patches you give me to get more debug output, this sort of things.


Files


Subtasks


Related issues

Related to Tails - Feature #14572: Implement automated tests for Additional Software/Offline Mode Resolved 2017-08-30 2018-01-15
Related to Tails - Feature #14596: Write automated tests for Additional Software GUI Resolved 2017-09-04 2018-03-06
Related to Tails - Bug #13541: Tor still sometimes fails to bootstrap in the test suite In Progress 2017-07-31
Blocks Tails - Feature #14597: Review automated tests for Additional Software GUI Rejected 2017-09-04 2018-04-01
Blocks Tails - Feature #13241: Core work: Test suite maintenance Rejected 2017-06-29

History

#1 Updated by intrigeri 2018-08-17 06:54:10

  • related to Feature #14572: Implement automated tests for Additional Software/Offline Mode added

#2 Updated by intrigeri 2018-08-17 06:54:15

  • related to Feature #14596: Write automated tests for Additional Software GUI added

#3 Updated by intrigeri 2018-08-17 07:35:22

intrigeri wrote:
> * In the logs I see: Unexpected valid-until: [2018-08-17 02:02:00] is not [2018-08-17 02:01:40 + 3h]. Could this possibly make tor fail to bootstrap?

I suspect that’s caused by V3AuthVotingInterval, for which bertagaz proposed a workaround (see Bug #13541#note-16). I’ll try it and we’ll see.

#4 Updated by intrigeri 2018-08-17 13:41:03

intrigeri wrote:
> * Logs begin at Fri 2018-08-17 00:50:17 UTC, end at Fri 2018-08-17 02:04:12 UTC means — I guess — that the oldest snapshot in the tree used here is already more than 1h old when we run this scenario; I wonder if ordering matters and if it’s just bad luck if this scenario fails and perhaps another similar one that would run at the same (relative) time would expose the same problem; I’ll try running only this scenario in the same environment, and perhaps running it at a different (relative) time

Indeed, if I run this scenario 1st it passes.

#5 Updated by intrigeri 2018-08-17 14:05:42

  • related to Bug #13541: Tor still sometimes fails to bootstrap in the test suite added

#6 Updated by intrigeri 2018-08-18 11:06:59

  • Assignee changed from bertagaz to intrigeri
  • QA Check deleted (Info Needed)
  • Feature Branch set to test/13541-more-robust-tor-bootstrap+force-all-tests

#7 Updated by intrigeri 2018-08-18 11:08:12

  • blocks Feature #14597: Review automated tests for Additional Software GUI added

#8 Updated by intrigeri 2018-08-18 11:17:21

  • Status changed from Confirmed to In Progress
  • Assignee changed from intrigeri to CyrilBrulebois
  • % Done changed from 0 to 50
  • QA Check set to Ready for QA

intrigeri wrote:
> intrigeri wrote:
> > * In the logs I see: Unexpected valid-until: [2018-08-17 02:02:00] is not [2018-08-17 02:01:40 + 3h]. Could this possibly make tor fail to bootstrap?
>
> I suspect that’s caused by V3AuthVotingInterval, for which bertagaz proposed a workaround (see Bug #13541#note-16). I’ll try it and we’ll see.

Gotcha! With that workaround applied, this scenario passes as part of a full test suite run, in the very environment where I’ve never seen it past before. I’ve run the full test suite on this branch both on Tails’ Jenkins and mine and could not spot any regression.

kibi, can you please review this? The only change is in the chutney submodule. You don’t need to learn everything about how the Tor network does, in a nutshell Tor clients select circuits using info they find in the “consensus”, which is built and voted on by a number of “directory authorities”. By default they would vote every hour but in the small, neat Tor network our test suite uses (run by Chutney) it was 10s, causing trouble as seen in the log I’m quoting above. The corresponding doc in torrc(5) is:

       V3AuthVotingInterval N minutes|hours
           V3 authoritative directories only. Configures the server’s preferred
           voting interval. Note that voting will actually happen at an interval
           chosen by consensus from all the authorities' preferred intervals. This
           time SHOULD divide evenly into a day. (Default: 1 hour)

If this change does not break test cases that used to pass so far in your environment, and you don’t spot any backdoor in my commit, I think we’re good to go :)

#9 Updated by intrigeri 2018-08-18 11:17:56

  • Type of work changed from Research to Code

#10 Updated by intrigeri 2018-08-19 13:52:44

#11 Updated by CyrilBrulebois 2018-08-19 14:15:39

Looking at the various related parameters, I initially wondered about maybe adjusting V3AuthVoteDelay accordingly, but that doesn’t seem to be needed: votes have to be quick (which they seem to be), we’re just asking to vote less often.

So that looks good to me, even if I can’t really run the test suite reliably as of now; I’ll try to finish setting up my machine(s) shortly.

#12 Updated by intrigeri 2018-08-19 14:53:03

> So that looks good to me, even if I can’t really run the test suite reliably as of now; I’ll try to finish setting up my machine(s) shortly.

OK! I’m not sure we should block on that and I’m pondering merging anyway.

#13 Updated by CyrilBrulebois 2018-09-03 23:26:19

  • Assignee changed from CyrilBrulebois to intrigeri
  • QA Check changed from Ready for QA to Pass

I’ve had a failure here:

    Step failed while creating checkpoint: a Tails persistence partition exists on USB drive "__internal"

(that I think I’ve seen on another machine, and that needs to be investigated separately)

but anything related to the Tor machinery looks great. So I’d go for a merge.

#14 Updated by intrigeri 2018-09-04 06:30:01

  • Status changed from In Progress to Fix committed
  • Assignee deleted (intrigeri)
  • % Done changed from 50 to 100

Thanks, merged.

#15 Updated by intrigeri 2018-09-05 16:21:29

  • Status changed from Fix committed to Resolved