Feature #9521

Use the chutney Tor network simulator in our test suite

Added by anonym 2015-06-02 14:23:03 . Updated 2016-06-26 10:44:17 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Test suite
Target version:
Start date:
2016-04-15
Due date:
% Done:

100%

Feature Branch:
test/9521-chutney
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Deliverable for:
270

Description

See parent ticket (Feature #9519) for the rationale.

We may want to use chutney to simulate the Tor network for increased determinism in our test suite.

The Tor network is a large part of the test suite’s indeterminism, both from transient network issues when communicating with the Tor network (or internal issues, e.g. bad circuits), and from the chosen exit node being blocked. Simulating the Tor network on the testing host would eliminate all such issues.

There is, however, a potential for making the testing host blacklisted/blocked if services identifies the repeated connections as spam (example: irc.oftc.net blocks lizard because of the multiple IRC connections made per hour, potentially). If that is a problem, Feature #9520 would solve that.


Subtasks


Related issues

Related to Tails - Bug #9478: How to deal with transient network errors in the test suite? Resolved 2015-05-27
Related to Tails - Feature #9520: Investigate the Shadow network simulator for use in our test suite Rejected 2015-06-02
Related to Tails - Feature #11356: Add Chutney to our isotesters Rejected 2016-04-15
Blocks Tails - Bug #10442: Totem "Watching a WebM video over HTTPS" test never passes on Jenkins Resolved 2015-10-28
Blocks Tails - Bug #10381: The "I open the address" steps are fragile Resolved 2015-10-15
Blocks Tails - Feature #10379: Check that we do not see any error pages in the "I open the address" step. Rejected 2015-10-15
Blocks Tails - Bug #10376: The "the Tor Browser loads the (startup page|Tails roadmap)" step is fragile Resolved 2015-10-15
Blocks Tails - Bug #10497: wait_until_tor_is_working helper is fragile Resolved 2015-11-06
Blocks Tails - Bug #10495: The 'the time has synced' step is fragile In Progress 2015-11-06
Blocks Tails - Feature #11351: Upgrade to Tor 0.2.8 Resolved 2016-03-26
Blocks Tails - Bug #9654: "IPv4 TCP non-Tor Internet hosts were contacted" during the test suite Resolved 2015-06-29
Blocks Tails - Bug #10440: Time syncing scenarios are fragile Resolved 2015-10-28

History

#1 Updated by anonym 2015-06-02 14:23:29

  • related to Bug #9478: How to deal with transient network errors in the test suite? added

#2 Updated by anonym 2015-06-02 14:25:23

  • related to Feature #9520: Investigate the Shadow network simulator for use in our test suite added

#3 Updated by anonym 2015-06-02 14:25:34

Mostly copy-pasted from Bug #9478:

chutney doesn’t look as polished as Shadow (the first line of the README isn’t very encouraging: “This is chutney. It doesn’t do much so far. It isn’t ready for prime-time.”). It is, however, dirt simple to setup, which is nice:

git clone https://git.torproject.org/chutney.git
cd chutney
./chutney configure networks/basic
./chutney start networks/basic


And that’s literally it for this simple setup (I guess we want a slightly bigger network, plus bridges, which seems doable even if we have to make some templates for the latter ourselves). :) I could use the two clients that networks/basic defines, and the traffic would exit from my computer as expected, which is what we want for the mid-term goal. I suspect chutney will be trivial to package for Debian, as it only depends on python (2.7+). Yay!

Barring any issues due to it’s supposed immaturity, it seems chutney would indeed work well for the mid-term goal. As for the long-term goal, chutney clearly isn’t designed to “simulate the Internet” like Shadow is. I don’t know how much that matters, though. I seems fairly easy to setup a virtual network where the services we need would run directly on the testing host, and the exits would reach them. However, I wonder how it’ll if that network is using the private IP space. Relays/exit will work fine once we set ExitPolicyRejectPrivate 0, so that’s fine, but Tails client will go ballistic if it resolves a domain to a private address, right (that’s a feature of Tor, IIRC)? And perhaps the differences in resolving works in SOCKS4 vs 5 will cause problems (just brainstorming). Perhaps we can pick some random non-private IP range and use it in the local network, and play with the testing host’s routing table, to work around this? Network namespaces can probably be useful.

It should be noted that I have no idea if Shadow actually solves this better given our requirements. Also, this presumed advantage of Shadow should be weighed against the advantages of chutney. Hm. Not having to patch the Tor client + super easy setup sounds really compelling, even if we have to do some custom tricks for the long-term “simulate the Internet” goal. IMHO, chutney actually looks like the better option.

#4 Updated by anonym 2015-07-16 13:10:23

After having had a deeper look in how we could use chutney in Tails automated test suite I’ve hit a blocker. Obviously the chutney Tor network would run on the testing host, and the testing guest would access it over the virtual LAN. However, chutney hardcodes 127.0.0.1 as the destination in e.g. the authority certificates, and Tor will not like this discrepancy. In short, chutney is designed to run the whole simulation, including all clients, on the same host, using loopback. Fixing this should be pretty simple e.g. by adding an env var CHUTNEY_LISTEN_ADDRESS that we could export as 10.2.2.1 or whatever the virtual host interface has. Hopefully upstream will accept such a patch.

#5 Updated by anonym 2015-07-16 14:20:05

Minimal patch that fixes the above:

--- a/lib/chutney/TorNet.py
+++ b/lib/chutney/TorNet.py
@@ -672,7 +672,7 @@ DEFAULTS = {
     'tor': os.environ.get('CHUTNEY_TOR', 'tor'),
     'tor-gencert': os.environ.get('CHUTNEY_TOR_GENCERT', None),
     'auth_cert_lifetime': 12,
-    'ip': '127.0.0.1',
+    'ip': os.environ.get('CHUTNEY_LISTEN_ADDRESS', '127.0.0.1'),
     'ipv6_addr': None,
     'dirserver_flags': 'no-v2',
     'chutney_dir': '.',


Of course, it may break other stuff where 127.0.0.1 is assumed, I really didn’t look deeply especially since this worked for my purposes: after adding the appropriate torrc lines (TestingTorNetwork 1, a suitable DirAuthority line for each simulated authority, etc. looking at a simulated client’s generated torrc should give a good idea of what to do) to a Tails session, it worked just fine. Yay!

Unfortunately bootstrap (and re-bootstrap from restarting Tor, which we sill do every time we restore from a snapshot) isn’t much faster than when using the real network. It’s still 10-15 seconds in general. That’s a shame as I was hoping that using chutney would more or less eliminate that waiting time, reducing a full test suite with something like NUMBER_OF_SCENARIOS_USING_TOR * 10 seconds. Given that NUMBER_OF_SCENARIOS_USING_TOR currently is something like 100, that means 1000 seconds, or a bit more than 15 minutes. Oh well.

#6 Updated by intrigeri 2015-07-16 16:42:42

> Unfortunately bootstrap (and re-bootstrap from restarting Tor, which we sill do every time we restore from a snapshot) isn’t much faster than when using the real network. It’s still 10-15 seconds in general.

I’m under the impression that it takes longer than 10-15 seconds on isotester1.lizard and in the settings where I run the test suite locally most often, but I didn’t measure it. I would measure it if there was some trivial way for me to do so, in case it matters (your call :)

#7 Updated by anonym 2015-07-18 05:55:19

intrigeri wrote:
> I’m under the impression that it takes longer than 10-15 seconds on isotester1.lizard and in the settings where I run the test suite locally most often, but I didn’t measure it. I would measure it if there was some trivial way for me to do so, in case it matters (your call :)

You can try this:

--- a/features/support/helpers/misc_helpers.rb
+++ b/features/support/helpers/misc_helpers.rb
@@ -69,6 +69,11 @@ end
 def wait_until_tor_is_working
   try_for(270) { @vm.execute(
     '. /usr/local/lib/tails-shell-library/tor.sh; tor_is_working').success? }
+  tor_log_lines = @vm.file_content("/var/log/tor/log").split("\n")
+  tor_start = DateTime.parse(tor_log_lines.first)
+  tor_done = DateTime.parse(tor_log_lines.grep(/Bootstrapped 100%: Done/).first)
+  diff = tor_done.to_time - tor_start.to_time
+  STDERR.puts "XXX: Tor bootstrap time (seconds): #{diff}"
 end

 def convert_bytes_mod(unit)

#10 Updated by intrigeri 2015-07-18 07:33:57

> You can try this:

Done!

  • on isotester1.lizard (features/torified_browsing.feature run twice, so only two full bootstraps, and the rest is re-bootstraps):
    • bootstrap times: 13, 13, 27, 75, 203, 12, 141, 27, 12, 21, 12, 27, 13, 13, 139, 139, 13, 140, 139, 12
    • mean bootstrap time: 59.55
    • median bootstrap time: 24.0
  • in my usual local testing environment (all tests up to, and including, electrum.feature, so a few full bootstraps + a few re-bootstraps):
    • bootstrap times: 22, 13, 14, 12, 26, 22, 20, 19, 23, 21, 21, 21, 22, 104, 24, 95
    • mean bootstrap time: 29.94
    • median bootstrap time: 21.5

=> so it seems that going down to 10-15 seconds would be a performance improvement in these settings, especially since (I guess) it would remove outliers that make the mean higher than the median. I’m not saying that this, in itself, is worth going the chutney way, but we have other reasons to investigate it anyway :)

#11 Updated by anonym 2015-09-02 02:13:29

This may be a good read in the future: https://trac.torproject.org/projects/tor/wiki/doc/TorChutneyGuide

#12 Updated by intrigeri 2015-09-27 08:59:19

  • Target version set to 2016

(As added by anonym post-summit.)

#13 Updated by anonym 2015-09-27 09:09:54

  • Description updated

#15 Updated by anonym 2016-02-20 14:57:17

  • blocks Bug #10442: Totem "Watching a WebM video over HTTPS" test never passes on Jenkins added

#16 Updated by anonym 2016-02-20 14:57:28

  • blocks Bug #10381: The "I open the address" steps are fragile added

#17 Updated by anonym 2016-02-20 14:57:38

  • blocks Feature #10379: Check that we do not see any error pages in the "I open the address" step. added

#18 Updated by anonym 2016-02-20 14:57:49

  • blocks Bug #10376: The "the Tor Browser loads the (startup page|Tails roadmap)" step is fragile added

#19 Updated by anonym 2016-02-20 14:58:44

  • Subject changed from Investigate the chutney Tor network simulator for use in our test suite to Use the chutney Tor network simulator in our test suite
  • Status changed from Confirmed to In Progress
  • Priority changed from Normal to Elevated
  • Target version changed from 2016 to Tails_2.4
  • % Done changed from 0 to 20
  • Type of work changed from Research to Code
  • Deliverable for set to 270

#20 Updated by intrigeri 2016-02-20 14:59:17

  • blocks Bug #10497: wait_until_tor_is_working helper is fragile added

#21 Updated by anonym 2016-03-03 16:18:41

  • blocks Bug #10495: The 'the time has synced' step is fragile added

#22 Updated by anonym 2016-04-15 05:00:55

  • % Done changed from 20 to 30
  • Feature Branch set to test/9521-chutney

The current branch allows us to seamlessly run the tests with either the real Tor network (i.e. like before) or a simulated one (by Chutney) based on the local configuration; to use Chutney’s simulated Tor network one has to configure something like

Chutney:
  src_dir: "/path/to/chutney-src-tree"


and the Chutney sources must have the patches I have submitted upstream to these two tickets:

What remains is (at least):

  • Consider adding chutney as a Git submodule instead of having the user providing a path to a correct Chutney distribution. Long-term we’d prefer to have Chutney packaged in Debian, of course, but let’s not bother with even starting to think about that now.
  • Figure out how we want to use this:
    • Should the simulated network be used by default? How to control that (run_test_suite --real-tor-network)?
    • Or do we actually always want to use it, except in some feature that explicitly uses the real Tor network?
    • What to do for the run we do for tentative release images?
  • Probably drop everything using check.tp.o, see commits commit:197e795 and commit:ac10a9e why it is ugly and problematic with Chutney.
  • Currently the only scenarios that fail are the ones using bridges (\o/) because there’s no support for that yet. I think I could pretty easily add support for “normal” bridges, since there is a Chutney torrc template for it, so we need to either: (1) patch Chutney so we can provide our own templates bridges with the transports we want to test, or (2) upstream templates for all transports we want to test. Perhaps we want to do both; (2) because we are nice with upstream, (1) because we want our own way to add torrc templates so we don’t have to wait (2) each time we want to add a test for a new transport.

Also, currently Jenkins will run these tests with the real Tor network, but I’d like to have it start testing with the simulated network soon. The easiest would be to put a Chutney source tree checkout with my patches applied and the required test suite configuration (which won’t interfere when other branches are being tested) on all isotesters.

#23 Updated by anonym 2016-04-15 08:30:36

anonym wrote:
> * Consider adding chutney as a Git submodule instead of having the user providing a path to a correct Chutney distribution. Long-term we’d prefer to have Chutney packaged in Debian, of course, but let’s not bother with even starting to think about that now.

Due to the pain of using Git submodules, we decided to go with the current approach where we point to a Chutney checkout somewhere on the filesystem. It will be dealt with on the isotesters with Feature #11356.

> Should the simulated network be used by default?

We all agree that: yes, we should do this …

> How to control that (run_test_suite --real-tor-network)?

… and we don’t care about this. I’ll keep all this “pluggable” though so if we come up for a case where we want to support this, it’ll be easy. It will not complicate things, and only add minimal bloat (two if:s that always will be true) so why not?

> Or do we actually always want to use it, except in some feature that explicitly uses the real Tor network?
> What to do for the run we do for tentative release images?

We will use Chutney in all tests in the short-term and probably mid-term, so that is what we will focus on now.

Long-term we want to make Jenkins an integral part of the release process, and delegate the automated tests to it (or at least make it another player in that game), but it first has to be robust enough. In this case we’ll probably want it to run at least some basic test(s) with the normal Tor network, as a sanity sanity check. An idea is: that feature is marked @release and normally Jenkins runs tests with --tags ~release@ (just like how we do with the @fragile tag), except when we build from a release tag. Note: we currently do not build release tags, but presumably we’d want to do that once we reach the state of “involving Jenkins in releases”.

(The next step would be that when we have reproducible builds, we just have to inform Jenkins that the RM’s locally built image’s hash matches the one Jenkins built, and then Jenkins (or something else, automated) will proceed with publishing the image over bittorrent and our HTTP mirrors)

> Also, currently Jenkins will run these tests with the real Tor network, but I’d like to have it start testing with the simulated network soon. The easiest would be to put a Chutney source tree checkout with my patches applied and the required test suite configuration (which won’t interfere when other branches are being tested) on all isotesters.

Again, Feature #11356.

#24 Updated by anonym 2016-04-22 05:02:25

So I spent most of the past four days improving and extensively testing this branch. It certainly looks like a big improvement — while I still see occasional bootstrap failures, I think I have only seen one transient post-bootstrap error (in git.feature cloning over HTTPs once randomly failed).

I’d really like us to get this running on Jenkins (Feature #11356 => Elevated) for some stats gathering. Beyond this branch, I will also create these branches for testing purposes only:

  • devel with all Tor-related @fragile tags removed
  • test/9521-chutney with all Tor-related @fragile tags removed

I think I’d also like to have variants that kills the Tor bootstrap restarting stuff we have in the restart-tor script (which I suspect make things worse sometimes nowadays), and maybe also try with tor 2.8.x (I guess both on the test suite host and in Tails) since it affects bootstrapping in some interesting ways (see Bug #11285, which would have to be solved for the test suite first).

#25 Updated by anonym 2016-04-25 17:40:20

anonym wrote:
> I will also create these branches for testing purposes only:
>
> * devel with all Tor-related @fragile tags removed

Done in the test/9521-with-fragile-scenarios branch.

> * test/9521-chutney with all Tor-related @fragile tags removed

Done in the test/9521-chutney-with-fragile-scenarios branch.

> I think I’d also like to have variants that kills the Tor bootstrap restarting stuff we have in the restart-tor script (which I suspect make things worse sometimes nowadays), and maybe also try with tor 2.8.x (I guess both on the test suite host and in Tails) since it affects bootstrapping in some interesting ways (see Bug #11285, which would have to be solved for the test suite first).

we’ll see how the above branches fare before testing any of this.

#26 Updated by bertagaz 2016-04-26 03:01:33

I think there’s a bug somewhere in the new branch without fragile tags (and the chutney branch itself). It's failing constantly on the check_tor_leaks@ scenarios with a "no implicit conversion of nil into Array (TypeError)" error in features/support/hooks.rb:271:in `After'.

It seems to have appeared since commit:acc3a1905db53a9e2707fa56d67d7828591b602f has been merged in this branch. See e.g first Jenkins run that started to exhibit this failure

#27 Updated by anonym 2016-04-28 15:58:29

bertagaz wrote:
> I think there’s a bug somewhere in the new branch without fragile tags (and the chutney branch itself). It's failing constantly on the check_tor_leaks@ scenarios with a "no implicit conversion of nil into Array (TypeError)" error in features/support/hooks.rb:271:in `After'.
>
> It seems to have appeared since commit:acc3a1905db53a9e2707fa56d67d7828591b602f has been merged in this branch. See e.g first Jenkins run that started to exhibit this failure

Whops, I didn’t see your comment. Yes I noticed this independently and fixed it. It was force-pushed, but it is confined into the corresponding commit, namely commit:8474c25005395aa9866f29188856cf26c490bb62.

#28 Updated by anonym 2016-05-08 03:37:27

#29 Updated by anonym 2016-05-10 09:14:44

  • Assignee changed from anonym to intrigeri
  • % Done changed from 30 to 50
  • QA Check set to Ready for QA

As decided during the CI meeting, I’m assigning it to you, intrigeri, for a code review. Also, if you haven’t reviewed it by Wednesday next week (2016-05-18), I’m supposed to just merge it any way, and I guess we’ll do a post-merge code review.

#30 Updated by intrigeri 2016-05-11 04:41:11

  • Assignee changed from intrigeri to anonym
  • QA Check changed from Ready for QA to Dev Needed

Great job! I’ve pushed a couple typo fixes, but did not test the thing yet.

Regarding the doc to set up test suite with Chutney:

  • It feels that manually clone + apply patches by hand is less practical than it could be (especially if we ever have to add more patches, etc.). How about we add a Git submodule pointing to the relevant branch in our own Chutney repo?
  • “For now you also have to copy (or, better, symlink)” ← how hard would it be to have the test suite do it itself?

Regarding things like assert_all_connections(sniffer.pcap_file) do |host|@ → s/host/connection/ (reusing the terminology from pcap_connections_helper), for better clarity, and then make it clear that we’re talking of the destination host+port? Or instead s/host/destination_host/, maybe. And then, I’m not convinced that the “Convenience aliases” help more (the test developer) than they add to confusion (for me): e.g. I found it weird to read [host.mac_saddr, host.mac_daddr].include?($vm.real_mac) == is_leaking (it looks like we’re asking the saddr of the destination host, which feels awkward).

Now, a couple rephrasing suggestions:

  • “there was no traffic sent to the web server on the LAN”: maybe “no traffic was sent to […]” instead?
  • “Unexpected hosts were contacted”: maybe “Unexpected packets were seen” instead? (we’re not necessarily selecting on the destination host only)

Is it me, or this branch will fix Bug #8961?

Why was require 'ipaddr' added to vm_helper.rb?

#31 Updated by anonym 2016-05-11 06:23:44

#32 Updated by anonym 2016-05-11 06:58:23

  • Assignee changed from anonym to intrigeri
  • % Done changed from 50 to 60

intrigeri wrote:
> Great job! I’ve pushed a couple typo fixes, but did not test the thing yet.
>
> Regarding the doc to set up test suite with Chutney:
>
> * It feels that manually clone + apply patches by hand is less practical than it could be (especially if we ever have to add more patches, etc.). How about we add a Git submodule pointing to the relevant branch in our own Chutney repo?

This is actually what I originally suggested, but it was rejected (“Git submodules are awkward to work with” or something) when we talked about it during some CI meeting; see beginning of Feature #9521#note-23.

Personally I would love this solution. Since chutney then becomes self-contained, no setup instruction are needed any more and we can get rid of the Chutney: src_dir crap from the local test suite configuration; and new patches will, indeed, be much easier to add since there is no coordination with sysadmins required. If you create a repo forked from the upstream, I can get this into shape (hence “Dev Needed”).

> * “For now you also have to copy (or, better, symlink)” ← how hard would it be to have the test suite do it itself?

It would not be hard, but if we have chutney as a Git submodule I’ll just push the templates in there instead (I guess the one we have could be upstreamed, eventually, so it makes sense that way too). Then I can forget about Bug #11364 and other upstream chutney work for a while (until the autumn or something), which is welcome in these times of — well — not enough time. :)

> Regarding things like assert_all_connections(sniffer.pcap_file) do |host|@ → s/host/connection/ (reusing the terminology from pcap_connections_helper), for better clarity

Fixed in commit:d098f3c.

> and then make it clear that we’re talking of the destination host+port? Or instead s/host/destination_host/, maybe. And then, I’m not convinced that the “Convenience aliases” help more (the test developer) than they add to confusion (for me):

Agreed. The assertion is made from the Tails VM’s perspective, so what “source” and “destination” means should be clear (added comment in commit:800fd2f just to be sure), but, indeed, let’s make it explicit which end of a connection we are looking at by dropping the “convenience aliases” and use daddr and dport instead. Fixed in commit:d6b1752. Note that we essentially never will have to look at the source address/port, that’s why I added the convenience aliases.

> e.g. I found it weird to read [host.mac_saddr, host.mac_daddr].include?($vm.real_mac) == is_leaking (it looks like we’re asking the saddr of the destination host, which feels awkward).

With these changes and explanations in place, we’re good here, right?

> Now, a couple rephrasing suggestions:
>
> * “there was no traffic sent to the web server on the LAN”: maybe “no traffic was sent to […]” instead?

Agreed, fixed in commit:2b6626a.

> * “Unexpected hosts were contacted”: maybe “Unexpected packets were seen” instead? (we’re not necessarily selecting on the destination host only)

Absolutely, but I think I stick with the “connections” term, so: “Unexpected connections were made”. Fixed in commit:69b6da7.

> Is it me, or this branch will fix Bug #8961?

It should, yes. I’ll have to go through these Tor-related test suite tickets and see which ones should be affected.

> Why was require 'ipaddr' added to vm_helper.rb?

vm_helper.rb uses ipaddr’s IPAddr in bridge_ip_addr() so it should have been there in the first place. Ruby is happy as long as some file requires a module, so that’s why it worked before.

#34 Updated by intrigeri 2016-05-11 07:29:32

  • Assignee changed from intrigeri to anonym

>> * It feels that manually clone + apply patches by hand is less practical than it could be (especially if we ever have to add more patches, etc.). How about we add a Git submodule pointing to the relevant branch in our own Chutney repo?

> Personally I would love this solution. Since chutney then becomes self-contained, no setup instruction are needed any more and we can get rid of the Chutney: src_dir crap from the local test suite configuration; and new patches will, indeed, be much easier to add since there is no coordination with sysadmins required. If you create a repo forked from the upstream, I can get this into shape (hence “Dev Needed”).

Excellent. As clarified over IM, the repo already exists => please go ahead :)

>> * “For now you also have to copy (or, better, symlink)” ← how hard would it be to have the test suite do it itself?

> It would not be hard, but if we have chutney as a Git submodule I’ll just push the templates in there instead (I guess the one we have could be upstreamed, eventually, so it makes sense that way too). Then I can forget about Bug #11364 and other upstream chutney work for a while (until the autumn or something), which is welcome in these times of — well — not enough time. :)

Cool. Please go ahead.

> With these changes and explanations in place, we’re good here, right?

Yes!

Code-reviewed the branch again up to commit:69b6da72353431acc5fa8959987d3ab2bf65085f, fine with me!

#35 Updated by anonym 2016-05-11 13:35:46

  • Assignee changed from anonym to intrigeri
  • % Done changed from 60 to 70
  • QA Check changed from Dev Needed to Ready for QA

intrigeri wrote:
> >> * It feels that manually clone + apply patches by hand is less practical than it could be (especially if we ever have to add more patches, etc.). How about we add a Git submodule pointing to the relevant branch in our own Chutney repo?
>
> > Personally I would love this solution. Since chutney then becomes self-contained, no setup instruction are needed any more and we can get rid of the Chutney: src_dir crap from the local test suite configuration; and new patches will, indeed, be much easier to add since there is no coordination with sysadmins required. If you create a repo forked from the upstream, I can get this into shape (hence “Dev Needed”).
>
> Excellent. As clarified over IM, the repo already exists => please go ahead :)
>
> >> * “For now you also have to copy (or, better, symlink)” ← how hard would it be to have the test suite do it itself?
>
> > It would not be hard, but if we have chutney as a Git submodule I’ll just push the templates in there instead (I guess the one we have could be upstreamed, eventually, so it makes sense that way too). Then I can forget about Bug #11364 and other upstream chutney work for a while (until the autumn or something), which is welcome in these times of — well — not enough time. :)
>
> Cool. Please go ahead.

Done in:

  • 97ada21 Add our temporary Chutney fork as a Git submodule.
  • a1f8a47 Move bridge-obfs4.tmpl into the Chutney submodule.
  • 77d16ec Use our Chutney Git submodule instead of an external checkout.

> Code-reviewed the branch again up to commit:69b6da72353431acc5fa8959987d3ab2bf65085f, fine with me!

Please also look at commit:e500661 (dogtail! :)) since the use of Tor Check just felt so wrong to keep using now that it will always tell us we do not use Tor thanks to Chutney.

#36 Updated by intrigeri 2016-05-12 03:07:13

  • Assignee changed from intrigeri to anonym
  • QA Check changed from Ready for QA to Info Needed

Code review passes. I’ve tried to run the test suite from that branch, and here’s what I see:

Command failed (returned pid 28987 exit 255): ["/srv/git/submodules/chutney/chutney", "start", "/srv/git/features/chutney/test-network", {:err=>[:child, :out]}]:
Using Python 2.7.9

Starting nodes

Couldn't launch test000auth (tor --quiet -f /tmp/TailsToaster/chutney-data/nodes/000auth/torrc): 255

Couldn't launch test001auth (tor --quiet -f /tmp/TailsToaster/chutney-data/nodes/001auth/torrc): 255

Couldn't launch test002auth (tor --quiet -f /tmp/TailsToaster/chutney-data/nodes/002auth/torrc): 255

Couldn't launch test003auth (tor --quiet -f /tmp/TailsToaster/chutney-data/nodes/003auth/torrc): 255


.
<0> expected but was
<#<Process::Status: pid 28987 exit 255>>. (Test::Unit::AssertionFailedError)
/usr/lib/ruby/vendor_ruby/test/unit/assertions.rb:55:in `block in assert_block'
/usr/lib/ruby/vendor_ruby/test/unit/assertions.rb:1593:in `call'
/usr/lib/ruby/vendor_ruby/test/unit/assertions.rb:1593:in `_wrap_assertion'
/usr/lib/ruby/vendor_ruby/test/unit/assertions.rb:53:in `assert_block'
/usr/lib/ruby/vendor_ruby/test/unit/assertions.rb:240:in `assert_equal'
/srv/git/features/support/helpers/misc_helpers.rb:196:in `block in cmd_helper'
/srv/git/features/support/helpers/misc_helpers.rb:192:in `popen'
/srv/git/features/support/helpers/misc_helpers.rb:192:in `cmd_helper'
/srv/git/features/step_definitions/chutney.rb:27:in `block (2 levels) in ensure_chutney_is_running'
/srv/git/features/step_definitions/chutney.rb:26:in `chdir'
/srv/git/features/step_definitions/chutney.rb:26:in `block in ensure_chutney_is_running'
/srv/git/features/step_definitions/chutney.rb:49:in `call'
/srv/git/features/step_definitions/chutney.rb:49:in `ensure_chutney_is_running'
/srv/git/features/support/hooks.rb:177:in `block in <top (required)>'
/srv/git/features/support/extra_hooks.rb:35:in `call'
/srv/git/features/support/extra_hooks.rb:35:in `invoke'
/srv/git/features/support/extra_hooks.rb:114:in `block in before_feature'
/srv/git/features/support/extra_hooks.rb:113:in `each'
/srv/git/features/support/extra_hooks.rb:113:in `before_feature'
/usr/lib/ruby/vendor_ruby/cucumber/ast/tree_walker.rb:181:in `block in send_to_all'
/usr/lib/ruby/vendor_ruby/cucumber/ast/tree_walker.rb:179:in `each'
/usr/lib/ruby/vendor_ruby/cucumber/ast/tree_walker.rb:179:in `send_to_all'
/usr/lib/ruby/vendor_ruby/cucumber/ast/tree_walker.rb:169:in `broadcast'
/usr/lib/ruby/vendor_ruby/cucumber/ast/tree_walker.rb:26:in `visit_feature'
/usr/lib/ruby/vendor_ruby/cucumber/ast/features.rb:28:in `block in accept'
/usr/lib/ruby/vendor_ruby/cucumber/ast/features.rb:17:in `each'
/usr/lib/ruby/vendor_ruby/cucumber/ast/features.rb:17:in `each'
/usr/lib/ruby/vendor_ruby/cucumber/ast/features.rb:27:in `accept'
/usr/lib/ruby/vendor_ruby/cucumber/ast/tree_walker.rb:21:in `block in visit_features'
/usr/lib/ruby/vendor_ruby/cucumber/ast/tree_walker.rb:170:in `broadcast'
/usr/lib/ruby/vendor_ruby/cucumber/ast/tree_walker.rb:20:in `visit_features'
/usr/lib/ruby/vendor_ruby/cucumber/runtime.rb:49:in `run!'
/usr/lib/ruby/vendor_ruby/cucumber/cli/main.rb:42:in `execute!'
/usr/bin/cucumber:13:in `<main>'

How can I debug this?

#37 Updated by bertagaz 2016-05-12 03:42:52

intrigeri wrote:
> How can I debug this?

You can remove the --quiet option used in chutney to start the Tor instances (only used in two occurrences), you’ll get the reason why they failed to start. Are you using our dedicated branch with all the patches? This error sound a lot like the one I experienced in Jenkins when chutney was sourcing the system torrc.

#38 Updated by anonym 2016-05-12 04:29:03

bertagaz wrote:
> intrigeri wrote:
> > How can I debug this?
>
> You can remove the --quiet option used in chutney to start the Tor instances (only used in two occurrences), you’ll get the reason why they failed to start.

If it helps, this is what I’d use:

sed -i '/"--quiet",/d' submodules/chutney/lib/chutney/TorNet.py

> Are you using our dedicated branch with all the patches?

As you can see in the log, he uses the chutney Git submodule that I pushed yesterday, and it tracks the correct branch so all required commits should be there.

> This error sound a lot like the one I experienced in Jenkins when chutney was sourcing the system torrc.

From the log I can see that it is not the same commandline that fails (e.g. there is no “—list-fingerprint” this time); now it indeed must be the start() method that fails, so this is another issue.

intrigeri, was this the first time you tried running it, or did this work before? And what versoin of Tor are you running? I’m again wondering if there is some permissions issue involved here. :)

#39 Updated by anonym 2016-05-12 04:29:11

  • Assignee changed from anonym to intrigeri

#40 Updated by intrigeri 2016-05-12 04:38:23

  • Assignee changed from intrigeri to anonym

> If it helps, this is what I’d use:

Thanks! So I re-run the command line that failed, and:

# tor -f /tmp/TailsToaster/chutney-data/nodes/003auth/torrc
May 12 07:35:20.159 [notice] Tor v0.2.5.12 (git-3731dd5c3071dcba) running on Linux with Libevent 2.0.21-stable, OpenSSL 1.0.1k and Zlib 1.2.8.
May 12 07:35:20.159 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://www.torproject.org/download/download#warning
May 12 07:35:20.159 [notice] Read configuration file "/tmp/TailsToaster/chutney-data/nodes/003auth/torrc".
May 12 07:35:20.170 [notice] Based on detected system memory, MaxMemInQueues is set to 8192 MB. You can override this by setting MaxMemInQueues by hand.
May 12 07:35:20.170 [warn] You have used DirAuthority or AlternateDirAuthority to specify alternate directory authorities in your configuration. This is potentially dangerous: it can make you look different from all other Tor users, and hurt your anonymity. Even if you've specified the same authorities as Tor uses by default, the defaults could change in the future. Be sure you know what you're doing.
May 12 07:35:20.170 [warn] The DirAuthority options 'hs' and 'no-hs' are obsolete; you don't need them any more.
May 12 07:35:20.170 [warn] The DirAuthority options 'hs' and 'no-hs' are obsolete; you don't need them any more.
May 12 07:35:20.170 [warn] The DirAuthority options 'hs' and 'no-hs' are obsolete; you don't need them any more.
May 12 07:35:20.170 [warn] The DirAuthority options 'hs' and 'no-hs' are obsolete; you don't need them any more.
May 12 07:35:20.170 [warn] Failed to parse/validate config: V3AuthVotingInterval is insanely low.
May 12 07:35:20.170 [err] Reading config failed--see warnings above.

So “V3AuthVotingInterval is insanely low” is probably the problem, right?

> intrigeri, was this the first time you tried running it, or did this work before?

First time.

> And what versoin of Tor are you running?

0.2.5.12-1 from Jessie. Do we need anything newer?

#41 Updated by intrigeri 2016-05-12 04:47:07

  • Assignee changed from anonym to intrigeri
  • QA Check changed from Info Needed to Dev Needed

Indeed, upgrading to 0.2.7.6-1~bpo8+1 fixes the problem => doc :)

#42 Updated by intrigeri 2016-05-12 04:48:06

  • Assignee changed from intrigeri to anonym

#43 Updated by anonym 2016-05-12 04:59:00

  • Assignee changed from anonym to intrigeri
  • QA Check changed from Dev Needed to Ready for QA

intrigeri wrote:
> Indeed, upgrading to 0.2.7.6-1~bpo8+1 fixes the problem => doc :)

Ah, yes. Specifically Tor 0.2.6.x is required, sorry for not having made this clear before. Docs fixed in commit:1dd8cb7.

#44 Updated by intrigeri 2016-05-12 10:31:40

  • Assignee changed from intrigeri to anonym
  • QA Check changed from Ready for QA to Dev Needed

Code review passes!

I’m doing a full test suite run (and will report back here later), but sending back to anonym’s plate so that he can handle failures such as https://jenkins.tails.boum.org/job/test_Tails_ISO_test-9521-chutney-with-fragile-scenarios/40/cucumberTestReport/the-tor-enforcement-is-effective/anti-test_-detecting-udp-leaks-of-dns-lookups-with-the-firewall-leak-detector/.

#45 Updated by intrigeri 2016-05-13 00:40:24

So, here are my test results:

  • I see most time syncing scenarios with a modified clock fail (timeout in “Tor is ready”) twice in a row. Interestingly, the two “Clock is one day in the future” scenarios pass each time.
  • (probably caused by dogtail rather than chutney) I see “When I disable the first persistence preset” fail: on the video, I see the apps->Tails submenu is open, but the persistent volume assistant entry is never clicked. Want a ticket about it?
  • (probably unrelated to chutney) I see the second “Then Pidgin automatically enables my XMPP account” fail in “Scenario: Using a persistent Pidgin configuration”: I see the account manager, and the XMPP account we created before rebooting is not listed (what?!). OTOH, “Pidgin has the expected persistent accounts configured” has succeeded, so clearly something is wrong; and FWIW, I’ve verified that the random IRC nickname listed in the account manager is the same as during the previous boot. Shall I report this separately?

#46 Updated by anonym 2016-05-13 04:19:05

intrigeri wrote:
> * I see most time syncing scenarios with a modified clock fail (timeout in “Tor is ready”) twice in a row. Interestingly, the two “Clock is one day in the future” scenarios pass each time.

Interesting. I’m not sure what to do with this. We know that “tordate” is a mess. I really do not expect us to try to fix it — in fact, I hope we don’t and instead spend that energy on Feature #5774 (Robust time syncing). Hence, I would actually like to remove all “tordate”-related scenarios from time_syncing.feature (i.e. all scenarios but the last two), and only add more scenarios once Feature #5771 is solved.

So, to me it doesn’t really matter if Chutney made “tordate” less robust. “tordate” is dead to me :). Besides, given how crazy “tordate” is at depending on exact behavior of the client vs the rest of the network, it surely could be that Chutney’s network and the real network differs enough that “tordate” will work differently between them.

> * (probably caused by dogtail rather than chutney) I see “When I disable the first persistence preset” fail: on the video, I see the apps->Tails submenu is open, but the persistent volume assistant entry is never clicked. Want a ticket about it?

Please!

> * (probably unrelated to chutney) I see the second “Then Pidgin automatically enables my XMPP account” fail in “Scenario: Using a persistent Pidgin configuration”: I see the account manager, and the XMPP account we created before rebooting is not listed (what?!). OTOH, “Pidgin has the expected persistent accounts configured” has succeeded, so clearly something is wrong; and FWIW, I’ve verified that the random IRC nickname listed in the account manager is the same as during the previous boot. Shall I report this separately?

Please!

#47 Updated by anonym 2016-05-13 06:14:50

intrigeri wrote:
> I’m doing a full test suite run (and will report back here later), but sending back to anonym’s plate so that he can handle failures such as https://jenkins.tails.boum.org/job/test_Tails_ISO_test-9521-chutney-with-fragile-scenarios/40/cucumberTestReport/the-tor-enforcement-is-effective/anti-test_-detecting-udp-leaks-of-dns-lookups-with-the-firewall-leak-detector/.

I am pretty sure that the static sleep I added in commit:7675d1e is enough for my system, but not on jenkins. If I remove it, I can reproduce the same problem most of the time. I am pretty sure if I would bump it, it’d work on jenkins too.

I spent some time researching this and found a seemingly perfect solution in commit:ca2b65a (see its commit message in particular). It works perfectly on my system, but let’s see how it does on jenkins.

#48 Updated by anonym 2016-05-13 07:26:02

  • blocks Bug #9654: "IPv4 TCP non-Tor Internet hosts were contacted" during the test suite added

#49 Updated by intrigeri 2016-05-14 05:41:56

> Interesting. I’m not sure what to do with this. We know that “tordate” is a mess. I really do not expect us to try to fix it — in fact, I hope we don’t and instead spend that energy on Feature #5774 (Robust time syncing).

ACK.

> Hence, I would actually like to remove all “tordate”-related scenarios from time_syncing.feature (i.e. all scenarios but the last two), and only add more scenarios once Feature #5771 is solved.

Let’s do that, but keep “Clock with host’s time”, “Clock with host’s time in bridge mode”, and the last three scenarios, OK?

> So, to me it doesn’t really matter if Chutney made “tordate” less robust. “tordate” is dead to me :). Besides, given how crazy “tordate” is at depending on exact behavior of the client vs the rest of the network, it surely could be that Chutney’s network and the real network differs enough that “tordate” will work differently between them.

Sounds totally reasonable to be, let’s be pragmatic here.

#50 Updated by anonym 2016-05-14 06:16:34

Ack, most “tordate” scenarios were removed in commit:ffba63d.

#51 Updated by anonym 2016-05-14 06:18:51

  • blocks Bug #10440: Time syncing scenarios are fragile added

#52 Updated by intrigeri 2016-05-14 06:23:36

>> * (probably caused by dogtail rather than chutney) I see “When I disable the first persistence preset” fail: on the video, I see the apps->Tails submenu is open, but the persistent volume assistant entry is never clicked. Want a ticket about it?

> Please!

I’ve fixed it in commit:41952f6, works for me. Pushed directly to devel, please take a look.

>> * (probably unrelated to chutney) I see the second “Then Pidgin automatically enables my XMPP account” fail in “Scenario: Using a persistent Pidgin configuration”: I see the account manager, and the XMPP account we created before rebooting is not listed (what?!). OTOH, “Pidgin has the expected persistent accounts configured” has succeeded, so clearly something is wrong; and FWIW, I’ve verified that the random IRC nickname listed in the account manager is the same as during the previous boot. Shall I report this separately?

Now known as Bug #11413.

#53 Updated by intrigeri 2016-05-14 07:33:21

  • Assignee changed from anonym to intrigeri
  • % Done changed from 70 to 90
  • QA Check changed from Dev Needed to Ready for QA

So, we compared test results on Jenkins (devel vs. this branch, and devel+fragile vs. this branch+fragile), and it looks like we’re good and can merge this! I’ll do a last full test suite run locally and then I expect to merge.

#54 Updated by intrigeri 2016-05-14 13:03:40

  • Status changed from In Progress to Fix committed
  • Assignee deleted (intrigeri)
  • % Done changed from 90 to 100
  • QA Check changed from Ready for QA to Pass

Merged! Now you can look at the bunch of tickets that were blocked by this one, and hopefully a few of those will be fixed for free :)

#55 Updated by sajolida 2016-05-15 11:07:49

Woohoo!

#56 Updated by anonym 2016-06-08 01:24:03

  • Status changed from Fix committed to Resolved

#57 Updated by BitingBird 2016-06-26 10:44:17

  • Priority changed from Elevated to Normal