Bug #10732

Fix tor-has-bootstrapped semantics on network reconnect

Added by intrigeri 2015-12-09 06:09:08 . Updated 2016-01-27 13:21:22 .

Status:
Resolved
Priority:
High
Assignee:
Category:
Target version:
Start date:
2015-12-09
Due date:
% Done:

100%

Feature Branch:
bugfix/10732-sanest-tor-has-bootstrapped-semantics
Type of work:
Research
Blueprint:

Starter:
Affected tool:
Deliverable for:
268

Description

It seems that it returns true before tor has reconnected to the Tor network.

“what the test suite needs is something that tells whether Tor currently thinks it should be working (be able to build circuits)”

> I wonder if it happens in normal conditions, that is without restoring snapshots. But let me not try to pretend it’s not on my plate ;)
> I bet that once we’ve reached tails-tor-has-bootstrapped.target it never gets reset to non-active.
> Which, I mean, is a matter of semantics. It currently means: “Tor has already bootstrapped once on this system”.
> I wonder if this is buggy in supported non-test-suite case, e.g. on network reconnect. Or if we’re back to “please, test suite folks, tell me what semantics you need from that program”.
> maybe we should just restart tails-wait-until-tor-has-bootstrapped.service on network reconnect (not sure if it puts the target back to inactive state)


Subtasks


Related issues

Blocks Tails - Feature #7563: Update the automated test suite for Jessie ISO images Resolved 2014-11-26 2016-01-15

History

#1 Updated by intrigeri 2015-12-09 06:10:31

  • blocks Feature #7563: Update the automated test suite for Jessie ISO images added

#2 Updated by intrigeri 2015-12-09 06:10:49

  • Deliverable for set to 268

#3 Updated by intrigeri 2015-12-09 13:56:25

  • Status changed from Confirmed to In Progress
  • % Done changed from 0 to 10
  • Feature Branch set to bugfix/10732-sanest-tor-has-bootstrapped-semantics

#4 Updated by intrigeri 2015-12-09 15:43:16

  • Subject changed from Investigate tor-has-bootstrapped behaviour on network reconnect to Fix tor-has-bootstrapped semantics on network reconnect
  • Assignee changed from intrigeri to anonym
  • % Done changed from 10 to 50
  • QA Check set to Ready for QA

#5 Updated by anonym 2015-12-09 20:16:24

  • Assignee changed from anonym to intrigeri
  • % Done changed from 50 to 40
  • QA Check changed from Ready for QA to Dev Needed

Unfortunately this branch doesn’t work. :/ I was immediately hit with Pidgin trying to connect at a time when Tor clearly wasn’t working. To investigate further I re-ran the tests with the following local modification, which sort of compares it with the wait_until_tor_is_working() implementation in Wheezy where we rely on what Tor thinks the bootstrap progress is:

--- a/features/support/helpers/misc_helpers.rb
+++ b/features/support/helpers/misc_helpers.rb
@@ -138,6 +139,8 @@ end

 def wait_until_tor_is_working
   try_for(270) { $vm.execute('/usr/local/sbin/tor-has-bootstrapped').success? }
+  bootstrap = $vm.execute('tor_bootstrap_progress', :libs => 'tor').stdout
+  info_log("Tor bootstrap progress: #{bootstrap}")
 rescue Timeout::Error => e
   c = $vm.execute("journalctl SYSLOG_IDENTIFIER=restart-tor")
   if c.success?

For me it said that the Tor bootstrap progress was 80%. Then I looked at Tor’s log and saw this:

Dec 10 03:16:15.000 [notice] Tor 0.2.7.5 (git-6184c873e90d93b2) opening new log file.
[...]
Dec 10 03:16:25.000 [notice] Bootstrapped 80%: Connecting to the Tor network
Dec 10 03:16:25.000 [notice] Signaled readiness to systemd

so the .target is reached immediately when systemd is notified of Tor’s readiness. Also note that, contrary to what you say in your commit message of commit:09d396dc, Tor waits 10 seconds before doing notifying systemd; actually Tor just idles for 10 seconds — this is what I said to you privately yesterday when talking about Tor’s bootstrapping issues (Bug #10238). A bit beside the point, perhaps…

In any case, I think I’ve found the fix, which can be demonstrated with this diff:

--- a/features/step_definitions/common_steps.rb
+++ b/features/step_definitions/common_steps.rb
@@ -103,8 +103,8 @@ def post_snapshot_restore_hook
   if $vm.has_network?
     if $vm.execute("systemctl --quiet is-active tor@default.service").success?
       $vm.execute("systemctl stop tor@default.service")
-      $vm.execute("systemctl --no-block restart tails-tor-has-bootstrapped.target")
       $vm.execute("rm -f /var/log/tor/log")
+      $vm.execute("systemctl --no-block restart tails-tor-has-bootstrapped.target")
       $vm.host_to_guest_time_sync
       $vm.spawn("restart-tor")
       wait_until_tor_is_working


The issue is that when tails-tor-has-bootstrapped.target is restarted it immediately makes tails-wait-until-tor-has-bootstrapped.service run, and it runs so fast that the log is not removed, and our horrible implementation of tor_is_ready() in our shell library depends on the Tor log to be cleared. Woo! :) So the corresponding reordering must be done in the 10-tor.sh NM hook.

To end with, I’m surprised that we have to touch 10-tor.sh at all. With systemd I’d expect us to be able to express a dependency such that tails-tor-has-bootstrapped.target stops when tordefault.service@ stops, and (re)starts with it too. If we can do that, all this would be much simpler. However, due to the Tor log issue, we would need to also do one of the following then:

  • We make tordefault.service@ remove Tor’s log when it stops.
  • We switch to the GETINFO based implementation (maybe the issues we had with it are gone now?) for getting the bootstrap progress in our shell library, so tor_is_ready() doesn’t depend on clearing the Tor log.

Super exciting fun stuff!

#6 Updated by intrigeri 2015-12-10 02:47:04

  • Assignee changed from intrigeri to anonym

> In any case, I think I’ve found the fix, which can be demonstrated with this diff:
[…]
> The issue is that when tails-tor-has-bootstrapped.target is restarted it immediately makes tails-wait-until-tor-has-bootstrapped.service run, and it runs so fast that the log is not removed, and our horrible implementation of tor_is_ready() in our shell library depends on the Tor log to be cleared. Woo! :) So the corresponding reordering must be done in the 10-tor.sh NM hook.

Great! While I’m away, please / feel free to (on top of the proposed branch):

  1. apply this trick to the test suite
  2. apply the same trick in 10-tor.sh
  3. merge the result into feature/jessie

(We’ll do another review of recent feature/jessie changes next week anyway :)

And if you’re not comfortable with taking the changes to 10-tor.sh due to what follows, then just cherry-pick the bits that modify the test suite and keep going.

> To end with, I’m surprised that we have to touch 10-tor.sh at all. With systemd I’d expect us to be able to express a dependency such that tails-tor-has-bootstrapped.target stops when tordefault.service@ stops, and (re)starts with it too. If we can do that, all this would be much simpler. However, due to the Tor log issue, we would need to also do one of the following then:

> * We make tordefault.service@ remove Tor’s log when it stops.

(At least that one is trivial to implement… assuming it would not break anything unexpectedly.)

> * We switch to the GETINFO based implementation (maybe the issues we had with it are gone now?) for getting the bootstrap progress in our shell library, so tor_is_ready() doesn’t depend on clearing the Tor log.

> Super exciting fun stuff!

Right, yes, totally. I agree there’s room for improvement here. I doubt the “target” concept works in a way that allows us to do what you suggest, but regardless I’m sure we can indeed bind this service and that target (or whatever we turn the latter into) more tighly, somehow — we’ll see :)

As as see it though, we’re drifting a bit away of what’s blocking Feature #7563 and Tails 2.0 here, so please file a dedicated ticket (no milestone, assign to me unless you want it yourself) to investigate these improvements, and let’s keep the ball rolling on what is my current primary focus on this ticket, OK?

#7 Updated by intrigeri 2015-12-17 10:26:15

  • Assignee changed from anonym to intrigeri

intrigeri wrote:
> Great! While I’m away, please / feel free to (on top of the proposed branch):
>
> # apply this trick to the test suite
> # apply the same trick in 10-tor.sh
> # merge the result into feature/jessie

I’m taking back this ticket to do that on the topic branch. Then we’ll see, depending on anonym’s availability vs. my scheduling needs for 2.0~beta1.

#8 Updated by intrigeri 2015-12-17 10:41:27

> > To end with, I’m surprised that we have to touch 10-tor.sh at all. With systemd I’d expect us to be able to express a dependency such that tails-tor-has-bootstrapped.target stops when tordefault.service@ stops, and (re)starts with it too.

I think we can actually do that with PartOf=tor.service, or similar. It has too much potential for breaking other stuff for me to want to do that at this point of the 2.0 cycle, though.

#9 Updated by anonym 2015-12-18 04:13:48

  • Status changed from In Progress to Fix committed
  • % Done changed from 40 to 100

Applied in changeset commit:161a88656f0e02d780277838ce16e9cc018f97da.

#10 Updated by intrigeri 2015-12-18 10:16:05

  • Assignee deleted (intrigeri)
  • QA Check changed from Dev Needed to Pass

#11 Updated by anonym 2016-01-27 13:21:22

  • Status changed from Fix committed to Resolved