Bug #10537

Fix restart-tor and tor-has-bootstrapped semantics vs. test suite on Jessie

Added by intrigeri 2015-11-12 01:33:06 . Updated 2015-11-17 10:36:54 .

Status:
Resolved
Priority:
High
Assignee:
Category:
Target version:
Start date:
2015-11-12
Due date:
% Done:

100%

Feature Branch:
feature/jessie
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Deliverable for:
269

Description

On feature/jessie, the “Tor has bootstrapped” code has been revamped and ported to systemd; some optimization that has gone into it apparently breaks assumptions that the automated test suite was relying on already, and/or that code added for Feature #9516 started relying on. We should identify what exactly the test suite needs, and then check whether we’d better change the semantics of restart-tor/tor-has-bootstrapped to satisfy the test suite’s needs, or instead have the test suite take into account the changed semantics somehow.


Subtasks


Related issues

Related to Tails - Feature #9516: Restart Tor if bootstrapping stalls for too long Resolved 2015-06-02
Blocks Tails - Feature #7563: Update the automated test suite for Jessie ISO images Resolved 2014-11-26 2016-01-15

History

#1 Updated by intrigeri 2015-11-12 01:34:37

  • Description updated

#2 Updated by intrigeri 2015-11-12 01:40:41

  • Assignee set to anonym
  • QA Check set to Info Needed

anonym/kytv, can you please sum up to me what semantics the test suite expects out of restart-tor / tor-has-bootstrapped / whatever is broken from the test suite’s PoV currently? Ideally, please also point me to the exact place/context in the test suite’s code that relies on this.

And then I’ll check if such needs+context are test suite -specific (and thus I can add whatever code is needed in the test suite itself, e.g. when post-snapshot-restore or something), or if instead we have a regression for supported use cases (in which case I’ll fix the restart-tor/tor-has-bootstrapped semantics).

#3 Updated by intrigeri 2015-11-12 01:41:04

  • blocks Feature #7563: Update the automated test suite for Jessie ISO images added

#4 Updated by intrigeri 2015-11-12 01:41:28

  • Priority changed from Normal to High
  • Target version changed from Tails_2.0 to Tails_1.8

(Blocks Feature #7563 if I got it right.)

#5 Updated by intrigeri 2015-11-12 01:41:37

  • Feature Branch set to feature/jessie

#6 Updated by intrigeri 2015-11-12 01:43:03

  • blocks #8668 added

#7 Updated by kytv 2015-11-12 08:14:52

intrigeri wrote:
> anonym/kytv, can you please sum up to me what semantics the test suite expects out of restart-tor / tor-has-bootstrapped / whatever is broken from the test suite’s PoV currently? Ideally, please also point me to the exact place/context in the test suite’s code that relies on this.
>
> And then I’ll check if such needs+context are test suite -specific (and thus I can add whatever code is needed in the test suite itself, e.g. when post-snapshot-restore or something), or if instead we have a regression for supported use cases (in which case I’ll fix the restart-tor/tor-has-bootstrapped semantics).

I might be stating the obvious. If so, apologies for my useless text below.

In features/step_definitions/common.rb (lines 108-114) we have:

  if $vm.has_network?
    if $vm.execute("systemctl --quiet is-active tor@default.service").success?
      $vm.execute("systemctl stop tor@default.service")
      $vm.execute("rm -f /var/log/tor/log")
      $vm.host_to_guest_time_sync
      $vm.spawn("restart-tor")
      wait_until_tor_is_working

wait_until_tor_is_working lives in features/support/helpers/misc_helpers.rb at line 139.

def wait_until_tor_is_working
  try_for(270) { $vm.execute('/usr/local/sbin/tor-has-bootstrapped').success? }
rescue Timeout::Error => e
  c = $vm.execute("grep restart-tor /var/log/syslog")
  if c.success?
    debug_log("From syslog:\n" + c.stdout.sub(/^/, "  "))
  else
    debug_log("Nothing was syslog:ed about 'restart-tor'")
  end 
  raise e
end

As can be seen at line 142, this part of the test suite needs to be migrated to jessie since we’ll want to use the journal instead.

WRT to the test suite, the problem I experienced was that after restoring a network-enabled snapshot we restart tor using restart-tor. We check that tor is ready with /usr/local/sbin/tor-has-bootstrapped which returns 0, e.g., Tor is ready, the pre-snapshot status of Tor.

I don’t know how much more than this I can provide since I’ve done almost no investigating of the problem, having only “worked around” it by raising the number of Tor retries while working on Bug #9791, but I’m happy to try.

#8 Updated by intrigeri 2015-11-12 08:22:32

  • Assignee changed from anonym to intrigeri
  • QA Check deleted (Info Needed)

After chatting with kytv I managed to read between the lines enough to conclude that the semantics you seem to want is “tor-has-bootstrapped gives info about the current status of Tor, not about whether Tor has ever successfully bootstrapped since the system was started”. Thanks. So this is back on my plate to check where we should update things.

#9 Updated by intrigeri 2015-11-12 08:25:56

IOW the test suite needs this behaviour:

Given a Tails system
And Tor has bootstrapped
When I stop and restart the Tor daemon
Then tor-has-bootstrapped returns false until the currently running Tor daemon has itself bootstrapped

(or it needs to do something around stop/restart so that this becomes true).

#10 Updated by kytv 2015-11-12 09:53:15

  • Assignee changed from intrigeri to anonym
  • QA Check set to Info Needed

The problem that I mentioned in the XMPP chat a few days ago has been resolved in recent commits. :)

calling as root: systemctl stop tor@default.service
call returned: [0, "", ""]
calling as root: rm -f /var/log/tor/log
call returned: [0, "", ""]
calling as root: date -s '@1447349674'
call returned: [0, "Thu Nov 12 17:34:34 UTC 2015\n", ""]
spawning as root: restart-tor
calling as root: /usr/local/sbin/tor-has-bootstrapped
call returned: [1, "", ""]
calling as root: /usr/local/sbin/tor-has-bootstrapped
call returned: [1, "", ""]
calling as root: /usr/local/sbin/tor-has-bootstrapped
call returned: [1, "", ""]
calling as root: /usr/local/sbin/tor-has-bootstrapped
call returned: [1, "", ""]
calling as root: /usr/local/sbin/tor-has-bootstrapped
call returned: [1, "", ""]
calling as root: /usr/local/sbin/tor-has-bootstrapped
call returned: [1, "", ""]
calling as root: /usr/local/sbin/tor-has-bootstrapped
call returned: [1, "", ""]
calling as root: /usr/local/sbin/tor-has-bootstrapped
call returned: [1, "", ""]
calling as root: /usr/local/sbin/tor-has-bootstrapped
call returned: [1, "", ""]
calling as root: /usr/local/sbin/tor-has-bootstrapped
call returned: [1, "", ""]
calling as root: /usr/local/sbin/tor-has-bootstrapped
call returned: [0, "", ""]
calling as root: cat /proc/cmdline
call returned: [0, "BOOT_IMAGE=/live/vmlinuz2 initrd=/live/initrd2.img boot=live config live-m
calling as amnesia: gpg --batch --list-keys '10CC5BC7'
call returned: [2, "", "gpg: error reading key: public key not found\n"]
calling as amnesia: gpg --batch  --recv-key '10CC5BC7'
call returned: [0, "", "gpg: requesting key 0x10CC5BC7 from hkps server hkps.pool.sks-keyserve
    When I fetch the "10CC5BC7" OpenPGP key using the GnuPG CLI                           # fe
    Then GnuPG uses the configured keyserver                                              # fe
    And the GnuPG fetch is successful                                                     # fe
calling as amnesia: gpg --batch --list-keys '10CC5BC7'
call returned: [0, "pub   4096R/0x1D84CCF010CC5BC7 2012-08-21 [expires: 2016-08-24]\n      Key
    And the "10CC5BC7" key is in the live user's public keyring after at most 120 seconds # fe
calling as root: awk "/^r/ { print \$6 }" /var/lib/tor/cached-microdesc-consensus

Previously tor-has-bootstrapped reported success immediately after a snapshot was restored. Now I’m “happy” with respect to the problem I mentioned in the XMPP channel. Yay ♥

What anonym mentioned in the description may still be a problem, e.g.

Given the network is plugged
And Tor has started
But Tor's bootstrapping process is stuck for more than $however_long_we_decide_is_reasonable
Then Tor should be restarted this time and every time the bootstrapping doesn't progress
     during $however_long_we_decide_is_reasonable
But if Tor hasn't successfully bootstrapped within 270 seconds (or a reasonable timeframe
     that we decide on)
Then we stop trying to restart Tor
And we notify the user that there's a problem

Like Feature #9516, this isn’t a problem with just the test suite but one that users also experience.

#11 Updated by intrigeri 2015-11-17 09:28:44

  • Assignee changed from anonym to intrigeri
  • QA Check deleted (Info Needed)

kytv wrote:
> The problem that I mentioned in the XMPP chat a few days ago has been resolved in recent commits. :)

Well, I believe that you’ve been lucky:

> Previously tor-has-bootstrapped reported success immediately after a snapshot was restored.

… and now it reports success immediately after the Tor service has started, as long as it has bootstrapped in the past (e.g. when creating the snapshot), even though it hasn’t bootstrapped again. In the case you’re reporting about, Tor (re-)bootstrap time won the race and thus it worked for you. But last time I checked the code, nothing ensured it’ll always win the race.

So my understanding is that I still have to implement what was described in Bug #10537#note-9.

> What anonym mentioned in the description

We had no input from anonym on this ticket so far.

> may still be a problem, e.g.

This is likely. Next step on that front is to express the semantics needed by the code added for Feature #9516. I think it’ll cost me (and probably us) less energy if I do it myself instead of trying to communicate about clear interfaces requirements. And then I’ll port it to feature/jessie as needed, I suppose.

Adding back to my plate.

#12 Updated by intrigeri 2015-11-17 09:59:38

  • related to Feature #9516: Restart Tor if bootstrapping stalls for too long added

#13 Updated by intrigeri 2015-11-17 10:00:00

intrigeri wrote:
> So my understanding is that I still have to implement what was described in Bug #10537#note-9.

There was a misunderstanding when we discussed this with kytv, apparently: on Tails 1.7, as long as Tor has ever bootstrapped, tor-has-bootstrapped always returns true, even if I stop Tor or restart it. So we hadn’t these semantics on 1.7, so we can’t be lacking them on feature/jessie I guess.

Back to Feature #9516 code then.

#14 Updated by intrigeri 2015-11-17 10:18:52

Note (mostly to myself): commit:27edb1a might help debugging this.

#15 Updated by intrigeri 2015-11-17 10:36:54

  • Status changed from Confirmed to Resolved
  • Assignee deleted (intrigeri)
  • % Done changed from 0 to 100

intrigeri wrote:
> Back to Feature #9516 code then.

I’ve looked at restart-tor and my conclusion is that I don’t see what has changed in feature/jessie that can break this script, so it looks like kytv was totally right that I’ve fixed the problem without knowing in commit:1390e8b, before I had heard about it or understood what it was about.

Closing, please open tickets about specific problems if there are any left :)