Feature #12411

Stop restarting tor if bootstrapping stalls

Added by anonym 2017-03-29 15:06:08 . Updated 2017-04-19 17:25:07 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Target version:
Start date:
2017-03-29
Due date:
% Done:

100%

Feature Branch:
feature/12411-tor-bootstrap-fixes
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

While investigating Bug #10238 preliminary tests indicate that tor might have fixed the issues we used to experience with the bootstrap process stalling and requiring a restart to kickstart it (Feature #9516). Let’s see if we can get rid of this crap!


Subtasks


Related issues

Related to Tails - Bug #10238: Investigate why Tor sometimes is slow at or even fails bootstrapping Rejected 2015-09-23
Related to Tails - Bug #13541: Tor still sometimes fails to bootstrap in the test suite In Progress 2017-07-31

History

#1 Updated by anonym 2017-03-29 15:18:36

  • % Done changed from 0 to 20
  • QA Check set to Ready for QA
  • Feature Branch set to feature/12411-tor-bootstrap-fixes

Let’s see what Jenkins thinks. I intend to test this branch many times before considering it further for inclusion in Tails 2.12.

#2 Updated by anonym 2017-03-29 15:21:59

  • Status changed from Confirmed to In Progress

Applied in changeset commit:729baa2b32b0d4c43c8cecbc5d5fc5a30ba3e40f.

#3 Updated by intrigeri 2017-03-30 10:28:56

anonym wrote:
> Let’s see what Jenkins thinks. I intend to test this branch many times before considering it further for inclusion in Tails 2.12.

Let me know if you want me to run it many times on a poorer Internet connection.

#4 Updated by anonym 2017-03-31 13:42:06

intrigeri wrote:
> anonym wrote:
> > Let’s see what Jenkins thinks. I intend to test this branch many times before considering it further for inclusion in Tails 2.12.
>
> Let me know if you want me to run it many times on a poorer Internet connection.

My branch seem to make /usr/local/sbin/tor-has-bootstrapped always fail, so I have to try to understand how all that systemd state works again.

Until then I’d appreciate if you could build and run the test suite for these branches:

  • test/tor-bootstrap-robustness_tor-0.2.9.x
  • test/tor-bootstrap-robustness_tor-0.3.1.x

#5 Updated by intrigeri 2017-04-01 07:08:09

> My branch seem to make /usr/local/sbin/tor-has-bootstrapped always fail, so I have to try to understand how all that systemd state works again.

I can give a hand if needed (over the week-end, or on Wednesday).

> Until then I’d appreciate if you could build and run the test suite for these branches:

ACK. On which ticket shall I report back?

#6 Updated by anonym 2017-04-01 11:41:23

intrigeri wrote:
> > My branch seem to make /usr/local/sbin/tor-has-bootstrapped always fail, so I have to try to understand how all that systemd state works again.
>
> I can give a hand if needed (over the week-end, or on Wednesday).

I’ve fixed it, so please test the feature branch as well!

> > Until then I’d appreciate if you could build and run the test suite for these branches:
>
> ACK. On which ticket shall I report back?

Here is good, but I’ll treat those results with a grain of salt since they also kill tordate. IMHO they still give good data about tor-specific bootstrapping issues.

#7 Updated by intrigeri 2017-04-01 13:55:00

> I’ve fixed it, so please test the feature branch as well!

OK, I’m on it.

>> > Until then I’d appreciate if you could build and run the test suite for these branches:
>>
>> ACK. On which ticket shall I report back?

> Here is good, but I’ll treat those results with a grain of salt since they also kill tordate. IMHO they still give good data about tor-specific bootstrapping issues.

I’ve run the test suite from both branches and they completed successfully, with a (download) saturated and relatively slow Internet connection.

#8 Updated by intrigeri 2017-04-01 18:32:01

> I’ve fixed it, so please test the feature branch as well!

Passes!

#9 Updated by intrigeri 2017-04-04 05:56:36

Please reassign to me as soon as the branch builds again and is ready for code review (even if you want to run more tests on it).

#10 Updated by anonym 2017-04-06 09:25:14

  • Assignee changed from anonym to intrigeri
  • % Done changed from 20 to 50

So not a single bootstrap issue has been observed by you, me, sid-Jenkins and lizard-Jenkins — please review’n’merge!

#11 Updated by intrigeri 2017-04-06 09:33:04

  • related to Bug #10238: Investigate why Tor sometimes is slow at or even fails bootstrapping added

#12 Updated by intrigeri 2017-04-06 09:59:01

  • Assignee changed from intrigeri to anonym

Code review at commit:4d3ba48037be1f42576c75b7092feda3079c8614:

  • In get_tor_control_socket_path(), I think we should also look in /usr/share/tor/tor-service-defaults-torrc (this is the place where the default socket is configured on Debian), otherwise looking in /etc/tor/torrc is kinda futile.
  • It’s a bit sad to add a dependency on socat, while we already include nc that (iirc) can talk to Unix sockets.
  • I’m curious why tor_bootstrap_progress() doesn’t need to pass -n to echo anymore.
  • I’d rather see extended regexp:s, because I would prefer not to have to learn another regexp language.

IMO none of these are real blockers for merging, but I would appreciate if the 3 first items on this list were fixed during the freeze.

So please make sure that Scenario: Clock with host's time in bridge mode passes (it’s disabled on Jenkins), merge, and reassign to you for these fixes.

What’s “sid-Jenkins”?

#13 Updated by anonym 2017-04-06 11:48:38

  • Assignee changed from anonym to intrigeri
  • QA Check changed from Ready for QA to Info Needed

intrigeri wrote:
> Code review at commit:4d3ba48037be1f42576c75b7092feda3079c8614:
>
> * In get_tor_control_socket_path(), I think we should also look in /usr/share/tor/tor-service-defaults-torrc (this is the place where the default socket is configured on Debian), otherwise looking in /etc/tor/torrc is kinda futile.

Good point! Will fix.

> * It’s a bit sad to add a dependency on socat, while we already include nc that (iirc) can talk to Unix sockets.

We install netcat-traditional, which can’t talk to UNIX sockets. netcat-openbsd can, though. I didn’t want to potentially break other stuff we use nc/@netcat@ by switching implementation, but I realize that was perhaps a bit overly cautious since our only other uses are:

features/step_definitions/tor.rb:    cmd = "echo | netcat #{host} #{port}"
features/step_definitions/tor.rb:    cmd = "echo | netcat -u #{host} #{port}"


So, I could try switching to netcat-openbsd and make sure that the tests affected still pass. Ok?

> * I’m curious why tor_bootstrap_progress() doesn’t need to pass -n to echo anymore.

I’m curious why it was ever needed. Do you have an explanation?

IMHO, in general, if you need to worry about newlines in shell function “return values” (now I’m talking about what they write, not the return status) you are doing something wrong. At least with my current understanding of shell expansion rules. Am I missing something?

> * I’d rather see extended regexp:s, because I would prefer not to have to learn another regexp language.

As someone who learned regexes through via grep and sed without --extended-regexp I could return your argument right back to you, but I get that probably more people than not just get confused by all those escapes. :)

> IMO none of these are real blockers for merging, but I would appreciate if the 3 first items on this list were fixed during the freeze.

I think I’ll try to do all this before merging any way. The amount of work needed to track this will be in the same order as actually fixing them. :)

> So please make sure that Scenario: Clock with host's time in bridge mode passes (it’s disabled on Jenkins), merge, and reassign to you for these fixes.

Yes!

> What’s “sid-Jenkins”?

I meant si*b*-Jenkins. :)

#14 Updated by intrigeri 2017-04-06 12:01:30

  • Assignee changed from intrigeri to anonym
  • QA Check changed from Info Needed to Dev Needed

> intrigeri wrote:
>> * It’s a bit sad to add a dependency on socat, while we already include nc that (iirc) can talk to Unix sockets.

> We install netcat-traditional, which can’t talk to UNIX sockets. netcat-openbsd can, though.

FWIW we have switched to netcat-openbsd on feature/stretch :)

> I didn’t want to potentially break other stuff we use nc/@netcat@ by switching implementation,

Makes sense.

> but I realize that was perhaps a bit overly cautious since our only other uses are:
>

> features/step_definitions/tor.rb:    cmd = "echo | netcat #{host} #{port}"
> features/step_definitions/tor.rb:    cmd = "echo | netcat -u #{host} #{port}"
> 

… and I bet we already have adapted the code to use netcat-openbsd on feature/stretch :)

> So, I could try switching to netcat-openbsd and make sure that the tests affected still pass. Ok?

Yes :)

>> * I’m curious why tor_bootstrap_progress() doesn’t need to pass -n to echo anymore.

> I’m curious why it was ever needed. Do you have an explanation?

I guess someone (possibly me) wanted to ensure that we return an integer, not an integer followed by a newline.

> IMHO, in general, if you need to worry about newlines in shell function “return values” (now I’m talking about what they write, not the return status) you are doing something wrong. At least with my current understanding of shell expansion rules. Am I missing something?

I’m probably missing something wrt. how a function’s stdout is captured by $(). Forget it.

>> * I’d rather see extended regexp:s, because I would prefer not to have to learn another regexp language.

> As someone who learned regexes through via grep and sed without --extended-regexp I could return your argument right back to you, but I get that probably more people than not just get confused by all those escapes. :)

Forget it then (even though all the programming languages we use in Tails default to something like PCRE, so it’s not as if we were going to standardize on old-school regexps any time soon).

> I think I’ll try to do all this before merging any way. The amount of work needed to track this will be in the same order as actually fixing them. :)

OK.

#15 Updated by anonym 2017-04-07 14:56:26

  • Assignee changed from anonym to intrigeri
  • % Done changed from 50 to 60
  • QA Check changed from Dev Needed to Ready for QA

intrigeri wrote:
> > intrigeri wrote:
> > but I realize that was perhaps a bit overly cautious since our only other uses are:
> > […]
>
> … and I bet we already have adapted the code to use netcat-openbsd on feature/stretch :)

Good!

> > So, I could try switching to netcat-openbsd and make sure that the tests affected still pass. Ok?
>
> Yes :)

Done, but then I found a bug in tor, see commit:8b950d5902ad1a157a99cca234b5ddd32e661c89. So we’re back to TCP communication again, and I had to disable PrivateNetwork=yes in a systemd unit file that were using the tor.sh library.

> >> * I’d rather see extended regexp:s, because I would prefer not to have to learn another regexp language.
>
> > As someone who learned regexes through via grep and sed without --extended-regexp I could return your argument right back to you, but I get that probably more people than not just get confused by all those escapes. :)
>
> Forget it then (even though all the programming languages we use in Tails default to something like PCRE, so it’s not as if we were going to standardize on old-school regexps any time soon).

Note that I was mostly teasing you! :) I have switched to extended regexes in this file, and will try to remember doing so for any future shell scripting I do.

> > I think I’ll try to do all this before merging any way. The amount of work needed to track this will be in the same order as actually fixing them. :)
>
> OK.

In the end I didn’t, and I wish I had at least tested “bridge mode” and Tor Launcher, cause it’s broken, and will be so in 2.12~rc1. The current state of the feature branch should fix this though => please review’n’re-merge into testing and devel!

#16 Updated by intrigeri 2017-04-09 11:30:43

  • Assignee changed from intrigeri to anonym
  • QA Check changed from Ready for QA to Dev Needed

Hi!

> Note that I was mostly teasing you! :) I have switched to extended regexes in this file, and will try to remember doing so for any future shell scripting I do.

Thanks! :)

> In the end I didn’t, and I wish I had at least tested “bridge mode” and Tor Launcher, cause it’s broken, and will be so in 2.12~rc1.

Ouch :/

> The current state of the feature branch should fix this though => please review’n’re-merge into testing and devel!

Merged, as I understand this does improve the current state of things.

Some bonus nitpicking (surprise!):

  • control_port="$(tor_rc_lookup ControlPort)" assumes that no flag is set. Perhaps reintroduce | awk '{ print $1 }') that you had added in tor_control_socket_path?
  • In theory, ControlPort accepts unix:path and auto values: Maybe make sure it contains only digits?

Feel free to tell me that you have more important Foundations Team stuff to work on than addressing these, if that’s how you feel :) If that’s the case, mark this ticket as Fix Committed.

#17 Updated by anonym 2017-04-10 08:20:44

  • Status changed from In Progress to Fix committed
  • Assignee deleted (anonym)
  • % Done changed from 60 to 100
  • QA Check changed from Dev Needed to Pass

intrigeri wrote:
> > The current state of the feature branch should fix this though => please review’n’re-merge into testing and devel!
>
> Merged, as I understand this does improve the current state of things.

Thanks!

> Some bonus nitpicking (surprise!):
>
> * control_port="$(tor_rc_lookup ControlPort)" assumes that no flag is set. Perhaps reintroduce | awk '{ print $1 }') that you had added in tor_control_socket_path?
> * In theory, ControlPort accepts unix:path and auto values: Maybe make sure it contains only digits?
>
> Feel free to tell me that you have more important Foundations Team stuff to work on than addressing these, if that’s how you feel :) If that’s the case, mark this ticket as Fix Committed.

Yeah, I guess I should focus on other things: my inner compulsively refactoring perfectionist wanted me to do things like these, but I thought “we are not really trying to make a user-friendly general purpose library for others’ consumption, so let’s cheat by making some assumptions that likely will remain true for Tails in order to avoid over-engineering this”. If we are gonna do all this correctly, we should also be able to handle multiple ControlPort directives, ControlPort unix:PATH vs the equiavalent ControlSocket PATH, ControlListenAddress etc. Let’s not bother!

#18 Updated by intrigeri 2017-04-10 09:37:13

> Let’s not bother!

ACK

#19 Updated by anonym 2017-04-19 17:25:07

  • Status changed from Fix committed to Resolved

#20 Updated by bertagaz 2017-07-31 10:41:05

  • related to Bug #13541: Tor still sometimes fails to bootstrap in the test suite added