Feature #9517

Retry connecting to OFTC when it fails

Added by anonym 2015-06-02 14:03:00 . Updated 2015-07-03 03:36:08 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Test suite
Target version:
Start date:
2015-06-02
Due date:
% Done:

100%

Feature Branch:
kytv:test/9517-retry-oftc
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

That is when we get the SSL handshake error, and when the server actually inform us that Tor is block at the moment.

It may be enough to simply force new Tor circuits, e.g. by using Vidalia’s New Identity feature and then click the “Reconnect” button in Pidgin, but I’m not sure. We may have to retry multiple times in a row.


Subtasks


Related issues

Related to Tails - Bug #9632: restart-vidalia appears to occasionally fail to actually restart Vidalia Rejected 2015-06-20
Blocks Tails - Bug #9072: Pidgin IRC tests often fail due to OFTC Tor blocking Resolved 2015-03-18
Blocks Tails - Feature #9518: Retry with new OpenPGP key server pool member when they misbehave Resolved 2015-06-02

History

#1 Updated by anonym 2015-06-02 14:18:14

  • blocks Bug #9072: Pidgin IRC tests often fail due to OFTC Tor blocking added

#2 Updated by anonym 2015-06-02 14:19:11

  • blocks #8538 added

#3 Updated by anonym 2015-06-02 14:36:35

  • related to Bug #9478: How to deal with transient network errors in the test suite? added

#4 Updated by intrigeri 2015-06-04 21:25:36

  • blocked by deleted (#8538)

#5 Updated by intrigeri 2015-06-04 21:25:39

  • related to deleted (Bug #9478: How to deal with transient network errors in the test suite?)

#6 Updated by intrigeri 2015-06-04 21:26:34

  • blocked by deleted (Bug #9072: Pidgin IRC tests often fail due to OFTC Tor blocking)

#7 Updated by intrigeri 2015-06-04 21:30:21

  • blocks Bug #9072: Pidgin IRC tests often fail due to OFTC Tor blocking added

#8 Updated by intrigeri 2015-06-04 21:30:27

  • blocks #8538 added

#9 Updated by anonym 2015-06-17 02:54:27

anonym wrote:
> That is when we get the SSL handshake error, and when the server actually inform us that Tor is block at the moment.
>
> It may be enough to simply force new Tor circuits, e.g. by using Vidalia’s New Identity feature and then click the “Reconnect” button in Pidgin, but I’m not sure. We may have to retry multiple times in a row.

Now I’ve verified that new circuit => a new pool member is picked on Pidgin reconnect. However, today (2015-06-17) the blocking is particularly severe. I haven’t been able to connect to irc.oftc.net via Tor for three hours while actively issuing a NEWNYM every minute (and manually clicking reconnect every 1-5 minutes, so I’ve done 100+ connections), getting both “Banned” and “No more connections permitted from your host” as the explanation.

So the retry strategy will not solve Bug #9072 but I guess it will slightly improve it when only some pool members refuse connections via Tor. YMMV. It may still be worth implementing,

#10 Updated by anonym 2015-06-18 02:57:07

And now I’ve had 24 hours of continuous blocking…

#11 Updated by intrigeri 2015-06-18 03:09:52

> So the retry strategy will not solve Bug #9072 but I guess it will slightly improve it when only some pool members refuse connections via Tor. YMMV. It may still be worth implementing,

Yep, I think it’ll fix the most common cases.

#12 Updated by kytv 2015-06-20 03:59:21

  • related to Bug #9632: restart-vidalia appears to occasionally fail to actually restart Vidalia added

#13 Updated by kytv 2015-06-21 16:36:22

  • Status changed from Confirmed to In Progress
  • % Done changed from 0 to 30

intrigeri wrote:
> > So the retry strategy will not solve Bug #9072 but I guess it will slightly improve it when only some pool members refuse connections via Tor. YMMV. It may still be worth implementing,
>
> Yep, I think it’ll fix the most common cases.

Agreed on both accounts.

I’ve got this implemented and am testing it now. It seems to be doing “the right thing” now that the blocking is quite severe. Once OFTC relents I’ll find out if I inadvertently broke anything.

(Note: I have MAX_RETRIES normally set to 5, but I want to take advantage of OFTC’s Tor blocking to test code robustness so I bumped the max retries to 10000)

calling as amnesia: xdotool search --name 'Buddy List' windowactivate --sync
call returned: [0, "", ""]
Forcing new Tor circuit... (attempt #157)
calling as root: pidof -x -o '%PPID' vidalia
...........

#14 Updated by kytv 2015-06-22 08:54:27

  • % Done changed from 30 to 40

I’m quite pleased with how this is going:

calling as amnesia: xdotool search --name 'Buddy List' windowactivate --sync
call returned: [0, "", ""]
Forcing new Tor circuit... (attempt #14)
calling as root: pidof -x -o '%PPID' vidalia
call returned: [0, "5777\n", ""]
calling as root: pidof -x -o '%PPID' vidalia
call returned: [0, "5777\n", ""]
[log] TYPE "
[log] RIGHT CLICK on (728,11)
[log] CLICK on (782,105)
[log] CLICK on (364,509)
calling as amnesia: xdotool search --name 'Buddy List' windowactivate --sync
call returned: [0, "", ""]
    Then Pidgin successfully connects to the "irc.oftc.net" account              # features/step_definitions/pidgin.rb:298
[log] DOUBLE CLICK on (195,236)
calling as amnesia: xdotool search --name '.*.oftc.net$' windowactivate --sync
call returned: [0, "", ""]
[log] Ctrl+TYPE "w"
[log] CLICK on (806,215)
    And I can join the "#tails" channel on "irc.oftc.net"                        # features/step_definitions/pidgin.rb:342
[log] TYPE "/topic"
    When I type "/topic"                                                         # features/step_definitions/common_steps.rb:876
[log] TYPE "
"
    And I press the "ENTER" key                                                  # features/step_definitions/common_steps.rb:881
    Then I see the Tails roadmap URL                                             # features/step_definitions/pidgin.rb:435
[log] CLICK on (451,375)
    When I click on the Tails roadmap URL                                        # features/step_definitions/pidgin.rb:446
    Then the Tor Browser has started and loaded the Tails roadmap                # features/step_definitions/common_steps.rb:372

1 scenario (1 passed)
15 steps (15 passed)
18m25.940s

#15 Updated by kytv 2015-06-22 11:06:09

  • Assignee changed from kytv to intrigeri
  • QA Check set to Ready for QA
  • Feature Branch set to kytv:test/9517-retry-oftc

Note: I wanted to use waitAny in fc9b3cdbac6d but I ran into the problem in Bug #9633. Perhaps the extra wait won’t so bad in this particular case, especially at those times (like now) when OFTC is relentlessly blocking Tor.

Intrigeri, apologies in advance if I’m wrong to assign this to you. (Hopefully you’re as pleased with this branch as I am.)

#16 Updated by kytv 2015-06-22 12:23:22

Another note: By default this branch will retry with a new Tor circuit 5 times, but this can be overridden. While testing I had it set to a ridiculously high number. After 51 retries it was able to connect.

calling as amnesia: xdotool search --name 'Buddy List' windowactivate --sync
call returned: [0, "", ""]
Forcing new Tor circuit... (attempt #51)
calling as root: pidof -x -o '%PPID' vidalia
call returned: [0, "6989\n", ""]
calling as root: pidof -x -o '%PPID' vidalia
call returned: [0, "6989\n", ""]
[log] TYPE "
[log] RIGHT CLICK on (728,11)
[log] CLICK on (782,105)
[log] CLICK on (364,509)
calling as amnesia: xdotool search --name 'Buddy List' windowactivate --sync
call returned: [0, "", ""]
    Then Pidgin successfully connects to the "irc.oftc.net" account
[log] DOUBLE CLICK on (195,236)
calling as amnesia: xdotool search --name '.*.oftc.net$' windowactivate --sync
call returned: [0, "", ""]
[log] Ctrl+TYPE "w"
[log] Ctrl+TYPE "w"
[log] Ctrl+TYPE "w"
[log] CLICK on (806,215)
    And I can join the "#tails" channel on "irc.oftc.net" 

#17 Updated by kytv 2015-06-22 12:46:56

  • blocks Feature #9518: Retry with new OpenPGP key server pool member when they misbehave added

#18 Updated by kytv 2015-06-26 05:38:32

  • Assignee changed from intrigeri to anonym

#19 Updated by anonym 2015-06-27 08:43:31

  • Assignee changed from anonym to kytv
  • QA Check changed from Ready for QA to Dev Needed

Nice! I’ve run it a few times now when OFTC is blocking Tor, and it works as intended.

Review:

> +Then /^(in Vidalia )?I force Tor to use a new circuit$/ do |with_vidalia|

Although grammatically correct, I think it would be a better flow if the (in Vidalia )? part was moved to the end.

> + # Sometimes Sikuli gets confused and recognizes the yellow-colored vidalia systray
> + # icon as the green one. This has been seen when Vidalia needed to be
> + # restarted in the above ‘begin’ block.

Wow, this I find highly surprising and a bit scary.

Just to be clear, is that why we need the try_for in the first place? If so, perhaps the comment can be clarified a bit?

> + try_for(3 * 60) do
> + # Let’s be sure that vidalia is still running. I’d hate to spend up to
> + # three minutes waiting for an icon that isn’t there because Vidalia, for
> + # whatever reason, is no longer running…
> + step ‘process “vidalia” is running’
> + screen.type(Sikuli::Key.ESC) > + screen.wait_and_right_click(‘VidaliaSystrayReady.png’, 10)
> + screen.wait_and_click('VidaliaMenuNewIdentity.png', 10) > + screen.wait(‘VidaliaNewIdentityNotification.png’, 20)
> + end
> + screen.wait('VidaliaNewIdentityNotification.png', 20) > + screen.waitVanish(‘VidaliaNewIdentityNotification.png’, 60)

1. Why do we press escape? A comment explaining why is definitely needed in a case like this. However, I do not believe it’s safe for us to do that at all: upon entry to this step we do nothing to change the window focus so (at least in the first iteration of the try_for) that escape will be sent to whatever application has the focus. If that was a Pidgin conversation window the current tab would be closed, for instance.

2. Why do we wait twice for the notification?

3. Can’t we close the notification instead of waitVanish? That would save a couple of seconds. No big deal, though. (You can definitely skip this if you agree that we only should use Vidalia to force new circuits in a dedicated scenario, see below.)

> + else
> + @vm.execute_successfully(‘restart-tor’)
> + end

Restarting Tor to force new circuits feels a bit drastic. Let’s send a NEWNYM signal instead. AFAICT it’d be easy:

@vm.execute_successfully('. /usr/local/lib/tails-shell-library/tor.sh; tor_control_send "signal NEWNYM"')

> Then /^Pidgin successfully connects to the “([^”]+)" account$/ do |account|
[…]
> + begin
> + @screen.wait(expected_channel_entry, 60)
> + break
> + rescue FindFailed
> + tries += 1
> + STDERR.puts “Forcing new Tor circuit… (attempt ##{tries})” if $config[“DEBUG”]
> + step “in Vidalia I force Tor to use a new circuit”
> […]

On second thought, I think we actually should force new circuits without Vidalia to save some time (due to no GUI interaction). However, we should add a dedicated scenario (in some other feature) for Vidalia’s “New identity” feature so your code is still needed! If you agree, let’s open a ticket for that (no need to do it now IMHO).

> +def focus_pidgin_conversation_window(account)
> + case account
> + when /.*\.oftc\.net$/
> + @vm.focus_window(“.*\.oftc\.net$”)
> + end
> +end

It’s a bit confusing that we in other parts are using @vm.focus_window directly to focus Pidgin’s conversation window. Instead of doing this on a case-by-case basis I think we could:

def focus_pidgin_conversation_window(account)
  @vm.focus_window(".*#{Regexp.escape(account)}.*")
end


and then use it in the other places as well. We could also add a function focus_pidgin_buddy_list.

> +def close_pidgin_conversation_window(account)
> + focus_pidgin_conversation_window(account)
> + @screen.type(Sikuli::Key.F4, Sikuli::KeyModifier.ALT)

I’m afraid this is racy but let’s see how it works out. It looks good so far at least.

I’m gonna look into Bug #9633.

#20 Updated by kytv 2015-06-27 11:31:53

anonym wrote:
> Nice! I’ve run it a few times now when OFTC is blocking Tor, and it works as intended.
>
> Review:
>
> > +Then /^(in Vidalia )?I force Tor to use a new circuit$/ do |with_vidalia|
>
> Although grammatically correct, I think it would be a better flow if the (in Vidalia )? part was moved to the end.

OK.

>
> > + # Sometimes Sikuli gets confused and recognizes the yellow-colored vidalia systray
> > + # icon as the green one. This has been seen when Vidalia needed to be
> > + # restarted in the above ‘begin’ block.
>
> Wow, this I find highly surprising and a bit scary.

Agreed.

> Just to be clear, is that why we need the try_for in the first place? If so, perhaps the comment can be clarified a bit?

Yes, this is exactly why I used try_for there.

> > + try_for(3 * 60) do
> > + # Let’s be sure that vidalia is still running. I’d hate to spend up to
> > + # three minutes waiting for an icon that isn’t there because Vidalia, for
> > + # whatever reason, is no longer running…
> > + step ‘process “vidalia” is running’
> > + screen.type(Sikuli::Key.ESC) > > + screen.wait_and_right_click(‘VidaliaSystrayReady.png’, 10)
> > + screen.wait_and_click('VidaliaMenuNewIdentity.png', 10) > > + screen.wait(‘VidaliaNewIdentityNotification.png’, 20)
> > + end
> > + screen.wait('VidaliaNewIdentityNotification.png', 20) > > + screen.waitVanish(‘VidaliaNewIdentityNotification.png’, 60)
>
> 1. Why do we press escape? A comment explaining why is definitely needed in a case like this. However, I do not believe it’s safe for us to do that at all: upon entry to this step we do nothing to change the window focus so (at least in the first iteration of the try_for) that escape will be sent to whatever application has the focus. If that was a Pidgin conversation window the current tab would be closed, for instance.

It was added for those cases in which Sikuli right-clicked the Vidalia icon and the menu was still displayed from the “misclick”. Without the ESC it would resolve itself when the try_for loop ran again.

>
> 2. Why do we wait twice for the notification?

Good question. To be investigated and maybe removed.

> 3. Can’t we close the notification instead of waitVanish? That would save a couple of seconds. No big deal, though. (You can definitely skip this if you agree that we only should use Vidalia to force new circuits in a dedicated scenario, see below.)
>

Could close it and/or make it a separate Vidalia step.

> > + else
> > + @vm.execute_successfully(‘restart-tor’)
> > + end
>
> Restarting Tor to force new circuits feels a bit drastic. Let’s send a NEWNYM signal instead. AFAICT it’d be easy:
> […]

Sounds good. :)

>
> > Then /^Pidgin successfully connects to the “([^”]+)" account$/ do |account|
> […]
> > + begin
> > + @screen.wait(expected_channel_entry, 60)
> > + break
> > + rescue FindFailed
> > + tries += 1
> > + STDERR.puts “Forcing new Tor circuit… (attempt ##{tries})” if $config[“DEBUG”]
> > + step “in Vidalia I force Tor to use a new circuit”
> > […]
>
> On second thought, I think we actually should force new circuits without Vidalia to save some time (due to no GUI interaction). However, we should add a dedicated scenario (in some other feature) for Vidalia’s “New identity” feature so your code is still needed! If you agree, let’s open a ticket for that (no need to do it now IMHO).

This is fine with me.

>
> > +def focus_pidgin_conversation_window(account)
> > + case account
> > + when /.*\.oftc\.net$/
> > + vm.focus_window(".*\.oftc\.net$") > > + end > > +end > > It's a bit confusing that we in other parts are using vm.focus_window directly to focus Pidgin’s conversation window. Instead of doing this on a case-by-case basis I think we could:
> […]
> and then use it in the other places as well. We could also add a function focus_pidgin_buddy_list.
>
> > +def close_pidgin_conversation_window(account)
> > + focus_pidgin_conversation_window(account)
> > + @screen.type(Sikuli::Key.F4, Sikuli::KeyModifier.ALT)
>
> I’m afraid this is racy but let’s see how it works out. It looks good so far at least.

I think it may need some tweaking but it’ll be easier to test when the nearly relentless OFTC blocking of Tor relents.

> I’m gonna look into Bug #9633.

#21 Updated by kytv 2015-06-27 16:06:21

anonym wrote:

> Restarting Tor to force new circuits feels a bit drastic. Let’s send a NEWNYM signal instead. AFAICT it’d be easy:
> […]

Excellent suggestion.

Forcing new Tor circuit... (attempt #21)
calling as root: . /usr/local/lib/tails-shell-library/tor.sh; tor_control_send "signal NEWNYM"
call returned: [0, "250 OK\n250 OK\n250 closing connection\n", ""]
[log] CLICK on (364,509)
calling as amnesia: xdotool search --name 'Buddy List' windowactivate --sync
call returned: [0, "", ""]

I’ve already run it a few times with success. Now to try applying/testing the proposed fix from Bug #9633

#22 Updated by kytv 2015-06-28 09:11:28

  • Assignee changed from kytv to anonym
  • QA Check changed from Dev Needed to Ready for QA

Changes pushed.

I switched to forcing a new circuit without vidalia which works very well. :)

Once 9633 is fixed in the tree I can optimize the tests with waitAny. (I already have that branch ready to go)

#23 Updated by kytv 2015-06-28 09:25:20

  • Assignee changed from anonym to intrigeri

#24 Updated by intrigeri 2015-06-28 23:37:50

  • Target version changed from Tails_1.4.1 to Tails_1.5

#25 Updated by kytv 2015-06-29 00:41:18

Applied in changeset commit:fc9b3cdbac6d05ef02b74b4fe96480c3b97c9c7d.

#26 Updated by kytv 2015-06-29 00:41:18

Applied in changeset commit:5011ea2a460ca4406b6f7197d252b17bcedb291b.

#27 Updated by kytv 2015-06-29 00:41:18

Applied in changeset commit:4098c671103bb17e7a91a06e793d2a1186e64208.

#28 Updated by intrigeri 2015-06-29 00:41:19

  • Status changed from In Progress to Fix committed
  • % Done changed from 40 to 100

Applied in changeset commit:d9800fc7cab8d7ad1c8223b6357326755c44aca3.

#29 Updated by intrigeri 2015-06-29 00:41:54

  • Assignee deleted (intrigeri)
  • Target version changed from Tails_1.5 to Tails_1.4.1
  • QA Check changed from Ready for QA to Pass

#30 Updated by intrigeri 2015-07-03 03:36:08

  • Status changed from Fix committed to Resolved