Feature #9668

Retry whois request on "WHOIS LIMIT EXCEEDED"

Added by intrigeri 2015-07-03 00:40:23 . Updated 2015-08-11 10:49:05 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Test suite
Target version:
Start date:
2015-07-03
Due date:
% Done:

100%

Feature Branch:
kytv:test/9668-whois
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

I’ve seen:

  Scenario: whois(1) should work and go through Tor.                     # features/torified_misc.feature:29
    When I query the whois directory service for "torproject.org"        # features/step_definitions/torified_misc.rb:1
    Then the whois command is successful                                 # features/step_definitions/torified_misc.rb:17
    Then the whois standard output contains "The Tor Project"            # features/step_definitions/torified_misc.rb:27
      The whois standard output does not contain The Tor Project:
      WHOIS LIMIT EXCEEDED - SEE WWW.PIR.ORG/WHOIS FOR DETAILS

      .
      <nil> is not true. (Test::Unit::AssertionFailedError)
      ./features/step_definitions/torified_misc.rb:29:in `/^the (wget|whois) standard output contains "([^"]+)"$/'
      features/torified_misc.feature:32:in `Then the whois standard output contains "The Tor Project"'

I guess that this means that the exit node’s IP has made too many whois requests to that server => we should switch circuits and retry.


Subtasks


Related issues

Blocked by Tails - Feature #9518: Retry with new OpenPGP key server pool member when they misbehave Resolved 2015-06-02

History

#1 Updated by intrigeri 2015-07-03 00:40:39

  • blocks #8538 added

#2 Updated by intrigeri 2015-07-03 00:41:14

Tentatively assigned to kytv for 1.5 like the parent ticket, feel free to triage as you wish.

#3 Updated by kytv 2015-07-03 05:46:00

  • Status changed from Confirmed to In Progress

#4 Updated by kytv 2015-07-03 06:37:41

  • % Done changed from 0 to 30
calling as root: . /usr/local/lib/tails-shell-library/tor.sh; tor_is_working
call returned: [1, "", ""]
calling as root: . /usr/local/lib/tails-shell-library/tor.sh; tor_is_working
call returned: [1, "", ""]
calling as root: . /usr/local/lib/tails-shell-library/tor.sh; tor_is_working
call returned: [0, "", ""]
spawning as root: restart-vidalia
spawn returned: [0, "", ""]
calling as amnesia: whois 'torproject.org'
call returned: [0, "WHOIS LIMIT EXCEEDED - SEE WWW.PIR.ORG/WHOIS FOR DETAILS\n", ""]
Forcing new Tor circuit... (attempt #1)
calling as root: . /usr/local/lib/tails-shell-library/tor.sh; tor_control_send "signal NEWNYM"
call returned: [0, "250 OK\n250 OK\n250 closing connection\n", ""]
calling as amnesia: whois 'torproject.org'
call returned: [0, "Domain Name:TORPROJECT.ORG\nDomain ID: D130971538-LROR\nCreation Date: 2006-10-17T22:02:50Z\nUpdated Date: 2014-12-27T10:42:24Z\nRegistry Expiry Date: 2015-10-17T22:02:50Z\nSponsoring Registrar:CSL Computer Service Langenbach GmbH d/b/a joker.com a German GmbH (R25-LROR)\nSponsoring Registrar IANA ID: 113\nWHOIS Server: \nReferral URL: \nDomain Status: clientTransferProhibited -- http://www.icann.org/epp#clientTransferProhibited\nRegistrant ID:CORG-167587\nRegistrant Name:- -\nRegistrant Organization:The Tor Project, Inc\nRegistrant Street: 7 Temple Street\nRegistrant Street: Suite A\nRegistrant City:Cambridge\nRegistrant State/Province:MA\nRegistrant Postal Code:02139-2403\nRegistrant Country:US\nRegistrant Phone:+1.7819481982\nRegistrant Phone Ext: \nRegistrant Fax: \nRegistrant Fax Ext: \nRegistrant Email:hostmaster@torproject.org\nAdmin ID:CORG-288059\nAdmin Name:The Tor Project\nAdmin Organization:The Tor Project\nAdmin Street: 7 Temple Street\nAdmin Street: Suite A\nAdmin City:Cambridge\nAdmin State/Province:MA\nAdmin Postal Code:02139-2403\nAdmin Country:US\nAdmin Phone:+1.7819481982\nAdmin Phone Ext: \nAdmin Fax: \nAdmin Fax Ext: \nAdmin Email:hostmaster@torproject.org\nTech ID:CORG-288059\nTech Name:The Tor Project\nTech Organization:The Tor Project\nTech Street: 7 Temple Street\nTech Street: Suite A\nTech City:Cambridge\nTech State/Province:MA\nTech Postal Code:02139-2403\nTech Country:US\nTech Phone:+1.7819481982\nTech Phone Ext: \nTech Fax: \nTech Fax Ext: \nTech Email:hostmaster@torproject.org\nName Server:NS1.TORPROJECT.ORG\nName Server:NS2.TORPROJECT.ORG\nName Server:NS4.TORPROJECT.ORG\nName Server:NS5.TORPROJECT.ORG\nName Server: \nName Server: \nName Server: \nName Server: \nName Server: \nName Server: \nName Server: \nName Server: \nName Server: \nDNSSEC:signedDelegation\nDS Created 1:2014-08-03T06:57:36Z\nDS Key Tag 1:32884\nAlgorithm 1:8\nDigest Type 1:2\nDigest 1:5428ccb7038fee088fc08c1f519142bcb19da7736c0eed5ebeef489b8e4deb79\nDS Maximum Signature Life 1:1814400 seconds\n\nAccess to Public Interest Registry WHOIS information is provided to assist persons in determining the contents of a domain name registration record in the Public Interest Registry registry database. The data in this record is provided by Public Interest Registry for informational purposes only, and Public Interest Registry does not guarantee its accuracy. This service is intended only for query-based access. You agree that you will use this data only for lawful purposes and that, under no circumstances will you use this data to(a) allow, enable, or otherwise support the transmission by e-mail, telephone, or facsimile of mass unsolicited, commercial advertising or solicitations to entities other than the data recipient's own existing customers; or (b) enable high volume, automated, electronic processes that send queries or data to the systems of Registry Operator, a Registrar, or Afilias except as reasonably necessary to register domain names or modify existing registrations. All rights reserved. Public Interest Registry reserves the right to modify these terms at any time. By submitting this query, you agree to abide by this policy. For more information on Whois status codes, please visit https://www.icann.org/resources/pages/epp-status-codes-2014-06-16-en.\n\n\n", ""]
  Scenario: whois(1) should work and go through Tor.              # features/torified_misc.feature:25
    When I query the whois directory service for "torproject.org" # features/step_definitions/torified_misc.rb:1
    Then the whois command is successful                          # features/step_definitions/torified_misc.rb:31
    Then the whois standard output contains "The Tor Project"     # features/step_definitions/torified_misc.rb:41

#5 Updated by kytv 2015-07-03 06:42:23

  • blocked by Feature #9518: Retry with new OpenPGP key server pool member when they misbehave added

#6 Updated by kytv 2015-07-03 06:43:49

This ticket currently depends on 2f25a19 from Feature #9518, the custom exception for execute_successfully.

#7 Updated by kytv 2015-07-05 09:57:17

  • Assignee changed from kytv to intrigeri
  • QA Check set to Ready for QA
  • Feature Branch set to kytv:test/9668-whois

Now that Feature #9518 has been merged…

#8 Updated by intrigeri 2015-07-05 10:51:36

  • Assignee changed from intrigeri to kytv
  • QA Check changed from Ready for QA to Dev Needed

Thanks! It seems to me that the proposed branch retries for all kinds of whois failures, and not only the ones about IP rate limiting, which means that we’re going to mask failures I’d rather catch — I have to say I don’t like this much, but perhaps I misunderstood the idea?

It also makes the the whois command is successful step useless, since it now uses vm.execute_successfully (for good reasons AFAICT), so the resulting test suite seems inconsistent to me. So, I suggest:

  • merging the two first steps into a single one, called e.g. I successfully query the whois directory service for;
  • only retrying on IP rate limiting errors, that is not catching ExecutionFailedInVM, and specifically looking for “LIMIT EXCEEDED” in stdout instead of looking for the domain name;
  • letting the last and dedicated step check for the expected text (on success) in stdout.

What do you think?

#9 Updated by kytv 2015-07-05 11:40:44

intrigeri wrote:
> Thanks! It seems to me that the proposed branch retries for all kinds of whois failures, and not only the ones about IP rate limiting, which means that we’re going to mask failures I’d rather catch — I have to say I don’t like this much, but perhaps I misunderstood the idea?

Perhaps it was me who misunderstood. I thought retrying whenever it failed would be good because any failure would be due to rate limiting errors or errors communicating with whichever whois server we hit, that is, not errors that ‘we’ can actually fix.

> [….]

> What do you think?

I’ll update this based on the suggestions.

#10 Updated by intrigeri 2015-07-06 01:16:00

> Perhaps it was me who misunderstood. I thought retrying whenever it failed would be good because any failure would be due to rate limiting errors or errors communicating with whichever whois server we hit, that is, not errors that ‘we’ can actually fix.

Well, a test suite should not assume that our side of things is perfect, and any error is caused by the Internet / Tor / the remote server. I’m all for retrying when a specific transient error that we can’t fix on our side is detected, but not with “oh it failed, let’s retry” in the general case.

#11 Updated by kytv 2015-07-06 08:04:07

  • Assignee changed from kytv to intrigeri
  • QA Check changed from Dev Needed to Ready for QA

Suggested changes made & pushed.

#12 Updated by kytv 2015-07-06 10:56:28

This is an example of what I was referring to:

call returned: [0, "", ""]
spawning as root: restart-vidalia
spawn returned: [0, "", ""]
calling as amnesia: whois 'torproject.org'
call returned: [2, "", "Timeout.\n"]
Command failed: whois 'torproject.org'
error code: 2
stderr: Timeout.

<false> is not true.
  Scenario: whois(1) should work and go through Tor.                           # features/torified_misc.feature:83
    When I successfully query the whois directory service for "torproject.org" # features/step_definitions/torified_misc.rb:1
      ExecutionFailedInVM (ExecutionFailedInVM)
      ./features/support/helpers/vm_helper.rb:382:in `rescue in execute_successfully'
      ./features/support/helpers/vm_helper.rb:378:in `execute_successfully'
      ./features/step_definitions/torified_misc.rb:6:in `/^I successfully query the whois directory service for "([^"]+)"$/'
      features/torified_misc.feature:84:in `When I successfully query the whois directory service for "torproject.org"'
    Then the whois standard output contains "The Tor Project"                  # features/step_definitions/torified_misc.rb:28

#13 Updated by intrigeri 2015-07-07 06:20:08

  • Assignee changed from intrigeri to kytv
  • QA Check changed from Ready for QA to Dev Needed

> This is an example of what I was referring to:

Good, this gives an example for “I’m all for retrying when a specific transient error that we can’t fix on our side is detected” => please catch this specific error (looking for Timeout. in stderr) and retry. Either in this ticket/branch, or file a dedicated one and reassign this one to me for QA.

#14 Updated by kytv 2015-07-08 13:10:12

intrigeri wrote:
> > This is an example of what I was referring to:
>
> Good, this gives an example for “I’m all for retrying when a specific transient error that we can’t fix on our side is detected” => please catch this specific error (looking for Timeout. in stderr) and retry. Either in this ticket/branch, or file a dedicated one and reassign this one to me for QA.

On it.

and another exception I’ll allow for:

spawn returned: [0, "", ""]
calling as amnesia: whois 'torproject.org'
call returned: [2, "", "[Jul 08 19:48:00] ERROR torsocks[6046]: Unable to resolve. Status reply: 4 (in socks5_recv_resolve_reply() at socks5.c:657)\ngetaddrinfo(whois.pir.org): Non-recoverable failure in name resolution\n"]
Command failed: whois 'torproject.org'
error code: 2
stderr: [Jul 08 19:48:00] ERROR torsocks[6046]: Unable to resolve. Status reply: 4 (in socks5_recv_resolve_reply() at socks5.c:657)
getaddrinfo(whois.pir.org): Non-recoverable failure in name resolution
. 

#15 Updated by intrigeri 2015-07-08 14:34:33

> On it.

:)

> and another exception I’ll allow for:

Totally makes sense to me!

#16 Updated by kytv 2015-07-08 16:16:24

intrigeri wrote:

> It also makes the the whois command is successful step useless, since it now uses vm.execute_successfully (for good reasons AFAICT), so the resulting test suite seems inconsistent to me.

As I’m going into this a bit further, it seems I’ll have to revert back to using vm.execute since we want to capture stderr.

As I see it:

execute_successfully will return true on success and false on failure. When true, stdout can be captured. When LIMIT EXCEEDED is in the whois results, whois exits 0 since whois did its job by returning what the server sent.

On failure, such as with the Timeout or Unable to resolve errors, as expected whois does not exit 0. Since whois exits non-zero, vm_execute_res will be false, causing vm_execute_res.stderr to be undefined.

spawning as root: restart-vidalia
spawn returned: [0, "", ""]
calling as amnesia: whois 'torproject.org'
call returned: [2, "", "Timeout.\n"]
Command failed: whois 'torproject.org'
error code: 2
stderr: Timeout.

<false> is not true.
  Scenario: whois(1) should work and go through Tor.                           # features/torified_misc.feature:47
    When I successfully query the whois directory service for "torproject.org" # features/step_definitions/torified_misc.rb:1
      undefined method `stderr' for nil:NilClass (NoMethodError)
      ./features/step_definitions/torified_misc.rb:12:in `rescue in block in <top (required)>'
      ./features/step_definitions/torified_misc.rb:5:in `/^I successfully query the whois directory service for "([^"]+)"$/'
      features/torified_misc.feature:48:in `When I successfully query the whois directory service for "torproject.org"'
    Then the whois standard output contains "The Tor Project"                  # features/step_definitions/torified_misc.rb:32

So—unless I’m wrong about this—I’ll revert to using execute for whois at least.

#17 Updated by intrigeri 2015-07-09 09:22:24

> execute_successfully will return true on success and false on failure.

Really? I thought it raised an exception on failure.
In any case, yes, using execute is the way to go whenever you want to analyze stderr, iirc.

#18 Updated by kytv 2015-07-09 11:13:11

intrigeri wrote:
> > execute_successfully will return true on success and false on failure.
>
> Really? I thought it raised an exception on failure.

Yes, that’s exactly right. I commented/posted too late at night.

> In any case, yes, using execute is the way to go whenever you want to analyze stderr, iirc.

Thanks for confirming, that coincides my experience(s).

#19 Updated by kytv 2015-07-10 02:51:08

  • Assignee changed from kytv to intrigeri
  • QA Check changed from Dev Needed to Ready for QA

Changes pushed after lots (and lots) of testing.

#20 Updated by intrigeri 2015-07-11 07:04:14

  • Assignee changed from intrigeri to kytv
  • % Done changed from 30 to 50
  • QA Check changed from Ready for QA to Dev Needed

Great job, one can see that you put a lot of attention to details in this branch \o/

My only remaining comments are design ones, so let me first state that I’m not fully up-to-date wrt. the design patterns you and anonym settled on, and also I lack some of the big picture nowadays, so some of my comments may totally be disputable: I’m not necessarily more expert than you in this area anymore :)

##{@new_circuit_tries})

Why two hash chars? (sorry if my Ruby is poor; hint: it is :)

+def force_new_tor_circuit
+[...]
+  step 'I force Tor to use a new circuit'

I’m not convinced by the soundness of having a function (itself called by steps) calling a step… especially when that step is currently run only from that function. I suggest to move all the logic from the step to the function, and then turn the step into a thin wrapper around the function: this way, we avoid the “calling steps from steps” ugliness in current code (steps can instead call the function directly), but still we keep the possibility open to add the step to a scenario in the future.

+    begin
+      @vm_execute_res = @vm.execute("whois '#{domain}'", LIVE_USER)
+      assert(!@vm_execute_res.stdout['LIMIT EXCEEDED'])
+      raise WhoisLookupFailure unless @vm_execute_res.success?
+      break
+    rescue Test::Unit::AssertionFailedError
+      force_new_tor_circuit
+    rescue WhoisLookupFailure => e

Why not a dedicated exception class for ‘LIMIT EXCEEDED’? It feels inconsistent with the fact we’re introcing one (WhoisLookupFailure) for the fallback kind of failure.

To end with I’m not convinced with the changes to the wget scenarios: one now has only a When step, which feels weird when reading the gherkin text. Also, the others now have only When and And, and in the current state of things the latter should be a Then instead. What benefit do we get from these changes? (except consistency with the whois scenarios, but that’s consistency with suboptimal design — half of the actual test is done in a When step, which is contrary to cucumber best practices — so IMO that wouldn’t be a good reason per-se).

#21 Updated by kytv 2015-07-11 10:44:41

intrigeri wrote:
> Great job, one can see that you put a lot of attention to details in this branch \o/
>

yay

> My only remaining comments are design ones, so let me first state that I’m not fully up-to-date wrt. the design patterns you and anonym settled on, and also I lack some of the big picture nowadays, so some of my comments may totally be disputable: I’m not necessarily more expert than you in this area anymore :)
>
> […]
>
> Why two hash chars? (sorry if my Ruby is poor; hint: it is :)

The second one is there so I can print a literal #, such as

Forcing new Tor circuit... (attempt #4)

>
> […]
>
> I’m not convinced by the soundness of having a function (itself called by steps) calling a step… especially when that step is currently run only from that function. I suggest to move all the logic from the step to the function, and then turn the step into a thin wrapper around the function: this way, we avoid the “calling steps from steps” ugliness in current code (steps can instead call the function directly), but still we keep the possibility open to add the step to a scenario in the future.

I’m glad you agree. :) I considered that after setting to RfQA and I changed it locally but didn’t commit and push yet.

>
> […]
>
> Why not a dedicated exception class for ‘LIMIT EXCEEDED’? It feels inconsistent with the fact we’re introcing one (WhoisLookupFailure) for the fallback kind of failure.

Touché. To be addressed.

>
> To end with I’m not convinced with the changes to the wget scenarios: one now has only a When step, which feels weird when reading the gherkin text. Also, the others now have only When and And, and in the current state of things the latter should be a Then instead. What benefit do we get from these changes? (except consistency with the whois scenarios, but that’s consistency with suboptimal design — half of the actual test is done in a When step, which is contrary to cucumber best practices — so IMO that wouldn’t be a good reason per-se).

That was done for Bug #9715 so that wget can be retried for transient errors.

(As for the suboptimal design..hmm..I’m not yet seeing how else to handle the retrying; my ruby-fu is still fairly weak).

#22 Updated by kytv 2015-07-11 11:48:09

kytv wrote:
> intrigeri wrote:

> > To end with I’m not convinced with the changes to the wget scenarios: one now has only a When step, which feels weird when reading the gherkin text. Also, the others now have only When and And, and in the current state of things the latter should be a Then instead. What benefit do we get from these changes? (except consistency with the whois scenarios, but that’s consistency with suboptimal design — half of the actual test is done in a When step, which is contrary to cucumber best practices — so IMO that wouldn’t be a good reason per-se).
>
> That was done for Bug #9715 so that wget can be retried for transient errors.
>
> (As for the suboptimal design..hmm..I’m not yet seeing how else to handle the retrying; my ruby-foo is still fairly weak).

To be clear: I didn’t know how to do the retrying without combining the steps due to vm_execute_res.success? being False when either whois timeout:s or the domain name fails to resolve.

#23 Updated by intrigeri 2015-07-11 13:50:09

>> (As for the suboptimal design..hmm..I’m not yet seeing how else to handle the retrying; my ruby-foo is still fairly weak).

> To be clear: I didn’t know how to do the retrying without combining the steps due to vm_execute_res.success? being False when either whois timeout:s or the domain name fails to resolve.

Here’s the way I would (try to) do it: the responsibility of the step that runs whois would only be to run whois, as many times needed to hide the specific errors that we have identified as being transient; it would pass as long as whois completes without raising one of those errors, after the specified maximum amount of retries, and regardless of its exit code. And the next steps (Then) would merely handle checking 1. the exit code; 2. the stdout for the expected content… just like they do in current devel. Of course I’m probably missing something that makes this hard, ugly, or even impossible.

#24 Updated by kytv 2015-07-12 00:22:14

intrigeri wrote:
> Here’s the way I would (try to) do it:

This should have been obvious to me it wasn’t right away: It’s not (always) wrong to ignore an exception that you yourself are raising; the next step will check success anyway.

Sigh. Sorry for the obtuseness.

-When /^I successfully query the whois directory service for "([^"]+)"$/ do |domain|
+When /^I query the whois directory service for "([^"]+)"$/ do |domain|
   next if @skip_steps_while_restoring_background
   @new_circuit_tries = 0
   until @new_circuit_tries == $config["MAX_NEW_TOR_CIRCUIT_RETRIES"] do
@@ -11,13 +11,11 @@ When /^I successfully query the whois directory service for "([^"]+)"$/ do |doma
         raise WhoisLookupFailure
       end
       break
-    rescue WhoisLookupFailure => e
+    rescue WhoisLookupFailure
       if @vm_execute_res.stderr['Timeout'] || \
          @vm_execute_res.stderr['Unable to resolve'] || \
          @vm_execute_res.stdout['LIMIT EXCEEDED']
         force_new_tor_circuit
-      else
-        raise e
       end
     end
   end

I think this and reverting my “combining commit” will address this design flaw. Testing.

#25 Updated by kytv 2015-07-12 10:13:28

  • Assignee changed from kytv to intrigeri
  • QA Check changed from Dev Needed to Ready for QA

Changes made & pushed (and tested repeatedly throughout the day). I’m confident the test behaves as it should and the “design” should be much better.

If this is acceptable I’ll push my branch for Bug #9715 which is much of the same but for wget.

#26 Updated by intrigeri 2015-07-13 01:02:57

  • % Done changed from 50 to 60

Code review passes. Now testing.

#27 Updated by kytv 2015-07-13 01:27:35

Applied in changeset commit:5a9700169f7f0700b72acd95bdd574f2188451a6.

#28 Updated by kytv 2015-07-13 01:27:36

Applied in changeset commit:384daf878a62b2d1510dd7b29cd5bf217d9df847.

#29 Updated by intrigeri 2015-07-13 01:27:37

  • Status changed from In Progress to Fix committed
  • % Done changed from 60 to 100

Applied in changeset commit:cacd5a4c737829fa2a3793bc305deab5ab82943a.

#30 Updated by intrigeri 2015-07-13 01:28:25

  • Assignee deleted (intrigeri)
  • QA Check changed from Ready for QA to Pass

Merged, congrats!

#31 Updated by cypherpunks 2015-08-05 09:38:56

Couldn’t an adversary deliberately cause WHOIS lookups from exit nodes to fail in an attempt to compel the creation of new Tor cicuit?

#32 Updated by intrigeri 2015-08-05 12:08:49

> Couldn’t an adversary deliberately cause WHOIS lookups from exit nodes to fail in an attempt to compel the creation of new Tor cicuit?

Yes, likely some adversaries can do that.
Did you notice that this landed in our test suite?

#33 Updated by BitingBird 2015-08-11 10:49:05

  • Status changed from Fix committed to Resolved