Bug #8941: The remote shell server is racing against Tails Greeter

Bug #8941

The remote shell server is racing against Tails Greeter

Added by anonym 2015-02-22 20:06:54 . Updated 2015-03-23 02:01:24 .

Status:

Resolved

Priority:

High

Assignee:

Category:

Test suite

Target version:

Tails_1.3.1

Start date:

2015-02-22

Due date:

2015-03-06

% Done:

100%

Feature Branch:

bugfix/8941-start-remote-shell-before-greeter

Type of work:

Code

Blueprint:

Starter:

Affected tool:

Deliverable for:

Description

The autotest_remote_shell.py is started after Tails Greeter (/etc/rc2.d/S05rc.local vs /etc/rc2.d/S03gdm3) which is fragile given how we wait for the remote shell to be ready (we “ask” if it’s up as soon as we see that Tails Greeter is running). In particular, in the test/7821-tor branch (for ~~Feature #7821~~) a tiny change to the remote shell server (commit 1a55905e5d51d0f578b5ab3732a8ac0453ac6363 (“Create the environment for remote shell calls more realisitcally.”)) exposes this race.

To eliminate this race we should:

migrate the remote shell startup to a dedicated initscript, that acts as a no-op unless autotest_never_use_this_option is passed on the kernel command-line
set X-Start-Before: $x-display-manager gdm gdm3 in the remote shell initscript
in autotest_remote_shell.py, touch some file after port.open() has succeeded
patch the gdm3 initscript to wait for that file to appear, when autotest_never_use_this_option is passed on the kernel command-line, before it actually starts GDM.

(above adapted from ~~Feature #7821#note-42~~)

Subtasks

Related issues

Related to Tails - ~~Feature #7821~~: Automatic tests for Tor	Resolved	2014-08-26
Related to Tails - ~~Feature #5491~~: Test suite: remote shell instability	Rejected
Related to Tails - ~~Feature #9057~~: Port the new remote shell initscript to systemd	Resolved	2015-03-15

History

#1 Updated by anonym 2015-02-22 20:07:05

related to ~~Feature #7821~~: Automatic tests for Tor added

#2 Updated by anonym 2015-02-22 20:07:19

related to ~~Feature #5491~~: Test suite: remote shell instability added

#3 Updated by intrigeri 2015-02-23 10:38:20

Parent task set to ~~Feature #8539~~

#4 Updated by intrigeri 2015-02-23 10:39:00

blocks #8538 added

#5 Updated by Tails 2015-03-06 04:06:54

Status changed from Confirmed to In Progress

Applied in changeset commit:2adf60611a6129d5dd8f63a61c8568241f323460.

#6 Updated by anonym 2015-03-06 04:27:06

% Done changed from 0 to 30
Feature Branch set to bugfix/8941-start-remote-shell-before-greeter

I’ve built an image from the feature branch and uploaded it to isotester1.lizard, and I’m currently running 200 scenarios like this:

  Scenario: Test
    Given a computer
    And I start the computer
    And the computer boots Tails

If I get no remote shell errors I’ll run the same for the 1.3 image, and hope we’ll get the errors then. :)

#7 Updated by intrigeri 2015-03-06 06:51:41

> If I get no remote shell errors I’ll run the same for the 1.3 image, and hope we’ll get the errors then. :)

Yay!

#8 Updated by anonym 2015-03-06 11:53:20

Assignee changed from anonym to intrigeri
QA Check changed from Dev Needed to Info Needed

200 scenarios (200 passed)
600 steps (600 passed)
212m14.653s

Woo! What do you think intrigeri?

In the meantime I’ll be running 200 scenarios on Tails 1.3, as control.

From the description:
> * patch the gdm3 initscript to wait for that file to appear, when autotest_never_use_this_option is passed on the kernel command-line, before it actually starts GDM.

That was done in commit 2adf606. However, to make things less “spread out all over the place”, perhaps we could put the exact thing we patch into the gdm3 init script into the tails-autotest-remote-shell init script this branch adds? I.e. in do_start(), after we have started the script, then we wait for the state file.

#9 Updated by intrigeri 2015-03-06 12:07:11

> Woo! What do you think intrigeri?

I think: “wow!”

> In the meantime I’ll be running 200 scenarios on Tails 1.3, as control.

Yes, please :)

> However, to make things less “spread out all over the place”, perhaps we could put the exact thing we patch into the gdm3 init script into the tails-autotest-remote-shell init script this branch adds? I.e. in do_start(), after we have started the script, then we wait for the state file.

Good idea. And then, the gdm3 init script must be modified to add a dependency on the tails-autotest-remote-shell service, to ensure proper ordering. This solution will be closer to what we can do with systemd, so less work for porting it to Jessie :)

Also, I suspect that some design doc needs updating wrt. the move of tails-autotest-remote-shell.

#10 Updated by intrigeri 2015-03-06 12:10:34

Assignee changed from intrigeri to anonym
QA Check changed from Info Needed to Dev Needed

#11 Updated by anonym 2015-03-06 15:00:56

Assignee changed from anonym to intrigeri
QA Check changed from Dev Needed to Info Needed

intrigeri wrote:
> > In the meantime I’ll be running 200 scenarios on Tails 1.3, as control.
>
> Yes, please :)

Hrm, I’m currently at 146 passed scenarios for Tails 1.3, i.e. seems I can’t reproduce your problem, but we’ll see…

I can’t recall, what was the scenario that you tested? Something in usb_install.feature? Or did you also see it outside of that one? Do you still have the .feature file you used to spam the “problematic” scenario, if you did? And was it on isotester1 or one of your own nested setups that you saw the problem?

> > However, to make things less “spread out all over the place”, perhaps we could put the exact thing we patch into the gdm3 init script into the tails-autotest-remote-shell init script this branch adds? I.e. in do_start(), after we have started the script, then we wait for the state file.
>
> Good idea. And then, the gdm3 init script must be modified to add a dependency on the tails-autotest-remote-shell service, to ensure proper ordering. This solution will be closer to what we can do with systemd, so less work for porting it to Jessie :)

Is it really necessary to add that dependency to the gdm3 init script? In the remote shell init script we have the reverse dependency that you suggested:

X-Start-Before:    $x-display-manager gdm gdm3

which should be equivalent to the patching you suggest, i.e the following line in the gdm3 init script:

Should-start:      tails-autotest-remote-shell

Right?

> Also, I suspect that some design doc needs updating wrt. the move of tails-autotest-remote-shell.

Actually, we have no design docs on this (at least according to git grep autotest -- wiki). There’s contribute/release_process/test/automated_tests.mdwn which reads as a mixed design doc and introduction to writing tests. There’s a “Remote shell” section which I suppose could mention the paths to the init script and python script, and any path of gdm3’s init script, if needed. Perhaps this document should be split? E.g.

move the design stuff into contribute/design somewhere
move the test writing intro into@contribute/release_process/test/usage.mdwn@
make automated_tests.mdwn only point to the usage and setup instructions.

Or something?

#12 Updated by anonym 2015-03-06 16:00:06

anonym wrote:
> intrigeri wrote:
> > > In the meantime I’ll be running 200 scenarios on Tails 1.3, as control.
> >
> > Yes, please :)
>
> Hrm, I’m currently at 146 passed scenarios for Tails 1.3, i.e. seems I can’t reproduce your problem, but we’ll see…

Damn:

200 scenarios (200 passed)
600 steps (600 passed)
248m25.386s

What am I doing wrong (or right, depending on the perspective :))?

Looking at ~~Feature #7821~~, the scenarios that failed for you were:

Scenario: Booting Tails from a USB drive upgraded from USB with persistence enabled
Scenario: Upgrading an old Tails USB installation from an ISO image, running on the old version

But I see no reason why USB boot should influence this. I conjecture that you’ve seen the boot error in the usb_install.feature scenarios simply because of probability; it’s the feature that does by far the most “real” boots (i.e. boot from scratch), whereas the other scenarios do ~one (thanks to snapshots); in fact usb_install.feature accounts for ~50% of the boots for a full run of the automated test suite. Or am I missing/forgetting something here?

#13 Updated by intrigeri 2015-03-06 18:45:34

> Damn:

Ouch!

The largest test I did was basically 1. install Tails on a virtual USB
drive; 2. boot from this USB drive dozens of times.

> But I see no reason why USB boot should influence this.

I’m absolutely not surprised that different underlying (emulated) storage technology can result in different ordering of services coming up to life.

#14 Updated by intrigeri 2015-03-06 18:47:41

> Or did you also see it outside of that one?

Outside.

> Do you still have the .feature file you used to spam the “problematic” scenario, if you did?

Nope.

> And was it on isotester1 or one of your own nested setups that you saw the problem?

Both, IIRC.

> Is it really necessary to add that dependency to the gdm3 init script? In the remote shell init script we have the reverse dependency that you suggested:

Indeed you’re right.

> There’s contribute/release_process/test/automated_tests.mdwn which reads as
> a mixed design doc and introduction to writing tests. There’s a “Remote shell”
> section which I suppose could mention the paths to the init script and python script,
> and any path of gdm3’s init script, if needed. Perhaps this document should be split?
> E.g.

> * move the design stuff into contribute/design somewhere
> * move the test writing intro into@contribute/release_process/test/usage.mdwn@
> * make automated_tests.mdwn only point to the usage and setup instructions.

If you feel like it, but don’t feel pressured. Adding links to the actual code would be a nice first step.

#15 Updated by intrigeri 2015-03-06 18:50:54

Assignee changed from intrigeri to anonym
QA Check changed from Info Needed to Dev Needed

#16 Updated by anonym 2015-03-07 02:39:53

Assignee changed from anonym to intrigeri
QA Check changed from Dev Needed to Info Needed

intrigeri wrote:
> > Damn:
>
> Ouch!
>
> The largest test I did was basically 1. install Tails on a virtual USB
> drive; 2. boot from this USB drive dozens of times.
>
> > But I see no reason why USB boot should influence this.
>
> I’m absolutely not surprised that different underlying (emulated) storage technology can result in different ordering of services coming up to life.

Well, good point. I’ll try this scenario instead, then:

@product
Feature: blah

  @keep_volumes
  Scenario: bleh
    Given a computer
    And I start Tails from DVD with network unplugged and I login
    And I create a 4 GiB disk named "current"
    And I plug USB drive "current"
    And I "Clone & Install" Tails to USB drive "current"
    Then the running Tails is installed on USB drive "current"
    But there is no persistence partition on USB drive "current"
    And I unplug USB drive "current"

  @keep_volumes
  Scenario: bluh
    Given a computer
    And the computer is set to boot from USB drive "current"
    And I start the computer
    And the computer boots Tails

  [... last one repeated 99 more times ...]

Currently running it on isotester1 with the 1.3 image in hope of seeing the error a convincing amount of times, and will re-run with a freshly built image from this branch afterwards in hope of seeing no errors again.

I’d like you to replicate this on the bare metal system you think you had seen it on (including trying to trigger it with the 1.3 image) and report back. Ok?

> > Is it really necessary to add that dependency to the gdm3 init script? In the remote shell init script we have the reverse dependency that you suggested:
>
> Indeed you’re right.

Ok. Then we have what we need in commit ce35080.

> > There’s contribute/release_process/test/automated_tests.mdwn which reads as
> > a mixed design doc and introduction to writing tests. There’s a “Remote shell”
> > section which I suppose could mention the paths to the init script and python script,
> > and any path of gdm3’s init script, if needed. Perhaps this document should be split?
> > E.g.
>
> > * move the design stuff into contribute/design somewhere
> > * move the test writing intro into@contribute/release_process/test/usage.mdwn@
> > * make automated_tests.mdwn only point to the usage and setup instructions.
>
> If you feel like it, but don’t feel pressured. Adding links to the actual code would be a nice first step.

Right, I’ll save that for a rainy day. I at least added the scripts in commit d0f1dab (bonus: I added the requirement about when the remote shell must be started).

#17 Updated by anonym 2015-03-07 02:42:18

anonym wrote:
> I’d like you to replicate this on the bare metal system you think you had seen it on (including trying to trigger it with the 1.3 image) and report back. Ok?

To make your life easier, you can currently find the feature as test-8941.feature in my $HOME on isotester1.

#18 Updated by intrigeri 2015-03-07 12:35:32

anonym wrote:
> Well, good point. I’ll try this scenario instead, then:

Looks good.

> Currently running it on isotester1 with the 1.3 image in hope of seeing the error a convincing amount of times, and will re-run with a freshly built image from this branch afterwards in hope of seeing no errors again.

I can’t wait to learn about your results :)

> I’d like you to replicate this on the bare metal system you think you had seen it on (including trying to trigger it with the 1.3 image) and report back. Ok?

Sure.

#19 Updated by intrigeri 2015-03-07 12:42:40

Here’s a code review at commit:d0f1dab9aff1de5255dad74d7fcf901f6a1a23b3:

typo in “other applocations”
Why do we check for autotest_never_use_this_option in wait_until_remote_shell_is_listening? The entire initscript is a no-op if that’s not the case anyway.

#20 Updated by intrigeri 2015-03-08 20:08:53

Assignee changed from intrigeri to anonym

anonym wrote:
> I’d like you to replicate this on the bare metal system you think you had seen it on (including trying to trigger it with the 1.3 image) and report back.

On bare metal (MacBook Pro):
- 1.3: 2 failures / 200 tries
- bugfix/8941-start-remote-shell-before-greeter: 0 failure / 200 tries
In a nested virtualization setup on my laptop:
- 1.3: 0 failure / 100 tries
- bugfix/8941-start-remote-shell-before-greeter: 0 failure / 100 tries

That’s not entirely convincing, but let’s see if anonym’s results on lizard tend to confirm or not.

#21 Updated by anonym 2015-03-09 12:59:08

intrigeri wrote:
> Here’s a code review at commit:d0f1dab9aff1de5255dad74d7fcf901f6a1a23b3:
>
> * typo in “other applocations”

Fixed in commit:e269386.

> * Why do we check for autotest_never_use_this_option in wait_until_remote_shell_is_listening? The entire initscript is a no-op if that’s not the case anyway.

Fixed in commit:deeb64b. I copy-paster the stuff I wrote in the gdm3 patch and simply forgot about the new context.

> > I’d like you to replicate this on the bare metal system you think you had seen it on (including trying to trigger it with the 1.3 image) and report back.
>
> * On bare metal (MacBook Pro):
> 1.3: 2 failures / 200 tries
> bugfix/8941-start-remote-shell-before-greeter: 0 failure / 200 tries
> * In a nested virtualization setup on my laptop:
> 1.3: 0 failure / 100 tries
> bugfix/8941-start-remote-shell-before-greeter: 0 failure / 100 tries

Did you observe the error in a similar nested setup on your laptop before?

> That’s not entirely convincing […]

Clearly there’s a dependency on the hardware (including virtual, if nested) but if we’d assume the same error rate as you see on the MacBook, 0.01, for the nested setup, the probability of you seeing no error for a 100 runs is (1-0.01)^100 ~= 0.366, which is quite big. More re-runs would be needed for some more certainty here.

> […] but let’s see if anonym’s results on lizard tend to confirm or not.

I had unfortunately forgot to run my last attempt on isotester1 with screen, and my SSH connection broke down. Argh! I re-started it (on 1.3 again) and on the 7th boot I’ve already seen the issue for the first time. Nice! :)

#22 Updated by intrigeri 2015-03-09 13:18:12

> Did you observe the error in a similar nested setup on your laptop before?

I can’t remember.

> More re-runs would be needed for some more certainty here.

Running!

#23 Updated by anonym 2015-03-09 16:02:54

anonym wrote:
> intrigeri wrote:
> > […] but let’s see if anonym’s results on lizard tend to confirm or not.
>
> I had unfortunately forgot to run my last attempt on isotester1 with screen, and my SSH connection broke down. Argh! I re-started it (on 1.3 again) and on the 7th boot I’ve already seen the issue for the first time. Nice! :)

Well, that was the only error I got. Running on an image built from the feature branch now.

#24 Updated by intrigeri 2015-03-09 20:59:54

Updated test results (that includes, and replaces, my previous ones):

On bare metal (MacBook Pro):
- 1.3: 6 failures / 600 tries
- bugfix/8941-start-remote-shell-before-greeter: 0 failure / 600 tries
In a nested virtualization setup on my laptop:
- 1.3: 0 failure / 100 tries
- bugfix/8941-start-remote-shell-before-greeter: 0 failure / 100 tries

#25 Updated by intrigeri 2015-03-10 00:18:02

It may come as a little bit off-topic above a pile of test reports, but the proposed branch introduces an initscript — very useful, by the way, apparently! To my great distaste, that would soon become the 2nd Tails-specific SysV initscript we install in feature/jessie. I’ve merged this branch on top of feature/jessie in the feature/jessie-remote-shell-systemd branch, and converted it to systemd using all cool kids technology I could, except socket activation (by the way, I believe that git log 2f96f86..origin/feature/jessie-remote-shell-systemd is a nice systemd showcase, for anyone who wants to learn with actual examples). Untested yet.

#26 Updated by intrigeri 2015-03-10 10:21:56

anonym, with my updated test results in mind, I’m tempted to merge the branch on the grounds that it measurably improves things at least on my bare metal testing environment. Let me know if I should wait for more test results.

#27 Updated by anonym 2015-03-10 11:54:31

Assignee changed from anonym to intrigeri
% Done changed from 30 to 50
QA Check changed from Info Needed to Ready for QA

intrigeri wrote:
> anonym, with my updated test results in mind, I’m tempted to merge the branch on the grounds that it measurably improves things at least on my bare metal testing environment. Let me know if I should wait for more test results.

Well, here are my results:

isotester1.lizard:
- 1.3: 1 failure / 100 tries
- bugfix/8941-start-remote-shell-before-greeter: 0 failure / 100 tries

Let’s just merge this now. It doesn’t seem to introduce any regressions, and it’s brought a bit more order into how we start the remote shell, which is an improvement on its own. Time will tell if it actually fixed the issue, instead of us burning more cpu cycles just for testing this.

#28 Updated by intrigeri 2015-03-12 13:07:46

Status changed from In Progress to Fix committed
% Done changed from 50 to 100

Applied in changeset commit:c53e8afe444e7b72c0cd3363f921f02a14f60b25.

#29 Updated by intrigeri 2015-03-12 13:10:37

Assignee deleted (~~intrigeri~~)
QA Check changed from Ready for QA to Pass

Congrats!

#30 Updated by intrigeri 2015-03-15 09:18:37

related to ~~Feature #9057~~: Port the new remote shell initscript to systemd added

#31 Updated by BitingBird 2015-03-22 11:53:32

Target version changed from Tails_1.3.2 to Tails_1.3.1

#32 Updated by BitingBird 2015-03-23 02:01:24

Status changed from Fix committed to Resolved