Bug #8941
The remote shell server is racing against Tails Greeter
100%
Description
The autotest_remote_shell.py
is started after Tails Greeter (/etc/rc2.d/S05rc.local
vs /etc/rc2.d/S03gdm3
) which is fragile given how we wait for the remote shell to be ready (we “ask” if it’s up as soon as we see that Tails Greeter is running). In particular, in the test/7821-tor
branch (for Feature #7821) a tiny change to the remote shell server (commit 1a55905e5d51d0f578b5ab3732a8ac0453ac6363 (“Create the environment for remote shell calls more realisitcally.”)) exposes this race.
To eliminate this race we should:
- migrate the remote shell startup to a dedicated initscript, that acts as a no-op unless
autotest_never_use_this_option
is passed on the kernel command-line - set
X-Start-Before: $x-display-manager gdm gdm3
in the remote shell initscript - in
autotest_remote_shell.py
, touch some file afterport.open()
has succeeded - patch the
gdm3
initscript to wait for that file to appear, whenautotest_never_use_this_option
is passed on the kernel command-line, before it actually starts GDM.
(above adapted from Feature #7821#note-42)
Subtasks
Related issues
Related to Tails - |
Resolved | 2014-08-26 | |
Related to Tails - |
Rejected | ||
Related to Tails - |
Resolved | 2015-03-15 |
History
#1 Updated by anonym 2015-02-22 20:07:05
- related to
Feature #7821: Automatic tests for Tor added
#2 Updated by anonym 2015-02-22 20:07:19
- related to
Feature #5491: Test suite: remote shell instability added
#3 Updated by intrigeri 2015-02-23 10:38:20
- Parent task set to
Feature #8539
#4 Updated by intrigeri 2015-02-23 10:39:00
- blocks #8538 added
#5 Updated by Tails 2015-03-06 04:06:54
- Status changed from Confirmed to In Progress
Applied in changeset commit:2adf60611a6129d5dd8f63a61c8568241f323460.
#6 Updated by anonym 2015-03-06 04:27:06
- % Done changed from 0 to 30
- Feature Branch set to bugfix/8941-start-remote-shell-before-greeter
I’ve built an image from the feature branch and uploaded it to isotester1.lizard
, and I’m currently running 200 scenarios like this:
Scenario: Test
Given a computer
And I start the computer
And the computer boots Tails
If I get no remote shell errors I’ll run the same for the 1.3 image, and hope we’ll get the errors then. :)
#7 Updated by intrigeri 2015-03-06 06:51:41
> If I get no remote shell errors I’ll run the same for the 1.3 image, and hope we’ll get the errors then. :)
Yay!
#8 Updated by anonym 2015-03-06 11:53:20
- Assignee changed from anonym to intrigeri
- QA Check changed from Dev Needed to Info Needed
200 scenarios (200 passed)
600 steps (600 passed)
212m14.653s
Woo! What do you think intrigeri?
In the meantime I’ll be running 200 scenarios on Tails 1.3, as control.
From the description:
> * patch the gdm3 initscript to wait for that file to appear, when autotest_never_use_this_option is passed on the kernel command-line, before it actually starts GDM.
That was done in commit 2adf606. However, to make things less “spread out all over the place”, perhaps we could put the exact thing we patch into the gdm3
init script into the tails-autotest-remote-shell
init script this branch adds? I.e. in do_start()
, after we have started the script, then we wait for the state file.
#9 Updated by intrigeri 2015-03-06 12:07:11
> Woo! What do you think intrigeri?
I think: “wow!”
> In the meantime I’ll be running 200 scenarios on Tails 1.3, as control.
Yes, please :)
> However, to make things less “spread out all over the place”, perhaps we could put the exact thing we patch into the gdm3
init script into the tails-autotest-remote-shell
init script this branch adds? I.e. in do_start()
, after we have started the script, then we wait for the state file.
Good idea. And then, the gdm3
init script must be modified to add a dependency on the tails-autotest-remote-shell
service, to ensure proper ordering. This solution will be closer to what we can do with systemd, so less work for porting it to Jessie :)
Also, I suspect that some design doc needs updating wrt. the move of tails-autotest-remote-shell
.
#10 Updated by intrigeri 2015-03-06 12:10:34
- Assignee changed from intrigeri to anonym
- QA Check changed from Info Needed to Dev Needed
#11 Updated by anonym 2015-03-06 15:00:56
- Assignee changed from anonym to intrigeri
- QA Check changed from Dev Needed to Info Needed
intrigeri wrote:
> > In the meantime I’ll be running 200 scenarios on Tails 1.3, as control.
>
> Yes, please :)
Hrm, I’m currently at 146 passed scenarios for Tails 1.3, i.e. seems I can’t reproduce your problem, but we’ll see…
I can’t recall, what was the scenario that you tested? Something in usb_install.feature
? Or did you also see it outside of that one? Do you still have the .feature
file you used to spam the “problematic” scenario, if you did? And was it on isotester1
or one of your own nested setups that you saw the problem?
> > However, to make things less “spread out all over the place”, perhaps we could put the exact thing we patch into the gdm3
init script into the tails-autotest-remote-shell
init script this branch adds? I.e. in do_start()
, after we have started the script, then we wait for the state file.
>
> Good idea. And then, the gdm3
init script must be modified to add a dependency on the tails-autotest-remote-shell
service, to ensure proper ordering. This solution will be closer to what we can do with systemd, so less work for porting it to Jessie :)
Is it really necessary to add that dependency to the gdm3
init script? In the remote shell init script we have the reverse dependency that you suggested:
X-Start-Before: $x-display-manager gdm gdm3
which should be equivalent to the patching you suggest, i.e the following line in the gdm3
init script:
Should-start: tails-autotest-remote-shell
Right?
> Also, I suspect that some design doc needs updating wrt. the move of tails-autotest-remote-shell
.
Actually, we have no design docs on this (at least according to git grep autotest -- wiki
). There’s contribute/release_process/test/automated_tests.mdwn
which reads as a mixed design doc and introduction to writing tests. There’s a “Remote shell” section which I suppose could mention the paths to the init script and python script, and any path of gdm3’s init script, if needed. Perhaps this document should be split? E.g.
- move the design stuff into
contribute/design
somewhere - move the test writing intro into@contribute/release_process/test/usage.mdwn@
- make
automated_tests.mdwn
only point to the usage and setup instructions.
Or something?
#12 Updated by anonym 2015-03-06 16:00:06
anonym wrote:
> intrigeri wrote:
> > > In the meantime I’ll be running 200 scenarios on Tails 1.3, as control.
> >
> > Yes, please :)
>
> Hrm, I’m currently at 146 passed scenarios for Tails 1.3, i.e. seems I can’t reproduce your problem, but we’ll see…
Damn:
200 scenarios (200 passed)
600 steps (600 passed)
248m25.386s
What am I doing wrong (or right, depending on the perspective :))?
Looking at Feature #7821, the scenarios that failed for you were:
- Scenario: Booting Tails from a USB drive upgraded from USB with persistence enabled
- Scenario: Upgrading an old Tails USB installation from an ISO image, running on the old version
But I see no reason why USB boot should influence this. I conjecture that you’ve seen the boot error in the usb_install.feature
scenarios simply because of probability; it’s the feature that does by far the most “real” boots (i.e. boot from scratch), whereas the other scenarios do ~one (thanks to snapshots); in fact usb_install.feature
accounts for ~50% of the boots for a full run of the automated test suite. Or am I missing/forgetting something here?
#13 Updated by intrigeri 2015-03-06 18:45:34
> Damn:
Ouch!
The largest test I did was basically 1. install Tails on a virtual USB
drive; 2. boot from this USB drive dozens of times.
> But I see no reason why USB boot should influence this.
I’m absolutely not surprised that different underlying (emulated) storage technology can result in different ordering of services coming up to life.
#14 Updated by intrigeri 2015-03-06 18:47:41
> Or did you also see it outside of that one?
Outside.
> Do you still have the .feature
file you used to spam the “problematic” scenario, if you did?
Nope.
> And was it on isotester1
or one of your own nested setups that you saw the problem?
Both, IIRC.
> Is it really necessary to add that dependency to the gdm3
init script? In the remote shell init script we have the reverse dependency that you suggested:
Indeed you’re right.
> There’s contribute/release_process/test/automated_tests.mdwn
which reads as
> a mixed design doc and introduction to writing tests. There’s a “Remote shell”
> section which I suppose could mention the paths to the init script and python script,
> and any path of gdm3’s init script, if needed. Perhaps this document should be split?
> E.g.
> * move the design stuff into contribute/design
somewhere
> * move the test writing intro into@contribute/release_process/test/usage.mdwn@
> * make automated_tests.mdwn
only point to the usage and setup instructions.
If you feel like it, but don’t feel pressured. Adding links to the actual code would be a nice first step.
#15 Updated by intrigeri 2015-03-06 18:50:54
- Assignee changed from intrigeri to anonym
- QA Check changed from Info Needed to Dev Needed
#16 Updated by anonym 2015-03-07 02:39:53
- Assignee changed from anonym to intrigeri
- QA Check changed from Dev Needed to Info Needed
intrigeri wrote:
> > Damn:
>
> Ouch!
>
> The largest test I did was basically 1. install Tails on a virtual USB
> drive; 2. boot from this USB drive dozens of times.
>
> > But I see no reason why USB boot should influence this.
>
> I’m absolutely not surprised that different underlying (emulated) storage technology can result in different ordering of services coming up to life.
Well, good point. I’ll try this scenario instead, then:
@product
Feature: blah
@keep_volumes
Scenario: bleh
Given a computer
And I start Tails from DVD with network unplugged and I login
And I create a 4 GiB disk named "current"
And I plug USB drive "current"
And I "Clone & Install" Tails to USB drive "current"
Then the running Tails is installed on USB drive "current"
But there is no persistence partition on USB drive "current"
And I unplug USB drive "current"
@keep_volumes
Scenario: bluh
Given a computer
And the computer is set to boot from USB drive "current"
And I start the computer
And the computer boots Tails
[... last one repeated 99 more times ...]
Currently running it on isotester1
with the 1.3 image in hope of seeing the error a convincing amount of times, and will re-run with a freshly built image from this branch afterwards in hope of seeing no errors again.
I’d like you to replicate this on the bare metal system you think you had seen it on (including trying to trigger it with the 1.3 image) and report back. Ok?
> > Is it really necessary to add that dependency to the gdm3
init script? In the remote shell init script we have the reverse dependency that you suggested:
>
> Indeed you’re right.
Ok. Then we have what we need in commit ce35080.
> > There’s contribute/release_process/test/automated_tests.mdwn
which reads as
> > a mixed design doc and introduction to writing tests. There’s a “Remote shell”
> > section which I suppose could mention the paths to the init script and python script,
> > and any path of gdm3’s init script, if needed. Perhaps this document should be split?
> > E.g.
>
> > * move the design stuff into contribute/design
somewhere
> > * move the test writing intro into@contribute/release_process/test/usage.mdwn@
> > * make automated_tests.mdwn
only point to the usage and setup instructions.
>
> If you feel like it, but don’t feel pressured. Adding links to the actual code would be a nice first step.
Right, I’ll save that for a rainy day. I at least added the scripts in commit d0f1dab (bonus: I added the requirement about when the remote shell must be started).
#17 Updated by anonym 2015-03-07 02:42:18
anonym wrote:
> I’d like you to replicate this on the bare metal system you think you had seen it on (including trying to trigger it with the 1.3 image) and report back. Ok?
To make your life easier, you can currently find the feature as test-8941.feature
in my $HOME
on isotester1
.
#18 Updated by intrigeri 2015-03-07 12:35:32
anonym wrote:
> Well, good point. I’ll try this scenario instead, then:
Looks good.
> Currently running it on isotester1
with the 1.3 image in hope of seeing the error a convincing amount of times, and will re-run with a freshly built image from this branch afterwards in hope of seeing no errors again.
I can’t wait to learn about your results :)
> I’d like you to replicate this on the bare metal system you think you had seen it on (including trying to trigger it with the 1.3 image) and report back. Ok?
Sure.
#19 Updated by intrigeri 2015-03-07 12:42:40
Here’s a code review at commit:d0f1dab9aff1de5255dad74d7fcf901f6a1a23b3:
- typo in “other applocations”
- Why do we check for
autotest_never_use_this_option
inwait_until_remote_shell_is_listening
? The entire initscript is a no-op if that’s not the case anyway.
#20 Updated by intrigeri 2015-03-08 20:08:53
- Assignee changed from intrigeri to anonym
anonym wrote:
> I’d like you to replicate this on the bare metal system you think you had seen it on (including trying to trigger it with the 1.3 image) and report back.
- On bare metal (MacBook Pro):
- 1.3: 2 failures / 200 tries
- bugfix/8941-start-remote-shell-before-greeter: 0 failure / 200 tries
- In a nested virtualization setup on my laptop:
- 1.3: 0 failure / 100 tries
- bugfix/8941-start-remote-shell-before-greeter: 0 failure / 100 tries
That’s not entirely convincing, but let’s see if anonym’s results on lizard tend to confirm or not.
#21 Updated by anonym 2015-03-09 12:59:08
intrigeri wrote:
> Here’s a code review at commit:d0f1dab9aff1de5255dad74d7fcf901f6a1a23b3:
>
> * typo in “other applocations”
Fixed in commit:e269386.
> * Why do we check for autotest_never_use_this_option
in wait_until_remote_shell_is_listening
? The entire initscript is a no-op if that’s not the case anyway.
Fixed in commit:deeb64b. I copy-paster the stuff I wrote in the gdm3
patch and simply forgot about the new context.
> > I’d like you to replicate this on the bare metal system you think you had seen it on (including trying to trigger it with the 1.3 image) and report back.
>
> * On bare metal (MacBook Pro):
> 1.3: 2 failures / 200 tries
> bugfix/8941-start-remote-shell-before-greeter: 0 failure / 200 tries
> * In a nested virtualization setup on my laptop:
> 1.3: 0 failure / 100 tries
> bugfix/8941-start-remote-shell-before-greeter: 0 failure / 100 tries
Did you observe the error in a similar nested setup on your laptop before?
> That’s not entirely convincing […]
Clearly there’s a dependency on the hardware (including virtual, if nested) but if we’d assume the same error rate as you see on the MacBook, 0.01, for the nested setup, the probability of you seeing no error for a 100 runs is (1-0.01)^100 ~= 0.366, which is quite big. More re-runs would be needed for some more certainty here.
> […] but let’s see if anonym’s results on lizard tend to confirm or not.
I had unfortunately forgot to run my last attempt on isotester1
with screen
, and my SSH connection broke down. Argh! I re-started it (on 1.3 again) and on the 7th boot I’ve already seen the issue for the first time. Nice! :)
#22 Updated by intrigeri 2015-03-09 13:18:12
> Did you observe the error in a similar nested setup on your laptop before?
I can’t remember.
> More re-runs would be needed for some more certainty here.
Running!
#23 Updated by anonym 2015-03-09 16:02:54
anonym wrote:
> intrigeri wrote:
> > […] but let’s see if anonym’s results on lizard tend to confirm or not.
>
> I had unfortunately forgot to run my last attempt on isotester1
with screen
, and my SSH connection broke down. Argh! I re-started it (on 1.3 again) and on the 7th boot I’ve already seen the issue for the first time. Nice! :)
Well, that was the only error I got. Running on an image built from the feature branch now.
#24 Updated by intrigeri 2015-03-09 20:59:54
Updated test results (that includes, and replaces, my previous ones):
- On bare metal (MacBook Pro):
- 1.3: 6 failures / 600 tries
- bugfix/8941-start-remote-shell-before-greeter: 0 failure / 600 tries
- In a nested virtualization setup on my laptop:
- 1.3: 0 failure / 100 tries
- bugfix/8941-start-remote-shell-before-greeter: 0 failure / 100 tries
#25 Updated by intrigeri 2015-03-10 00:18:02
It may come as a little bit off-topic above a pile of test reports, but the proposed branch introduces an initscript — very useful, by the way, apparently! To my great distaste, that would soon become the 2nd Tails-specific SysV initscript we install in feature/jessie
. I’ve merged this branch on top of feature/jessie
in the feature/jessie-remote-shell-systemd
branch, and converted it to systemd using all cool kids technology I could, except socket activation (by the way, I believe that git log 2f96f86..origin/feature/jessie-remote-shell-systemd
is a nice systemd showcase, for anyone who wants to learn with actual examples). Untested yet.
#26 Updated by intrigeri 2015-03-10 10:21:56
anonym, with my updated test results in mind, I’m tempted to merge the branch on the grounds that it measurably improves things at least on my bare metal testing environment. Let me know if I should wait for more test results.
#27 Updated by anonym 2015-03-10 11:54:31
- Assignee changed from anonym to intrigeri
- % Done changed from 30 to 50
- QA Check changed from Info Needed to Ready for QA
intrigeri wrote:
> anonym, with my updated test results in mind, I’m tempted to merge the branch on the grounds that it measurably improves things at least on my bare metal testing environment. Let me know if I should wait for more test results.
Well, here are my results:
isotester1.lizard
:- 1.3: 1 failure / 100 tries
- bugfix/8941-start-remote-shell-before-greeter: 0 failure / 100 tries
Let’s just merge this now. It doesn’t seem to introduce any regressions, and it’s brought a bit more order into how we start the remote shell, which is an improvement on its own. Time will tell if it actually fixed the issue, instead of us burning more cpu cycles just for testing this.
#28 Updated by intrigeri 2015-03-12 13:07:46
- Status changed from In Progress to Fix committed
- % Done changed from 50 to 100
Applied in changeset commit:c53e8afe444e7b72c0cd3363f921f02a14f60b25.
#29 Updated by intrigeri 2015-03-12 13:10:37
- Assignee deleted (
intrigeri) - QA Check changed from Ready for QA to Pass
Congrats!
#30 Updated by intrigeri 2015-03-15 09:18:37
- related to
Feature #9057: Port the new remote shell initscript to systemd added
#31 Updated by BitingBird 2015-03-22 11:53:32
- Target version changed from Tails_1.3.2 to Tails_1.3.1
#32 Updated by BitingBird 2015-03-23 02:01:24
- Status changed from Fix committed to Resolved