Bug #11464

"all notifications have disappeared" step is fragile when network is unplugged

Added by intrigeri 2016-05-23 15:40:45 . Updated 2017-02-02 15:59:21 .

Status:
Resolved
Priority:
Elevated
Assignee:
Category:
Test suite
Target version:
Start date:
2016-05-23
Due date:
% Done:

100%

Feature Branch:
test/11398-florence-hides-other-windows
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

It relies on the fact that the notification applet is there. When we boot with network plugged, the (persistent) Tor is ready has to be here so we’re fine. However, when booting with network unplugged, we can’t rely on that, so instead we rely on subtle timing between the previous step (“the Tails desktop is ready”, where we do more and more things) and the timeout of the “started in VM” notification. The changes brought to fix Bug #11398 trigger this latent bug, that could hit us any time we add more stuff to “the Tails desktop is ready”.


Files


Subtasks


Related issues

Related to Tails - Bug #11398: Florence sometimes hides other windows, which breaks tests Resolved 2016-05-05

History

#1 Updated by intrigeri 2016-05-23 15:41:54

  • related to Bug #11398: Florence sometimes hides other windows, which breaks tests added

#2 Updated by intrigeri 2016-05-24 13:45:10

There’s one more cause for unreliability here: there’s a TOCTU race condition in:

  next if not(@screen.exists("GnomeNotificationApplet.png"))
  @screen.click("GnomeNotificationApplet.png")
  @screen.wait("GnomeNotificationAppletOpened.png", 10)

… that triggers:

[log] CLICK on (1007,762)
    And I start Tails from DVD with network unplugged and I login                # features/step_definitions/common_steps.rb:193
      FindFailed: can not find GnomeNotificationAppletOpened.png on the screen.
      Line ?, in File ? (RuntimeError)
      ./features/step_definitions/common_steps.rb:210:in `/^I start Tails( from DVD)?( with network unplugged)?( and I login)?$/'
      features/untrusted_partitions.feature:78:in `And I start Tails from DVD with network unplugged and I login'

(video attached).

Maybe we should try clicking the image in a block that catches exceptions, and do next iff any exception is raised? Sounds like this would easily eliminate the race condition, no?

Here again, I can’t really mark scenarios as fragile (yet) since so many are affected, so bumping priority a bit.

#3 Updated by intrigeri 2016-05-25 10:20:41

  • Status changed from Confirmed to In Progress
  • Assignee set to intrigeri
  • Target version set to Tails_2.4
  • % Done changed from 0 to 10
  • Feature Branch set to test/11398-florence-hides-other-windows

Pushed a tentative fix to test/11398-florence-hides-other-windows, since I think that branch makes it more likely to hit such problems.

#4 Updated by intrigeri 2016-05-26 09:03:15

  • File <del>missing: 00_11_51_Tails_will_not_enable_disk_swap.mkv</del> added

My tentative fix has a gross bug, pushed a fixup.

#5 Updated by intrigeri 2016-05-26 09:03:24

  • File deleted (00_11_51_Tails_will_not_enable_disk_swap.mkv)

#6 Updated by intrigeri 2016-05-26 15:29:12

Even with my fix in, this still sometimes fails: https://jenkins.tails.boum.org/job/test_Tails_ISO_test-10376-fix-startup-page-roadmap-test-is-fragile/28/. I think this shows that Sikuli’s screen.click is racy, and not reliable when used on a fast-changing UI. I’ll try to thing of a workaround, but really IMO we should try to find another solution (would dogtail be less racy?) to make the whole notifications thing more robust.

#7 Updated by intrigeri 2016-05-26 15:43:09

I think that another problem I’ve just fixed (commit:eab5505) might have made this problem more likely to happen, because previously we were often running “all notifications have disappeared” twice, which at least doubles the chances it fails (I say “at least” because the 2nd call has more chances to be close to the time when the “running in a VM” notification times out, due to how the whole timing of our session initialization vs. test suite works).

#8 Updated by intrigeri 2016-05-27 12:29:13

  • Assignee changed from intrigeri to anonym
  • % Done changed from 10 to 50
  • QA Check set to Ready for QA

Seems to be pretty robust now!

#9 Updated by anonym 2016-06-01 19:55:44

  • Status changed from In Progress to Fix committed
  • Assignee changed from anonym to intrigeri
  • % Done changed from 50 to 100

intrigeri wrote:
> Seems to be pretty robust now!

Indeed, which is the reason I merged it, but you might want to look at this commit I pushed on top:

761f3df You cannot `return` in a step definition block.


Is there anything we can do to cleanly deal with this (unlikely?) error case? IMHO we can just forget about this and revisit if we see issues — AFAICT the real fix of this branch is the other stuff.

#10 Updated by anonym 2016-06-02 11:04:31

anonym wrote:
> Indeed, which is the reason I merged it, but you might want to look at this commit I pushed on top:

And:

cc1051e Make try_for less race-prone.

#11 Updated by intrigeri 2016-06-02 13:17:23

> Indeed, which is the reason I merged it, but you might want to look at this commit I pushed on top:
>

> 761f3df You cannot `return` in a step definition block.
> 

Indeed, right. IIRC I was not convinced either, but copied it from some other place that I can’t find now, so probably I’m just wrong :)

> Is there anything we can do to cleanly deal with this (unlikely?) error case?

Indeed we still have a TOCTOU problem here, but apparently it’s less bad than the previous exists+click. I don’t know. We could assume that a failure screen.wait("GnomeNotificationAppletOpened.png", 10) means that the notification applet has disappeared, and then we need to unclick what we clicked, but it feels potentially fragile & premature optimization, so:

> IMHO we can just forget about this and revisit if we see issues — AFAICT the real fix of this branch is the other stuff.

Yes.

#12 Updated by intrigeri 2016-06-02 13:17:48

  • Assignee deleted (intrigeri)
  • QA Check changed from Ready for QA to Pass

#13 Updated by intrigeri 2016-06-02 13:19:41

> And:
>

> cc1051e Make try_for less race-prone.
> 

Well, I had lowered the timeout to 2s in commit:7c4bd26 because a 5s timeout was triggering Bug #11464. We’ll see how it fares in practice, but let’s be ready to lower it back.

#14 Updated by anonym 2016-06-08 01:27:51

  • Status changed from Fix committed to Resolved

#15 Updated by bertagaz 2016-07-22 03:04:07

Sadly we’ve seen this bug several time again while referencing the test suite failure in Jenkins for 2016 June (see Bug #11087#note-9). Clearly Sikuli clicks too late on the notification area, which has already disappeared, and then ends up opening the workspace selection applet.

#16 Updated by Anonymous 2016-09-03 13:53:16

  • Assignee set to intrigeri

While checking “Volunteers to handle important tickets flagged for next release, but without assignee” during the current contributor meeting we saw this ticket. Assigning to you, tentatively, because I don’t know what to do with it exactly otherwise.

#17 Updated by intrigeri 2016-09-04 03:55:53

  • Assignee changed from intrigeri to anonym
  • Target version changed from Tails_2.6 to Tails_3.0

Thank you for putting it back onto our radar! Indeed it was a mistake to assign this to 2.6 with no assignee. I think that anonym has some improvements in this area on the feature/stretch branch, so moving it to 3.0.

#18 Updated by anonym 2017-02-02 14:13:47

  • Status changed from Confirmed to Fix committed
  • % Done changed from 0 to 100

Applied in changeset commit:358eed2cd3a690aabc7876761b2e4b3f23718aed.

#19 Updated by anonym 2017-02-02 14:17:10

  • Assignee deleted (anonym)
  • QA Check changed from Dev Needed to Pass

#20 Updated by anonym 2017-02-02 15:59:21

  • Status changed from Fix committed to Resolved