Bug #10288

Fix newly identified issues to make our test suite more robust and faster

Added by anonym 2015-09-27 06:36:53 . Updated 2019-11-29 10:55:02 .

Status:
In Progress
Priority:
Elevated
Assignee:
Category:
Test suite
Target version:
Start date:
2015-02-26
Due date:
% Done:

72%

Feature Branch:
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

Our initial plan will be to mark any scenario that we ever see fail in the automated test suite run by jenkins (not locally, or anywhere else) with the @fragile tag. On jenkins we will run the test suite with the Cucumber option --tag ~fragile@, which makes it skip these scenarios.

Whenever we find a robustness issue for a Scenario $SCENARIO we do the following:

  1. Add the @fragile tag to $SCENARIO, commit it to a suitable base branch B. Often B will be stable, but it could be devel if not affecting stable, or e.g. feature/jessie (or other long-term integration branches) if only affecting that one. Let’s say this became commit DEADBEE. Then merge B into all base branches where it makes sense (e.g. if B == stable we’d merge into devel, and possible then merge devel into feature/jessie).
  2. File a ticket (let’s say the number becomes #NNNNN) with subject: “$SCENARIO is fragile” (or similar) and
    1. reference commit DEADBEE in the description.
    2. make it block this ticket, i.e. Bug #10288.
  3. Create a branch test/NNNN-fix-${SCENARIO} (or a similar, shortened name) from B, and
    1. commit a revert of DEADBEE.
    2. set this branch as the Feature Branch in ticket #NNNN.

Iterating this process (and merging the base branches into all feature/bugfix/test branches, including those created by this procedure) should make us converge to a state where we have isolated all robustness issues to individual branches, and all base branches should be green. On jenkins.

Note: There may be more than one reason to tag a scenario @fragile, which breaks the above scheme a bit. We do not want to end up with the revert of one branch removing the @fragile tag in the base branches when its merged, while there still are at least one unmerged branch with another reason for the scenario to be marked @fragile. I think the best way to track this is on the ticket, and by making a comment around the @fragile tag, listing each ticket tracking the scenarios fragility. In each branch you remove only its ticket => we get a merge conflict as a “notification” when merging, and we only remove the tag in the base branch when the last ticket is removed from the comment.

Creating a summary of failures

1. Clone the puppet-tails Git repo, get the attached json-analysis script.

2. Get the `jobResults-*.xml` files you want from jenkins.lizard:/var/lib/jenkins/global-build-stats/jobresults/

3. download all the JSON test result files you’re interested in (you can pass an epoch to ISO-test-suite-runs, that’ll be the starting point) e.g.:

cd $PUPPET_TAILS_REPO
for url in $(./files/jenkins/master/ISO-test-suite-runs /tmp/jobResults-2015-12.xml) ; do
    dest=$(mktemp --tmpdir=. tailstester-XXXXXXXXXX.json)
    wget -O "$dest" "$url"
    [ -s "$dest" ] || rm -f "$dest"
done

4. Do the analysis on JSON files:

json-analysis --steps *.json@

Files


Subtasks

Bug #10375: Increase the number of Tor circuit retries in the test suite Resolved

100

Bug #10376: The "the Tor Browser loads the (startup page|Tails roadmap)" step is fragile Resolved

100

Feature #10379: Check that we do not see any error pages in the "I open the address" step. Rejected

0

Bug #10380: gobby tests are fragile Resolved

0

Bug #10378: The "Tails OpenPGP keys" scenario is fragile Resolved

100

Bug #10381: The "I open the address" steps are fragile Resolved

100

Bug #8961: The automated test suite doesn't fetch Tor relays from unverified-microdesc-consensus.bak Resolved

100

Bug #9654: "IPv4 TCP non-Tor Internet hosts were contacted" during the test suite Resolved

100

Bug #10440: Time syncing scenarios are fragile Resolved

100

Bug #10441: Synaptic test is fragile Resolved

100

Bug #10442: Totem "Watching a WebM video over HTTPS" test never passes on Jenkins Resolved

100

Bug #10444: Git tests are fragile Resolved

100

Bug #10474: Scenario "Connecting to the #i2p IRC channel" is fragile Rejected

30

Bug #10475: Scenario "Using a persistent Electrum configuration" is fragile In Progress

0

Bug #10493: The "I see "WindowsSysTraySound.png"" step is fragile Resolved

100

Bug #10495: The 'the time has synced' step is fragile In Progress

0

Bug #10496: apt-get scenarios are fragile Resolved

100

Bug #10497: wait_until_tor_is_working helper is fragile Resolved

100

Bug #10498: SSH tests are fragile Resolved

100

Bug #10499: The ICMP Tor enforcement test is fragile Resolved hefee

100

Bug #10500: Monitor failure modes of Seahorse Rejected

0

Bug #10501: Step 'the "10CC5BC7" key is in the live user's public keyring' is fragile Resolved

100

Bug #10502: The test suite sometimes cannot connect to the remote shell Resolved

100

Feature #10503: Run erase_memory.feature first to optimize test suite performance Resolved

100

Bug #10504: boot_device method in features/step_definitions/usb.rb is broken Resolved

100

Bug #10523: whois test is fragile Resolved

100

Bug #10718: Lower waiting time for USB installation in the test suite Resolved

100

Bug #10774: MAC address spoofing failure notifications are not always displayed Confirmed

0

Bug #10775: "I can view and print a PDF file stored in /usr/share" scenario is fragile Resolved

100

Bug #10776: Step "I shutdown and wait for Tails to finish wiping the memory" fails when memory wiping causes a freeze Resolved

100

Bug #10777: The test suite machinery sometimes misses the boot splash Resolved

100

Bug #10783: Test that clicks the roadmap URL in Pidgin is fragile Resolved

100

Feature #10900: "I should be able to install a package using Synaptic" step is fragile Resolved

100

Bug #10991: The "I both encrypt and sign the message using my OpenPGP key" step is fragile Confirmed

0

Bug #10992: Fragile test: the OpenPGP applet key selection window is moved partly off-screen instead of selecting a key Confirmed anonym

0

Bug #10994: "I can view and print a PDF file" scenarios are fragile Confirmed

30

Bug #11114: I2P tests are fragile Confirmed

0

Bug #11394: "Symmetric encryption and decryption using OpenPGP Applet" is fragile Confirmed

0

Bug #11398: Florence sometimes hides other windows, which breaks tests Resolved

100

Bug #11400: "I test Torbirdy's proxy settings" test sometimes fails due to missing favicon in "congratulations" tab Rejected anonym

0

Bug #11401: robust_notification_wait sometimes opens the Applications menu which breaks tests Resolved

100

Bug #11409: Deal with the 'Dogtail: warning: application may be hanging' bug Resolved anonym

0

Bug #11413: Test suite: newly added XMPP account is not persisted, and "Pidgin has the expected persistent accounts configured" doesn't notice Resolved

100

Bug #11414: The "Chatting with some friend over XMPP in a multi-user chat" scenario is fragile Resolved hefee

100

Bug #11452: "I2P displays a notice when bootstrapping fails" test is fragile Confirmed

0

Bug #11453: "Chatting with some friend over XMPP" test is fragile Resolved hefee

100

Bug #11457: "I close the Unsafe Browser" step is fragile Resolved hefee

100

Bug #11458: "I see the Unsafe Browser start notification and wait for it to close" step is fragile Rejected

0

Bug #11462: "I2P is running" test is fragile: may fail when the time has not sync'ed yet Confirmed

0

Bug #11463: robust_notification_wait sometimes does not recognize the notification it's looking for Resolved hefee

100

Bug #11464: "all notifications have disappeared" step is fragile when network is unplugged Resolved

100

Bug #11465: focus_window uses select_virtual_desktop in a racy way Confirmed

0

Bug #11479: "the Tails desktop is ready" step is fragile due to buggy display of Florence systray icon Resolved

100

Bug #11508: Firewall leak detector makes bad assumptions about PacketFu parsing Resolved

100

Bug #11521: The check_tor_leaks hook is fragile Resolved

100

Bug #11558: Step a Tails persistence partition exists on USB drive is fragile Resolved

100

Bug #11563: Git over HTTPS scenario is fragile Resolved hefee

100

Bug #11582: Some upgrade test scenarios fail due to lack of disk space on Jenkins Resolved

100

Bug #11583: UEFI boot tests fail on Jenkins Resolved

100

Bug #11584: "Using a persistent Pidgin configuration" is fragile Resolved hefee

100

Bug #11585: "Persistent browser bookmarks" is fragile Confirmed

0

Bug #11588: Sometimes fails to boot from USB on Jenkins with I/O errors Resolved

100

Bug #11589: Time syncing over bridge is fragile Confirmed

0

Bug #11591: Step "the Tor Browser shows the [...] error" is fragile Resolved hefee

100

Bug #11592: Step "[...] has loaded in the Tor Browser" is fragile Resolved

100

Bug #11606: "Tor Launcher uses all expected TBB shared libraries" is fragile Resolved hefee

100

Bug #11616: "The emergency shutdown applet can .." scenarios are fragile Resolved

100

Bug #11617: Clicking "Yes" for "More options?" in the Greeter sometimes fails in Jenkins Resolved

100

Bug #11697: Step "Electrum successfully connects to the network" is fragile In Progress

0

Bug #11698: Test suite calls undefined save_pcap_file method in " the network device has its default MAC address configured" Resolved

100

Bug #11711: "The Unsafe Browser can be used in all languages supported in Tails" test is broken for locales that have a translated homepage Resolved

100

Bug #11816: Test suite often freezes after clicking "Login" in the Greeter Resolved

100

Bug #11865: In the test suite we sometimes boot from the isohybrid when we intended to boot from the DVD Confirmed

0

Bug #11890: Checking credentials in Thunderbird autoconfig wizard sometimes fails in the test suite In Progress anonym

30

Bug #11892: Sometimes the remote shell doesn't start because of missing initial Space when modifying the kernel cmdline In Progress anonym

0

Bug #11901: Adjust test suite to take into account that MAT does not clean PDF files anymore Resolved

100

Bug #11906: Icedove "Only the expected addons are installed" scenario fails since "amnesia branding" is not installed Resolved

100

Bug #12040: Test suite cannot sometimes connect to the remote shell: "Dropped out-of-order remote shell response: got id but expected id NNNN" Confirmed anonym

0

Bug #12041: Spurious reboot breaks test suite which cannot connect to the remote shell Confirmed anonym

0

Bug #12042: Thunderbird email sending test sometimes fails due to the Attachment Reminder Confirmed anonym

0

Bug #12043: Test failure in "Fetching OpenPGP keys using Seahorse via the OpenPGP Applet should work and be done over Tor" due to weird interaction with GNOME Shell tiling features Confirmed

0

Bug #12044: Step "only the expected files are present on the persistence partition" sometimes fails: guestfs fails to find partition Confirmed

0

Bug #12045: Step 'I try a "Clone & Upgrade" Tails to USB drive "isohybrid"' sometimes fails: no target drive listed Confirmed

0

Bug #12047: Step 'I temporarily create a 100 MiB disk named "swap"' timeouts Confirmed

0

Bug #12131: Step 'I double-click the Report an Error launcher on the desktop' sometimes fails Resolved

100

Bug #12132: Step 'I shutdown Tails and wait for the computer to power off' sometimes fails by rebooting instead Confirmed

0

Bug #12558: The "Chatting with some friend over XMPP in a multi-user chat" scenario is broken on Riseup MUCs Resolved

0

Bug #12586: Synaptic test is fragile on Stretch Rejected

0

Bug #13458: Step "a screenshot is saved to the live user's Pictures directory" is fragile Confirmed

0

Bug #13459: Scenario "Booting Tails from a USB drive in UEFI mode" is fragile Needs Validation anonym

0

Bug #13460: Virt-viewer fails to start Confirmed

0

Bug #13461: The Desktop icons are sometimes not displayed since the upgrade to Stretch Resolved

20

Bug #13469: Starting applications "via GNOME Activities Overview" step is fragile Resolved

100

Bug #13470: Step "Tails Greeter has applied all settings" is fragile Resolved hefee

100

Bug #13541: Tor still sometimes fails to bootstrap in the test suite In Progress

20

Bug #14770: "Fetching OpenPGP keys" scenarios are fragile: communication failure with keyserver Resolved

0

Bug #14771: Retrying mechanism for the "I open the address" step is buggy in the Unsafe Browser Resolved

100

Bug #15321: "The Report an Error launcher will…" test suite step is fragile Resolved

0

Bug #15514: The "The Tails documentation launcher on the desktop works…" scenarios are fragile Resolved

100


Related issues

Related to Tails - Feature #10287: Set up limited email notification on automatic test failure for the initial deployment Resolved 2015-09-27
Related to Tails - Bug #10096: Fix newly identified issues to make our test suite more robust and faster, phase 2 Rejected 2015-08-26
Related to Tails - Feature #11355: Re-enable Jenkins notifications on ISO build/test failure In Progress 2017-08-28
Related to Tails - Bug #16959: Gather usability data about our current CI In Progress

History

#1 Updated by anonym 2015-09-27 06:37:21

  • Description updated

#2 Updated by intrigeri 2015-09-28 02:02:07

  • blocks #8668 added

#3 Updated by intrigeri 2015-09-28 02:02:44

  • Deliverable for set to 270

#4 Updated by anonym 2015-09-28 04:52:13

  • related to Feature #10287: Set up limited email notification on automatic test failure for the initial deployment added

#5 Updated by anonym 2015-10-04 12:53:19

  • File <del>missing: analysis-summary.txt</del> added
  • File <del>missing: tailstester1-json.tar.bz2</del> added
  • File json-analysis added
  • Assignee changed from anonym to kytv
  • QA Check set to Dev Needed

Edit: Removing comment. So much was wrong with these early isotester1 runs that it’s just confusing the discussion on this ticket.

#6 Updated by anonym 2015-10-12 03:48:23

  • File deleted (analysis-summary.txt)

#7 Updated by anonym 2015-10-12 03:48:30

  • File deleted (tailstester1-json.tar.bz2)

#8 Updated by anonym 2015-10-12 04:51:14

Here’s a summary for the last 12 runs on isotester1 (simply json-analysis *.json):

Step failure breakdown (total: 55):
* 24    Step: the Unsafe Browser works in all supported languages
  - 24    Scenario: The Unsafe Browser can be used in all languages supported in Tails
* 10    Step: Tor is ready
  - 5     Scenario: Clock way in the future
  - 2     Scenario: Using obfs2 pluggable transports
  - 1     Scenario: Clock with host's time
  - 1     Scenario: The tor process should be confined with Seccomp
  - 1     Scenario: I2P is enabled when the "i2p" boot parameter is added
* 6     Step: Pidgin successfully connects to the "irc.oftc.net" account
  - 4     Scenario: Connecting to the #tails IRC channel with the pre-configured account
  - 2     Scenario: Using a persistent Pidgin configuration
* 4     Step: the OpenPGP keys shipped with Tails will be valid for the next 3 months
  - 4     Scenario: The shipped Tails OpenPGP keys are up-to-date
* 2     Step: the Tor Browser has started and loaded the Tails roadmap
  - 2     Scenario: Connecting to the #tails IRC channel with the pre-configured account
* 1     Step: I see "SSHAuthVerification.png" after at most 60 seconds
  - 1     Scenario: SSH is using the default SocksPort
* 1     Step: I configure some Bridge pluggable transports in Tor Launcher
  - 1     Scenario: Clock way in the future in bridge mode
* 1     Step: I create a new bitcoin wallet
  - 1     Scenario: Using a persistent Electrum configuration
* 1     Step: I open "/home/amnesia/Persistent/default-testpage.pdf" with Evince
  - 1     Scenario: I can view and print a PDF file stored in persistent /home/amnesia/Persistent but not /home/amnesia/.gnupg
* 1     Step: I click the blocked video icon
  - 1     Scenario: Watching a WebM video
* 1     Step: I fetch the "10CC5BC7" OpenPGP key using the GnuPG CLI without any signatures
  - 1     Scenario: Syncing OpenPGP keys using Seahorse started from the Tails OpenPGP Applet should work and be done over Tor.
* 1     Step: I connect Gobby to "gobby.debian.org"
  - 1     Scenario: Gobby is using the default SocksPort
* 1     Step: the Tor Browser has started and loaded the startup page
  - 1     Scenario: Importing an OpenPGP key from a website
* 1     Step: I see "WindowsStartMenu.png" after at most 10 seconds
  - 1     Scenario: The panel menu should look like Microsoft Windows's start menu

Some more detailed analysis:

* 24    Step: the Unsafe Browser works in all supported languages
  - 24    Scenario: The Unsafe Browser can be used in all languages supported in Tails


This is because isotester1 doesn’t set an UTF-8 locale. See Bug #10359.

Also, we apparently have duplicated the 'The Unsafe Browser can be used in all languages supported in Tails' scenario in both localization.feature and unsafe_browser.feature.

* 10    Step: Tor is ready
  - 5     Scenario: Clock way in the future
  - 2     Scenario: Using obfs2 pluggable transports
  - 1     Scenario: Clock with host's time
  - 1     Scenario: The tor process should be confined with Seccomp
  - 1     Scenario: I2P is enabled when the "i2p" boot parameter is added

7 of these happen in wait_until_tor_is_working indicating that Feature #9516 didn’t solve everything. In fact, all those errors occur in the ‘Tor is ready’ step, implying that none seem to occur when we restore from a snapshot. My impression is that before Feature #9516 the Tor bootstrapping issues happened just as often when restoring from snapshot. Add to that that the case where we run wait_until_tor_is_working after restoring a snapshot happens a lot more frequently than in the ‘Tor is ready’ step. This seems to indicate that Feature #9516 has issues with the initial bootstrap, e.g. when combined with tordate.

The other failures happened in the (sub) step ‘the time has synced’, and specifically it was htpdate. We should add some improved error logging (e.g. dump contents of /var/log/htpdate.log) and possibly logic for retrying htpdate on failure. Although that should be thought about carefully since these errors affect users as much and hence are real issues.

Any way, we cannot mark all scenarios using this step (incl. those depending on snapshots using it) @fragile since it would disable all tests using the network, essentially.

* 6     Step: Pidgin successfully connects to the "irc.oftc.net" account
  - 4     Scenario: Connecting to the #tails IRC channel with the pre-configured account
  - 2     Scenario: Using a persistent Pidgin configuration

All are of the type: “The operation failed (despite forcing 5 new Tor circuits) with: RuntimeError: Connecting to account irc.oftc.net failed. (TorFailure)”.

OFTC blocking is still a problem. We could try bumping the retries from 5 to 10. Thoughts?

* 4     Step: the OpenPGP keys shipped with Tails will be valid for the next 3 months
  - 4     Scenario: The shipped Tails OpenPGP keys are up-to-date

Expected. We need to be more proactive about updating the key… :)

* 2     Step: the Tor Browser has started and loaded the Tails roadmap
  - 2     Scenario: Connecting to the #tails IRC channel with the pre-configured account

The ‘I see the Tails roadmap URL’ doesn’t use the retrying-magic we have in the 'I open the address ...' step. We should refactor out that code from the latter so it can be used in the former step.

* 1     Step: I see "SSHAuthVerification.png" after at most 60 seconds
  - 1     Scenario: SSH is using the default SocksPort

I’m unsure what’s wrong here. For some reason isotester1 doesn’t have any artifacts except the json log for run 41, so I cannot investigate further.

* 1     Step: I configure some Bridge pluggable transports in Tor Launcher
  - 1     Scenario: Clock way in the future in bridge mode

Known to be very fragile. We should probably mark all the “way in the past/future” scenarios as @fragile right away.

* 1     Step: I create a new bitcoin wallet
  - 1     Scenario: Using a persistent Electrum configuration

This was on run 45, and the error screenshot shows an image of a prompt saying “You are offline”. Interesting.

* 1     Step: I open "/home/amnesia/Persistent/default-testpage.pdf" with Evince
  - 1     Scenario: I can view and print a PDF file stored in persistent /home/amnesia/Persistent but not /home/amnesia/.gnupg

Here we failed to find GnomeTerminalWindow.png. Can’t investigate further due to missing artifacts in run 38.

* 1     Step: I click the blocked video icon
  - 1     Scenario: Watching a WebM video

Run 35. Looking at the trace is interesting:

    And I open the address "https://webm.html5.org/test.webm" in the Tor Browser      # features/step_definitions/common_steps.rb:550
    And I click the blocked video icon                                                # features/step_definitions/common_steps.rb:838
      FindFailed: can not find TorBrowserBlockedVideo.png on the screen.

In the error screenshot I can see that the Tor Browser is showing the “The connection has timed out” page. Something must be wrong with our recent improvements in the ‘I open the address ...’ step since it accepted this page. Possibly the existing condition isn’t enough, but we should also check that we do not see this particular error page, or perhaps any error page. Thoughts?

* 1     Step: I fetch the "10CC5BC7" OpenPGP key using the GnuPG CLI without any signatures
  - 1     Scenario: Syncing OpenPGP keys using Seahorse started from the Tails OpenPGP Applet should work and be done over Tor.

Another “The operation failed (despite forcing 5 new Tor circuits)”. Either we have to bump the retries, or start investigating the OpenPGP server issue again.

* 1     Step: I connect Gobby to "gobby.debian.org"
  - 1     Scenario: Gobby is using the default SocksPort

Run 43. The error screenshot shows that Gobby is still trying to resolve gobby.debian.net. I guess we need Tor retry magic here.

* 1     Step: the Tor Browser has started and loaded the startup page
  - 1     Scenario: Importing an OpenPGP key from a website

Happened in run 39, which lacks artifacts so I cannot investigate. Our recent retry magic should have fixed it, but maybe it’s another instance of the “The connection has timed out” page messing things up, like above?

* 1     Step: I see "WindowsStartMenu.png" after at most 10 seconds
  - 1     Scenario: The panel menu should look like Microsoft Windows's start menu

Happened in run 39, which lacks artifacts so I cannot investigate.

#9 Updated by anonym 2015-10-12 04:55:08

  • Status changed from Confirmed to In Progress
  • Target version changed from Tails_1.8 to Tails_1.7
  • QA Check changed from Dev Needed to Info Needed

Actually, we need to deal with the @fragile tagging and branch creation ASAP since the jenkins deplyment is imminent.

kytv, can you please go through my analysis above? Could you create tickets where you agree that there is a problem? I think we should skip those where we didn’t get artifacts (and hence have no clue what the issue is) so they will be re-run and fail again soon with artifacts. I can then do the @fragile tagging + branch creation since that’s easier with commit rights.

#10 Updated by kytv 2015-10-15 05:33:25

anonym wrote:
> Here’s a summary for the last 12 runs on isotester1 (simply json-analysis *.json):
> […]
>
> Some more detailed analysis:
>
> […]
>
> All are of the type: “The operation failed (despite forcing 5 new Tor circuits) with: RuntimeError: Connecting to account irc.oftc.net failed. (TorFailure)”.
>
> OFTC blocking is still a problem. We could try bumping the retries from 5 to 10. Thoughts?

I’m fine with that. We could even go higher since it’s a configurable option.

>
> The ‘I see the Tails roadmap URL’ doesn’t use the retrying-magic we have in the 'I open the address ...' step. We should refactor out that code from the latter so it can be used in the former step.
>

+1 for refactoring

>
> This was on run 45, and the error screenshot shows an image of a prompt saying “You are offline”. Interesting.

Hmm. Perhaps something to do with our Electrum being “too old” to connect to the servers? Granted, [ “too old” != “offline” ], but maybe the error is wrong.

> […]
>
> In the error screenshot I can see that the Tor Browser is showing the “The connection has timed out” page. Something must be wrong with our recent improvements in the ‘I open the address ...’ step since it accepted this page. Possibly the existing condition isn’t enough, but we should also check that we do not see this particular error page, or perhaps any error page. Thoughts?

+1 for checking for an error.

>
> Run 43. The error screenshot shows that Gobby is still trying to resolve gobby.debian.net. I guess we need Tor retry magic here.

Agreed.

#11 Updated by anonym 2015-10-15 05:45:36

anonym wrote:
> The other failures happened in the (sub) step ‘the time has synced’, and specifically it was htpdate. We should add some improved error logging (e.g. dump contents of /var/log/htpdate.log) and possibly logic for retrying htpdate on failure. Although that should be thought about carefully since these errors affect users as much and hence are real issues.

I’m seeing this more and more, possibly because I’m seeing less and less of Tor bootstrapping errors. I think we need some retry_tor love for htpdate only in that step. Note: tordate is part of the Tor bootstrapping, which we already have fixed, more or less. So we need a ticket for this, but we cannot do anything about this with the @fragile tag.

Next: another scenario that needs a @fragile tag is ‘Install packages using Synaptic’. I’ve been running the test suite on another, weaker machine that I have, and I’ve seen two instances of:

    And I update APT using Synaptic                           # features/step_def
initions/apt.rb:31
[log] Ctrl+TYPE "f"
    Then I should be able to install a package using Synaptic # features/step_def
initions/apt.rb:45
      FindFailed: can not find SynapticSearch.png on the screen.


So, again, we’re having issues with keyboard shortcuts. While it could be a window focus issue, or solvable with retrying, I think the proper solution is to make this scenario completely image + mouse click driven (there are more than just Ctrl+f).

#12 Updated by anonym 2015-10-15 05:49:14

kytv wrote:
> anonym wrote:
> > Here’s a summary for the last 12 runs on isotester1 (simply json-analysis *.json):
> > […]
> >
> > Some more detailed analysis:
> >
> > […]
> >
> > All are of the type: “The operation failed (despite forcing 5 new Tor circuits) with: RuntimeError: Connecting to account irc.oftc.net failed. (TorFailure)”.
> >
> > OFTC blocking is still a problem. We could try bumping the retries from 5 to 10. Thoughts?
>
> I’m fine with that. We could even go higher since it’s a configurable option.

IMHO it being configurable only makes this easier to test (even without a branch). However, the defaults should be sane, so that’s what we’re aiming to increase here if it will improve the situation.

> > This was on run 45, and the error screenshot shows an image of a prompt saying “You are offline”. Interesting.
>
> Hmm. Perhaps something to do with our Electrum being “too old” to connect to the servers? Granted, [ “too old” != “offline” ], but maybe the error is wrong.

Could be. I’ve never seen that before, so perhaps it’s only worth creating the ticket, but skip the @fragile and branch dance.

#13 Updated by kytv 2015-10-15 06:10:32

anonym wrote:
> kytv wrote:
> > anonym wrote:
> > > Here’s a summary for the last 12 runs on isotester1 (simply json-analysis *.json):
> > > […]
> > >
> > > Some more detailed analysis:
> > >
> > > […]
> > >
> > > All are of the type: “The operation failed (despite forcing 5 new Tor circuits) with: RuntimeError: Connecting to account irc.oftc.net failed. (TorFailure)”.
> > >
> > > OFTC blocking is still a problem. We could try bumping the retries from 5 to 10. Thoughts?
> >
> > I’m fine with that. We could even go higher since it’s a configurable option.
>
> IMHO it being configurable only makes this easier to test (even without a branch). However, the defaults should be sane, so that’s what we’re aiming to increase here if it will improve the situation.
>

Of course. :) Since I didn’t know what would be deemed “sane” I went with 5 when I implemented it.

Before Feature #9653 we were doomed to wait a minute for the #tails channel image to appear while a Reconnect button was waiting to be clicked. Retrying 15 times would mean waiting at least 15 minutes but now it could go through all 15 retries in half a minute. Raising it to 10 or 15 seems reasonable.

#14 Updated by kytv 2015-10-15 06:29:27

  • blocked by Bug #10378: The "Tails OpenPGP keys" scenario is fragile added

#15 Updated by kytv 2015-10-15 06:53:52

  • blocked by Bug #10380: gobby tests are fragile added

#16 Updated by kytv 2015-10-15 07:05:01

  • blocked by Bug #10381: The "I open the address" steps are fragile added

#17 Updated by anonym 2015-10-15 08:45:45

General remark/clarification: in some instances it’s not really our tests that are fragile, but rather our features in Tails that actually are fragile with the tests being correct. Hence I think that @fragile actually means that the test is fragile and/or that the Tails feature being tested is fragile.

Note that there is some relationship with disabling known broken tests, i.e. Bug #7233. However, I think we must distinguish tests that are fragile, i.e. fail with some probability, from tests that always fail. Only the latter is suitable to be marked @known_broken á Bug #7233, because when we have the proper solution, the tests will not be disabled, but be run and they will be expected to fail (i.e. success/failure are reversed)! IMHO this also supports my reasoning for the extended usage of @fragile, above.

anonym wrote:
> OFTC blocking is still a problem. We could try bumping the retries from 5 to 10. Thoughts?

Now we have Bug #10375 which might improve the situation, but clearly OFTC is to blame here, no the test suite. Feature #7874 seems like the actual fix, but we’ll see how the retries bump works out. That should be tried in the branch we create for removing the @fragile tag.

> […] Scenario: Clock way in the future in bridge mode
>
> Known to be very fragile. We should probably mark all the “way in the past/future” scenarios as @fragile right away.

Again, the problem is not the tests, but that our time syncing is not robust (especially the “tordate” component), so the real fix is Feature #5774, not anything related to the tests. Marking all these four scenarios as @fragile is completely necessary, though.

#18 Updated by intrigeri 2015-10-16 02:51:42

  • Subject changed from Fix newly identified robustness issues in the automated test suite to Fix newly identified issues to make our test suite more robust and faster, phase 1
  • Deliverable for changed from 270 to 267

anonym, kytv: we actually have a deliverable on this for milestone III, and another for milestone VI. The second one already had a ticket (Bug #10096) so I’m tweaking this one to track the milestone III goals. In practice, of course you’ll be continuously fixing stuff as part of intermediary milestones as well. That’s a big pile of work (whose size is not known yet) to be done over the courses of 9.5 months, so it’ll need to be split into smaller batches, so that it’s clear what are the ones you want to fix during the current cycle, etc. I suggest you two draft that in two weeks, when we know more about how large the problem is, but still in time for the next CI team meeting.

#19 Updated by intrigeri 2015-10-16 02:52:02

  • related to Bug #10096: Fix newly identified issues to make our test suite more robust and faster, phase 2 added

#20 Updated by intrigeri 2015-10-16 03:14:06

  • blocks deleted (Bug #10380: gobby tests are fragile)

#21 Updated by intrigeri 2015-10-16 03:14:56

  • blocks deleted (Bug #10378: The "Tails OpenPGP keys" scenario is fragile)

#22 Updated by intrigeri 2015-10-16 03:15:41

  • blocks deleted (Bug #10381: The "I open the address" steps are fragile)

#23 Updated by anonym 2015-10-16 07:45:57

  • Description updated
  • Status changed from In Progress to Confirmed

#24 Updated by anonym 2015-10-16 10:19:16

  • Description updated

#25 Updated by intrigeri 2015-10-31 05:50:01

intrigeri wrote:
> anonym, kytv: we actually have a deliverable on this for milestone III, and another for milestone VI. The second one already had a ticket (Bug #10096) so I’m tweaking this one to track the milestone III goals. In practice, of course you’ll be continuously fixing stuff as part of intermediary milestones as well. That’s a big pile of work (whose size is not known yet) to be done over the courses of 9.5 months, so it’ll need to be split into smaller batches, so that it’s clear what are the ones you want to fix during the current cycle, etc. I suggest you two draft that in two weeks, when we know more about how large the problem is, but still in time for the next CI team meeting.

I think we’ve reached about the time when this should be done (but if you want to do it after 1.7 release and before the CI meeting it’s fine with me). Note that since my last comment the target version / reality inconsistency has expanded to new tickets (Bug #10440, Bug #10441, Bug #10442, Bug #10443 and Bug #10444).

#26 Updated by anonym 2015-10-31 10:44:08

intrigeri wrote:
> intrigeri wrote:
> > anonym, kytv: we actually have a deliverable on this for milestone III, and another for milestone VI. The second one already had a ticket (Bug #10096) so I’m tweaking this one to track the milestone III goals. In practice, of course you’ll be continuously fixing stuff as part of intermediary milestones as well. That’s a big pile of work (whose size is not known yet) to be done over the courses of 9.5 months, so it’ll need to be split into smaller batches, so that it’s clear what are the ones you want to fix during the current cycle, etc. I suggest you two draft that in two weeks, when we know more about how large the problem is, but still in time for the next CI team meeting.
>
> I think we’ve reached about the time when this should be done (but if you want to do it after 1.7 release and before the CI meeting it’s fine with me). Note that since my last comment the target version / reality inconsistency has expanded to new tickets (Bug #10440, Bug #10441, Bug #10442, Bug #10443 and Bug #10444).

Whatever you expect from this ticket is not what I intended. It was to me never the idea that we’d be able to fix even the first batch of fragile tests in time for milestone III.

I want this to be the master ticket collecting all fragile tests we currently have. So imho we’re still in the first phase (so I don’t get what Bug #10096 is for now). (BTW, the target version I set was only to make sure kytv would see this ticket in his view — sorry for not treating the Redmine state as holy scripture and being pragmatic instead :)). Individual tickets can be assigned specific deliverable milestones, or we create some tracking tickets (and there we can call them phases) that are blocked by some of these tickets. At least I feel like we’ll have a better overview of what has to be fixed to make jenkins work good enough for us by having this single ticket tracking all the robustness issues.

#27 Updated by intrigeri 2015-11-06 06:03:40

  • Subject changed from Fix newly identified issues to make our test suite more robust and faster, phase 1 to Fix newly identified issues to make our test suite more robust and faster
  • Target version changed from Tails_1.7 to Tails_2.5
  • QA Check deleted (Info Needed)
  • Deliverable for changed from 267 to 270

#28 Updated by intrigeri 2015-12-07 07:08:11

  • File <del>missing: summary-20151201-to-20151207.txt</del> added

Here’s the output of json-analysis for the test suite runs since the beginning of December.

#29 Updated by intrigeri 2015-12-07 07:09:20

  • Description updated

(Documenting how to create such a summary without too much by-hand painful operations).

#30 Updated by intrigeri 2015-12-07 07:43:43

  • File deleted (summary-20151201-to-20151207.txt)

#31 Updated by intrigeri 2015-12-07 07:44:23

(Updated summary + doc to include details requested by bertagaz.)

#32 Updated by intrigeri 2016-02-20 14:49:10

  • Subject changed from Fix newly identified issues to make our test suite more robust and faster to Fix newly identified issues to make our test suite more robust and faster
  • Assignee changed from kytv to anonym

#33 Updated by sajolida 2016-06-02 14:40:58

  • blocks Feature #10394: Identify which of the remaining manual tests have the best cost/benefit to automate added

#34 Updated by BitingBird 2016-06-26 10:38:29

  • Status changed from Confirmed to In Progress

#35 Updated by anonym 2016-07-18 06:42:53

  • Target version changed from Tails_2.5 to Tails_2.7

#36 Updated by intrigeri 2016-07-18 06:50:36

  • Deliverable for changed from 270 to SponsorS_Internal

#38 Updated by intrigeri 2016-07-22 05:30:08

(Add a wip/ prefix to the newly created branches, see tails-ci@.)

#39 Updated by intrigeri 2016-08-18 07:33:58

  • Assignee deleted (anonym)
  • Target version deleted (Tails_2.7)
  • Deliverable for deleted (SponsorS_Internal)

#40 Updated by intrigeri 2016-08-18 07:42:48

  • related to Feature #11355: Re-enable Jenkins notifications on ISO build/test failure added

#41 Updated by sajolida 2016-11-13 12:00:46

  • blocked by deleted (Feature #10394: Identify which of the remaining manual tests have the best cost/benefit to automate)

#42 Updated by Anonymous 2017-06-29 08:55:45

  • Assignee set to anonym

Assigning the parent ticket to anonym for tracking.

#43 Updated by Anonymous 2018-01-15 11:36:02

  • blocked by deleted (#8668)

#44 Updated by intrigeri 2019-10-19 16:19:21

  • Assignee deleted (anonym)

(I’m not sure what this ticket is useful for at this point. Either way, presuming that anonym will fix All The Things™ matches neither recent reality, nor current plans.)

#45 Updated by intrigeri 2019-12-21 15:16:17

  • related to Bug #16959: Gather usability data about our current CI added