Bug #7233

How to deal with tests expected to fail?

Added by anonym 2014-05-13 09:26:39 . Updated 2015-12-01 09:23:42 .

Status:
Resolved
Priority:
Elevated
Assignee:
Category:
Test suite
Target version:
Start date:
2015-09-26
Due date:
% Done:

100%

Feature Branch:
test/7233-expected-failures
Type of work:
Research
Blueprint:

Starter:
0
Affected tool:
Deliverable for:

Description

Sometimes tests stay broken for extended periods, and we need a way to deal with that.

For an initial discussion, see: https://mailman.boum.org/pipermail/tails-dev/2014-May/005759.html


Subtasks

Bug #10281: Document how we deal with tests expected to fail Resolved

100


History

#1 Updated by intrigeri 2014-05-13 14:34:55

I think “Discuss” would be relevant for this ticket if there was a clear proposal.

Given we’re not there yet, and an actual proposal needs to be built, maybe “Research” would be more appropriate.

#2 Updated by anonym 2014-05-14 03:06:47

  • Target version set to Sustainability_M1
  • Type of work changed from Discuss to Research

Agreed. Also set target version to 2.0 as you proposed on tails-dev@.

#3 Updated by sajolida 2014-07-10 14:46:12

  • Target version changed from Sustainability_M1 to Hardening_M1

#4 Updated by intrigeri 2015-01-06 12:55:25

#5 Updated by sajolida 2015-01-07 12:38:20

  • blocks #8538 added

#6 Updated by anonym 2015-01-09 14:45:29

  • Target version changed from Hardening_M1 to Tails_1.5

#7 Updated by anonym 2015-01-10 17:38:47

  • Assignee set to anonym

#8 Updated by intrigeri 2015-01-14 13:20:08

  • Target version changed from Tails_1.5 to Tails_1.4

Blocks a ticket flagged for 1.4.

#9 Updated by intrigeri 2015-05-09 02:51:37

  • Target version changed from Tails_1.4 to Tails_1.4.1

Postponing. IIRC this one was kinda expected to be postponed when we discussed it last month.

#10 Updated by anonym 2015-06-08 19:21:12

  • Assignee changed from anonym to intrigeri
  • QA Check set to Info Needed
  • Feature Branch set to test/7233-expected-failures

It is fairly easy to monkeypatch cucumber to treat scenarios tagged e.g. @fail as if they actually succeeded. See the test/7233-expected-failures (WIP) branch. However, as the steps fail or are skipped (due to an earlier step failure), formatters will be informed of this. Hence e.g. the JSON formatter will still get the step failures/skips, but in the end cucumber will treat the whole run as a success (e.g. return 0, the pretty formatter will say that all scenarios and steps passed). Perhaps we actually want that? After all, the steps did fail, so whatever statistics jenkins will do for us (via the JSON output) should reflect that, it’s just that we don’t wan the full run to be treated as a failure => trigger notifications and so on. Right? What semantics do we want here, really. I’m assigning this ticket to you, intrigeri, for some input, since you had some pretty clear ideas about this before.

OTOH, I have no idea how jenkins’ cucumber plugin will deal with a run that is treated as a success (returned 0) but where the JSON has steps that are marked as failures. Perhaps it will be fine, perhaps it will blow up. Yay.

It should also be said that if we want some other semantics, e.g. that when a @fail-tagged scenario fails then all steps in it are also treated as successes (even those that we skip), then we’re in trouble because the internal cucumber design/code organization is not appropriate for that from what I can tell.

All in all, introducing something like this feels risky to me, and I currently am not even sure of exactly what we want.

#11 Updated by intrigeri 2015-06-09 19:02:18

  • Assignee changed from intrigeri to anonym
  • QA Check changed from Info Needed to Dev Needed

> It is fairly easy to monkeypatch cucumber to treat scenarios tagged e.g. @fail as if they actually succeeded.

Cool :)

> Perhaps we actually want that? After all, the steps did fail, so whatever statistics jenkins will do for us (via the JSON output) should reflect that, it’s just that we don’t wan the full run to be treated as a failure => trigger notifications and so on.

This seems to totally make sense at first glance.

> I’m assigning this ticket to you, intrigeri, for some input, since you had some pretty clear ideas about this before.

I believe you that I had, but right now, sadly I don’t have the problem space in mind so I’ll be of little help. And ENOTIME right now to re-read the email thread that made us create this ticket :/

> OTOH, I have no idea how jenkins’ cucumber plugin will deal with a run that is treated as a success (returned 0) but where the JSON has steps that are marked as failures. Perhaps it will be fine, perhaps it will blow up.

No idea, but IIRC that plugin only deals with presenting output via the web interface, and it’s not involved in actually running the tests. If I’m correct on this one, then we’ll use a shell “builder” (in Jenkins’ terminology) and then what matters regarding global success/failure is the exit code of ./run_test_suite, so we should be good. This remains to be verified in practice, of course.

> It should also be said that if we want some other semantics, e.g. that when a @fail-tagged scenario fails then all steps in it are also treated as successes (even those that we skip), then we’re in trouble because the internal cucumber design/code organization is not appropriate for that from what I can tell.

I doubt we want to pretend that skipped tests were successful.

> All in all, introducing something like this feels risky to me, and I currently am not even sure of exactly what we want.

=> perhaps first single out a use case or three and reason about them, instead of in theory? At least it would help me, but I’m less a theory person than you :)

#12 Updated by intrigeri 2015-06-29 00:49:17

  • Target version changed from Tails_1.4.1 to Tails_1.5

#13 Updated by anonym 2015-08-03 11:01:57

  • Target version changed from Tails_1.5 to Tails_1.6

#14 Updated by anonym 2015-09-07 13:51:24

  • Assignee changed from anonym to intrigeri
  • QA Check changed from Dev Needed to Info Needed

intrigeri wrote:
> > It is fairly easy to monkeypatch cucumber to treat scenarios tagged e.g. @fail as if they actually succeeded.
>
> Cool :)

Argh! I indeed hope it was “fairly easy” and that I can rediscover it. I do not remember anything about what I did, and lost my test code in this since it apparently was done in a directory that I don’t backup.

> > Perhaps we actually want that? After all, the steps did fail, so whatever statistics jenkins will do for us (via the JSON output) should reflect that, it’s just that we don’t wan the full run to be treated as a failure => trigger notifications and so on.
>
> This seems to totally make sense at first glance.

Ok, then. This is the best I’ve been able to come up with after looking in to this several times over the past year, so I’ll go with it then.

> > I’m assigning this ticket to you, intrigeri, for some input, since you had some pretty clear ideas about this before.
>
> I believe you that I had, but right now, sadly I don’t have the problem space in mind so I’ll be of little help. And ENOTIME right now to re-read the email thread that made us create this ticket :/

I think you answered it adequately. In fact, I found little of interest in that thread, only your description of how it should work:

> In the test frameworks I’m used to, here’s how the
> TODO (known broken) tests work: a test case flagged as such is run,
> and then: if it fails, then it’s result is ignored, and does not
> affect the overall result of the test suite; if it passes, then that’s
> considered (maybe surprisingly) as a failure, which is actually
> logical since it most likely indicates that the test suite is
> outdated, or buggy.

In fact, I think I’d like the tag @known_broken. What do you think?

Back to quoting this ticket:
> > OTOH, I have no idea how jenkins’ cucumber plugin will deal with a run that is treated as a success (returned 0) but where the JSON has steps that are marked as failures. Perhaps it will be fine, perhaps it will blow up.
>
> No idea, but IIRC that plugin only deals with presenting output via the web interface, and it’s not involved in actually running the tests. If I’m correct on this one, then we’ll use a shell “builder” (in Jenkins’ terminology) and then what matters regarding global success/failure is the exit code of ./run_test_suite, so we should be good. This remains to be verified in practice, of course.

Great! Let’s hope you’re right.

> > It should also be said that if we want some other semantics, e.g. that when a @fail-tagged scenario fails then all steps in it are also treated as successes (even those that we skip), then we’re in trouble because the internal cucumber design/code organization is not appropriate for that from what I can tell.
>
> I doubt we want to pretend that skipped tests were successful.

Right. Hopefully it’s only “success” and “fail” that matters in the end.

> > All in all, introducing something like this feels risky to me, and I currently am not even sure of exactly what we want.
>
> => perhaps first single out a use case or three and reason about them, instead of in theory?

Well, let’s look at what we’ve had so far:

  • The tails-dev@ discussion started when I implemented a @skip tag to mark the entire winxp.feature when updating the automated test suite from squeeze to wheezy, in which we the old theme didn’t work, and our plan was to move to an Windows 8-based theme. In retrospect, removing winxp.feature seems like the proper solution. I guess we’re in the same situation now, when migrating to Jessie?
  • There was an extended period of time when the VirtualBox guest modules are available scenario was broken because we had a “temporary” issue building them. Then this seemed like a pretty nice feature to have.

> At least it would help me, but I’m less a theory person than you :)

Hah! Perhaps that is so. :) At least now I have too bad of an imagination to come up with anything reasonable. In general (hah!) it seems like we may need this in the future when doing big migrations, e.g. to new Debian versions, that extends over a long time, and that may break stuff in ways that are not trivial to fix immediately.

So, going a bit more concrete again, look at commit commit:63fa454 (“in” feature/jessie) where some failing steps were removed, and then reintroduced in commit commit:64b9a32 when they were fixed. I think it’d have been better if that scenario was marked @known_broken instead. The revert could easily have been forgotten.

#15 Updated by intrigeri 2015-09-14 10:16:51

  • Assignee changed from intrigeri to anonym
  • QA Check changed from Info Needed to Dev Needed

> I think you answered it adequately. In fact, I found little of interest in that thread, only your description of how it should work:

>> In the test frameworks I’m used to, here’s how the TODO (known broken) tests work: a test case flagged as such is run, and then: if it fails, then it’s result is ignored, and does not affect the overall result of the test suite; if it passes, then that’s considered (maybe surprisingly) as a failure, which is actually logical since it most likely indicates that the test suite is outdated, or buggy.

FTR, I still like it. Also, I seem to remember that there’s something in either “the Cucumber book” or “the rspec book” about this situation. I don’t remember which one, nor the exact suggested solution. Perhaps it’s simply marking steps as “pending”. It could be worth a quick look before jumping to the tag kludge.

> In fact, I think I’d like the tag @known_broken. What do you think?

I like the name.

>> No idea, but IIRC that plugin only deals with presenting output via the web interface, and it’s not involved in actually running the tests. If I’m correct on this one, then we’ll use a shell “builder” (in Jenkins’ terminology) and then what matters regarding global success/failure is the exit code of ./run_test_suite, so we should be good. This remains to be verified in practice, of course.

> Great! Let’s hope you’re right.

Better check on our current Jenkins test deployment, and ask bertagaz if in doubt.

>> > All in all, introducing something like this feels risky to me, and I currently am not even sure of exactly what we want.
>>
>> => perhaps first single out a use case or three and reason about them, instead of in theory?

> Well, let’s look at what we’ve had so far:

> * The tails-dev@ discussion started when I implemented a @skip tag to mark the entire winxp.feature when updating the automated test suite from squeeze to wheezy, in which we the old theme didn’t work, and our plan was to move to an Windows 8-based theme. In retrospect, removing winxp.feature seems like the proper solution. I guess we’re in the same situation now, when migrating to Jessie?

Exactly.

> * There was an extended period of time when the VirtualBox guest modules are available scenario was broken because we had a “temporary” issue building them. Then this seemed like a pretty nice feature to have.

Right.

#16 Updated by anonym 2015-09-15 06:07:57

Ok, so I found the code; luckily I had pushed it to our Git, and actually set the Feature Branch field of this ticket (ha ha!).

Now for some bad news: I absolutely failed to fixup the branch so it’d work on Jessie’s cucumber (I wrote the old one when still on Wheezy). Stuff has changed radically inside cucumber (the cucumber 2.0 rewrite/redesign). I absolutely do not understand the new code any more and would have to spend a substantial amount of time to do so. What’s clear is that the current approach will not work, and several monkey patches would be needed.

I’m more inclined to take this as a lesson: let’s not extend cucumber. The stuff we’ve done with the formatters (to get our extra hooks) is almost fine because it’s based on a stable API, more or less. But this would dig into the fundamentals of cucumber (success/failure status). It will likely break often. I’m not sure where all this puts us.

However, on a more positive note, there’s a new Cucumber feature coming. It would allow us to run all features twice but with different options in the same run; one run with --tags ~known_broken; one run with —wip —tags known_broken. Combined, the whole test suite would be covered, but the kicker would be that [t]he overall result would aggregate results from both suites [read: runs].".

Note: we’ve always been able to solve this problem with two runs, e.g. cucumber --tags ~known_broken ${REST_OF_ARGS} && cucumber —wip —tags known_broken ${REST_OF_ARGS}. But that’d give us two runs in Jenkins, which seemed like an ugly situation.

So, I’d like to close this ticket as resolved, create a new wait-one for integrating this new cucumber feature with our test suite when it arrives.

#18 Updated by anonym 2015-09-15 06:09:28

  • Assignee changed from anonym to intrigeri
  • QA Check changed from Dev Needed to Info Needed

#19 Updated by intrigeri 2015-09-15 06:25:41

> But this would dig into the fundamentals of cucumber (success/failure status). It will likely break often. I’m not sure where all this puts us.

Agreed.

> However, on a more positive note, there’s a new Cucumber feature coming. It would allow us to run all features twice but with different options in the same run; one run with --tags ~known_broken; one run with —wip —tags known_broken. Combined, the whole test suite would be covered, but the kicker would be that [t]he overall result would aggregate results from both suites [read: runs].".

Sounds good!

> Note: we’ve always been able to solve this problem with two runs, e.g. cucumber --tags ~known_broken ${REST_OF_ARGS} && cucumber —wip —tags known_broken ${REST_OF_ARGS}. But that’d give us two runs in Jenkins, which seemed like an ugly situation.

Note that we can very well combine these two runs in a single Jenkins job. Not sure how the Cucumber plugin for Jenkins would report about them, perhaps it would misbehave. But perhaps it’ll misbehave anyway with the new Cucumber feature as well?

> So, I’d like to close this ticket as resolved,

I think we should keep this ticket because it tracks a problem we still need to solve. Tracking actual implementation details is a different matter.

> create a new wait-one for integrating this new cucumber feature with our test suite when it arrives.

If that’s really how we want to solve this problem, then creating a wait subtask sounds like a good way to translate it into Redmine semantics.

#20 Updated by intrigeri 2015-09-15 06:30:38

  • Assignee changed from intrigeri to anonym
  • QA Check deleted (Info Needed)

#22 Updated by anonym 2015-09-15 06:41:01

  • Assignee changed from anonym to bertagaz
  • QA Check set to Info Needed

intrigeri wrote:
> > Note: we’ve always been able to solve this problem with two runs, e.g. cucumber --tags ~known_broken ${REST_OF_ARGS} && cucumber —wip —tags known_broken ${REST_OF_ARGS}. But that’d give us two runs in Jenkins, which seemed like an ugly situation.
>
> Note that we can very well combine these two runs in a single Jenkins job. Not sure how the Cucumber plugin for Jenkins would report about them, perhaps it would misbehave. But perhaps it’ll misbehave anyway with the new Cucumber feature as well?

Ok, I’ll bring him into the mix, then. Bertagaz, what do you think? Could you do some silly test to see what would happen? If you need details, or some example, I could cook something up for you.

> > So, I’d like to close this ticket as resolved,
>
> I think we should keep this ticket because it tracks a problem we still need to solve. Tracking actual implementation details is a different matter.

Ok.

> > create a new wait-one for integrating this new cucumber feature with our test suite when it arrives.
>
> If that’s really how we want to solve this problem, then creating a wait subtask sounds like a good way to translate it into Redmine semantics.

Done.

#25 Updated by bertagaz 2015-09-23 01:26:34

  • Target version changed from Tails_1.6 to Tails_1.7

#26 Updated by anonym 2015-09-26 09:09:52

  • Assignee changed from bertagaz to anonym
  • QA Check deleted (Info Needed)

Until the Cucumber “Suites” feature, we’ll use the “remove broken tests but file tickets to restore them” approach when.

Let’s ignore the approach mentioned in Bug #7233#note-22 (so ignore the “Info Needed”, bert) — I think it will cause pain vs. the Cucumber Jenkins plugin,8539 and make stuff more complex on the Jenkins side in general.

#27 Updated by anonym 2015-10-17 06:31:15

  • Target version changed from Tails_1.7 to Tails_1.8
  • Type of work changed from Research to Wait

#28 Updated by intrigeri 2015-10-20 03:23:21

  • Type of work changed from Wait to Research

> Type of work changed from Research to Wait

This is already encoded in the Feature #10198 subtask, and once Feature #10198 is resolved then this change won’t be automatically reverted. Besides, another subtask (Bug #10281) is WIP on our side, so the parent ticket isn’t only about waiting. I dunno what the correct Type of work should be, but definitely not Wait.

I think that the research here (Bug #7233) has been done, and it has two results:

  • long-term is Feature #10198 and should not be a subtask since we have another good enough option, and we’ve no control over when we can do Feature #10198;
  • short-term is Bug #10281 which is correctly a subtask as it’s good enough to address the problem at hand.

#29 Updated by intrigeri 2015-12-01 09:21:14

  • Status changed from Confirmed to Resolved

Applied in changeset commit:a0b6e251c2e786ee564e0f0492fc139c2f8c7cad.

#30 Updated by intrigeri 2015-12-01 09:23:42

  • Assignee deleted (anonym)
  • QA Check set to Pass

intrigeri wrote:
> * long-term is Feature #10198 and should not be a subtask since we have another good enough option, and we’ve no control over when we can do Feature #10198;

I’ve fixed that.