Bug #17551

Test suite has defunct processes after sikuli removal

Added by anonym 2020-03-25 13:47:26 . Updated 2020-04-16 05:50:48 .

Status:
Resolved
Priority:
Elevated
Assignee:
intrigeri
Category:
Test suite
Target version:
Start date:
Due date:
% Done:

100%

Feature Branch:
test/17551-prevent-defunct-processes
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

After merging Bug #15460 we see defunct processes like:

[python3] <defunct>
[import] <defunct>


They are cleaned up when the test suite exits, but this is ugly and could be a memory leak.


Subtasks


Related issues

Related to Tails - Bug #15460: Test suite broken with Java 9+ so we need to replace Sikuli Resolved
Related to Tails - Bug #14948: zombie processes when running the test suite Resolved 2017-11-10
Related to Tails - Bug #17216: Make the test suite clean up after itself even in most tricky failure modes Confirmed
Blocks Tails - Feature #16209: Core work: Foundations Team Confirmed

History

#1 Updated by anonym 2020-03-25 14:25:42

I think I get it. See this:

  def screenshot(target)
    FileUtils.rm_f(target)
    p = IO.popen(['import', '-quality', '100%', '-window', 'root', target])
    p.close
    assert($?.success?)
    assert(File.exists?(target))
  end


I think what happens is that we call this method inside a try_for() and sometimes the timeout makes us abort while the IO.popen call is running, so we never run p.close. That explains the [import] <defunct>, and the [python3] <defunct> seems to be from when we call the OpenCV python script, where the same thing can happen.

#2 Updated by anonym 2020-03-25 14:37:08

  • Status changed from Confirmed to In Progress

Applied in changeset commit:tails|5740245e36c27e83fcd6a733b8ea649e934acb7b.

#3 Updated by anonym 2020-03-25 14:47:08

  • % Done changed from 0 to 20
  • Feature Branch set to test/17551-prevent-defunct-processes

This solution works for me, but I’d like to see it run a lot on Jenkins first.

To avoid the code duplication I’ll probably make a safe_popen() that does this clean up for us. Might rewrite the history for that.

@intrigeri, feel free to test if this fixes the issue for you as well!

#4 Updated by intrigeri 2020-03-26 08:43:15

#5 Updated by intrigeri 2020-04-03 08:10:18

#6 Updated by intrigeri 2020-04-04 06:54:04

  • Status changed from In Progress to Needs Validation
  • Assignee changed from anonym to intrigeri

Given we’re now Saturday, it seems that our last chance to get this fixed in 4.5 is that someone takes the last validation steps over from anonym now. I’m on it!

#7 Updated by intrigeri 2020-04-05 05:06:31

  • Status changed from Needs Validation to In Progress
  • Assignee changed from intrigeri to anonym

Running the test suite from this branch on my sid system, I see:

root     2275048  0.0  0.0      0     0 pts/5    Z+   09:50   0:00 [import] <defunct>
root     2278793  0.0  0.0      0     0 pts/5    Z+   09:53   0:00 [import] <defunct>
root     2334461  0.0  0.0      0     0 pts/5    Z+   11:06   0:00 [import] <defunct>
root     2359364  0.0  0.0      0     0 pts/5    Z+   11:33   0:00 [import] <defunct>
root     2359432  0.0  0.0      0     0 pts/5    Z+   11:33   0:00 [import] <defunct>
root     2362910  0.0  0.0      0     0 pts/5    Z+   11:35   0:00 [import] <defunct>
root     2362975  0.0  0.0      0     0 pts/5    Z+   11:36   0:00 [import] <defunct>
root     2363295  0.0  0.0      0     0 pts/5    Z+   11:36   0:00 [import] <defunct>
root     2368181  0.0  0.0      0     0 pts/5    Z+   11:40   0:00 [import] <defunct>
root     2368247  0.0  0.0      0     0 pts/5    Z+   11:40   0:00 [import] <defunct>
root     2412542  0.0  0.0      0     0 pts/5    Z+   12:29   0:00 [import] <defunct>
root     2414777  0.0  0.0      0     0 pts/5    Z+   12:32   0:00 [import] <defunct>
root     2416469  0.0  0.0      0     0 pts/5    Z+   12:33   0:00 [import] <defunct>
root     2423381  0.0  0.0      0     0 pts/5    Z+   12:39   0:00 [import] <defunct>
root     2423479  0.0  0.0      0     0 pts/5    Z+   12:39   0:00 [import] <defunct>
root     2423572  0.0  0.0      0     0 pts/5    Z+   12:39   0:00 [import] <defunct>
root     2424010  0.0  0.0      0     0 pts/5    Z+   12:39   0:00 [import] <defunct>
root     2451345  0.1  0.0      0     0 pts/5    Z+   13:32   0:00 [ruby] <defunct>

So:

  • I’d rather not merge changes that try, and fail, to fix the import problem: that would lead to misleading code and make things harder to reason about.
  • I would be happy to merge the change that fixes the python3 problem, unless the import problem remaining makes you suspect that what I got today is merely anecdotal evidence, and the python3 problem is still there potentially.
  • I don’t know what to think about the ruby process. It can’t say for sure that it’s a regression.

#8 Updated by anonym 2020-04-07 15:09:23

I’m wondering if I did something wrong in my verification when I concluded that “[t]his solution works for me”, because now I see the same result as you (python fixed, import still there, ruby as a possible regression).

I think the next step is to create a reliable reproducer, both to verify the hypothesis of why it happens, and to have something to test potential solutions against. Actually, making the test suite able to detect these zombies itself would be nice!

#9 Updated by CyrilBrulebois 2020-04-07 17:05:11

  • Target version changed from Tails_4.5 to Tails_4.6

#10 Updated by intrigeri 2020-04-08 05:59:10

> Actually, making the test suite able to detect these zombies itself would be nice!

See Bug #17216 and in particular Bug #11295#note-38 :)

#11 Updated by intrigeri 2020-04-12 13:19:33

  • Priority changed from Normal to Elevated
  • Parent task deleted (Bug #15460)

Unparenting so we can close Bug #15460.
Bumping priority as this is a possible regression introduced in 4.5.

#12 Updated by intrigeri 2020-04-12 13:19:59

  • related to Bug #15460: Test suite broken with Java 9+ so we need to replace Sikuli added

#13 Updated by anonym 2020-04-14 10:57:42

  • % Done changed from 20 to 40

Woo, last three runs (since commit:b29c07b9952abde2a63ef5e9dd0af3987a55a073) on Jenkins looks good! The test suite cannot detect that it created any zombies. Just gonna refactor away some repeated code…

#14 Updated by anonym 2020-04-14 13:42:59

  • % Done changed from 40 to 50

Done! So just one more run on Jenkins…

#15 Updated by anonym 2020-04-15 08:41:12

  • Status changed from In Progress to Needs Validation
  • Assignee deleted (anonym)

anonym wrote:
> Done! So just one more run on Jenkins…

The two runs on commit:643c6ac21e1b8501b32301840cf3f4e5b5771e47 looks good! I’ve reverted the debugging stuff and pushed — please review and merge!

#16 Updated by intrigeri 2020-04-15 10:23:33

  • Assignee set to intrigeri

#17 Updated by intrigeri 2020-04-15 10:30:40

  • related to Bug #14948: zombie processes when running the test suite added

#18 Updated by intrigeri 2020-04-15 10:31:31

  • related to Bug #17216: Make the test suite clean up after itself even in most tricky failure modes added

#19 Updated by intrigeri 2020-04-15 10:45:48

A quick code review passes. Now building 4.5 so I can run the test suite locally against it, with instrumentation in place to spot defunct processes.

#20 Updated by intrigeri 2020-04-16 05:50:48

  • Status changed from Needs Validation to Resolved
  • % Done changed from 50 to 100

Applied in changeset commit:tails|cf9e208256acc3db931c51cd4df1066452c856c9.