Bug #17551
Test suite has defunct processes after sikuli removal
100%
Description
After merging Bug #15460 we see defunct processes like:
[python3] <defunct>
[import] <defunct>
They are cleaned up when the test suite exits, but this is ugly and could be a memory leak.
Subtasks
Related issues
Related to Tails - |
Resolved | ||
Related to Tails - |
Resolved | 2017-11-10 | |
Related to Tails - Bug #17216: Make the test suite clean up after itself even in most tricky failure modes | Confirmed | ||
Blocks Tails - Feature #16209: Core work: Foundations Team | Confirmed |
History
#1 Updated by anonym 2020-03-25 14:25:42
I think I get it. See this:
def screenshot(target)
FileUtils.rm_f(target)
p = IO.popen(['import', '-quality', '100%', '-window', 'root', target])
p.close
assert($?.success?)
assert(File.exists?(target))
end
I think what happens is that we call this method inside a try_for()
and sometimes the timeout makes us abort while the IO.popen
call is running, so we never run p.close
. That explains the [import] <defunct>
, and the [python3] <defunct>
seems to be from when we call the OpenCV python script, where the same thing can happen.
#2 Updated by anonym 2020-03-25 14:37:08
- Status changed from Confirmed to In Progress
Applied in changeset commit:tails|5740245e36c27e83fcd6a733b8ea649e934acb7b.
#3 Updated by anonym 2020-03-25 14:47:08
- % Done changed from 0 to 20
- Feature Branch set to test/17551-prevent-defunct-processes
This solution works for me, but I’d like to see it run a lot on Jenkins first.
To avoid the code duplication I’ll probably make a safe_popen()
that does this clean up for us. Might rewrite the history for that.
@intrigeri, feel free to test if this fixes the issue for you as well!
#4 Updated by intrigeri 2020-03-26 08:43:15
- Parent task set to
Bug #15460
#5 Updated by intrigeri 2020-04-03 08:10:18
- blocks Feature #16209: Core work: Foundations Team added
#6 Updated by intrigeri 2020-04-04 06:54:04
- Status changed from In Progress to Needs Validation
- Assignee changed from anonym to intrigeri
Given we’re now Saturday, it seems that our last chance to get this fixed in 4.5 is that someone takes the last validation steps over from anonym now. I’m on it!
#7 Updated by intrigeri 2020-04-05 05:06:31
- Status changed from Needs Validation to In Progress
- Assignee changed from intrigeri to anonym
Running the test suite from this branch on my sid system, I see:
root 2275048 0.0 0.0 0 0 pts/5 Z+ 09:50 0:00 [import] <defunct>
root 2278793 0.0 0.0 0 0 pts/5 Z+ 09:53 0:00 [import] <defunct>
root 2334461 0.0 0.0 0 0 pts/5 Z+ 11:06 0:00 [import] <defunct>
root 2359364 0.0 0.0 0 0 pts/5 Z+ 11:33 0:00 [import] <defunct>
root 2359432 0.0 0.0 0 0 pts/5 Z+ 11:33 0:00 [import] <defunct>
root 2362910 0.0 0.0 0 0 pts/5 Z+ 11:35 0:00 [import] <defunct>
root 2362975 0.0 0.0 0 0 pts/5 Z+ 11:36 0:00 [import] <defunct>
root 2363295 0.0 0.0 0 0 pts/5 Z+ 11:36 0:00 [import] <defunct>
root 2368181 0.0 0.0 0 0 pts/5 Z+ 11:40 0:00 [import] <defunct>
root 2368247 0.0 0.0 0 0 pts/5 Z+ 11:40 0:00 [import] <defunct>
root 2412542 0.0 0.0 0 0 pts/5 Z+ 12:29 0:00 [import] <defunct>
root 2414777 0.0 0.0 0 0 pts/5 Z+ 12:32 0:00 [import] <defunct>
root 2416469 0.0 0.0 0 0 pts/5 Z+ 12:33 0:00 [import] <defunct>
root 2423381 0.0 0.0 0 0 pts/5 Z+ 12:39 0:00 [import] <defunct>
root 2423479 0.0 0.0 0 0 pts/5 Z+ 12:39 0:00 [import] <defunct>
root 2423572 0.0 0.0 0 0 pts/5 Z+ 12:39 0:00 [import] <defunct>
root 2424010 0.0 0.0 0 0 pts/5 Z+ 12:39 0:00 [import] <defunct>
root 2451345 0.1 0.0 0 0 pts/5 Z+ 13:32 0:00 [ruby] <defunct>
So:
- I’d rather not merge changes that try, and fail, to fix the
import
problem: that would lead to misleading code and make things harder to reason about. - I would be happy to merge the change that fixes the
python3
problem, unless theimport
problem remaining makes you suspect that what I got today is merely anecdotal evidence, and thepython3
problem is still there potentially. - I don’t know what to think about the
ruby
process. It can’t say for sure that it’s a regression.
#8 Updated by anonym 2020-04-07 15:09:23
I’m wondering if I did something wrong in my verification when I concluded that “[t]his solution works for me”, because now I see the same result as you (python fixed, import still there, ruby as a possible regression).
I think the next step is to create a reliable reproducer, both to verify the hypothesis of why it happens, and to have something to test potential solutions against. Actually, making the test suite able to detect these zombies itself would be nice!
#9 Updated by CyrilBrulebois 2020-04-07 17:05:11
- Target version changed from Tails_4.5 to Tails_4.6
#10 Updated by intrigeri 2020-04-08 05:59:10
> Actually, making the test suite able to detect these zombies itself would be nice!
See Bug #17216 and in particular Bug #11295#note-38 :)
#11 Updated by intrigeri 2020-04-12 13:19:33
- Priority changed from Normal to Elevated
- Parent task deleted (
)Bug #15460
Unparenting so we can close Bug #15460.
Bumping priority as this is a possible regression introduced in 4.5.
#12 Updated by intrigeri 2020-04-12 13:19:59
- related to
Bug #15460: Test suite broken with Java 9+ so we need to replace Sikuli added
#13 Updated by anonym 2020-04-14 10:57:42
- % Done changed from 20 to 40
Woo, last three runs (since commit:b29c07b9952abde2a63ef5e9dd0af3987a55a073) on Jenkins looks good! The test suite cannot detect that it created any zombies. Just gonna refactor away some repeated code…
#14 Updated by anonym 2020-04-14 13:42:59
- % Done changed from 40 to 50
Done! So just one more run on Jenkins…
#15 Updated by anonym 2020-04-15 08:41:12
- Status changed from In Progress to Needs Validation
- Assignee deleted (
anonym)
anonym wrote:
> Done! So just one more run on Jenkins…
The two runs on commit:643c6ac21e1b8501b32301840cf3f4e5b5771e47 looks good! I’ve reverted the debugging stuff and pushed — please review and merge!
#16 Updated by intrigeri 2020-04-15 10:23:33
- Assignee set to intrigeri
#17 Updated by intrigeri 2020-04-15 10:30:40
- related to
Bug #14948: zombie processes when running the test suite added
#18 Updated by intrigeri 2020-04-15 10:31:31
- related to Bug #17216: Make the test suite clean up after itself even in most tricky failure modes added
#19 Updated by intrigeri 2020-04-15 10:45:48
A quick code review passes. Now building 4.5 so I can run the test suite locally against it, with instrumentation in place to spot defunct processes.
#20 Updated by intrigeri 2020-04-16 05:50:48
- Status changed from Needs Validation to Resolved
- % Done changed from 50 to 100
Applied in changeset commit:tails|cf9e208256acc3db931c51cd4df1066452c856c9.