Bug #8707

The automated test suite does not clean up when it's finished

Added by kytv 2015-01-16 10:44:09 . Updated 2015-05-12 18:44:57 .

Status:
Resolved
Priority:
Elevated
Assignee:
Category:
Test suite
Target version:
Start date:
2015-01-16
Due date:
% Done:

100%

Feature Branch:
bugfix/8707-properly-clean-up-xvfb
Type of work:
Code
Blueprint:

Starter:
0
Affected tool:
Deliverable for:

Description

When the test suite is run a socket is created in /tmp/.X11-unix/X[0-9], but this socket is not cleaned up upon $feature’s completion. This prevents the test suite from starting on the 10th run. Stracing the process shows

wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 13072
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fbcf606a9d0) = 13073
stat("/tmp/.X10-lock", 0x7fffac0e62b0)  = -1 ENOENT (No such file or directory)
stat("/tmp/.X11-unix/X10", 0x7fffac0e62b0) = -1 ENOENT (No such file or directory)
stat("/usr/local/sbin/sleep", 0x7fffac0e6630) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/sleep", 0x7fffac0e6630) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/sleep", 0x7fffac0e6630) = -1 ENOENT (No such file or directory)
stat("/usr/bin/sleep", 0x7fffac0e6630)  = -1 ENOENT (No such file or directory)
stat("/sbin/sleep", 0x7fffac0e6630)     = -1 ENOENT (No such file or directory)
stat("/bin/sleep", {st_mode=S_IFREG|0755, st_size=31136, ...}) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fbcf606a9d0) = 13074
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 13073
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigreturn(0x11)                      = 13073
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 13074
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigreturn(0x11)                      = 13074
stat("/tmp/.X10-lock", 0x7fffac0e62b0)  = -1 ENOENT (No such file or directory)
stat("/tmp/.X11-unix/X10", 0x7fffac0e62b0) = -1 ENOENT (No such file or directory)
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fbcf606a9d0) = 13075
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 13075
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigreturn(0x11)                      = 13075
stat("/tmp/.X10-lock", 0x7fffac0e62b0)  = -1 ENOENT (No such file or directory)
stat("/tmp/.X11-unix/X10", 0x7fffac0e62b0) = -1 ENOENT (No such file or directory)
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fbcf606a9d0) = 13076
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 13076
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigreturn(0x11)                      = 13076
stat("/tmp/.X10-lock", 0x7fffac0e62b0)  = -1 ENOENT (No such file or directory)
stat("/tmp/.X11-unix/X10", 0x7fffac0e62b0) = -1 ENOENT (No such file or directory)
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fbcf606a9d0) = 13077
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 13077
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigreturn(0x11)                      = 13077
stat("/tmp/.X10-lock", 0x7fffac0e62b0)  = -1 ENOENT (No such file or directory)
stat("/tmp/.X11-unix/X10", 0x7fffac0e62b0) = -1 ENOENT (No such file or directory)
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fbcf606a9d0) = 13078
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 13078
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigreturn(0x11)                      = 13078
stat("/tmp/.X10-lock", 0x7fffac0e62b0)  = -1 ENOENT (No such file or directory)
stat("/tmp/.X11-unix/X10", 0x7fffac0e62b0) = -1 ENOENT (No such file or directory)
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fbcf606a9d0) = 13079
# ls -l /tmp/.X11-unix/
total 0
srwxrwxrwx 1 root root 0 Jan 15 17:56 X0
srwxrwxrwx 1 root root 0 Jan 15 20:30 X1
srwxrwxrwx 1 root root 0 Jan 15 20:49 X2
srwxrwxrwx 1 root root 0 Jan 15 21:04 X3
srwxrwxrwx 1 root root 0 Jan 15 21:26 X4
srwxrwxrwx 1 root root 0 Jan 15 23:43 X5
srwxrwxrwx 1 root root 0 Jan 16 00:36 X6
srwxrwxrwx 1 root root 0 Jan 16 02:30 X7
srwxrwxrwx 1 root root 0 Jan 16 08:33 X8
srwxrwxrwx 1 root root 0 Jan 16 10:02 X9

For now, I can work around this by clearing out /tmp/.X11-unix/ in my test VM.


Subtasks


Related issues

Related to Tails - Feature #8947: Use the headless Ruby gem in the automated test suite Rejected 2015-02-24
Related to Tails - Bug #9139: Test suite stalled, lots of virt-viewer defunct processes Resolved 2015-03-31

History

#1 Updated by intrigeri 2015-01-16 14:36:12

  • Status changed from New to Confirmed
  • Priority changed from Normal to Elevated
  • Target version set to Tails_1.4
  • Parent task set to Feature #8539

#2 Updated by intrigeri 2015-01-16 14:36:49

This kind of problems will be problematic once we run the test suite on lizard, hence made it a child ticket of Feature #8539.

#3 Updated by intrigeri 2015-02-06 17:48:22

  • blocks #8538 added

#4 Updated by intrigeri 2015-02-23 10:36:08

anonym, kytv: this one needs an assignee since it’s a child ticket of Feature #8539.

#5 Updated by anonym 2015-02-23 12:00:35

I’ll take it. Interestingly, on my system I do not seem to have this limit of 10.

$ find /tmp/.X11-unix/* | wc -l
146


:S

#6 Updated by intrigeri 2015-02-23 13:04:55

  • Assignee set to anonym

#7 Updated by intrigeri 2015-02-23 13:06:53

> I’ll take it. Interestingly, on my system I do not seem to have this limit of 10.

Would be interesting to find out what you did to bump this limit. Possibly your DM is raising it?

#8 Updated by intrigeri 2015-02-24 15:17:50

  • related to Feature #8947: Use the headless Ruby gem in the automated test suite added

#9 Updated by intrigeri 2015-03-31 12:33:23

  • related to Bug #9139: Test suite stalled, lots of virt-viewer defunct processes added

#10 Updated by kytv 2015-04-12 21:37:45

anonym wrote:
> I’ll take it. Interestingly, on my system I do not seem to have this limit of 10.
> […]
> :S

I also do not have this limitation “on bare metal”, only in the nested VM set-up. I DID, however, run into this on bare metal at the sprint.

#11 Updated by anonym 2015-04-17 14:58:34

  • Status changed from Confirmed to In Progress

Applied in changeset commit:9d1ac96689bef63c6c2e6f29c39862b361ddb43d.

#12 Updated by anonym 2015-04-17 15:01:33

  • Assignee changed from anonym to kytv
  • % Done changed from 0 to 50
  • QA Check set to Ready for QA
  • Feature Branch set to bugfix/8707-properly-clean-up-xvfb

intrigeri wrote:
> > I’ll take it. Interestingly, on my system I do not seem to have this limit of 10.
>
> Would be interesting to find out what you did to bump this limit. Possibly your DM is raising it?

I use kdm, for the record. In my Jessie VM that doesn’t have any DM I have the 10 limit, though. I couldn’t come up with any good search terms for this, so I’m giving up. Proper clean up is the real solution, any wya.

So, to ensure that the Xvfb process is stopped and cleaned up, we issue a trap

    trap "kill -0 ${XVFB_PID} 2>/dev/null && kill -9 ${XVFB_PID}; \
          rm -f /tmp/.X${TARGET_DISPLAY#:}-lock" EXIT


Whoops, kill -9 is generally a bad idea, and in this instance it kills Xvfb without letting it clean up its locks itself. The manual cleanup we do with rm later only looks at /tmp/.X${number}-lock, but a lock is also created in /tmp/.X11-unix/X${number} which then will make us step through display numbers quickly. I think the proper way to do this is simply:

    trap "kill -0 ${XVFB_PID} 2>/dev/null && kill ${XVFB_PID}" EXIT


as implemented in the feature branch. Please review and merge!

#13 Updated by kytv 2015-04-17 17:10:30

  • Assignee changed from kytv to intrigeri

The changes handle this “the right way” and as a result the clean-up takes place, both when it terminates normally and when CTRL+C is used. Perfect.

#14 Updated by intrigeri 2015-04-17 19:12:31

  • Status changed from In Progress to Fix committed
  • % Done changed from 50 to 100

Applied in changeset commit:df3537e10060a717a88eacb52ea4192afec1f915.

#15 Updated by intrigeri 2015-04-17 19:14:21

  • Assignee deleted (intrigeri)
  • QA Check changed from Ready for QA to Pass

Merged into stable, devel, etc. Thanks a lot for fixing that painful bug!

#16 Updated by BitingBird 2015-05-12 18:44:57

  • Status changed from Fix committed to Resolved