Feature #10503
Run erase_memory.feature first to optimize test suite performance
100%
Description
According to Feature #9401#note-4, the test suite’s peak disk usage in /tmp/TailsToaster
should be around 15GB these days. This matters since we mount an ext4 filesystem on that directory, and count on the Linux kernel to do caching and write to disk as little of these GB as possible. When we reach peak usage, the system under test is allocated 8GB of RAM, so to avoid any writes to disk we would need at the very least 15+8GB of RAM on each isotesterN. We “only” give them 20GB each, so inevitably a bunch of GBs have to be written to disk at some point while running erase_memory.feature
(and possibly re-read from disk into cache soon after). This adds I/O load to lizard that we could happily do without (we run a bunch of other services there), and on less powerful systems reading/writing that many GBs can have a significant impact on the test suite’s runtime.
According to Feature #9401#note-4 again, peak disk usage could be lowered to 8GB just by running erase_memory.feature
first, which would magically fix all the aforementioned problems. It seems to be such a low-hanging fruit with such obvious advantages that exceptionally, perhaps we can do premature optimization and just do it. It could be good enough to just ensure that this change indeed has the intended effect on peak disk usage, and doesn’t make things worse (the test suite’s run time should be a good enough metric).
Moreover, for Feature #9264 (deadline = end of 2015) we may want to run the test suite with absolutely everything in RAM, so decreasing max RAM usage for such usecases by 30% could impact the hardware specs quite a bit.
Subtasks
Related issues
Related to Tails - |
Resolved | 2015-12-15 | |
Related to Tails - |
Resolved | 2016-07-21 |
History
#1 Updated by intrigeri 2015-11-06 11:40:20
- blocks #8668 added
#2 Updated by anonym 2015-11-06 12:16:38
- Target version set to Tails_1.8
#3 Updated by intrigeri 2015-12-05 05:44:01
- related to
Feature #9264: Consider buying more server hardware to run our automated test suite added
#4 Updated by anonym 2015-12-16 14:05:58
- Target version changed from Tails_1.8 to Tails_2.0
#5 Updated by intrigeri 2016-01-06 13:57:22
- Target version changed from Tails_2.0 to Tails_2.2
#6 Updated by intrigeri 2016-02-05 14:51:26
FYI we’re in the process of updating/refining the hardware config used to run the test suite via Jenkins, so it would be helpful to reach a conclusion here.
#7 Updated by anonym 2016-02-05 16:17:49
- Status changed from Confirmed to In Progress
Applied in changeset commit:e6aefe9200b585cd10c81676c2b850331a69f2ae.
#8 Updated by anonym 2016-02-05 16:20:22
- Assignee changed from anonym to intrigeri
- % Done changed from 0 to 50
- QA Check set to Ready for QA
- Feature Branch set to test/10503-reorder-feature-execution
I believe the current situation yields the highest possible peak disk usage. The feature branch will, I believe, bring it to the smallest possible peak disk usage. :)
#9 Updated by anonym 2016-02-05 18:28:19
I ran a custom feature that runs all the non-temporary snapshot restoring steps, and it came in at around 7.5 GiB. So, when only considering the obvious offenders of disk space usage (excluding logs, videos, screenshots and such optional artifacts) the peak disk usage is:
max($biggest_ram_dump, $biggest_non_snapshot_disk, $internal_snapshot_disk)
= max(8 GiB, $tails_iso_size, 7.5 GiB)
= 8 GiB.
(Note: for $biggest_non_snapshot_disk = $tails_iso_size
, see the Tails can boot from live systems stored on hard drives
scenario in untrusted_partitions.feature
)
Earlier I think it must have been
8 GiB + $internal_snapshot_disk = 8 + 7.5 GiB = 15.5 GiB,
so this is quite a nice improvement. :)
#10 Updated by anonym 2016-02-05 18:46:19
Ah, actually, usb_install.feature
will both need all the heavy snapshots while also creating some big disks, so the above reasoning is probably wrong.
#11 Updated by intrigeri 2016-02-05 19:25:39
> Ah, actually, usb_install.feature
will both need all the heavy snapshots while also creating some big disks, so the above reasoning is probably wrong.
The reasoning in the ticket description, or the latest one you posted?
#12 Updated by anonym 2016-02-05 19:36:42
anonym wrote:
> Ah, actually, usb_install.feature
will both need all the heavy snapshots while also creating some big disks, so the above reasoning is probably wrong.
Running usb_install.feature
while all other snapshots exist brings us to 8.6 GiB. Pushed commit:f35824c to deal with this.
#13 Updated by anonym 2016-02-05 19:39:18
intrigeri wrote:
> > Ah, actually, usb_install.feature
will both need all the heavy snapshots while also creating some big disks, so the above reasoning is probably wrong.
>
> The reasoning in the ticket description, or the latest one you posted?
The latest one. So much has changed regarding snapshots so the old reasoning isn’t completely valid any more, so I’m mostly disregarding it.
#14 Updated by intrigeri 2016-02-09 18:46:56
- Assignee changed from intrigeri to anonym
- QA Check changed from Ready for QA to Info Needed
Code review passes, but I wonder why we need the if not intersection.empty?
conditional: the 3 statements it guards would be no-op if the condition is wrong, no?
I’ll now run some tests. Do you already have some measurements for the peak disk usage during a complete run? (Don’t bother if you don’t have this info handy.)
#15 Updated by intrigeri 2016-02-11 00:46:09
I see peak disk usage is 12553240 here.
#16 Updated by intrigeri 2016-02-11 00:56:31
intrigeri wrote:
> I see peak disk usage is 12553240 here.
On Bug #10396#note-10 we measured that it was “almost 12G” on lizard early in December. I expected that the reordering would lower this number. Maybe the @fragile tagging on lizard explains why we don’t save much (because I’m comparing a full run today, against a partial run on lizard 2 months ago).
Anyway: regardless of peak disk space usage, just running memory erasure first should decrease the maximum(RAM allocated to the system under testing + temporary disk space we would like to keep in memory cache), which is what this ticket is actually about IIRC. We haven’t measured that yet IIRC, but if I’m happy with this branch and merge it, I’ll measure how much RAM an isotester needs these days to run the test suite without hitting disk. I’ll need this for Feature #11011 anyawy.
#17 Updated by intrigeri 2016-02-11 01:24:03
- % Done changed from 50 to 90
- QA Check changed from Info Needed to Pass
Tests pass. Feel free to merge with, or without the minor style change I suggested above.
#18 Updated by intrigeri 2016-02-11 01:24:21
- Subject changed from Consider running erase_memory.feature first to optimize test suite performance to Run erase_memory.feature first to optimize test suite performance
- Type of work changed from Discuss to Code
#19 Updated by anonym 2016-02-11 21:06:05
- Status changed from In Progress to Fix committed
- % Done changed from 90 to 100
Applied in changeset commit:c701b57d503318168bea5a0bdc62ddd848a6d0df.
#20 Updated by anonym 2016-02-11 21:07:01
- Assignee changed from anonym to intrigeri
- QA Check changed from Pass to Info Needed
intrigeri wrote:
> Code review passes, but I wonder why we need the if not intersection.empty?
conditional: the 3 statements it guards would be no-op if the condition is wrong, no?
That is what I initially did, but I found it a bit clearer and less “clever” (in the bad sense) to do it that way. I think I’ll just leave it as-is.
> I’ll now run some tests. Do you already have some measurements for the peak disk usage during a complete run? (Don’t bother if you don’t have this info handy.)
Yeah, I did a full run which I apparently forgot to post here, sorry. I ran this:
while true; do
stat -c%s "${TMP}"/*.memstate "${TMP}"/TailsToasterStorage/* | \
sum-column >> /tmp/sizes-monitor
sleep 5
done
where sum-column
is just a simple script that does what you’d expect.
So I only monitored the storage and snapshots and I got a peak of 9104.2 MiB, but the other artifacts shouldn’t add up to much (<20 MiB at any point, typically).
> I see peak disk usage is 12553240 here.
I’m surprised you see over 3 GiB more than me. Did you run with --keep-snapshots
or something? Then the temporary snapshots will be kept, and there are five of them which I believe would add up to that number. Since we don’t run the automated test suite with --keep-snapshots
on Jenkins, I don’t think the tests you do should use that, if you did.
> Anyway: regardless of peak disk space usage, just running memory erasure first should decrease the maximum(RAM allocated to the system under testing + temporary disk space we would like to keep in memory cache), which is what this ticket is actually about IIRC. We haven’t measured that yet IIRC, but if I’m happy with this branch and merge it, I’ll measure how much RAM an isotester needs these days to run the test suite without hitting disk. I’ll need this for Feature #11011 anyawy.
ACK, so I merged it into devel so you’re not blocked any more. I’m leaving the ticket as Info needed, though, until you’ve answered my question above (I’d like to open another ticket to investigate this if you did not run with --keep-snapshots
). Please close this ticket properly once you’ve answered!
#21 Updated by intrigeri 2016-02-13 01:37:40
> That is what I initially did, but I found it a bit clearer and less “clever” (in the bad sense) to do it that way. I think I’ll just leave it as-is.
OK.
> Yeah, I did a full run which I apparently forgot to post here, sorry. I ran this:
> […]
> So I only monitored the storage and snapshots and I got a peak of 9104.2 MiB, but the other artifacts shouldn’t add up to much (<20 MiB at any point, typically).
>> I see peak disk usage is 12553240 here.
> I’m surprised you see over 3 GiB more than me.
So am I :)
> Did you run with --keep-snapshots
or something?
I didn’t. But I had a test.feature
lying around, not sure what was in there, nor if it matters.
I’ll re-run in a clean environment (because again, I need these results for Feature #11011).
> Please close this ticket properly once you’ve answered!
It’s fix committed and will stay this way until 2.2 is out so I think I should not close it.
Anyway, I’ll follow-up on Feature #11011 or Feature #11113 about it.
#22 Updated by intrigeri 2016-02-14 13:29:35
- Assignee deleted (
intrigeri) - QA Check changed from Info Needed to Pass
#23 Updated by anonym 2016-03-08 19:03:16
- Status changed from Fix committed to Resolved
#24 Updated by intrigeri 2016-07-21 02:57:11
- related to
Bug #11582: Some upgrade test scenarios fail due to lack of disk space on Jenkins added