Bug #9705
Update the memory erasure automated tests for Jessie
100%
Description
I see two of its scenarios fail due to insufficient patterns coverage after filling RAM with a known pattern. We should first check that we correctly measure this (e.g. I’ve fixed some of it with commit:505297c already, and there may be more to update). If we don’t, then fix it. If we do, then assess whether it’s expected that on Jessie, less patterns are found.
Subtasks
Related issues
Related to Tails - |
Resolved | 2015-11-05 | |
Blocked by Tails - |
Resolved | 2014-11-26 |
History
#1 Updated by intrigeri 2015-07-08 04:07:29
- Parent task set to
Feature #7563
#2 Updated by intrigeri 2015-07-08 04:09:57
- Tracker changed from Feature to Bug
#3 Updated by intrigeri 2015-08-02 09:06:17
- Due date set to 2016-01-15
#4 Updated by intrigeri 2015-08-02 09:06:35
- blocks #8668 added
#5 Updated by intrigeri 2015-08-26 05:55:44
- Deliverable for set to 268
#6 Updated by anonym 2015-10-20 09:07:59
- Status changed from Confirmed to In Progress
Applied in changeset commit:e901c7f92f605d38de3deea8b126ae7721d541f1.
#7 Updated by anonym 2015-10-20 09:08:38
- Status changed from In Progress to Confirmed
- Assignee deleted (
anonym) - % Done changed from 0 to 50
- QA Check set to Ready for QA
- Feature Branch set to feature/jessie
intrigeri wrote:
> I see two of its scenarios fail due to insufficient patterns coverage after filling RAM with a known pattern.
For context, I get:
Scenario: Anti-test: no memory erasure on an old computer
[...]
Then I find many patterns in the guest's memory
Pattern coverage: 59.701% (2118 MiB)
59.701% of the memory is filled with the pattern, but more than 70.000% was expected.
and
Scenario: Memory erasure on an old computer
[...]
When I fill the guest's memory with a known pattern
Pattern coverage: 60.590% (2149 MiB)
60.590% of the memory is filled with the pattern, but more than 65.593% was expected.
My guess is that we get this simply because Gnome etc. in Jessie requires more RAM than in Wheezy, and our calculations fail to take this into account, despite attempting to use ratios.
> We should first check that we correctly measure this (e.g. I’ve fixed some of it with commit:505297c already, and there may be more to update).
Indeed, what we do looks crazy…
> If we don’t, then fix it. If we do, then assess whether it’s expected that on Jessie, less patterns are found.
I think it is a combination of both. I’ve pushed commit commit:e901c7f which I believe makes a lot more sense than what we did previously, on multiple fronts. I hope the added comments and commit message speaks for itself.
That commit probably would make sense in our wheezy-based branches too, but there these tests work adequately so perhaps we can live with it. Otherwise I guess we can cherry-pick this commit. Thoughts?
#8 Updated by anonym 2015-10-20 09:09:00
- Status changed from Confirmed to In Progress
- Assignee set to intrigeri
#9 Updated by intrigeri 2015-11-02 02:07:08
- Assignee changed from intrigeri to anonym
- % Done changed from 50 to 70
- QA Check changed from Ready for QA to Dev Needed
anonym wrote:
> I think it is a combination of both. I’ve pushed commit commit:e901c7f which I believe makes a lot more sense than what we did previously, on multiple fronts. I hope the added comments and commit message speaks for itself.
Excellent work! I’m going to nitpick on a lot of things (because that’s a complex part of the test suite, and it’d better be extra clear) but I like the general idea a lot. I’ve run this test suite feature once, and seen it pass.
The reasoning for setting oom_kill_allocating_task
to 0 should apply to ocal-includes/usr/share/initramfs-tools/scripts/init-premount/sdmem
(where we set it to 1) as well, no? Same question for setting /proc/sys/vm/overcommit_memory
to 0 (we set it to 1 in the initramfs hook).
In passing, oom_score_adj
is inherited:
# echo $$
29977
# echo -1000 > /proc/$$/oom_score_adj
# cat /proc/$$/oom_score_adj
-1000
# bash
# echo $$
30075
# cat /proc/$$/oom_score_adj
-1000
… so there’s a race between the time we start the fillram process (they inherit –1000 from the remote shell) and the time we set their oom_score_adj
to 1000, during which the kernel will prefer killing anything but the fillram processes, no? Assuming there’s no way to “exec that command with that oom_score_adj
”, perhaps fillram should wait until its oom_score_adj
is 1000, before it starts actually filling memory?
Code style: your love of state (variables ;) shows again with kernel_mem_settings
. Generally I would look elsewhere, but I particularly dislike that one, not only because I really don’t see the point (I’m used to deal with it, and this time it’s debatable, kind of), but also because the variable name doesn’t express correctly what it is actually storing (a list of command lines). So if you feel the need for a variable, keep it, but please rename it :)
Did you try lowering the amount of memory reserved for the kernel (64MB) and for admin tasks (128MB)? I’m just wondering if perhaps you set arbitrarily high values when starting working on it, and then forgot to come back to it later.
The commit message reads:
Now we do what we intended more explicitly, that is: measure the
pattern coverage in the free RAM at the time we start filling it.
… while the commit does:
- coverage = patterns_b.to_f/convert_to_bytes(@detected_ram_m.to_f, 'MiB')
- puts "Pattern coverage: #{"%.3f" % (coverage*100)}% (#{patterns_m} MiB)"
+ coverage = patterns_b.to_f/@free_mem_after_fill_b
+ puts "Pattern coverage: #{"%.3f" % (coverage*100)}% (#{patterns_m} MiB " +
+ "out of #{free_mem_after_fill_m} MiB initial free memory)"
At first glance it felt like they’re totally contradicting each other (and the new displayed message doesn’t reflect reality). A closer look at the code showed me that they are actually not contradicting each other, but rather free_mem_after_fill_m
is very badly named. Or did I get it wrong?
Shall we change the “Memory fill progress” string? It’s the percentage of used RAM, rather than the percentage of filling we plan to do, right?
> That commit probably would make sense in our wheezy-based branches too, but there these tests work adequately so perhaps we can live with it.
Let’s focus on Tails 2.0 and not waste time backporting stuff without a good reason.
#10 Updated by anonym 2015-11-02 08:02:40
- Assignee changed from anonym to intrigeri
- QA Check changed from Dev Needed to Ready for QA
intrigeri wrote:
> anonym wrote:
> > I think it is a combination of both. I’ve pushed commit commit:e901c7f which I believe makes a lot more sense than what we did previously, on multiple fronts. I hope the added comments and commit message speaks for itself.
>
> Excellent work! I’m going to nitpick on a lot of things (because that’s a complex part of the test suite, and it’d better be extra clear) but I like the general idea a lot. I’ve run this test suite feature once, and seen it pass.
Thanks!
> The reasoning for setting oom_kill_allocating_task
to 0 should apply to ocal-includes/usr/share/initramfs-tools/scripts/init-premount/sdmem
(where we set it to 1) as well, no? Same question for setting /proc/sys/vm/overcommit_memory
to 0 (we set it to 1 in the initramfs hook).
Sure but note that all the kernel settings we do there are only for the memory filling part, to prevent the VM from freezing while doing it, essentially. All settings we do here are lost when we kexec
into the memwipe kernel, after all. What you ask for are unrelated bugfixes in Tails.
> In passing, oom_score_adj
is inherited:
>
> […]
>
> … so there’s a race between the time we start the fillram process (they inherit –1000 from the remote shell) and the time we set their oom_score_adj
to 1000, during which the kernel will prefer killing anything but the fillram processes, no? Assuming there’s no way to “exec that command with that oom_score_adj
”, perhaps fillram should wait until its oom_score_adj
is 1000, before it starts actually filling memory?
Indeed. While yours is a clever solution, I think the way I’ve now implemented this in commit:be2d54f is clearer and more local.
> Code style: your love of state (variables ;) shows again with kernel_mem_settings
.
The state is irrelevant. It’s just an attempt to write easier-to-read code. I greatly prefer introducing variables with a good name over a comment explaining some magic value or how some long chain of functions transform to something usable. It’s also a good way to split up code in easier to read blocks, and so it’s easier to follow and understand intermediary steps. Feel free to present arguments against this practice, because I intend to keep doing it unless otherwise convinced. :)
> Generally I would look elsewhere, but I particularly dislike that one, not only because I really don’t see the point (I’m used to deal with it, and this time it’s debatable, kind of), but also because the variable name doesn’t express correctly what it is actually storing (a list of command lines). So if you feel the need for a variable, keep it, but please rename it :)
I agree that the name is poor in the state you reviewed this. Background: I introduced that variable while experimenting and I originally had a set of default options that I altered a bit when the 64-bit kernel was used, hence the need for the variable to avoid duplication. It turned out that the kernel bitness didn’t matter in the end. Furthermore, I had switched to key-value pairs for sysctl
, and then the name made a lot more sense. The situation got a bit confusing after I squashed the history when my experiments were done, apparently. I’ve fixed this in commit:e987ef9.
Just to explain a bit why I think introducing explanatory variables is a good thing, let’s look at how this would have worked without that variable:
[
[key0, val0],
...
[keyn, valn],
].each { |key, val| $vm.execute_successfully("sysctl #{key}=#{val}") }
Since I read from top to down, when I encounter this list, I don’t know what it is. I have to jump to the end, where I see what we do with it (sysctl
key-value pairs) and then deduce what the list is from what happens in the body (and possibly from the names of the temporary loop variables). A comment could fix it, but context switching from Ruby to English and then back to Ruby just feels unnecessary, and it’s not very DRY.
Furthermore, Ruby with it’s idiomatic list.each do |element|...
convention for loops makes this extra apparent, since in most other languages we’d have for element in list
which would help to explain this at the top. In fact, in this case we would then probably be more explanatory like this: for memory_setting, value in ...
or something, which makes it clear what the list that follows will be about. But what actually happened here? Yeah, we were forced to introduce variables for the for
-loop, and of course we picked good names. That made any comment about what the list is unnecessary.
Another example from the same .rb
file:
instances = (@detected_ram_m.to_f/(2**10)).ceil
instances.times do [...]
IMHO this is a lot easier to understand than
(@detected_ram_m.to_f/(2**10)).ceil.times do [...]
although nr_instances
admittedly is a better name (commit:44545b2). If we try to save the first approach with a comment like “Run one instance per GiB of code” just to make it clear what the code does, then we break the DRY principle, and we also have a clear indication that the code is not very readable. One shouldn’t have to add a comment stating what some code does, only why it’s done, like the existing comment for the above (which incidentally contains that sentence, but only as a part of the explanation why).
Perhaps I have just misunderstood you completely, because I find your criticism of my variable usage quite strange. (?)
> Did you try lowering the amount of memory reserved for the kernel (64MB) and for admin tasks (128MB)? I’m just wondering if perhaps you set arbitrarily high values when starting working on it, and then forgot to come back to it later.
Yes, these values were picked after hours of testing different combinations. Sorry, I thought I mentioned this in the commit message.
> At first glance it felt like they’re totally contradicting each other (and the new displayed message doesn’t reflect reality). A closer look at the code showed me that they are actually not contradicting each other, but rather free_mem_after_fill_m
is very badly named. Or did I get it wrong?
Wow, mind fart! Fixed in commit:082ea26.
> Shall we change the “Memory fill progress” string? It’s the percentage of used RAM, rather than the percentage of filling we plan to do, right?
Well, the goal of the memory fill, and indeed what we aim at, is that all RAM should be filled, right?
> > That commit probably would make sense in our wheezy-based branches too, but there these tests work adequately so perhaps we can live with it.
>
> Let’s focus on Tails 2.0 and not waste time backporting stuff without a good reason.
Exactly.
#11 Updated by intrigeri 2015-11-05 02:34:33
- blocked by
Bug #8317: fillram isn't effective for filling the memory in Jessie added
#12 Updated by intrigeri 2015-11-05 03:29:57
- related to
Bug #10487: Improve VM and OOM settings for erasing memory added
#13 Updated by intrigeri 2015-11-05 03:47:46
- Status changed from In Progress to Resolved
- QA Check changed from Ready for QA to Pass
(Ouch, it seems I managed to annoy/stress you quite a bit on this one. Sorry.)
>> The reasoning for setting oom_kill_allocating_task
to 0 should apply to ocal-includes/usr/share/initramfs-tools/scripts/init-premount/sdmem
(where we set it to 1) as well, no? Same question for setting /proc/sys/vm/overcommit_memory
to 0 (we set it to 1 in the initramfs hook).
> Sure
Thanks for confirming, this allowed me to file Bug #10487.
> but note that all the kernel settings we do there are only for the memory filling part, to prevent the VM from freezing while doing it, essentially. All settings we do here are lost when we kexec
into the memwipe kernel, after all.
Sure, I got this part :)
> What you ask for are unrelated bugfixes in Tails.
To be clear, I did not mean to suggest that you should fix these bugs yourself/now: I merely wanted to know if indeed they were bugs in your opinion. Sorry for being unclear and raising your stress levels needlessly :/
> Indeed. While yours is a clever solution, I think the way I’ve now implemented this in commit:be2d54f is clearer and more local.
Excellent, I like it!
>> Code style: your love of state (variables ;) shows again with kernel_mem_settings
.
> The state is irrelevant. It’s just an attempt to write easier-to-read code. I greatly prefer introducing variables with a good name over a comment explaining some magic value or how some long chain of functions transform to something usable. It’s also a good way to split up code in easier to read blocks, and so it’s easier to follow and understand intermediary steps. Feel free to present arguments against this practice, because I intend to keep doing it unless otherwise convinced. :)
ACK wrt. naming things. But that’s a matter of style, and I shouldn’t have started this debate here (let’s keep it for face-to-face pair programming or similar), and should have instead stuck to the narrower buggy variable name issue.
> The situation got a bit confusing after I squashed the history when my experiments were done, apparently. I’ve fixed this in commit:e987ef9.
Excellent, thanks :)
> Just to explain a bit why I think introducing explanatory variables is a good thing, […]
Thanks for the explanation. I agree all these problems need solutions. I just happen to disagree about the single-use-variable solution, but again that’s a matter of taste — FWIW, for such a usecase (that is a variable that’s used once), replacing it with a function would address my concerns about state / procedural code, and (I think) solve the same problems as you’re introducing variables for. Now, I’m not saying that my own code is as nice (to my own taste) as I’m suggesting yours could/should be, and I’m definitely not authoritative in this area.
>> Shall we change the “Memory fill progress” string? It’s the percentage of used RAM, rather than the percentage of filling we plan to do, right?
> Well, the goal of the memory fill, and indeed what we aim at, is that all RAM should be filled, right?
Fair enough.
#14 Updated by intrigeri 2015-11-05 03:48:34
- Assignee deleted (
intrigeri) - % Done changed from 70 to 100