Feature #9264

Consider buying more server hardware to run our automated test suite

Added by intrigeri 2015-04-19 16:09:15 . Updated 2016-01-27 17:38:42 .

Status:
Resolved
Priority:
High
Assignee:
Category:
Infrastructure
Target version:
Start date:
2015-12-15
Due date:
% Done:

100%

Feature Branch:
Type of work:
Research
Starter:
Affected tool:
Deliverable for:

Description

Once:

  • we have actual data wrt. how lizard v2 copes with running our automated test suite
  • some test suite optimizations are done, that will speed it up
  • some more tests are written, that slow down the test suite

… then we’ll have a better idea whether lizard v2 can cope with its duties.

It’ll then be time to consider buying more server hardware (e.g. a dedicated one to run the test suite with only one level of virtualization) for faster feedback to developers from the test suite runs.


Subtasks

Feature #10764: Check if we can host another machine at SeaCCP Resolved

100

Feature #10971: Try giving 6 vcpus to each isotester on lizard Resolved

100

Feature #10996: Try running more isotester:s on lizard Resolved

100


Related issues

Related to Tails - Bug #10396: Sort out overallocated storage situation on isotesterN.lizard Resolved 2015-10-20
Related to Tails - Feature #7631: Get a server able to run our automated test suite Resolved 2015-01-01
Related to Tails - Feature #10503: Run erase_memory.feature first to optimize test suite performance Resolved 2015-11-06
Related to Tails - Bug #9157: ISO testers (level-1) VMs crash when running the test suite with Jessie's kernel Resolved 2015-04-04
Related to Tails - Feature #10851: Give lizard enough free storage to host our freezable APT repository Resolved 2016-01-04
Related to Tails - Bug #10999: Parallelize our ISO building workload on more builders Resolved 2016-01-26
Related to Tails - Feature #11009: Improve ISO building and testing throughput and latency Resolved 2016-01-26

History

#1 Updated by intrigeri 2015-04-19 16:09:46

  • related to Feature #7631: Get a server able to run our automated test suite added

#2 Updated by intrigeri 2015-04-19 16:10:25

  • blocks #8538 added

#4 Updated by intrigeri 2015-06-11 14:00:08

  • related to deleted (Feature #7631: Get a server able to run our automated test suite)

#5 Updated by intrigeri 2015-06-11 14:00:13

  • blocked by Feature #5288: Run the test suite automatically on autobuilt ISOs added

#6 Updated by intrigeri 2015-09-02 10:04:54

  • Target version changed from Tails_1.6 to Tails_1.7

Late October should be early enough to make a decision to purchase that stuff by the end of the year, and we’ll have actual data regarding how it goes on lizard + with the new VM snapshot great stuff.

#7 Updated by intrigeri 2015-10-05 13:35:13

  • Priority changed from Normal to Elevated

#8 Updated by intrigeri 2015-10-20 11:10:40

  • related to Bug #10396: Sort out overallocated storage situation on isotesterN.lizard added

#10 Updated by intrigeri 2015-10-21 00:52:07

  • Assignee changed from intrigeri to bertagaz
  • QA Check set to Info Needed

#12 Updated by intrigeri 2015-11-02 04:20:38

  • Assignee changed from bertagaz to intrigeri
  • Target version changed from Tails_1.7 to Tails_1.8
  • QA Check deleted (Info Needed)

#14 Updated by intrigeri 2015-11-06 02:44:09

  • Assignee changed from intrigeri to bertagaz
  • QA Check set to Info Needed

bertagaz, can we easily get usage and performance data about the test suite in Jenkins? What would be useful to make sensible decisions here includes data, over some period of time, about:

  • number of runs; the ones that were aborted fast (e.g. the ones with the Tao commit) should be counted separately: on the one hand these are runs we would have liked to do, OTOH they didn’t impact resource usage
  • waiting time in queue: mean and median would be a good start, but once we have the raw data we could as well do finer statistics such as “X% of the test suite runs didn’t have to wait more than 30 minutes, Y% had to wait between 30 and 90 minutes, etc.”
  • isotesterN usage, that is percentage of time during which they’re busy/idling

… and then we’ll see if we have a problem to solve, and what exactly it is.

I’ve looked at what we have already in the web interface. https://jenkins.tails.boum.org/load-statistics?type=hour is about all slaves, aggregated together, and doesn’t seem to give stats about isotesterN only. Looks like https://jenkins.tails.boum.org/plugin/global-build-stats/ could help answer some of these questions perhaps.

Perhaps we are storing some raw data that these web interfaces don’t give me access to, but it might also be that we’re not be gathering the raw data needed to extract these stats when we’ll be back to it in a week or so, so I’m putting this on your radar so that you can quickly, if possible, set up whatever is needed so that we have at least some of the data we need in a week or so.

Ideally, we would find a way to be able to answer such questions easily in the future (as opposed to manually collecting info from web pages and computing stats on them in a non-automated way), but well, no need to overengineer the first attempt at data gathering and stats computing :)

#15 Updated by intrigeri 2015-11-06 02:53:54

We already have some per-slave idle time graphs (but no actual stats):

For the other data we need:

=> perhaps installing the Cluster Statistics plugin now, and coming back to see it results in a few days, would be the low-hanging fruit I was hoping for?

#16 Updated by intrigeri 2015-11-06 07:33:22

  • blocks deleted (Feature #5288: Run the test suite automatically on autobuilt ISOs)

#17 Updated by intrigeri 2015-11-14 02:25:35

  • Priority changed from Elevated to High

#19 Updated by bertagaz 2015-11-18 03:54:20

  • Status changed from Confirmed to In Progress
  • Assignee changed from bertagaz to intrigeri

I have not found where the node statistics in Jenkins are stored, nor how to extract longer datas than in the executor statistics URIs you pasted. Even playing with the Jenkins json api didn’t gave more, so I suspect jenkins don’t store more than 1 day of node executor statistics.

I’ve made the maths another way around, because we have another kind of useful informations to have an idea of our needs in isotesters, which are the statistics of the number of automatic ISO builds.

In theory, 1 isobuilder can build around 1000 ISO/month if the build takes 40 minutes on average (((24 * 60) / 40) * 30). So with 2 isobuilders, we can build at most 2000 ISO/month.

October was quite a representative month of the Tails development pace I think, and we built 935 ISO.

This sounds close to my experience with the isobuilders: if there are builds in queues while the isobuilders are busy, they still get quickly done, and the isobuilders are also idling at some moments. 2 isobuilders seems to be a sufficient number for our actual development rythm.

Their (short) stats (https://jenkins.tails.boum.org/computer/isobuilder2/load-statistics?type=hour and https://jenkins.tails.boum.org/computer/isobuilder1/load-statistics?type=hour) seems to say the same thing, we’re on average at the middle of their capacity. I’ve looked at them quite regulary and they seems to stick to this neighborhood.

So I think we can assume our need in isotesters is to be able to test at least 1000 ISO/month, to cope with our most intense development months.

This of course actually includes branches based on stable, so the decision on Feature #10492 may reduce this number.

This also sounds way more than what we calculated at first, and it’s quite obvious when one looks at https://jenkins.tails.boum.org/load-statistics?type=hour (note that I rebooted all the Jenkins VMs including master in the night between Nov. 16 and Nov. 17):

We actually have 4 executors on the Jenkins master node and 1 on each isobuilders ans isotesters.

So, it means we have 6 real ISO build/test executors. As we can see on this last graph, the number of busy executors is always close to 5, which seems to validate an hypothesis where we always have 4 isotesters busy, and half of the isobuilders too most of the time.

So now, one isotester is able to test something like 120 ISO/month, if we estimate a slightly over-estimated test time of 6 hours (so 4 builds a day).

That means that we’d need something like at least 8 isotesters if we settle on 100O ISO/month.

What’s your opinion on this maths? Does it make sense? If it does, do you think we should be a bit more careful and try to have as much isotesters we’d need to test all ISO that would be build with all isobuilders running at their top most possibilities: 2000 ISO/month (which means 16 isotesters)? Maybe the bill for that kind of hardware would help to decide?

#20 Updated by intrigeri 2015-11-23 01:43:56

  • related to Feature #7631: Get a server able to run our automated test suite added

#21 Updated by intrigeri 2015-11-23 02:59:11

  • Assignee changed from intrigeri to bertagaz

Hi!

> I’ve made the maths another way around, because we have another kind of useful informations to have an idea of our needs in isotesters, which are the statistics of the number of automatic ISO builds.

Fair enough.

Do you mind if I install Cluster Statistics Plugin so that we have actual usage and performance data next time we need it for the last N months?

> So I think we can assume our need in isotesters is to be able to test at least 1000 ISO/month, to cope with our most intense development months.

RUNS=1000 is just 7% more than the busiest month we’ve observed between July and October. In 2014 and 2013 respectively, in terms of Git commits, the top month of the year has been 7% and 32% more active than max(July..October). So I think that a 7% safety margin is a bit short, and I then think we need to choose between:

  • take a bigger safety margin, e.g. one that would have worked over the last few years: 30% => RUNS=1350;
  • estimate more precisely the reasons that make us think we’ll have to run the test suite less often (e.g. Feature #10492).

I say let’s go with RUNS=1350 for now, and if the corresponding hardware is too expensive, then we can revisit (and perhaps the conclusion is that what we already have is good enough in the end; I think it may be worth getting more hardware to be able to run the test suite against all autobuilt ISOs without any need for clever selection, but I doubt it’s worth getting more hardware to be able to run it just a bit more often, if in the end the needed cleverness could allow us to just stick with lizard v2 and be done with it…)

> This also sounds way more than what we calculated at first,

https://tails.boum.org/blueprint/automated_builds_and_tests/automated_tests_specs/ says 10/20 autobuilt ISOs/day minimum (=300-600/month), and unknown maximum, so it sounds like our estimates were quite good :)

> As we can see on this last graph, the number of busy executors is always close to 5, which seems to validate an hypothesis where we always have 4 isotesters busy, and half of the isobuilders too most of the time.

This is the point at which it’s clear to me that we lack actual historical data, but it’ll have to do.

> So now, one isotester is able to test something like 120 ISO/month, if we estimate a slightly over-estimated test time of 6 hours (so 4 builds a day).

Yep, this looks correct for the isotesters we currently have.

> That means that we’d need something like at least 8 isotesters if we settle on 100O ISO/month.

This conclusion implicitly relies on the fact we’ll purchase hardware that give us isotesters that have the same performance as the ones we currently have. I doubt this will be the case, and actually I secretly hope it won’t be.

Let me make some of my own assumptions explicit.

  • The test suite run time, assuming constant hardware, is rather stable: we’ll have more tests, but they will be compensated by optimizations.
  • What lizard keeps on doing: we’ll keep most of the isotesters we currently have on lizard; OTOH lizard’s CPU and memory is starting to be scarce, e.g. we have some needs we hadn’t planned, e.g. the freezable APT repo will need more of that, we’ll want a VM for weblate, etc.; so if we could remove one of its isotester we would give us plenty of available resources => let’s say we keep 3 isotesters on lizard => they can take care of 3*120 = 360 test suite runs / month.

So we need additional hardware that can run the test suite about (RUNS - 360) times a month.

And the question becomes: how do we do that best? Given we can not give lizard more CPU cores (right?), we need a second box, and then I see two main options:

  • do use nested virtualization on the new box, which implies:
    • we need hardware that can do nested KVM efficiently
    • we waste some CPU power (compared to if we were not using nested KVM)
    • we can reuse our current platform as-is, including e.g. the isotester reboot system
  • don’t use nested virtualization on the new box, which implies:
    • the test suite runs faster => we can run it more often on the same hardware (compared to if we were using nested KVM)
    • we need to adapt the test suite and the platform that runs it, to make sure that:
      • we can run several instances concurrently;
      • we can clean up stuff well enough between runs to avoid the need to reboot the host.

I don’t know if the savings on hardware costs the no-nested-KVM option gives us are worth the additional work. And currently we lack the expertise needed to use technology that’s adequate to implement the needed clean up reliably (systemd, cgroups, you name it). So I’m tempted to take the “just do nested KVM as usual” shortcut, even if it restricts our choice of hardware quite a bit.

Thoughts?

I’d like to see a quick initial quote of hardware that can do 1000 runs/month, just to give us an idea, and see if we’d rather put time in selecting more carefully what autobuilt ISOs we want to run the test suite on.

#22 Updated by intrigeri 2015-11-25 03:14:48

FTR I had to cancel ~15 test jobs to get the backlog accumulated due to Bug #10601 down to something that gives us acceptable latency between push and test results. bertagaz: whether you’ve been occasionally doing similar manual prioritization would be useful information.

#23 Updated by intrigeri 2015-12-02 03:11:36

intrigeri wrote:
> FTR I had to cancel ~15 test jobs to get the backlog accumulated due to Bug #10601 down to something that gives us acceptable latency between push and test results. bertagaz: whether you’ve been occasionally doing similar manual prioritization would be useful information.

Since a week I have not done any such manual queue handling, and each time I looked the build queue was at reasonable levels (confirmed by https://jenkins.tails.boum.org/load-statistics?type=hour).

#24 Updated by intrigeri 2015-12-05 05:44:01

  • related to Feature #10503: Run erase_memory.feature first to optimize test suite performance added

#25 Updated by bertagaz 2015-12-05 07:32:32

  • Assignee changed from bertagaz to intrigeri
  • Blueprint set to https://tails.boum.org/blueprint/hardware_for_automated_tests_take2/

intrigeri wrote:
> I’d like to see a quick initial quote of hardware that can do 1000 runs/month, just to give us an idea, and see if we’d rather put time in selecting more carefully what autobuilt ISOs we want to run the test suite on.

Here’s the output of my research about this in a blueprint. Feedbacks welcome. Will reply to other questions a bit later.

#27 Updated by intrigeri 2015-12-06 15:22:43

  • related to Bug #9157: ISO testers (level-1) VMs crash when running the test suite with Jessie's kernel added

#28 Updated by intrigeri 2015-12-07 04:32:54

  • Assignee changed from intrigeri to bertagaz
  • % Done changed from 0 to 10

Thanks!

This first cost estimate is a good basis. I don’t think it’s worth fine-tuning it before my other questions are answered, and before the maths take into account that once we have Feature #10707 the number of runs should go down by about 75/523 =~ 14% (initially I found 25% but this figure was buggy since it ignored aborted jobs).

So:

> Will reply to other questions a bit later.

Yes, please do so before we dive too deep into refining hardware specs :)

But I believe there’s a bug in the estimates, and fixing it has the potential change the invoice substantially. I think the analysis of CPU needs is buggy: on lizard we allocate 3 CPU threads as vcpus to each isotester, not 3 real CPU cores. So if we do the same on the new box, we need 8*3 = 24 CPU threads = 12 CPU cores, instead of 24 CPU cores (or even less, if this allows us to buy a CPU with a twice bigger clock rate). This changes a lot of things potentially:

  • if we stick with your estimates and give each isotester 3 real CPU cores, maybe the test suite will run much faster, which would shorten the feedback loop; and then we need less isotesters, so less CPU cores, and less RAM; I don’t know where the sweet spot between more concurrency and more per-isotester power is in terms of performance; would be good to quickly benchmark on lizard giving 3 real CPU cores to an isotester (this implies to shut down one isotester to free some CPU threads)
  • perhaps we can do with one CPU only => we can look at those that have a “1” instead of “2” as first digit in their model number, a single socket motherboard would do, and for the same power consumption we can go for a faster (non-low-power) and/or cheaper CPU. But perhaps there’s a blocker: can we have 256GB of RAM with only one CPU? And what about 128GB of RAM as suggested on the blueprint?
  • or, if we stick with 2 CPUs (e.g. because it’s a must to support that much RAM) we could get some that have less cores but a faster clock rate, e.g. 2618lv3, 2628Lv3, 2630v3, 2640v3, 2650v3; this can be cheaper and/or faster.

FYI I’ve added a “Ways to lower the price a bit” section to the blueprint for later.

In passing, our previous stats for how many times we did run the test suite are flawed: we only considered non-aborted test runs for some reason I don’t get, and I realized yesterday that a lot of jobs abort while they should really be failing (Bug #10718, Bug #10717). E.g. in November, if I stop ignoring the aborted jobs (commit b73b81f in puppet-tails), instead of 314 runs I see 523 runs. Thankfully the maths on this ticket are based on the number of built ISOs so we’re good :)

#29 Updated by bertagaz 2015-12-15 03:38:01

  • Target version changed from Tails_1.8 to Tails_2.0

Postponing

#30 Updated by intrigeri 2016-01-04 15:04:49

  • related to Feature #10851: Give lizard enough free storage to host our freezable APT repository added

#32 Updated by intrigeri 2016-01-11 18:16:05

intrigeri wrote:
> Do you mind if I install Cluster Statistics Plugin so that we have actual usage and performance data next time we need it for the last N months?

Ping? (Of course now will be too late to get stats for the two months that passed since I asked your opinion on this matter, but well, let’s think about the future.)

#33 Updated by intrigeri 2016-01-11 21:15:50

intrigeri wrote:
> And the question becomes: how do we do that best? Given we can not give lizard more CPU cores (right?), we need a second box, and then I see two main options:
>
> * do use nested virtualization on the new box, which implies:
> we need hardware that can do nested KVM efficiently
> we waste some CPU power (compared to if we were not using nested KVM)
> we can reuse our current platform as-is, including e.g. the isotester reboot system
> * don’t use nested virtualization on the new box, which implies:
> the test suite runs faster => we can run it more often on the same hardware (compared to if we were using nested KVM)
> we need to adapt the test suite and the platform that runs it, to make sure that:
> * we can run several instances concurrently;
> * we can clean up stuff well enough between runs to avoid the need to reboot the host.
>
> I don’t know if the savings on hardware costs the no-nested-KVM option gives us are worth the additional work. And currently we lack the expertise needed to use technology that’s adequate to implement the needed clean up reliably (systemd, cgroups, you name it). So I’m tempted to take the “just do nested KVM as usual” shortcut, even if it restricts our choice of hardware quite a bit.

I’ve thought a little bit about it, and I wonder if there might be a solution to run multiple concurrent instances of the test suite in a way that’s cleaned up reliably. Say that for each test suite runner, we use a dedicated user, with its own qemu:///session libvirt instance, and reuse essentially the same logic we have for “rebooting isotesterN before running a test”, but with tools applicable to user existence + session, that is: first do something like loginctl terminate-user $user && deluser --remove-home $user, then re-create the user anew, before running the tests. Maybe loginctl would be reliable enough, without introducing too complex new tools? And if we need to run stuff in each of these $user’s session, we can use the systemd user session to manage it.

Anyway, this was just in passing. In practice, the ability to do nested KVM doesn’t impact our choice of hardware much, and we can start with nested KVM and switch to non-nested next time we want to get more test suite throughput and better latency.

#34 Updated by intrigeri 2016-01-18 16:47:48

  • Assignee changed from bertagaz to intrigeri
  • QA Check deleted (Info Needed)

Discussed with bertagaz, blueprint updated, we have 2 options there. Next steps:

  1. ask taggart if option C is technically doable, otherwise go for option D
  2. wait for results from Feature #10971 that will allow us to fine tune the chosen option a bit
  3. some of the needed work (1 puppetmaster for multiple machines, untrusted network) is shared with the monitoring system setup that is being done in January-February, and then building on top of it:
  4. order and setup this new ISO testing machine in March

#35 Updated by intrigeri 2016-01-26 12:10:41

  • Target version changed from Tails_2.0 to Tails_2.2

intrigeri wrote:
> # ask taggart if option C is technically doable, otherwise go for option D

Done over email yesterday.

> # wait for results from Feature #10971 that will allow us to fine tune the chosen option a bit

Done, next step is Feature #10996.

And then, this is still valid:

> # some of the needed work (1 puppetmaster for multiple machines, untrusted network) is shared with the monitoring system setup that is being done in January-February, and then building on top of it:
> # order and setup this new ISO testing machine in March

#36 Updated by intrigeri 2016-01-26 15:29:45

To clarify, the general idea behind what I’m trying in the subtasks is: it seems that we’re are seriously underusing lizard v2 CPUs a lot; there may be tricks (e.g. running more isotesters, which may require to give the machine more RAM) that allow us to use them better. If this effort succeeds, then:

  • we’ll be using more of our currently available computing power, which would be satisfying in itself for various reasons;
  • we’ll be able to run the test suite more often on lizard => the additional hardware we buy will need to run it less => it can be cheaper;
  • we’ll have learnt lots of things about how to set up and optimize systems to cope with our ISO testing workload; for example, it should become clearer how much RAM per CPU core we want.

#37 Updated by intrigeri 2016-01-26 16:26:35

  • related to Bug #10999: Parallelize our ISO building workload on more builders added

#38 Updated by intrigeri 2016-01-27 17:28:19

  • related to Feature #11009: Improve ISO building and testing throughput and latency added

#39 Updated by intrigeri 2016-01-27 17:38:42

  • Status changed from In Progress to Resolved
  • Assignee deleted (intrigeri)

I’m calling this research done. My conclusions are on the blueprint, and my plan is Feature #11009, the first step is Feature #11010, and later get a 2nd machine.