Bug #16161
optimise pv placement for io-performance
100%
Description
lv’s should be spread across the different raid arrays for optimal i/o performance.
Subtasks
Related issues
Related to Tails - |
Resolved | 2018-10-11 |
History
#1 Updated by groente 2018-11-28 11:30:42
- related to
Feature #16041: Replace rotating drives with new SSDs on lizard added
#2 Updated by intrigeri 2018-11-28 12:36:38
- Target version set to Tails_3.12
- % Done changed from 0 to 20
I’m done with the initial attempt. I’ll check Munin in a week or two and will adjust things as needed.
#3 Updated by intrigeri 2018-11-29 13:23:47
OK, no need to wait a week to adjust a bit: after 24h it’s clear that md4 still gets way more I/O than md3 and md5; all the numbers match (IOPS, latency, throughput). That’s not surprising: my initial plan only covered the top IO consumers and almost everything else is still on md4. I’ll move some more stuff out of it.
#4 Updated by intrigeri 2018-11-29 13:31:12
Moved root, jenkins-system and bittorrent-system from md4 to md5.
#5 Updated by intrigeri 2018-12-08 08:01:20
Over the last 7 days:
- read IOPS:
- average is rather well balanced, with md4 quite lower than the other arrays though
- max is much higher on md5 than elsewhere
- write IOPS:
- average is much higher on md4 than other arrays
- max is very low on md3 and 72% higher on md4 than md5
- disk latency:
- average is good on md3 and md5 but it’s 16 times bigger on md4
- max is OK on md3 and md5 but much bigger (2.13s) on md4 which is somewhat concerning
- throughput, utilization: nothing remarkable
#6 Updated by intrigeri 2018-12-08 08:24:16
Zooming in, only the md4 write IOPS and latency is still concerning. It’s fully correlated with jenkins-data
and it increased substantially since we started generating USB images on all branches (Feature #16154), i.e. just after my initial analysis (Feature #16041#note-13) of the per-LV I/O consumption… too bad, but we knew it would have to be an incremental process anyway. Apart of jenkins-data
, the biggest consumers of write I/O on md4 is isobuilder3-data, so I’m moving it to md3 which is relatively underloaded in terms of write I/O. This does not leave much space left on md3 (30GB) but it should improve things until we actually need more space there.
#7 Updated by intrigeri 2018-12-16 11:30:15
- Status changed from In Progress to Resolved
- Assignee deleted (
intrigeri) - % Done changed from 20 to 100
md4’s average latency (73ms) is now “only” about 7.5 times bigger than the other arrays, i.e. twice better than before my last changes. md4 average write IOPS is now “only” about twice bigger than the other arrays. I think this is good enough. Other metrics are OK.
The only way I see to balance this even better would be to spread jenkins-data
over multiple arrays. Such an allocation algorithm would be more difficult to maintain: we already have a hard time ensuring LVs remain on “their” PV when we grow them as part of day-to-day system operations; I’d rather not make this even harder => IMO the cost/benefit of spending more time on this ticket would not be worth it.