Bug #10396

Sort out overallocated storage situation on isotesterN.lizard

Added by intrigeri 2015-10-20 11:08:08 . Updated 2015-12-20 10:15:21 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Infrastructure
Target version:
Start date:
2015-10-20
Due date:
% Done:

100%

Feature Branch:
Type of work:
Sysadmin
Blueprint:

Starter:
Affected tool:
Deliverable for:
267

Description

On lizard’s VG we currently have 405.62g (= 435.53G) available, which is clearly not enough to cover our planned needs, e.g. there’s not enough space left for me to allocate its planned space to the freezable APT repo (which I’d like to do within 24-48 hours). I suspect I should not have ignored the GiB vs. GB subtleties when doing Feature #9400… and/or some space was allocated somewhere that I had not taken into account when we did the planning.

Anyway, it seems that jenkins-data is using half of the space we’ve allocated for it so on the short term I could just steal some space there. On the longer term we might need to go back to the drawing board for Feature #9399; note that this is related to Feature #9264 (if we get another box, it’ll need disks).


Subtasks


Related issues

Related to Tails - Feature #9399: Extend lizard's storage capacity Resolved 2015-05-14
Related to Tails - Feature #9264: Consider buying more server hardware to run our automated test suite Resolved 2015-12-15

History

#1 Updated by intrigeri 2015-10-20 11:08:52

intrigeri wrote:
> […] and/or some space was allocated somewhere that I had not taken into account when we did the planning.

bertagaz, if you have any clue on this aspect, please let me know :)

#2 Updated by intrigeri 2015-10-20 11:10:31

  • related to Feature #9399: Extend lizard's storage capacity added

#3 Updated by intrigeri 2015-10-20 11:10:40

  • related to Feature #9264: Consider buying more server hardware to run our automated test suite added

#4 Updated by bertagaz 2015-10-21 02:55:44

  • Assignee changed from intrigeri to bertagaz

intrigeri wrote:
> intrigeri wrote:
> > […] and/or some space was allocated somewhere that I had not taken into account when we did the planning.
>
> bertagaz, if you have any clue on this aspect, please let me know :)

I’ll have a look.

#5 Updated by intrigeri 2015-10-27 13:12:33

bertagaz wrote:
> intrigeri wrote:
> > intrigeri wrote:
> > > […] and/or some space was allocated somewhere that I had not taken into account when we did the planning.
> >
> > bertagaz, if you have any clue on this aspect, please let me know :)
>
> I’ll have a look.

Any news on this side? If you don’t think you can do this by tomorrow night, please simply reassign to me and I’ll deal with it somehow.

#6 Updated by intrigeri 2015-11-02 04:23:06

  • Status changed from Confirmed to Rejected
  • Assignee deleted (bertagaz)

Too late, I’ve found other ways to work on Feature #5926.

#7 Updated by intrigeri 2015-11-06 11:17:42

  • Subject changed from Sort out storage situation on lizard to Sort out overallocated storage situation on isotesterN.lizard
  • Status changed from Rejected to Confirmed
  • Assignee set to bertagaz
  • Priority changed from High to Normal
  • Target version changed from Tails_1.7 to Tails_1.8
  • Parent task set to Feature #5288
  • Deliverable for set to 267

We had planned to allocate 20GB for each isotesterN (see details on Feature #9400#note-6) and they got 41GB each, except isotester1 that has 51GB for some reason. So we’re wasting 3*21+31=94GB here. I think our estimates were slightly wrong, and these VMs probably need a bit more than what we thought initially, but the overallocated space is simply too much of a waste to be ignored.

#8 Updated by bertagaz 2015-11-18 05:29:49

  • Assignee changed from bertagaz to intrigeri
  • QA Check set to Info Needed

intrigeri wrote:
> We had planned to allocate 20GB for each isotesterN (see details on Feature #9400#note-6) and they got 41GB each, except isotester1 that has 51GB for some reason. So we’re wasting 3*21+31=94GB here. I think our estimates were slightly wrong, and these VMs probably need a bit more than what we thought initially, but the overallocated space is simply too much of a waste to be ignored.

Right, I think I messed up something when resizing the isotesterN-tmp lvm volumes. I’m not sure how much is over-allocated though.

Having a look at our munin graph for disk usage seems to say that we had pikes at 98% or so of this tmp partitions, but it might as well be that there were some bugs at some points that filled the partition, but looking at the daily graphs, this high percentage seems to happen often.

So I wonder if in the end this over-allocation didn’t save our ass at some point. May you have a look at this graph and tell me what you think about them?

#9 Updated by intrigeri 2015-11-21 02:49:25

  • Assignee changed from intrigeri to bertagaz
  • QA Check changed from Info Needed to Dev Needed

> Having a look at our munin graph for disk usage seems to say that we had pikes at 98% or so of this tmp partitions

On our Munin I see no filesystem space usage stats for the FS’es we’re talking about, so I don’t know what you’re talking about (in case it’s a source of confusion, diskstats_utilization is not about disk space utilisation; the diskstats plugin is about how the device behaves at the block device layer, and “utilization” is “how busy is the device”). If we have such stats somewhere already, then please point me precisely (with unambiguous language and pointers) to where I can find them to confirm your findings.

Meanwhile, to save one roundtrip and some data gathering time I’m running:

while true ; do
for i in $(seq 1 4) ; do
ssh isotester$i.lizard df /tmp/TailsToaster | tail -n1 | awk '{print $3}' >> ~/tmp/isotester$i-tailstoaster-used
done
sleep 60
done

… so when you’re back to this ticket, you can easily check the data (in my $HOME) and fix storage allocation based on these numbers.

#10 Updated by bertagaz 2015-12-05 07:36:41

  • Assignee changed from bertagaz to intrigeri
  • QA Check changed from Dev Needed to Info Needed

intrigeri wrote:
> On our Munin I see no filesystem space usage stats for the FS’es we’re talking about, so I don’t know what you’re talking about (in case it’s a source of confusion, diskstats_utilization is not about disk space utilisation; the diskstats plugin is about how the device behaves at the block device layer, and “utilization” is “how busy is the device”). If we have such stats somewhere already, then please point me precisely (with unambiguous language and pointers) to where I can find them to confirm your findings.

Hmm right, thanks for the explanations.

> Meanwhile, to save one roundtrip and some data gathering time I’m running:
>
> […]
>
> … so when you’re back to this ticket, you can easily check the data (in my $HOME) and fix storage allocation based on these numbers.

Biggest number out of this stats is at almost 12G.

So we may reduce this tmp partition to 15G for a bit of safety, or 20G if we expect it to grow at some point (but I don’t believe so). What’s your opinion?

#11 Updated by intrigeri 2015-12-05 10:20:29

  • Status changed from Confirmed to In Progress
  • Assignee changed from intrigeri to bertagaz
  • % Done changed from 0 to 20
  • QA Check changed from Info Needed to Dev Needed

bertagaz wrote:
> Biggest number out of this stats is at almost 12G.
>
> So we may reduce this tmp partition to 15G for a bit of safety,

According to what I’m quoting on Feature #10503, we could quite easily get this number down by a few GB, so I think that adding 25% of safety margin is over-enthusiastic => I suggest 10% of safety marging. Beware of the units, by the way ;)

#12 Updated by bertagaz 2015-12-15 03:26:43

  • Target version changed from Tails_1.8 to Tails_2.0

Postponing

#13 Updated by intrigeri 2015-12-19 10:46:10

  • % Done changed from 20 to 50

Done on isotester2 and isotester3 with: VM=isotester3 ; virsh shutdown "$VM" && virsh vol-delete "${VM}-tmp" --pool lvm && virsh vol-create-as lvm "${VM}-tmp" 14173392076 && sudo mkfs.ext4 -m 0 "/dev/lizard/${VM}-tmp" && virsh start "$VM"

#14 Updated by bertagaz 2015-12-20 07:32:43

  • Assignee changed from bertagaz to intrigeri
  • % Done changed from 50 to 80
  • QA Check changed from Dev Needed to Ready for QA

Done for isotester1 and isotester4 almost following your guidelines: I shut down the VMs by hand before starting to play with the -tmp volume, just to be sure it was down for real.

#15 Updated by intrigeri 2015-12-20 10:15:21

  • Status changed from In Progress to Resolved
  • Assignee deleted (intrigeri)
  • % Done changed from 80 to 100
  • QA Check changed from Ready for QA to Pass

Looks good!