Feature #16041

Replace rotating drives with new SSDs on lizard

Added by intrigeri 2018-10-11 11:45:13 . Updated 2018-11-28 11:31:55 .

Status:
Resolved
Priority:
Normal
Assignee:
intrigeri
Category:
Infrastructure
Target version:
Start date:
2018-10-11
Due date:
% Done:

100%

Feature Branch:
Type of work:
Sysadmin
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

On the sysadmin side I disabled the old rotating drives:

sudo vgremove spinninglizard
sudo pvremove /dev/mapper/md2_crypt
sudo cryptdisks_stop md2_crypt
sudo mdadm --stop /dev/md2
sudo sed -i --regexp-extended '/^md2_crypt/ d' /etc/crypttab
sudo sed -i --regexp-extended '/^ARRAY \/dev\/md\/2 / d' /etc/mdadm/mdadm.conf
sudo update-initramfs -u

groente, can you please check if I forgot something? Then reassign to me so I handle the next steps.


Subtasks


Related issues

Related to Tails - Bug #16131: Broken Samsung SSD 850 EVO 1TB on lizard Resolved 2018-11-17
Related to Tails - Bug #16161: optimise pv placement for io-performance Resolved 2018-11-28
Blocks Tails - Feature #13242: Core work: Sysadmin (Maintain our already existing services) Confirmed 2017-06-29
Blocks Tails - Bug #16155: increase jenkins and iso-archive diskspace Resolved 2018-11-27

History

#1 Updated by intrigeri 2018-10-11 11:46:03

  • related to #15779 added

#2 Updated by intrigeri 2018-10-11 11:46:26

  • blocks Feature #13242: Core work: Sysadmin (Maintain our already existing services) added

#3 Updated by groente 2018-10-11 13:11:02

  • Assignee changed from groente to intrigeri
  • QA Check changed from Ready for QA to Dev Needed

intrigeri wrote:
> On the sysadmin side I disabled the old rotating drives:
>
> […]
>
> groente, can you please check if I forgot something? Then reassign to me so I handle the next steps.

Apart from the systemd services that tried to bring md2_crypt back up again I already mentioned on xmpp, I think that pretty much covers it.

Just to be safe, I would recommend running grub-install again on the remaining disks (sda — sdf), it should already be there, but with the occassional ‘grub not found’ during lizard reboots, it’s better to be safe than sorry before pulling disks out.

#4 Updated by intrigeri 2018-10-12 07:44:22

  • Target version changed from Tails_3.10.1 to Tails_3.11

#5 Updated by intrigeri 2018-10-17 09:22:27

groente wrote:
> Apart from the systemd services that tried to bring md2_crypt back up again I already mentioned on xmpp, I think that pretty much covers it.

FTR that was fixed.

> Just to be safe, I would recommend running grub-install again on the remaining disks (sda — sdf), it should already be there, but with the occassional ‘grub not found’ during lizard reboots, it’s better to be safe than sorry before pulling disks out.

Good idea! I did sudo dpkg-reconfigure grub-pc, selected /dev/sd[c-f] and let it install GRUB on those drives. Note that we can’t install GRUB on /dev/sd[ab] because there’s simply no room for it (fully encrypted, no partition table nor filesystem that GRUB can use).

#6 Updated by intrigeri 2018-11-06 08:53:14

  • Status changed from Confirmed to In Progress

#7 Updated by intrigeri 2018-11-17 09:43:53

  • related to Bug #16131: Broken Samsung SSD 850 EVO 1TB on lizard added

#8 Updated by intrigeri 2018-11-17 09:45:52

Our BIOS was still configured to start on the rotating drives. I’ve fixed that.

Pinged taggart on IRC today.

#9 Updated by groente 2018-11-17 19:11:45

due to md1 being degraded (see Bug #16131), the following LV’s will be moved from md1 to md4:

    root
    puppet-git-system   *
    apt-system
    apt-data
    rsync-system
    bittorrent-system
    apt-proxy-system
    apt-proxy-data
    whisperback-system
    bitcoin-data        **
    jenkins-system
    bridge-system
    www-system
    misc-system
    puppet-git-data
    bitcoin-system
    bitcoin-swap
    isos-www            **
    isotester1-system
    im-system
    monitor-system
    isotester2-system
    isotester3-system
    isotester4-system
    isotester4-data
    apt-snapshots       **
    isotester5-system
    isotester5-data
    isotester6-system
    isotester6-data
    translate-system    **
    isobuilder1-system
    isobuilder4-system
    isobuilder3-system
    isobuilder3-data    **
    isobuilder2-system 
    isobuilder2-libvirt **
    isobuilder3-libvirt **
    isobuilder4-libvirt **
    isobuilder1-libvirt **
    apt-proxy-swap

LV’s marked * also have a foot in md3, only the parts from md1 will be moved
LV’s marked were already partially on md4

#10 Updated by intrigeri 2018-11-20 19:42:18

  • Assignee changed from intrigeri to bertagaz
  • % Done changed from 0 to 10

Old drives pulled out, new drives plugged in. Please do the basic setup of the new drives (or ask me to do it) and reassign to me so I do the next steps. See ML for required timing & technical details. Thanks in advance!

#11 Updated by groente 2018-11-27 10:49:34

  • Assignee changed from bertagaz to groente

stealing this ticket because we need the diskspace for the sprint

#12 Updated by groente 2018-11-27 11:34:06

  • blocks Bug #16155: increase jenkins and iso-archive diskspace added

#13 Updated by intrigeri 2018-11-27 18:30:23

Regarding spreading the I/O load again accross PVs aka. RAID arrays:

  • this much seems obvious: spread the ISO builders & testers over at least 2 arrays; they don’t use that much I/O though (we’ve set up stuff & memory to minimize I/O needs here)
  • top IOPS consumers (average IOPS over a week, max of read & write): jenkins-data (147.91), apt-snapshots (36.66), translate-system (10.90), apt-proxy-data (7.44), puppet-git-system (6.81), isos (4.86), bitcoin-data (3.80)
  • ISO builders & testers, when busy, make other volumes busy (mainly jenkins-data, apt-snapshots, apt-proxy-data); let’s separate them if we can

So let’s try this:

  • md3 (old, 500GB): translate-system, apt-proxy-data, puppet-git-system, bitcoin-data, half of Jenkins workers (isobuilders 1-2, isotesters 1-3)
  • md4 (old, 2TB): jenkins-data, isos, 1/4 of ISO builders & testers (isobuilder3, isotester4)
  • md5 (new, 4TB): apt-snapshots, 1/4 of Jenkins workers (isobuilder4, isotesters 5-6)

I’ll do this once the lower part of the stack is ready and a week or two later I’ll check latency and IOPS per PV, which should tell me how good or bad this first iteration was.

#14 Updated by groente 2018-11-27 18:37:40

  • Assignee changed from groente to intrigeri
  • QA Check changed from Dev Needed to Pass

go for it, once that’s done i think this ticket can be closed \o/

#15 Updated by intrigeri 2018-11-28 07:50:36

  • % Done changed from 10 to 50
  • QA Check deleted (Pass)

#16 Updated by intrigeri 2018-11-28 11:25:05

Amending the plan: md3 would be too full if we do exactly that so I’ll move isobuilder2 stuff to md5 instead.

#17 Updated by groente 2018-11-28 11:30:42

  • related to Bug #16161: optimise pv placement for io-performance added

#18 Updated by groente 2018-11-28 11:31:55

  • Status changed from In Progress to Resolved
  • Target version deleted (Tails_3.11)
  • % Done changed from 50 to 100

all done with the disk replacement, created a new ticket for the pv-switcheroo