Project

General

Profile

Bug #13425

Bug #11680: Upgrade server hardware (2017 edition)

Upgrade lizard's storage (2017 edition)

Added by intrigeri 3 months ago. Updated 10 days ago.

Status:
Resolved
Priority:
Elevated
Assignee:
-
Category:
Infrastructure
Target version:
Start date:
07/05/2017
Due date:
% Done:

100%

QA Check:
Feature Branch:
Type of work:
Sysadmin
Blueprint:
Easy:
Affected tool:

Description

Given what #12002 and #11806 taught us, and the painful situation we're in already wrt. storage, we need to upgrade lizard's storage ASAP as a first step towards #11680, without figuring out the rest of #11680 yet. Whatever storage we add can be reused when we do #11680 anyway :)

Setting target version = 3.1 because such things take time, so better start early if we want to get it done within a reasonable time frame.


Related issues

Related to Tails - Bug #13526: apt-snapshots partition lacks disk space Resolved 07/27/2017
Related to Tails - Bug #14732: add diskspace to isobuilder1-4 Resolved 09/28/2017
Related to Tails - Feature #14797: Decide what LVs to host on lizard rotating drives In Progress 10/07/2017
Blocked by Tails - Feature #11806: Update server storage planning needs for at least 2017 Resolved 09/19/2016
Blocks Tails - Bug #12595: Not enough space in /var/lib/jenkins on isobuilders Resolved 05/25/2017

History

#1 Updated by intrigeri 3 months ago

  • Blocked by Feature #11806: Update server storage planning needs for at least 2017 added

#2 Updated by bertagaz 3 months ago

intrigeri wrote:

Given what #12002 and #11806 taught us, and the painful situation we're in already wrt. storage, we need to upgrade lizard's storage ASAP as a first step towards #11680, without figuring out the rest of #11680 yet. Whatever storage we add can be reused when we do #11680 anyway :)

Setting target version = 3.1 because such things take time, so better start early if we want to get it done within a reasonable time frame.

So I guess first step is to get in touch with our fellows at the colo to confirm we can add two drives, and to get their availability in the coming time, so that we can plan this. I'll send an email for that.

#3 Updated by bertagaz 3 months ago

  • Blocks Bug #12595: Not enough space in /var/lib/jenkins on isobuilders added

#4 Updated by intrigeri 3 months ago

So I guess first step is to get in touch with our fellows at the colo to confirm we can add two drives, and to get their availability in the coming time, so that we can plan this. I'll send an email for that.

I'm glad you're moving this forward.

Perhaps we should first look into our options:

  • PCIe (M.2, NVM and friends) drives
    • Can we plug such things? We have a bunch of PCI Express ports that are probably unused, and (according to our Git) a Supermicro RSC-RR1U-E8 riser card (no idea if it's used). Perhaps we can plug such things with riser cards, ideally cards that accept at least 2 M.2/NVM flash devices.
    • advantage: fast! leaves room in the case if we ever need it
    • potential drawbacks:
      • more expensive?
      • available in the kind of size we need?
  • adding two SSDs (assuming we can still plug 2 more, there can be other limitations than the physical space in the case):
    • advantages:
      • simple, easy
      • cheapest solution
    • drawback: retiring the rotating drives later will require more coordination with, and more work for, the local colo admins (replace 1 rotating drive with a SSD, boot, wait for us to pvmove stuff around, shutdown, replace the 2nd rotating drive with the 2nd SSD); IOW this is easier on the short term, but creates technical debt for later, which I dislike quite a bit
  • replacing two rotating drives with SSDs
    • advantages:
      • simple, easy
      • straightforward for the local colo admins: plug the new SSDs, come back whenever they want to unplug the old drives
      • it feels safer to replace 5-years-old rotating drives now
      • we still have 2 empty slots that can be used for the next upgrade
    • drawbacks: probably more expensive than the "adding two SSDs" option as we'll need bigger SSDs (perhaps we can sell the rotating drives but that won't affect the big picture much)

So the questions I'd like to see answered are:

  • Are PCIe (M.2, NVM) drives a realistic option?
  • What kind of SSDs are available for the "replacing two rotating drives with SSDs" option? We'll need 1.7+0.6 = 2.3 TB. Are there any suitable 2.5 TB or 3 TB SSDs or do we have to jump to 4 TB? How much of our hardware upgrade budget would it eat, compared to the other options?

#5 Updated by bertagaz 3 months ago

  • Status changed from Confirmed to In Progress
  • Assignee changed from bertagaz to intrigeri
  • % Done changed from 0 to 20
  • QA Check set to Info Needed

intrigeri wrote:

So I guess first step is to get in touch with our fellows at the colo to confirm we can add two drives, and to get their availability in the coming time, so that we can plan this. I'll send an email for that.

I'm glad you're moving this forward.

Great! :)

Perhaps we should first look into our options:

Good idea.

So the questions I'd like to see answered are:

  • Are PCIe (M.2, NVM) drives a realistic option?

I've made a bit of research on this side. So it seems we do have PCIe ports, but no M.2 or NVMe ports. M.2 drives are at most 1G fat anyway, so they don't fit. PCIe SSDs are super expensive.

From the specs of the riser card and our mobo, it seems the riser card does not provide M.2 or NVMe ports neither. Also this riser card is a way to put a drive horizontally so that it fits in a 1U case. But if you do that I think you can only put one of them, as the drive will cover every other PCIe ports of the mobo. So we could only put one PCIe drive.

I've found traces of PCIe (x8) <-> NVMe adapters, but I'm not sure how it would fit in our 1U case. Maybe combined with the riser card it could. But NMVe SSds are also super expensive anyway.

So in the end, my answer would be "no", I don't think that's a realistic option. It's still worth asking to the colo people what they think about just in case.

  • What kind of SSDs are available for the "replacing two rotating drives with SSDs" option? We'll need 1.7+0.6 = 2.3 TB. Are there any suitable 2.5 TB or 3 TB SSDs or do we have to jump to 4 TB? How much of our hardware upgrade budget would it eat, compared to the other options?

There are 3T samsung EVO SSDs, available at around 800$ each. This seems much more affordable than the M.2/NVMe options. Samsung EVO 4T are at 1700$ each.

Now I wonder: we're supposed to replace/upgrade Lizard in the and of 2019. Do we assume that our storage upgrade plan will be enough to survive until then, or should we think about adding a bit more space so that we can hold on the storage side until then (which could mean going to the much more expensive 4T option). My take would be to bet it will be fine, that we've been good at planning the storage until end of 2018 and we can just assume we may need to upgrade Lizard a bit sooner than expected in 2019. Not sure it's worth the additional $ of the 4T option.

#6 Updated by intrigeri 3 months ago

  • Assignee changed from intrigeri to bertagaz
  • QA Check changed from Info Needed to Dev Needed
  • Are PCIe (M.2, NVM) drives a realistic option?

I've made a bit of research on this side. So it seems we do have PCIe ports, but no M.2 or NVMe ports. M.2 drives are at most 1G fat anyway, so they don't fit.

Did you really mean 1 G? I have some of those: SSD 128GB M.2 Intel SSDPEKKW128G7X1. If you meant 1TB, then if we can plug 4 * 1TB (= 2TB usable) then we're good if we keep our rotating drives for a bit longer.

PCIe SSDs are super expensive.

Wow, OMG.

From the specs of the riser card and our mobo, it seems the riser card does not provide M.2 or NVMe ports neither. Also this riser card is a way to put a drive horizontally so that it fits in a 1U case. But if you do that I think you can only put one of them, as the drive will cover every other PCIe ports of the mobo. So we could only put one PCIe drive.

OK, too bad. I thought I had seen riser cards that accept 4 NVMe drives somewhere, but I guess you checked.

But NMVe SSds are also super expensive anyway.

Indeed.

So in the end, my answer would be "no", I don't think that's a realistic option.

Agreed.

It's still worth asking to the colo people what they think about just in case.

Yes, it's worth telling them what our needs are. They might come up with a creative idea neither of us has thought of :)

  • What kind of SSDs are available for the "replacing two rotating drives with SSDs" option? We'll need 1.7+0.6 = 2.3 TB. Are there any suitable 2.5 TB or 3 TB SSDs or do we have to jump to 4 TB? How much of our hardware upgrade budget would it eat, compared to the other options?

There are 3T samsung EVO SSDs, available at around 800$ each. This seems much more affordable than the M.2/NVMe options. Samsung EVO 4T are at 1700$ each.

What exact model is that? I would just like to check they're good enough for our needs.

Anyway, good to know this 3TB option is available :) It fits into our budget forecasting for storage upgrades (and we thought we would only get 2*2TB for this price), so if NVMe is not a realistic option, then I say let's go with the "replacing two rotating drives with SSDs" option and 2*3TB SSDs.

Now I wonder: we're supposed to replace/upgrade Lizard in the and of 2019.

Err, the parent ticket (#11680) is on our sysadmin roadmap for 2017Q2→2019Q1, and would like to get it done this year anyway; perhaps you missed #11680#note-5?
Did I miss something about this "end of 2019" thing?

#7 Updated by bertagaz 3 months ago

intrigeri wrote:

  • Are PCIe (M.2, NVM) drives a realistic option?

I've made a bit of research on this side. So it seems we do have PCIe ports, but no M.2 or NVMe ports. M.2 drives are at most 1G fat anyway, so they don't fit.

Did you really mean 1 G? I have some of those: SSD 128GB M.2 Intel SSDPEKKW128G7X1. If you meant 1TB, then if we can plug 4 * 1TB (= 2TB usable) then we're good if we keep our rotating drives for a bit longer.

I really meant 1TB max, M.2 seems to be a limited (for now) form factor.

PCIe SSDs are super expensive.

Wow, OMG.

Yeah, amazing. :)

From what I've seen they doesn't fit in a 1U case anyway.

From the specs of the riser card and our mobo, it seems the riser card does not provide M.2 or NVMe ports neither. Also this riser card is a way to put a drive horizontally so that it fits in a 1U case. But if you do that I think you can only put one of them, as the drive will cover every other PCIe ports of the mobo. So we could only put one PCIe drive.

OK, too bad. I thought I had seen riser cards that accept 4 NVMe drives somewhere, but I guess you checked.

Did not stumble upon that while searching. I've seen x4 several time though (like in the colo hardware collection web page), but related to the "PCIe gen" rather. I'll dig a bit there.

The adapters I've seen allowed only 1 disk (with different possible form factors), but maybe two risers + two adapters + 2 NVMe disk would fit in the case. They are still really expensive though. Not sure right now it's worth it, compared to the price of 3T SATA SSDs nowadays.

When we'll upgrade lizard's hardware, I think we should really consider having a mobo with M.2/NVMe slots anyway, we could really benefit from this I/O bandwidth boost. It all depends on the availability and affordability of this kind of thing, but given the huge bump in I/O they provide, I guess it will settle down fast.

So in the end, my answer would be "no", I don't think that's a realistic option.

Agreed.

It's still worth asking to the colo people what they think about just in case.

Yes, it's worth telling them what our needs are. They might come up with a creative idea neither of us has thought of :)

ACK. I'll write an email summing up our outcomes so far and pointing to this ticket. If they have the same conclusion, we'll go on with the samsung EVO alternative.

What exact model is that? I would just like to check they're good enough for our needs.

Right, EVO line is not that uniform. Taken from the same seller than for the other hardware links here, they are "Samsung 850 EVO, 3TB 2.5 SATA3 SSD

Anyway, good to know this 3TB option is available :) It fits into our budget forecasting for storage upgrades (and we thought we would only get 2*2TB for this price), so if NVMe is not a realistic option, then I say let's go with the "replacing two rotating drives with SSDs" option and 2*3TB SSDs.

+1

Now I wonder: we're supposed to replace/upgrade Lizard in the and of 2019.

Err, the parent ticket (#11680) is on our sysadmin roadmap for 2017Q2→2019Q1, and would like to get it done this year anyway; perhaps you missed #11680#note-5?

Ooops, did not think about finding a ticket about that.

Did I miss something about this "end of 2019" thing?

The blueprint still state that. But maybe that's meaning "start replacing lizard hardware", and the blueprint state the deadline for such a thing to be alive.

I think that's also why I prefer the "3T SATA" option because it gives us 700G left to handle this 2019 transition bits of storage.

#8 Updated by bertagaz 3 months ago

  • % Done changed from 20 to 30

intrigeri wrote:

bertagaz wrote:

It's still worth asking to the colo people what they think about just in case.

Yes, it's worth telling them what our needs are. They might come up with a creative idea neither of us has thought of :)

Sent the first emails, pointing here and explaining where we were/what we need. We'll see when people at the colo are available to go forward.

#9 Updated by intrigeri 3 months ago

  • Related to Bug #13526: apt-snapshots partition lacks disk space added

#10 Updated by intrigeri 3 months ago

Raised one concern on the email thread wrt. the 3TB 850 EVO.

#11 Updated by bertagaz 3 months ago

  • Assignee changed from bertagaz to intrigeri
  • QA Check changed from Dev Needed to Info Needed

bertagaz wrote:

intrigeri wrote:

Perhaps we should first look into our options:

  • What kind of SSDs are available for the "replacing two rotating drives with SSDs" option? We'll need 1.7+0.6 = 2.3 TB. Are there any suitable 2.5 TB or 3 TB SSDs or do we have to jump to 4 TB? How much of our hardware upgrade budget would it eat, compared to the other options?

There are 3T samsung EVO SSDs, available at around 800$ each. This seems much more affordable than the M.2/NVMe options. Samsung EVO 4T are at 1700$ each.

So this option may be compromised, if this kind of SSDs in fact doesn't exist. No news yet from interpromicro, but in case it's confirmed, we have two options:

  • Take 2x2T SATA SSDs, and keep the old rotating drive until the later Lizard hardware upgrade. $: 850 x 2
  • Jump on 2x4T SATA SSDs, which leaves room but is much more expensive. $: 1700 x 2

It seems unlikely we'll get to the M.2/PCIe/NVMe way, so which of two shall we take? I think the later 4T option is probably too expensive to be worth it.

#12 Updated by intrigeri 3 months ago

  • Assignee changed from intrigeri to bertagaz
  • QA Check changed from Info Needed to Dev Needed

I think the later 4T option is probably too expensive to be worth it.

Agreed, although not merely because it would be "too expensive" in absolute terms: e.g. if it was clear already that we would need these drives as part of #11680, then I would want us to consider this option seriously. But that's not clear to me, because one of the leading options outsources a substantial amount of our disk space needs. And then at this point I would not be able to justify this expense as a worthy long-term investment. I prefer to keep the extra money aside to have more flexibility when making plans on #11680.

#13 Updated by bertagaz 2 months ago

  • Target version changed from Tails_3.1 to Tails_3.2

#14 Updated by groente 2 months ago

  • Related to Bug #13583: bitcoin is running out of disk space added

#15 Updated by groente 2 months ago

  • Related to deleted (Bug #13583: bitcoin is running out of disk space)

#16 Updated by groente 2 months ago

  • Blocks Bug #13583: bitcoin is running out of disk space added

#17 Updated by intrigeri about 2 months ago

  • Blocks deleted (Bug #13583: bitcoin is running out of disk space)

#18 Updated by anonym 18 days ago

  • Target version changed from Tails_3.2 to Tails_3.3

#19 Updated by groente 18 days ago

So, since the new disks are in place, what do you think about the following approach:

- create a new /dev/md4 with /dev/sdg and /dev/sdh
- add a bit of luks to create /dev/mapper/md4_crypt
- use vgextend to add /dev/md4_crypt to the lizard volume group
- run: for i in `pvdisplay m /dev/mapper/md1_crypt |grep 'Logical volume' |awk '{print $3}'`; do pvmove -n $i /dev/mapper/md1_crypt /dev/mapper/md4_crypt;done
now all the logical volumes that were on the spinning disks should be on ssd, so we can
- vgreduce a
which should remove /dev/mapper/md1_crypt from the volume group
- and if so desired we can then create a new volume group spinninglizard with /dev/mapper/md1_crypt as sole pv.

If I'm not mistaken, this should allow us to move everything onto SSD without downtime and avoid future confusion whether a VM is on spinning disk or SSD.

Optionally, we could first partition /dev/sdg and sdh, then make a small raid array to replace the md0 boot partition, so we can remove the spinning disks alltogether and feel good about that half a watt of energy consumption saved.

#20 Updated by groente 18 days ago

well, that oneliner came out all wrong, should've been:

for i in `pvdisplay -m /dev/mapper/md1_crypt |grep 'Logical volume' |awk '{print $3}'`; do pvmove -n $i /dev/mapper/md1_crypt /dev/mapper/md4_crypt;done

followed by:

vgreduce -a

#21 Updated by groente 18 days ago

  • Related to Bug #14732: add diskspace to isobuilder1-4 added

#22 Updated by intrigeri 18 days ago

- create a new /dev/md4 with /dev/sdg and /dev/sdh
- add a bit of luks to create /dev/mapper/md4_crypt

+ pvcreate /dev/mapper/md4_crypt

- use vgextend to add /dev/md4_crypt to the lizard volume group
- run: for i in `pvdisplay -m /dev/mapper/md1_crypt |grep 'Logical volume' |awk '{print $3}'`; do pvmove -n $i /dev/mapper/md1_crypt /dev/mapper/md4_crypt;done

I have strong reservations about looping over pvmove: when a pvmove command is aborted for whatever reason, running it again resumes the operation… unless one has run another pvmove command already. This has bitten me in the past and since then my personal policy is to never script around multiple calls to pvmove.

- now all the logical volumes that were on the spinning disks should be on ssd, so we can
- vgreduce a
which should remove /dev/mapper/md1_crypt from the volume group
- and if so desired we can then create a new volume group spinninglizard with /dev/mapper/md1_crypt as sole pv.

If I'm not mistaken, this should allow us to move everything onto SSD without downtime and avoid future confusion whether a VM is on spinning disk or SSD.

Sounds good!

I think there's an initramfs update missing near the end.

Optionally, we could first partition /dev/sdg and sdh, then make a small raid array to replace the md0 boot partition, so we can remove the spinning disks alltogether and feel good about that half a watt of energy consumption saved.

Right now, if I'm not mistaken /boot is not backed by rotating drives at all, but I suspect we might still be relying on their boot sector.

But in principle, extending /boot redundancy across more of or SSDs seems right. Same for installing GRUB on more boot sectors.

Wrt. removing the rotating drives: IIRC our updated storage plan relies on the fact we'll keep them in for a while more, and postpone their unplugging to #11680, but I totally might be misremembering. So please someone double check and add a reference here before we unplug them :)

Meta: this work is supposed to be done (by the end of October) as part of a sponsor deliverable that's on bertagaz' plate. So if you want to work further on this, please check with bertagaz if he's fine with sharing some of his allocated budget with you, and find an agreement together wrt. how exactly the money shall be split (be it based on clocking or whatever, that's your business). Now, if this is not done by mid-October and there's no clear+realistic ETA, I say feel free to take it over, clock your time, and "steal" from his budget. Obviously that's the worst case situation, let's try hard to avoid it with improved planning & communication (on Oct 2 we have a repro builds meeting and a CI team one, and bertagaz is supposed to attend both, so I'll check with him what's his current realistic ETA… taking into account the previous one was not realistic since this just got postponed to 3.3).

#23 Updated by bertagaz 13 days ago

  • Assignee changed from bertagaz to intrigeri
  • QA Check changed from Dev Needed to Info Needed

intrigeri wrote:

- create a new /dev/md4 with /dev/sdg and /dev/sdh
- add a bit of luks to create /dev/mapper/md4_crypt

+ pvcreate /dev/mapper/md4_crypt

- use vgextend to add /dev/md4_crypt to the lizard volume group
- run: for i in `pvdisplay -m /dev/mapper/md1_crypt |grep 'Logical volume' |awk '{print $3}'`; do pvmove -n $i /dev/mapper/md1_crypt /dev/mapper/md4_crypt;done

I have strong reservations about looping over pvmove: when a pvmove command is aborted for whatever reason, running it again resumes the operation… unless one has run another pvmove command already. This has bitten me in the past and since then my personal policy is to never script around multiple calls to pvmove.

Yeah, I was a bit afraid reading that too. I'll do that by hand rather.

I think there's an initramfs update missing near the end.

Right.

Optionally, we could first partition /dev/sdg and sdh, then make a small raid array to replace the md0 boot partition, so we can remove the spinning disks alltogether and feel good about that half a watt of energy consumption saved.

Right now, if I'm not mistaken /boot is not backed by rotating drives at all, but I suspect we might still be relying on their boot sector.

Nop, /boot/ is not on the rotating drives.

But in principle, extending /boot redundancy across more of or SSDs seems right. Same for installing GRUB on more boot sectors.

So I guess I should partition the new SSDs to have /boot partitions on RAID too? Should I had this partitions as spares for the md0 RAID volume, or create a new RAID? IIRC RAID1 cannot have more that two partitions, but I may be wrong.

Wrt. removing the rotating drives: IIRC our updated storage plan relies on the fact we'll keep them in for a while more, and postpone their unplugging to #11680, but I totally might be misremembering. So please someone double check and add a reference here before we unplug them :)

Nop, given we only brought 2To disks, we need to keep them until the next Lizard upgrade.

#24 Updated by intrigeri 13 days ago

  • Assignee changed from intrigeri to bertagaz
  • QA Check changed from Info Needed to Dev Needed

But in principle, extending /boot redundancy across more of or SSDs seems right. Same for installing GRUB on more boot sectors.

So I guess I should partition the new SSDs to have /boot partitions on RAID too?

I'm not sure what you mean exactly: "have" is ambiguous in this context. If you want my input, please clarify what you're proposing.

Should I had this partitions as spares for the md0 RAID volume,

I've suggested we improve redundancy, which is not addressed by adding spares, so no.

or create a new RAID?

I don't understand what we would do with this new RAID array.

IIRC RAID1 cannot have more that two partitions,

I think it can. Thankfully it takes a few minutes to check locally with file-backed or tmpfs throw-away storage.

If possible, I would add one active device per newly added disk to the existing /dev/md0 RAID1, which will give us redundancy.
If not possible, I would replace one of the currently active devices with one backed by one of the new disks (to limit the consequences of identical drives failing at the same time), turn the demoted active device into a spare, and add a 2nd spare backed by the other new disk, i.e. in the end we would have 2 active devices and 2 spares.

#25 Updated by bertagaz 12 days ago

intrigeri wrote:

I think it can. Thankfully it takes a few minutes to check locally with file-backed or tmpfs throw-away storage.

After a bit of researches, I confirm it can. So I just added the two new drives 1st partitions to the md0 array. I still need to install grub on their boot sectors.

#26 Updated by bertagaz 12 days ago

groente wrote:

So, since the new disks are in place, what do you think about the following approach:

- create a new /dev/md4 with /dev/sdg and /dev/sdh

In progress. I forgot to add the option so there's no resync, so I'm waiting for it to be over (an hour left).

#27 Updated by intrigeri 11 days ago

groente wrote:

well, that oneliner came out all wrong, should've been:

for i in `pvdisplay -m /dev/mapper/md1_crypt |grep 'Logical volume' |awk '{print $3}'`; do pvmove -n $i /dev/mapper/md1_crypt /dev/mapper/md4_crypt;done

It is my understanding that pvmove /dev/mapper/mdX_crypt /dev/mapper/mdY_crypt does the same in one simple command, without the risks of the loop.

#28 Updated by bertagaz 10 days ago

  • Related to Feature #14797: Decide what LVs to host on lizard rotating drives added

#29 Updated by bertagaz 10 days ago

  • Status changed from In Progress to Resolved
  • Assignee deleted (bertagaz)
  • % Done changed from 30 to 100
  • QA Check deleted (Dev Needed)

bertagaz wrote:

In progress. I forgot to add the option so there's no resync, so I'm waiting for it to be over (an hour left).

Done. There's no a new empty VG called spinninglizard for the rotating drives. I've created #14797 to decide what we'll put on it. I've also installed grub on the new SSDs MBR, so this ticket is over now.

Also available in: Atom PDF