The data on the RAID sans the VMs is also backed up to CrashPlan's cloud. The RAID is not my data backup :)
A couple of weeks ago I noticed that the hard drives in the raid had been powered on for > 27000 hours (more than 3 years of runtime). These hard drives are just consumer level SATA drives, so it was time to swap them out for some newer ones. Here's the process that I followed:
- Buy a new 3TB WD Red drive
- Plug it into the home server
- Apply high LCC fix
- Partition the new drive
- Add the new drive to the RAID set
- Resync to the new drive
- Buy a second new 3TB WD Red drive
- Remove both old drives from the RAID set
- Plug the second new drive into the home server, remove the two old drives
- Apply high LCC fix again
- Partition the second new drive
- Add the second new drive to the RAID set
- Grow the RAID set array size to the full size of the new disks
- Grow the filesystem to the full size of the new RAID set
- Wipe data on both old drives
Buy a new 3TB WD Red drive
I chose to use WD Reds because they're designed for a NAS environment (not too different to my usage of the drives) - the drives are meant to be switched on all the time. They also have a higher MTBF.
Plug it into the home server
I plugged the drive into one of the 6G SATA ports on my server's motherboard and booted up the home server.
Apply high LCC fix
There's a known issue with some models of WD Reds where their load cycle count increases because the drive spins down too frequently. WD has a fix for this available here. This fix needs to be applied while the disk is not in use, so I decided to do it before partitioning. I downloaded the 64 bit linux build (wd5741x64) and applied it to the drive. The tool reported that no firmware upgrade was necessary.
Apply high LCC fix
There's a known issue with some models of WD Reds where their load cycle count increases because the drive spins down too frequently. WD has a fix for this available here. This fix needs to be applied while the disk is not in use, so I decided to do it before partitioning. I downloaded the 64 bit linux build (wd5741x64) and applied it to the drive. The tool reported that no firmware upgrade was necessary.
Partition the drive
Booting into Linux, the drive wasn't initialised so I used parted (available from the EPEL repo) to write a GPT signature to the drive, create a partition and allocate all space to it, and then mark it as a raid partition:
# parted -a optimal /dev/sda GNU Parted 2.1 Using /dev/sda Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) mklabel gpt (parted) mkpart Partition type? primary/extended? primary File system type? [ext2]? Start? 2048s End? -1 (parted) set 1 raid on
Run an alignment check to make sure the partition starts on a physical sector boundary:
(parted) align-check alignment type(min/opt) optimal/[minimal]? optimal Partition number? 1 (parted)
It seems a bit strange that the align-check command doesn't return anything if the alignment check succeeded, but it looks like that's intended behaviour - I tried creating a partition on a non-aligned sector boundary later on and it returned a warning message.
Now check that everything looks okay:
(parted) print Model: ATA WDC WD30EFRX-68E (scsi) Disk /dev/sda: 3001GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Number Start End Size File system Name Flags 1 1049kB 3001GB 3001GB raid (parted) quit
Add the drive to the RAID set
My raid set is /dev/md0 so I ran:
mdadm --add /dev/md0 /dev/sda1
Mdadm adds the drive as a hot spare, so now you need to tell it to use the new drive as an active drive:
mdadm --grow --raid-devices=3 /dev/md0
Resync to the new drive
This took approximately 5 hours. You can do a cat /proc/mdstat to see progress (the text below is just an example):
[root@nyamo crypt]# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb1[2] sda1[3] 2930133512 blocks super 1.2 [2/2] [UU] [================>....] resync = 84.7% (2482257528/2930133512) finish=138.7min speed=53785K/sec unused devices: <none>
Buy a new 3TB WD Red drive
A week later I went to the local computer store and bought a second drive - the idea is that you get the same model of drive, but from a different batch. This decreases the likelihood of both drives simultaneously failing since the drives will be sufficiently different.
Remove both old drives from the RAID set
I reduced the raid size back to 2 and took the old drive which was no longer an active device out of the RAID set:
# mdadm --grow /dev/md0 --raid-devices=2 # mdadm --remove /dev/md0 /dev/sdb1
Plug the second new drive into the home server, remove the two old drives
I powered off my home server, removed the two old hard drives and plugged into the second new WD Red into another 6G SATA port.
Apply high LCC fix again
After booting up the home server again, I repeated the same steps as outlined previously, and the tool reported that no firmware upgrade was necessary.
Partition the second new drive
Same as before, but on a different drive:
# parted -a optimal /dev/sdb GNU Parted 2.1 Using /dev/sdb Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) mklabel gpt (parted) mkpart Partition type? primary/extended? primary File system type? [ext2]? Start? 2048s End? -1 (parted) set 1 raid on
(parted) align-check alignment type(min/opt) optimal/[minimal]? optimal Partition number? 1 (parted) print Model: ATA WDC WD30EFRX-68E (scsi) Disk /dev/sda: 3001GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Number Start End Size File system Name Flags 1 1049kB 3001GB 3001GB raid (parted) quit
Add the second new drive to the RAID set.
The RAID set has started up and reports that there's one active device and the RAID is degraded. Adding the new drive to the RAID set triggers a resync:
# mdadm --add /dev/md0 /dev/sdb1
Which then takes hours to complete. Doing a cat /proc/mdstat shows progress:
# cat /proc/mdstat
Personalities : [raid1] md0 : active raid1 sdb1[2] sda1[3] 2930133512 blocks super 1.2 [2/2] [UU] [=================>...] resync = 85.4% (2502889848/2930133512) finish=148.3min speed=47998K/sec unused devices: <none>
My old drives were 2TB drives and the new ones are 3TB, so to use all the available space the RAID set needs to be grown (I ran this using screen and detached as it takes some time):
# mdadm --grow /dev/md0 --size=max
Grow the filesystem to the full size of the new RAID set
Now the file system needs to be grown to use all of the available space in the RAID set (again using screen and it takes longer than the previous command):
# resize2fs /dev/md0
Wipe data on both old drives
This is on my todo list :)
No comments:
Post a Comment