Friday, 16 May 2014

Linux software mirrored raid - swap out drives

I have a home server running CentOS that acts as a VM host and file server. It has a simple raid 1 (mirrored) configuration for the file system which holds some VM images and other important data. It's pretty handy to have this at home as I can just spin up a VM (Windows or Linux) and try out things without having to affect my desktop machine.

The data on the RAID sans the VMs is also backed up to CrashPlan's cloud. The RAID is not my data backup :)

A couple of weeks ago I noticed that the hard drives in the raid had been powered on for > 27000 hours (more than 3 years of runtime). These hard drives are just consumer level SATA drives, so it was time to swap them out for some newer ones. Here's the process that I followed:

  • Buy a new 3TB WD Red drive
  • Plug it into the home server
  • Apply high LCC fix
  • Partition the new drive
  • Add the new drive to the RAID set
  • Resync to the new drive
  • Buy a second new 3TB WD Red drive
  • Remove both old drives from the RAID set
  • Plug the second new drive into the home server, remove the two old drives
  • Apply high LCC fix again
  • Partition the second new drive
  • Add the second new drive to the RAID set
  • Grow the RAID set array size to the full size of the new disks
  • Grow the filesystem to the full size of the new RAID set
  • Wipe data on both old drives

Buy a new 3TB WD Red drive
I chose to use WD Reds because they're designed for a NAS environment (not too different to my usage of the drives) - the drives are meant to be switched on all the time. They also have a higher MTBF.

Plug it into the home server
I plugged the drive into one of the 6G SATA ports on my server's motherboard and booted up the home server.

Apply high LCC fix
There's a known issue with some models of WD Reds where their load cycle count increases because the drive spins down too frequently. WD has a fix for this available here. This fix needs to be applied while the disk is not in use, so I decided to do it before partitioning. I downloaded the 64 bit linux build (wd5741x64) and applied it to the drive. The tool reported that no firmware upgrade was necessary.

Partition the drive
Booting into Linux, the drive wasn't initialised so I used parted (available from the EPEL repo) to write a GPT signature to the drive, create a partition and allocate all space to it, and then mark it as a raid partition:

# parted -a optimal /dev/sda
GNU Parted 2.1
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel gpt
(parted) mkpart
Partition type?  primary/extended? primary
File system type?  [ext2]?
Start? 2048s
End? -1 
(parted) set 1 raid on

Run an alignment check to make sure the partition starts on a physical sector boundary:

(parted) align-check
alignment type(min/opt)  optimal/[minimal]? optimal
Partition number? 1
(parted) 

It seems a bit strange that the align-check command doesn't return anything if the alignment check succeeded, but it looks like that's intended behaviour - I tried creating a partition on a non-aligned sector boundary later on and it returned a warning message.

Now check that everything looks okay:

(parted) print
Model: ATA WDC WD30EFRX-68E (scsi)
Disk /dev/sda: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  3001GB  3001GB                     raid

(parted) quit

Add the drive to the RAID set
My raid set is /dev/md0 so I ran:

mdadm --add /dev/md0 /dev/sda1

Mdadm adds the drive as a hot spare, so now you need to tell it to use the new drive as an active drive:

mdadm --grow --raid-devices=3 /dev/md0

Resync to the new drive
This took approximately 5 hours. You can do a cat /proc/mdstat to see progress (the text below is just an example):

[root@nyamo crypt]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[2] sda1[3]
      2930133512 blocks super 1.2 [2/2] [UU]
      [================>....]  resync = 84.7% (2482257528/2930133512) finish=138.7min speed=53785K/sec

unused devices: <none>

Buy a new 3TB WD Red drive
A week later I went to the local computer store and bought a second drive - the idea is that you get the same model of drive, but from a different batch. This decreases the likelihood of both drives simultaneously failing since the drives will be sufficiently different.

Remove both old drives from the RAID set
I reduced the raid size back to 2 and took the old drive which was no longer an active device out of the RAID set:

# mdadm --grow /dev/md0 --raid-devices=2
# mdadm --remove /dev/md0 /dev/sdb1

Plug the second new drive into the home server, remove the two old drives
I powered off my home server, removed the two old hard drives and plugged into the second new WD Red into another 6G SATA port.

Apply high LCC fix again
After booting up the home server again,  I repeated the same steps as outlined previously, and the tool reported that no firmware upgrade was necessary.

Partition the second new drive
Same as before, but on a different drive:
# parted -a optimal /dev/sdb
GNU Parted 2.1
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel gpt
(parted) mkpart
Partition type?  primary/extended? primary
File system type?  [ext2]?
Start? 2048s
End? -1 
(parted) set 1 raid on
(parted) align-check
alignment type(min/opt)  optimal/[minimal]? optimal
Partition number? 1
(parted) print
Model: ATA WDC WD30EFRX-68E (scsi)
Disk /dev/sda: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  3001GB  3001GB                     raid
(parted) quit

Add the second new drive to the RAID set.
The RAID set has started up and reports that there's one active device and the RAID is degraded. Adding the new drive to the RAID set triggers a resync:

# mdadm --add /dev/md0 /dev/sdb1

Which then takes hours to complete. Doing a cat /proc/mdstat shows progress:

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[2] sda1[3]
      2930133512 blocks super 1.2 [2/2] [UU]
      [=================>...]  resync = 85.4% (2502889848/2930133512) finish=148.3min speed=47998K/sec

unused devices: <none>

Grow the RAID set array size to the full size of the new disks.
My old drives were 2TB drives and the new ones are 3TB, so to use all the available space the RAID set needs to be grown (I ran this using screen and detached as it takes some time):

# mdadm --grow /dev/md0 --size=max

Grow the filesystem to the full size of the new RAID set
Now the file system needs to be grown to use all of the available space in the RAID set (again using screen and it takes longer than the previous command):

# resize2fs /dev/md0

Wipe data on both old drives
This is on my todo list :)

Thursday, 1 May 2014

Enabling the watchdog on a Raspberry Pi (Raspbian)

I have a Raspberry Pi that's connected to a PICAXE which listens for sensor data that's being broadcast from a couple of nodes in my house (one measures mains power consumption, two others measure ambient temperature). In summer the Raspberry Pi gets a bit warm and randomly locks up, so I've enabled the hardware watchdog to automatically reboot it if the system locks up.

A hardware watchdog is usually a watchdog timer: a register which is hooked up to a clock and when either the register overflows or hits zero it triggers a hardware reset. Your system writes to the register periodically to stop it triggering the hardware reset. If for some reason you stopped writing to the register, something went wrong and the system resets.

Here are some short notes on how to enable the onboard watchdog on the Raspberry Pi:

Run the following command to attempt loading the watchdog kernel module:
sudo modprobe bcm2708_wdog

Now run "lsmod" and look for the line in bold below:

Module Size Used by
bcm2708_wdog 3537 0


This verifies that the watchdog module was loaded successfully. Now edit /etc/modules and add bcm2708_wdog to load the module on boot by running the following command:
sudo echo bcm2708_wdog >> /etc/modules

Install the watchdog package:
sudo apt-get install watchdog

Configure starting the watchdog daemon on boot:
sudo update-rc.d watchdog defaults
sudo chkconfig watchdog on

Edit the watchdog daemon's configuration files and tell it to use the watchdog device. Change the line:
#watchdog-device = /dev/watchdog
to
watchdog-device = /dev/watchdog

Start the watchdog:
/etc/init.d/watchdog start


Logging isn't always free

No matter how fancy the interface / facade on top of it looks, a remote procedure call is still a remote procedure call.

I'm currently working on a performance and maintainability improvement project at work: we have a system that has substantial technical debt and it's going into production in a few months. It has been a hard slog but things are starting to look better!

Earlier in the week we ran into an interesting problem: one of our leads had done a great job of improving performance by caching some entities which were constantly pulling out of the database (sometimes a page load as pulling the same row out of the database tens to hundreds of times). We could see that pages were loading faster when running on our local development machines.

I deployed the web app to a standalone webserver and loaded up the app... and it started chugging along.. really slowly...

It turned out that the extra caching was doing a lot of extra logging. Our logger sends data out to a few sinks and one of them was a remote procedure call based system. This RPC was taking about 100msec to complete, and it was running synchronously! Filtering down the amount of logging sent to that sink resulted in the app picking up its pace again.

Fallacies of Distributed Computing - why do you haunt me so often!