Last week, I started seeing the emails from my server: the disk hardware early failure detection system, known as SMART, was beginning to spot errors on a disk in the server that powers this domain (and a few others). I sort of plan for disk failures, but I hate … hate … hate them. It’s not just because of the obvious time it takes to replace the drive, but because even with backups I have NEVER in 24 years of using Linux succeeded in completely cloning, copying, and spinning up on new hardware an old drive.
This time I did it. But it was NOT without drama. Here is a small how to on the following:
- Cloning a hardware disk image to a software disk image;
- Writing the disk image to new hardware and dealing with the consequences;
- Reinstalling the ability to boot the drive (my failed disk was the boot disk on the server)
- Swapping the old and new drives.
Selecting the Correct Disk Hardware
The disk that was failing was a 1TB drive (Western Digital Black series). The first step in cloning the disk is to make sure the replacement hardware has MORE BLOCKS on it than the old drive. Two 1TB drives can nonetheless vary slightly in the total number of actual blocks that can be written to the disk, and blocks are the currency of storage. For instance, imagine two 1 XB disks, where XB is a make-believe unit of disk size. One may have 10000001 blocks, and the other 10000000 blocks. If you try to clone the 10000001 block version to the 10000000 block version, it won’t fit. There are ways to work around this, but it’s just something to be aware of.
How do you find out the block size of your drives and/or their partitions in Linux? One quick way is:
8 16 976762584 sdb
8 0 976762584 sda
Here, “sda” was my failing disk and “sdb” was my replacement. Same block size. Good to go. In fact, the partition I needed to save on “sda” was smaller than the total drive size, so everything looked good.
Cloning the Disk
This is where things got very stressful for me. My first instinct was to shut down the server and remove the failing drive. I loaded it into an external disk enclosure that can be plugged into any other computer using USB. This I did. I had only one machine in the house with enough disk space to hold the 1TB partition clone. That disk enclosure is about 10 years old and actually turned out the be the villain in my story, but let me just explain the cloning process so you have the recipe. Then I’ll tell you about my woes.
I used ddrescue to do the cloning. The manual page for ddrescue has excellent examples of how to use it, so I’ll just quote from that. I wanted to clone /dev/sda1 only off the failing drive, so once affixed to the computer I ran this:
ddrescue -f -n /dev/sda1 /path/to/output/sda1_clone.img /path/to/output/sda_clone1.log
ddrescue -f -r3 /dev/sda1 /path/to/output/sda1_clone.img /path/to/output/sda_clone1.log
The first command runs with the “no scrape” option (“-n”), which is more gentle on the disk. This is important if the hardware is dangerously close to total failure. The second command only tries to scrape the failing blocks found in the first pass. The log file is ESSENTIAL to this process, as it records the progress from each step so, if you have to restart the command, it knows to pick up where it left off (or only to copy problematic sections of the disk).
Now you have saved to disk a copy of the drive. Its important to deal with the failing hardware as little as possible. In fact, to be truly safe (if you have enough disk space) you should COPY the COPY. That way, if you have to do repairs on the disk image and those repairs go bad, you have an archive copy of the image to restore and try again, without needing the disk hardware to be copied again. (For example, if the disk is so damaged that parts of it cannot be rescued, you might run a file restoration program on the disk image to try to recover damaged files; this might do more damage than good, so you might want to start again from the un-repaired disk image and try a different repair program).
Remember how I earlier said that you might have to be careful about writing the disk image to new hardware that may not have enough space? What if you have to save a 1TB drive, but only have a 0.5TB drive lying around to replace it? In my case, while I have 1TB of total storage, I am only using about 25% of that disk. Tools like ddrescue copy every block in the partition – even empty ones! One way to avoid this feature is to shrink the partition first, then copy it. On Linux, you can do this with the GParted graphical disk management tool.
In principle, it’s supposed to be possible to do this on the disk image you make. You can associate the disk image with a loopback device and then run GParted on the loopback device. For example:
losetup -f /path/to/output/sda1_clone_copy1.img
losetup -l # prints all loopback devices - find your image in the list!
In principle, you can now shrink the partition(s) in the image and write the changes, which should result in a small image file that can now be written to a smaller hardware disk. I didn’t do this, but there is no reason it should not work.
Side Story: The Meddlesome USB Disk Enclosure
The main villain in the story turned out to be a bad piece of hardware that WASN’T the failing drive, but rather and old USB external disk enclosure. A SATA drive can be plugged into the enclosure with a pair of cables (one for power, one for data) and then connected via USB to a computer. The first time I tried to clone the drive, I pulled it from the server and plugged it into this enclosure. I then did the rescue work on a different machine. The cloning of the partition appeared to go fine, but when I then tried to inspect or mount the partition image, I repeatedly got errors about “bad magic numbers” and “bad superblocks” in the partition.
At first, I freaked out. I assumed that the disk had completely corrupted. But then I plugged it back into the server and booted the machine … and the disk came up fine! Since I had access to the disk at that moment, I ran ddrescue on the drive while it was up … which is risky, but I was desperate. I mounted the remote computer storage area where I was holding the disk images using SSHFS and ddrescue’d the boot drive straight across the network to the remote disk. This took about 6 hours, but it was done by the time I awoke the next morning.
I then used the enclosure to mount the new hardware on the machine holding the disk image. I waited 7 hours for the image to write to the hard drive, then tried to move the drive back into the server. The server again complained that the disk partitions were corrupted or damaged. It was at this time that I took stock of the common denominator in every failure: the enclosure. I set it aside and borrowed a newer external disk enclosure from my desktop PC, where it holds scratch disk storage for projects. I used THAT enclosure to hold the new disk, write the disk image again (7 hours), and then tried to transport the disk back to the server. This time, the server had NO trouble seeing the disk partition information. All was well.
The lesson: mind your hardware – it, too, can be a weak link in the rescue chain.
Writing the Image to New Hardware
It’s time to setup the new disk so it’s ready to receive the disk image. I began by running GParted and partitioning the new disk with one large 1TB primary partition, formatted for EXT4. The latter step is probably not necessary, but I did it just to be thorough.
It was time to write the disk image. Let’s say the new disk and its blank partition are at /dev/sde1. To write the image:
ddrescue -f /path/to/output/sda1_clone_copy.img /dev/sde1 /path/to/output/sda1_write.log
Once it’s written, you should check the file system (there are always issues at this stage that are easily fixed). This is again straight from the example in the ddrescue manual:
e2fsck -f -v /dev/sde1
Do all the fixes it suggests. At this point, you should reopen the disk in GParted and do the following things:
- Make sure the data fills the disk; I found that even though the disk image was 1TB, the write results in a partition that THINKS its only as big as the amount of data stored in the image. I expanded the partition to fill the available space on the new disk.
- Make sure the boot flag is set for this partition, if you intend it to be bootable. This was my boot disk, so this was essential.
Now it’s time to give this new disk partition a unique UUID. Mine had the EXACT SAME disk UUID as the original disk partition from which it was cloned. Since I needed these to be truly distinct disks that could both coexist on the same server for a period of time (while I made sure all data was truly copied over and intact), my new disk needed a new UUID:
tune2fs -U random /dev/sde1
Take note of the UUID that is created and stored on the partition. If it doesn’t print out, you can look it up using the blkid command, e.g. blkid /dev/sde1.
Now it’s time to update some of the data on the new disk. In particular, I needed to edit the /etc/fstab file on the new disk to set the UUID of the root partition (e.g. which is mounted to “/“) to be the one I just created. Otherwise, it would try to mount the old broken disk (whose UUID was previously listed in /etc/fstab). I edited the fstab file and updated the UUID of the disk partition to be mounted as “/“.
Finally, it’s time to make this disk actually bootable again. This involves reinstalling GRUB into the master boot record, so the drive knows what to execute when offered the chance to boot the computer. For this step, you need to run the GRUB reinstall from INSIDE the new disk partition.
Begin by mounting the new disk partition on your machine:
mount /dev/sde1 /mnt
Now you need to make sure that, temporarily, the /proc, /sys, and /dev disk areas point to the corresponding areas on /mnt/:
mount –bind /dev /mnt/dev
mount –bind /proc /mnt/proc
mount –bind /sys /mnt/sys
Now use “chroot” to change the root directory to be the file system mounted at /mnt:
chroot /mnt /bin/bash
Now to fix the GRUB configuration to boot the correct disk before reinstalling GRUB on this disk. The problem is the old /boot/grub/grub.cfg file has the record of the original disk’s UUID encoded; you need to edit that file and search-and-replace the old UUID with the new one.
Once you have done that, it’s time to reinstall GRUB:
grub-install --recheck /dev/sde
Exit the chroot (type “exit” and hit enter), then unmount /proc, /sys, and /dev (we don’t want to mess up this computer by leaving its disk areas pointing to the new partition!). Now unmount the partition from /mnt. This disk is ready to be put back in the server.
Restarting the Original Server
I moved the new hard drive into the SATA bay once occupied by the old, failing hard drive. I moved the failing hard drive to an external USB disk enclosure (a good one!). I turned on the server and it booted! After that, I verified that any files that had been updated in the time I was messing around with writing the disk image to the new hardware were properly copied over from the old disk to the new one (if anything went wrong, always have the disk image to start from again!). So far, so good. The new disk is the main boot disk and the old disk is parked in a USB enclosure in case there is something that needs to be pulled from it.
This was deeply satisfying.