10 July 2014

Oh no not again

So, I was doing a routine upgrade of my (very) old laptop the other day. It no longer has a working battery and doesn't quite have the power I want for modern day-to-day stuff, but it's served me very well as a SSH gateway, Subversion server and a place to keep my IRC session idling. Old laptops can make really nice servers since they're typically quiet, draw little power, and come with their own keyboard and monitor.

But I digress.

I was upgrading the packages, and one of those was a kernel update, so I rebooted for the first time in months and...

Remember how I had hard drive problems recently? Yeah.

The CF card I've been using had IO errors. This caused filesystem corruption. This, in turn, meant GRUB couldn't locate the kernel to boot from. I had to open it up and pull the flash card out to see what was going on. And it wasn't pretty.

I can't really expand on the advice I posted in Rescue Me! about backups, because I've been procrastinating on fixing the other terrible hard drive fails I had in April and haven't got that machine restored yet; never mind finally finishing my Ultimate Backup Final Solution. Instead, let's just briefly follow the immediate steps I'm taking to rescue what I can, and hope that nothing else breaks while I'm typing this paragraph. Seriously, what is with hardware I own?! Is my house experiencing unusually large Neutrino flux? Was it built on an ancient UNIX burial ground?

Making an image

At first, I mounted the filesystem directly from the card OK, and did a btrfs scrub. In hindsight, what I should have done as the very first step is make an image, because later on things would fail further.

I install ddrescue (via the gddrescue package on Ubuntu). Be aware that its command-line arguments aren't identical to the venerable old dd; it takes an input file, output file, and optional 'log' file in that order.

james@yang(): /mnt/touro1/@image
$ sudo ddrescue /dev/sdd ./sumomo-cf32-20140710.img ./sumomo-cf32-20140710.img.ddrescuelog

GNU ddrescue 1.16
Press Ctrl-C to interrupt
rescued:    17241 MB,  errsize:  16781 kB,  current rate:        0 B/s
   ipos:    17258 MB,   errors:       1,    average rate:    5538 kB/s
   opos:    17258 MB,     time since last successful read:    13.5 m
Copying non-tried blocks...
Interrupted by user

I interrupted it because just over half-way through making the image, it hits hard IO errors that the kernel decides are unrecoverable, and the device gets removed. What now? Well, if we're lucky, I copied enough that we can still access the 2nd partition of the three that were on there, and get at my data. But how to do that? We've made an image of the entire device, MBR and partitions and everything. Is there a way to read it without going the obvious route of writing it to some USB flash drive first?

Of course there is. This is Linux, and an image file is not really much different from a device file like /dev/sdd; we just need to point our tools at the file we created.

james@yang(): /mnt/touro1/@image
$ sudo gparted ./sumomo-cf32-20140710.img.fucked.test
[sudo] password for james: 
libparted : 2.3
Cannot have a partition outside the disk!

Well, okay, maybe some of our tools are assuming that the data we are feeding it has at least some internal consistency. GParted is a nice tool, but evidently it doesn't like only having half a drive to work with. Similarly, I try cfdisk, and it doesn't like it either. What else can we do?

Epic Flying Mount

A bit of research later, and I can construct a mount option that gets us to the partition we want.

james@yang(): /mnt/touro1/@image
$ sfdisk -l -uS ./sumomo-cf32-20140710.img.fucked.test
Disk ./sumomo-cf32-20140710.img.fucked.test: cannot get geometry

Disk ./sumomo-cf32-20140710.img.fucked.test: 2105 cylinders, 255 heads, 63 sectors/track
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
./sumomo-cf32-20140710.img.fucked.test1          2048   1998847    1996800  83  Linux
./sumomo-cf32-20140710.img.fucked.test2   *   1998848  32718847   30720000  83  Linux
./sumomo-cf32-20140710.img.fucked.test3      32718848  63438847   30720000  83  Linux
./sumomo-cf32-20140710.img.fucked.test4             0         -          0   0  Empty

SFDisk is a lower level tool that can get us the partition info we need from the MBR. It doesn't work on GPT, so you'll have to wait until I (inevitably) have a failure on an GPT disk for me to post info on doing that! The -l option lists the partitions, and -uS sets the units to 'sectors', 512 bytes each. The 2nd partition, hopefully still present on this image, starts at sector 1998848. Multiplying that by 512 gets us:-

$ echo $((512 * 1998848))

Heh, did you know you could do simple maths right there in the shell? Anyway, we can use this byte offset as an option to the mount command! Since I need my btrfs mount to work despite its second half not being available, I include the 'degraded' flag.

james@yang(): /mnt/touro1/@image
$ sudo mount -t btrfs -o degraded,ro,loop,offset=1023410176 ./sumomo-cf32-20140710.img.fucked.test /mnt/tmp1/

Well, did it work...?

james@yang(): /mnt/tmp1
$ ls
@/  fstab.gud  @home/  isos/  @sumomo/

james@yang(): /mnt/tmp1
$ ls @/
bin/   btrfs/  dev/  home/       isos/  machine  mnt/  proc/  run/   selinux/  sshfs/   sys/       tmp/  var/
boot/  cifs/   etc/  initrd.img  lib/   media/   opt/  root/  sbin/  srv/      sumomo/  thanatos/  usr/  vmlinuz

Oh thank fuck. Some of the files are in fact corrupt; I know this because btrfs keeps checksums of all the data. Happily, keeping an eye on /var/log/syslog while copying files out of the loop mount didn't show any errors for the really important stuff.

I am, however, one more machine down. Is there a finite amount of brokenness that I have to maintain, else more hardware will randomly break on me? Who knows! Hopefully future posts will be able to focus more on prevention than scrambling to pull data out of a burning building, but we'll see.

No comments:

Post a Comment