So yesterday – inspired by Chris Gerhard – I installed my own homebrew take a snapshot every 5 minutes script on my shiny new home NAS box. Everything has been going swimmingly, so I decided to cut-over from my old striped-external-disk solution on the iMac, to the new 2.5Tb RAID-Z on the Solaris box.
After all, what could possibly go wrong? So: yesterday I copied-over the entire iMac data store – nearly 500Gb – to the AMD’s ZFS filesystem and snapshotted it for good measure, independent of the script.[0] Then – and this is where I really goofed – I erased and reformatted the old iMac RAID array. So it was karma this morning when I awoke to find a moderately wedged Solaris box, and skimmed through the /var/adm/messages file to find: (heavily edited for repetitititition)Sep 13 05:33:15 suzi scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@d,1/ide@0 (ata4):
Sep 13 05:33:16 suzi scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@d/ide@1 (ata3):
Sep 13 05:33:15 suzi timeout: abort request, target=0 lun=0
Sep 13 05:33:16 suzi timeout: abort request, target=0 lun=0 Sep 13 05:33:16 suzi scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@d/ide@0 (ata2):
Sep 13 05:33:16 suzi timeout: abort request, target=0 lun=0 Sep 13 05:33:16 suzi gda: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@d,1/ide@0/cmdk@0,0 (Disk2):
Sep 13 05:33:16 suzi Error for command ‘write sector’ Error Level: Informational
Sep 13 05:33:16 suzi gda: [ID 107833 kern.notice] Sense Key: aborted command
Sep 13 05:33:16 suzi gda: [ID 107833 kern.notice] Vendor ‘Gen-ATA ‘ error code: 0x3 Sep 13 05:33:16 suzi gda: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@d/ide@1/cmdk@0,0 (Disk1):
Sep 13 05:33:16 suzi Error for command ‘write sector’ Error Level: Informational
Sep 13 05:33:16 suzi gda: [ID 107833 kern.notice] Sense Key: aborted command
Sep 13 05:33:16 suzi gda: [ID 107833 kern.notice] Vendor ‘Gen-ATA ‘ error code: 0x3 Sep 13 05:33:16 suzi gda: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@d/ide@0/cmdk@0,0 (Disk4):
Sep 13 05:33:16 suzi Error for command ‘write sector’ Error Level: Informational
Sep 13 05:33:16 suzi gda: [ID 107833 kern.notice] Sense Key: aborted command
Sep 13 05:33:16 suzi gda: [ID 107833 kern.notice] Vendor ‘Gen-ATA ‘ error code: 0x3 Sep 13 05:40:06 suzi fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-GH, TYPE: Fault, VER: 1, SEVERITY: Major
Sep 13 05:40:06 suzi EVENT-TIME: Thu Sep 13 05:40:06 BST 2007
Sep 13 05:40:06 suzi PLATFORM: System Product Name, CSN: System Serial Number, HOSTNAME: suzi
Sep 13 05:40:06 suzi SOURCE: zfs-diagnosis, REV: 1.0
Sep 13 05:40:06 suzi EVENT-ID: 5ad39b52-c2e7-6d53-b937-d43694ed2568
Sep 13 05:40:06 suzi DESC: The number of checksum errors associated with a ZFS device
Sep 13 05:40:06 suzi exceeded acceptable levels. Refer to http://sun.com/msg/ZFS-8000-GH for more information.
Sep 13 05:40:06 suzi AUTO-RESPONSE: The device has been marked as degraded. An attempt
Sep 13 05:40:06 suzi will be made to activate a hot spare if available.
Sep 13 05:40:06 suzi IMPACT: Fault tolerance of the pool may be compromised.
Sep 13 05:40:06 suzi REC-ACTION: Run ‘zpool status -x’ and replace the bad device.
…which was deeply scary. What was more scary to me was the output of zpool status:
pool: tank
state: ONLINE
status: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
[…] errors: Permanent errors have been detected in the following files:
<metadata>:<0x305>
tank/local/alecm@quick.20070913.053500:<0x0>
Let me put it to you like this: 15 years of my life are on those spindles – and for that moment they contained the only copy. There was no backup from which to restore.
I didn’t panic – I’m not given to panic in such circumstances – but I downed the machine and thought about it very hard. And swore, rather a lot. Especially about the lack of coffee. Zpool was calling the tank unmountable, because three drives were missing and two “degraded”, and there were not enough “replicas” to get the pool back up. Start with the basics: I swapped cables around, checked the SATA power-leads were properly seated (I had had one loosen, before). Cutting a long story short, the next few hours of research were deeply fraught, with multiple reboots, testing hypotheses that the motherboard’s SATA controller might be fried, or that a particular disk might be fried because it was making ticking noises (it’s useful, having a stethoscope) – and so forth. Particularly worrying was the “cascade” failure, that three disks seem to have died at the same time, although only one of them was ticking curiously, and what really confused was that the names of faulty disks in /var/adm/messages seemed to bear no relationship to the ones which zpool was reporting at fault. In the end I bet each-way on two possibilities, and fixed both:- that the daisychain powerlead driving three disks in a bank, was at fault
- that the SATA data leads were too close/tied together, and crosstalk under heavy load was causing ‘sense errors’
…at least it gave me hope that data could be recovered without loss before having to rebuild the pool.
So once zpool could see all the disks again, I used zpool clear tank and rebooted. And eventually it would hang. And then I did rebooted again. And hang. And again. And again. This was overkill, but inbetween each reboot I had time to rsync a little more data onto the reformatted iMac RAID system; and gradually, miracle of all miracles, I built up a copy of all the unreplicated data. So now I had a corrupt zpool, and nothing to lose; so I could afford to experiment. Noting that the corrupt metadata was pertinent to a (inherently readonly) snapshot:tank/local/alecm@quick.20070913.053500:<0x0>
– I destroyed all the snapshots on the machine, leaving only the filesystems. Then I iterated:
- zpool scrub tank
- zpool clear tank
- lather, rinse, repeat…
22:14:43 suzi:~ $ zpool status -x
all pools are healthy
I am still gonna rebuild the pool – not taking chances – but the data is safe and I can take time to hammer the pool thoroughly before committing my archive data to it as a primary store. First thing I shall do is reinstate the snapshot script. So why do I love ZFS? Well I can guarantee I have not lost any data to corruption, or indeed any data at all. No files missing, none with bad blocks. If there were, it would have told me. And I managed to salvage the situation. So why hate it? Well if the documentation – and the tool itself – tells you “action: Destroy the pool and restore from backup.” then you will pretty obviously assume you are screwed. Not merely is that defeatist. It is (potentially) wrong – so if you get stuck in this situation, I recommend you consider having a go at working around it. It worked for me. —
[0] Which I am beta-testing and will release once I am happy.
[1] Chatting with Chris Gerhard he confirmed with some research: SATA sucks for crosstalk, the cables are unshielded, under no account tie them together “to improve airflow” because it’ll bugger your system. Keep the data leads away from the power leads, too…
Leave a Reply