FileVault at Fault

Well, that was an entertaining four hours.

I re-orged my e-mail filestore – I use the NMH mail handler as a backend to EXMH – and at a conservative estimate linked, renamed and unlinked some 30,000 files making a total of 90,000 file operations.

On Solaris – or on MacOS’s BSD filesystem (apparently deprecated, and lacking crypto) – this would have been fine.

However: at some point in these operations my HFS+ FileVault volume’s journal got borked, with the result that when I next logged out-and-in I got the following in the log:

Nov 11 15:29:36 localhost kernel: jnl: journal start/end pointers reset! (jnl 0x1594b4c; s 0x167200 e 0x167200)

…and poof! – my home filestore had vanished in its entirety; no desktop, no backdrop motorbike picture, no nothing. Nada.

This is the tale of how I fixed it.

For those not in the know, FileVault stores your encrypted home directory in a volume/file named username.sparseimage in your real homedir; this can be accessed as a plain file when you are logged out.

What I did not appreciate is the MacOS mechanism for “hiding” those files when you log-in; what appears to happen is (for user fred):

  1. /Users/fred is renamed to /Users/.fred
  2. the /Users/.fred/fred.sparseimage device is opened
  3. the volume “fred” on that device is mounted as /Users/fred
  4. the login process proceeds as normal.

…and logout simply reverses this process.

It appears that what had happened was that the volume (ie: filesystem, not the encrypted block device) got corrupted and would not mount upon login. In this situation MacOS simply gave up on the sparseimage, created me a whole new, blank home directory and dropped me into it, leaving the half-baked /Users/.alecm intact.

Clicking on the sparseimage file failed to open it – I got the worrying message that it “contained no mountable filesystems”; DiskUtility told a different story, and (after verification) mounted it quite happily, although no amount of Verify-and-Repair cycles seemed to create something that was mountable via Finder.

The journal would just not repair.

This is something I’ve noticed from previous filesystem corruption under MacOS – having journalling enabled seems to override all ability to check and repair, and you seem to need to switch it off to get the volumes repaired, after which you can re-enable.

I messed with the situation for ages – tried putting the “repaired” sparseimage file back, but it was not “repaired” enough:

Nov 17 16:18:48 localhost kernel: jnl: journal start/end pointers reset! (jnl 0x1594c88; s 0x312e00 e 0x312e00)
Nov 17 16:30:03 localhost kernel: jnl: write_journal_header: error writing the journal header!
Nov 17 16:30:45 localhost kernel: jnl: journal start/end pointers reset! (jnl 0x1594c88; s 0x312e00 e 0x312e00)

…and so I eventually gave up on that approach.

The final solution I hit upon was to:

  • plug-in a firewire hard-drive
  • copy the old alecm.sparseimage file to it – drag-‘n’drop 17Gb, copied in 17 minutes
  • erase the big file from my iBook’s harddisk
  • remove the spurious /Users/.alecm directory that had been left while diagnosing the problem
  • disable and then re-enable FileVault for my account – a rapid process given the empty homedir with which I was working
  • use DiskUtility to mount the firewire copy of the sparseimage file
  • drag’n’drop all the contents of that image into a new subdirectory of my new FileVault’ed homedir – 3.5hrs copying
  • shuffle the files back into their original locations.

…and it’s worked nicely. No files seem to have been lost, and the system is running much faster, I suspect due to the defragmentation effects inherent to the process.

I may do this in future as a matter of course.

Overall I am suprised at the apparent amateurishness of the approach; I suppose that not using an overlay mount but merely renaming directories would permit installation of both online and offline “.forward” files and so forth, and anything that chdir’s into thar directory whilst I am logged-out is likely not to get confused by my logging-in.

Incidentally, when I write above about MacOS “giving up”, I mean it quite literally:

Nov 17 15:33:09 localhost loginwindow[4432]: Error mounting home directory disk image (99).
Nov 17 15:59:59 localhost loginwindow[4549]: Failed to stat homedir: sleeping 1 second: chdir returns -1 for /Users/alecm: Attempt 1
Nov 17 16:00:00 localhost loginwindow[4549]: Failed to stat homedir: sleeping 1 second: chdir returns -1 for /Users/alecm: Attempt 2
Nov 17 16:00:01 localhost loginwindow[4549]: Failed to stat homedir: sleeping 1 second: chdir returns -1 for /Users/alecm: Attempt 3

Nov 17 16:00:17 localhost loginwindow[4549]: Failed to stat homedir: sleeping 1 second: chdir returns -1 for /Users/alecm: Attempt 19
Nov 17 16:00:18 localhost loginwindow[4549]: Failed to stat homedir: sleeping 1 second: chdir returns -1 for /Users/alecm: Attempt 20
Nov 17 16:10:09 localhost loginwindow[4549]: [Login.m:2931]: Could not move /Users/.alecm to /Users/alecm

Polling a mount?!? Argh! Come back automounter, all is forgiven.

Comments

6 responses to “FileVault at Fault”

  1. alecm
    re: FileVault at Fault

    ps: it does not appear to possible to switch-off journalling in the sparseimage; DiskUtility greys-out the button.

  2. Manuel The Hammer
    re: FileVault at Fault

    I GOT IT! Same problem, different troubleshoot: Nothing was recoverable via the system suggested here, I mounted via Disk utility the damn sparse image damaged, used Data Rescue X (a very nice suit for data recovery) and lost only 8 of over 15k files! I fell in Heaven!

  3. Michael Willis
    re: FileVault at Fault

    I have tried your rescue plan, but the ‘use DiskUtility to mount the firewire copy of the sparseimage file’ step didn’t work . . . Tried logging in with the same username, and as a completely different ‘Administrator’. I wasn’t sue about the ‘remove the spurious /Users/.alecm directory that had been left while diagnosing the problem’ either – where is it located? Any advice please? The experience did prompt me to get a new drive and backup software though! regards Michael

  4. alecm
    re: FileVault at Fault

    disk utility would / did once barf the first time I tried mounting it, and worked the second; eventually it worked. As for the “.alecm” file, I just want to check that you read the above properly, realised that “alecm” is my username, and that the “.alecm” directory referrs to the “hiding” mechanism that I cite, above?

  5. themacjedi
    re: FileVault at Fault

    filevault being turned on will also hose any running program’s preference/log files that are stored within the sparse image if a crash or power outage occurs. having maya, safari, and adium open after the power goes out killed my maya project, not to mention all the settings i had spent hours making work. EVEN when i saved the damn prefs and project before the crash. (as well as hotkeys) filevault is a joke as Apple does not take the smart route and load the users currently in-use files into RAM or another location on disk for later refrence first so that at least the files left on the sparse are not hosed should the system die.

    What they should do is load preferences/project files for currently opening programs from the original file into ram or to some other place, leaving the original untouched. When the program quits or some preference is changed, the new preferences are written back to disk from ram, and the original file marked as obsolete or as “former state” (AND NOT OVERWRITTEN). If a crash or power outage happens, and the last written file/save is corrupted, at least its last good state is still in tact and can be re-loaded. the OS should never keep anything in a volatile state! Thats obviously what its doing now and this is seriously stupid. lucky for me I keep backups of my files myself. but the same cannot be easily done for preferences that must reside in the users home folder (unless they are made aliases on a non encrypted disk?) The same would apply when you hit the save button. microsoft office does something similar and keeps all save states in tact for files so when something blows up, a dialogue could come up and ask if you want the previous state restored. “FIlevault has found corrupted file data because we at apple are idiots and cant handle your priceless data correctly on disk XXXXX, would you like this repaired?” click the “hell yes” button and this completely transparent process benefits the end user by keeping their stuff safe. Apple needs to get on the ball and make it happen.

  6. ApolloX
    FOUND A SOLUTION

    **********FOUND A SOLUTION**********

    I was able to revive the broken FileVault volume even after Apple’s Disk Utility failed entirely.

    The trick was to use Data Rescue II twice! Once, to recover the image from the nearly dead drive, and then a second time on the recovered image file itself. Where I missed this is that using Data Rescue II like this isn’t exactly the most obvious feature. First you need to enter “Expert” mode. Once in expert mode you need to check the button “Show Virtual (.dmg file devices)”. For whatever reason*, my image was in the list and Data Rescue was able to read files in it and recover them!!!! I LOVE DATA RESCUE II!!!!!

    * I’m not sure why the image was visible in list. I’d guess it was either because I tried to read the image with Disk Utility and it made the disk (but not the image) available to other applications -or- I tried opening the img in Data Rescue II (not a useful feature since the data was encrypted but this may have saved the info about the img)

Leave a Reply

Your email address will not be published. Required fields are marked *