Well, that was an entertaining four hours.
I re-orged my e-mail filestore – I use the NMH mail handler as a backend to EXMH – and at a conservative estimate linked, renamed and unlinked some 30,000 files making a total of 90,000 file operations.
On Solaris – or on MacOS’s BSD filesystem (apparently deprecated, and lacking crypto) – this would have been fine.
However: at some point in these operations my HFS+ FileVault volume’s journal got borked, with the result that when I next logged out-and-in I got the following in the log:
Nov 11 15:29:36 localhost kernel: jnl: journal start/end pointers reset! (jnl 0x1594b4c; s 0x167200 e 0x167200)
…and poof! – my home filestore had vanished in its entirety; no desktop, no backdrop motorbike picture, no nothing. Nada.
This is the tale of how I fixed it. For those not in the know, FileVault stores your encrypted home directory in a volume/file named username.sparseimage in your real homedir; this can be accessed as a plain file when you are logged out. What I did not appreciate is the MacOS mechanism for “hiding” those files when you log-in; what appears to happen is (for user fred): …and logout simply reverses this process. It appears that what had happened was that the volume (ie: filesystem, not the encrypted block device) got corrupted and would not mount upon login. In this situation MacOS simply gave up on the sparseimage, created me a whole new, blank home directory and dropped me into it, leaving the half-baked /Users/.alecm intact. Clicking on the sparseimage file failed to open it – I got the worrying message that it “contained no mountable filesystems”; DiskUtility told a different story, and (after verification) mounted it quite happily, although no amount of Verify-and-Repair cycles seemed to create something that was mountable via Finder. The journal would just not repair. This is something I’ve noticed from previous filesystem corruption under MacOS – having journalling enabled seems to override all ability to check and repair, and you seem to need to switch it off to get the volumes repaired, after which you can re-enable. I messed with the situation for ages – tried putting the “repaired” sparseimage file back, but it was not “repaired” enough: …and so I eventually gave up on that approach. The final solution I hit upon was to: …and it’s worked nicely. No files seem to have been lost, and the system is running much faster, I suspect due to the defragmentation effects inherent to the process. I may do this in future as a matter of course. Overall I am suprised at the apparent amateurishness of the approach; I suppose that not using an overlay mount but merely renaming directories would permit installation of both online and offline “.forward” files and so forth, and anything that chdir’s into thar directory whilst I am logged-out is likely not to get confused by my logging-in. Incidentally, when I write above about MacOS “giving up”, I mean it quite literally: Polling a mount?!? Argh! Come back automounter, all is forgiven.
Nov 17 16:18:48 localhost kernel: jnl: journal start/end pointers reset! (jnl 0x1594c88; s 0x312e00 e 0x312e00)
Nov 17 16:30:03 localhost kernel: jnl: write_journal_header: error writing the journal header!
Nov 17 16:30:45 localhost kernel: jnl: journal start/end pointers reset! (jnl 0x1594c88; s 0x312e00 e 0x312e00)
Nov 17 15:33:09 localhost loginwindow[4432]: Error mounting home directory disk image (99).
Nov 17 15:59:59 localhost loginwindow[4549]: Failed to stat homedir: sleeping 1 second: chdir returns -1 for /Users/alecm: Attempt 1
Nov 17 16:00:00 localhost loginwindow[4549]: Failed to stat homedir: sleeping 1 second: chdir returns -1 for /Users/alecm: Attempt 2
Nov 17 16:00:01 localhost loginwindow[4549]: Failed to stat homedir: sleeping 1 second: chdir returns -1 for /Users/alecm: Attempt 3
…
Nov 17 16:00:17 localhost loginwindow[4549]: Failed to stat homedir: sleeping 1 second: chdir returns -1 for /Users/alecm: Attempt 19
Nov 17 16:00:18 localhost loginwindow[4549]: Failed to stat homedir: sleeping 1 second: chdir returns -1 for /Users/alecm: Attempt 20
Nov 17 16:10:09 localhost loginwindow[4549]: [Login.m:2931]: Could not move /Users/.alecm to /Users/alecm
Leave a Reply