Bug: MacOS NFS Client to NFS/ZFS/GZIP Solaris Server, Error -36

I have a 2.5Tb Solaris (nv71x) box, and the five disks which create the primary pool are joined together in RAID-Z. On that pool I have created five filesystems:

  • default – just a vanilla filesystem
  • copies-2 – set copies=2 for data dittoblocks
  • copies-3 – copies=3
  • compress-gzip – use gzip compression
  • compress-lzjb – use lzjb compression

…and I have exported them all over my gigabit subnet. On the G5 iMac (MacOS 10.4.10) I mount the filesystems by means of the following script:


NFSROOT=/NFS
test -d $NFSROOT || mkdir $NFSROOT || exit 1
chmod 755 $NFSROOT
NFSOPTS="-o soft,intr,nfsv3,resvport,tcp,rwsize=32768,readahead=8"
while read NFSMOUNT
do
MOUNTPOINT="$NFSROOT/`basename $NFSMOUNT`"
test -d $MOUNTPOINT || mkdir $MOUNTPOINT || exit 1
chmod 755 $MOUNTPOINT
mount_nfs $NFSOPTS $NFSMOUNT $MOUNTPOINT || exit 1
done <<EOF
192.168.1.4:/export/compress-gzip
192.168.1.4:/export/compress-lzjb
192.168.1.4:/export/copies-2
192.168.1.4:/export/copies-3
192.168.1.4:/export/default
EOF

…and then opening “/NFS” on my Mac gives me a nice series of volume/mountpoints on my desktop.

In order to try and improve performance, I have edited /System/Library/StartupItems/NFS/NFS and increased the number of nfsiod from 4 to 6, but that is all I have done so far.

Here’s the thing: taking a sizeable test file (a 700Mb Ubuntu “.iso” file) from my desktop and dragging it to each of the volumes in turn, I get differing results:


error-36.png


filesystem success? error
default yes
compress-gzip no aborts after 24Mb, gives Finder “Error Code -36”
compress-lzjb yes
copies-2 yes
copies-3 yes


I shall post some speed-tests later, but for the moment was wondering whether anyone knew what this particular error message might mean? A scan of the Apple website suggests a pile of stuff to do with SMB authentication, but I am presuming that some manner of asynchronous I/O failure would be a better bet?

In the meantime it looks like I should avoid GZIP mountpoints; presumably either the delays are too long, or there is something different showing up in the I/O semantics to a GZIP filesystem?

The funny thing is: the Ubuntu “.iso” file is not really very compressible, so in theory ZFS should have given up and just copied the darned thing…

Update: I just got 70-odd megabytes of the .ISO file to copy to the GZIP partition by doing it whilst a copy to another filesystem was ongoing; when I manually aborted the latter then the former died with “Error -36” again, a few seconds later.

This smells like a time-race.

Comments

13 responses to “Bug: MacOS NFS Client to NFS/ZFS/GZIP Solaris Server, Error -36”

  1. According to http://www.appleerrorcodes.com/ “Error code -36” is actually an IoError.

    Might be worth doing a packet dump and looking at it with Wireshark, see if anything is going astray there or if it’s just something broken in OSX.

  2. Clive

    Hmm.

    I have a friend who’s just built himself a 7.5TB server out of 15 0.5TB discs in a tower case and is happily using OpenSolaris and RAID-Z on it.

    I have another friend who said this weekend that he abhors the idea of conflating partition management, RAID and filesystem because, although it simplifies matters, when it breaks it’ll *really* break. He also noted ZFS is complex and immature and suggested coming back in two years’ time.

    I was tending towards the former view; now I’m not so sure. (-8

  3. I can understand those fears, and of the two the only thing which worries me is the “immaturity” argument.

    From a simplistic perspective a ZFS filesystem (not “pool” !) is little more than a “unix directory on hyper-steroids” so I don’t see them as conflating what we have already; I similarly don’t see the problem with partitioning – not least since I turn over entire raw spindles to ZFS for optimisation reasons, thus one could say I am actually making life more simple than requiring magical little partitions to be spattered hither and thither in order to hold metadata.

    With that in mind it can said I am merely RAIDing entire spindles, and only some godamawful problem in the filesystem bookkeeping is likely to bite me – which is where the maturity argument comes in.

    And I am happy enough to take that risk, and manage it with change control and backups of any seriously critical data.

  4. The only real problem I have with ZFS are the relatively slow directory read/write and file creation/deletion speeds and the fact that it doesn’t have a real (read usable) quota system.

    The ZFS “quota” system is broken by design. Any “quota” system which is merely a filesystem total usage limit and requires the limit to be removed to allow you to be able to delete files to clear the problem is unusable and pointless in a live, multi-user system.

  5. Ya Steve – that (the deletion bug) got fixed.

    How old is your information?

    As for the quota/reservation system, what are you looking for? Hard and soft limits?

  6. Well, it was still there in the fully patched production version of Solaris 10×86 3 months ago when I was doing tests on our new servers.

    As for the quotas, to make it manageable for a large number of users it needs to be per UID. (Sorry, making lots of new filesystems doesn’t cut it, it’s unmanageable if there are hundreds of users and can make NFS “sharing” a pain. Oh, and it makes “df” almost impossible to use on the server.) Soft limits would be nice but not essential.

    Did the ZFS team actually consult sysadmins about the way filesystems are actually used in the real world before they made thier decision about quotas? (Thier “let them eat cake” approach to the critisism doesn’t help either.)

  7. Well it’s probably not to your taste, but filesystems per user is exactly how I would address it; sharing / share management is easy in bulk either by using hierarchies and inherited sharing:

    /home/staff/steve

    /home/students/droid

    zfs set sharenfs=@stafftargets pool/home/staff # steve will inherit

    zfs rename pool/home/students/droid pool/home/staff/droid # on getting a job

    (…etc) – the bigger challenge is that people are not thinking like this, and this is different and therefore it is “a pain”.

    Matters will shift over time.

  8. Of course, the biggest maintenance headache in this case isn’t the initial creation on a day-to-day basis, it’s the complete system restore (when things go terribly wrong, which they will someday) or data migration scenarios.

    With the “monolithic” filesystems, a complete restore (including quota data) would entail merely re-writing the data back onto a clean filesystem. All the ownership, permissions and quota would just re-appear. For the one-filesystem-per-user paradigm this becomes:

    for i in `cat username-data`
    do
    create-zfs-fs $i
    set-zfs-userquota $i
    restore-data $i
    done

    And this assumes that you have a table elsewhere holding what the quota for each user was held on separate storage.

  9. P.S. Of course, if there were a ZFS aware version of tar(1) or some other back-up system which can store and retrieve all this information and build the filesystems then this problem would go away.

    I’ve not seen any yet.

  10. >P.S. Of course, if there were a ZFS aware version of tar(1) or some other back-up system which can store and retrieve all this information and build the filesystems then this problem would go away.

    man zfs(1) – see section on “zfs send” ; yes this is a slowly moving target, but this is the direction in which things are going, and I know people who are using it.

    Your suggestion for “tar” is a good one, though. I’ll ask some folk about storage of attributes…

  11. I’ve been doing more experimentation…

    I’ve found a bit of a show stopper, at least for my use of ZFS:

    Because you have nested filesystems which are all individually exported, to recreate the full filesystem structure on a client it requires that the client is able to mount NFS filesystems on top of mount points on an NFS filesystem.

    Solaris can and will do this, even under automounter (which is the only sane way of managing the huge numbers of potential mounts). Linux, however, will not. The automounter will mount the top-level filesystem but will see no sub-directories and cannot mount any of the subsidiary filesystems.

    I couldn’t see a way in the documentation to tell the ZFS NFS sharing system to treat the whole ZFS filesystem tree as one virtual share. Is it actually possible?

  12. Prod me about it; I will try it on Ubuntu later, if I can find the time. This makes no sense to me, since the parent NFS mount should contain the mountpoints for the sub-mountpoints.

  13. Actually, I’ve just tried manually mounting rather than using /net and it works. However, it still doesn’t help with automounter maps (even for Solaris) as there’s no way to specify nested maps, i.e. maps for NFS mounts under an automounted NFS mount as far as I can tell.

    e.g.
    auto_home (or NIS map auto.home) contains:
    users -rw,intr zfs-server:/disks/home/users #Base ZFS filesystem

    you can’t then have further maps such as this, or can you?

    users/* -rw,intr zfs-server:/disks/home/users/&

Leave a Reply

Your email address will not be published. Required fields are marked *