Articles April 24, 2007 at 8:59 am

The State of rsync on Tiger

10.4.9 brings some welcome improvements to rsync on OS X, but they might not be everything that you're looking for.

An exhaustive look at what is and what doesn't seem to be working with rsync through various iterations of OS X 10.4. Read on for the gory details…

(Ed. Note: It would appear that both versions of the 2007-004 Security Update re-breaks much of the rsync progress. This is corrected by the 2007-005 SecUpd, so be sure to test those patches!)

 

Mac OS X 10.4.9 comes with a new version of rsync, the third one to be officially released by Apple since 10.4.
The release gives us a good opportunity to check whether the previous trend towards reliability and useability has been maintained, as it would be bad to continuously have an incomplete/unreliable out-of-the-box way to preserve our servers' precious data… [1]

To make rsync HFS+ aware, Apple introduced as few changes to rsync's code as possible.
Basically, they are centered around the building (by the sender) and the interpretation (by the receiver) of the list of files to be transferred, and on the ability to pack all "non conventional" attributes of a file in an additional regular file that may be transferred over the wire, then unpacked on the other side.

This way, rsync's protocol is preserved: invoking it without its -E flag allows it to work with any regular rsync server, provided the protocol versions are compatible of course.
Moreover, when invoked with the -E flag, it should prove consistent with what would have been achieved without the flag, just with the addition of all the other OS X bits being carried across.

The aforementioned packing/unpacking capability is provided by a library function, copyfile(), which also allows for rsync to find out about the various pieces related to a given file.
That function in fact is the building block for the HFS+ awareness of other unix file tools too. Having a well-behaved rsync should thus also increase the odds that the other tools will behave correctly as well.

Some small illustrative pieces of code are provided hereafter; they are supposed to be run under following environment:
1. Volume "Data", which is not the boot volume, is a Journaled HFS+ formatted volume.
2. Unless otherwise stated, ACLs are not enabled on the boot volume nor on volume "Data"; when needed, they are explicitely enabled in the sample pieces of code.
3. A copy of SimpleText, the pre-OSX application with an empty data fork and everything in the resource fork, is available at the top level of volume "Data".
4. A copy of command line tool "xarg" is available at the top level of volume "Data", and the Developer tools are installed.
5. The working directory is /Volumes/Data.

10.4 and rsync

The Darwin projects for 10.4.0:
    Libc-391
    rsync-20 (based on rsync 2.6.3)

Apple's changes to rsync's original code are conveniently summarized in two .diff files, EA.diff being the one related to the handling of extended attributes.

The truth is that rsync was almost unuseable with the -E flag: when applied to a sufficently large set of files, the likelihood of a crash was nearing 100%.

Moreover, even with smaller transfers, the results proved rather erratic, and it was often needed to run rsync twice or thrice on the same data, so as to have all files (or all of their parts) transfered.

As a result, rsync could only be reasonably considered without its -E flag, and thus be used for providing the same functionality as in pre-Tiger times.

10.4.6 and rsync

The projects have evolved to:
    Libc-391.2.5
    rsync-24 (based on rsync 2.6.3)
for the PPC version, and to
    Libc-391.4.2
    rsync-24 (based on rsync 2.6.3)
for the Intel version.

Note that the code for copyfile() seems to be virtually the same for both architectures, as the differences are confined to the handling of byte-ordering and data alignment; this seems to be the case in subsequent versions as well.

With rsync-24, Apple addressed the main problems encountered under 10.4: rsync wouldn't crash anymore while trying to insert an entry for a "synthetic file" in the file list; the same way, a spurious file unlinking, which could lead to some of the aforementioned mysterious failures, has been removed.

In other words, with 10.4.6, rsync had begun to be a candidate for large file transfers.

Some surprising, not to say inconsistent however, behaviors from the previous version were still maintained.

For example:
    mkdir B
    rsync -aE SimpleText B/
systematically yielded a copy whose modification date was set to the time of the transfer; on the other hand, the creation date was preserved.

By contrast, in such a case:
    mkdir A B
    touch A/empty
    osascript -e 'tell app "Finder" to set label index of file "Data:A:empty" to 2'
    rsync -aE A/empty B/
the copy had both its creation and modification dates set to the modification date of the original file… [2]

But the biggest problems became apparent as soon as one enabled ACLs on the source volume; for example, with the following:
    mkdir A B
    sudo fsaclctl -p . -e
    rsync -aE SimpleText B/
    touch A/empty
    chmod +a "www allow write" A/empty
    rsync -aE A/empty B/
you would just get those infamous error messages related to "vanished" files, while neither the resource fork nor the ACL were transferred.

It was possible to have rsync working again by enabling ACLs on the boot volume. This was clearly erroneous, as the creation of the synthetic files in "/tmp" obviously doesn't require any ACL capability at all: they are just regular files.

Now, rsync's code shows that the –temp-dir (-T) option has been reaped by Apple for defining the temporary directory to be used on the sender's side (it still governs the receiver's temporary location, as in rsync's original code).
So, instead of enabling ACLs on the boot volume, one had a partial workaround, as this worked for file "empty":
    mkdir A B
    sudo fsaclctl -p . -e
    touch A/empty
    chmod +a "www allow write" A/empty
    rsync -aE –temp-dir=/Volumes/Data/C A/empty B/
but still not for SimpleText…

In fact, for having a file with a resource fork to be rsynced from an ACL-enabled volume, one had to somehow "trigger" that capability, for example by creating an ACL for it, even an empty one.
But this anyway required either to run FixupResourceForks afterwards, or to run rsync as root, as the POSIX permissions weren't correctly transferred.

But that was not the whole story:
    mkdir A B C
    sudo fsaclctl -p / -e
    sudo fsaclctl -p . -e
    touch A/empty
    chmod +a "www allow write" A/empty
    rsync -aE A/empty B/
    rsync -aE B/empty C/
The second rsync command just crashed.

As a general rule, it appeared that files that had been rsynced once from an ACL-enabled volume were not rsyncable anymore (the mess introduced in those files being a very persisting one).
Rather annoying, isn't it?

So, running rsync with its -E flag on "regular" volumes started to be feasible, and even rather reliable.
But running it on ACL-enabled volumes was still kind of suicidal…

10.4.9 and rsync

For the PPC architecture, the project versions are now:
    Libc-391.2.9
    rsync-24.1 (based on rsync 2.6.3)
while, on Intel, they are:
    Libc-391.5.21
    rsync-24.1 (based on rsync 2.6.3)

As far as copyfile() is concerned, return values of some calls are now correctly handled: they do not cancel each other out anymore and thus, for example, make a success appear as an error. Moreover, previously overlooked cases or error conditions are now taken into account.
Rsync's code shows very few changes, mostly related to some data initialization/release, as well as to the correct handling of the -E flag during the cleanup of synthetic files.

Consistency has been reintroduced with regards to he handling of creation dates; now,
    mkdir A B
    cp -p SimpleText A/
    touch A/SimpleText
    sleep 10
    rsync -aE A/SimpleText B/
yields a copy whose creation and modification dates are both set to the modification date of the source file.
OK, perhaps not the most wanted move, but it's at least consistent with every other transfer done through rsync.

But the most striking improvements that came with 10.4.9 are related to ACL-enabled volumes.

It isn't required to have ACLs enabled on the sender's boot volume anymore (nor to resort to the –temp-dir trick).
This is a very nice improvement, as the requirement was logically flawed, and very likely revealed underlying problems anyway.

Moreover, provided rsync runs with sufficient privileges, a file located on an ACL-enabled volume and with only a non empty resource fork as an "extended attribute" may now be rsynced with the -E option.

Last but not least, an rsynced file may be rsynced again.

As a result, rsync may now realistically be applied on ACL-enabled volumes, as shown by this sample code:

    # Let's create various test cases.
    sudo fsaclctl -p . -e
    mkdir A B C
    echo "1234567890" >A/rawfile
    echo "1234567890" >A/rawfile-ACL
    chmod +a "www allow write" A/rawfile-ACL
    echo "1234567890" >A/rawfile-ACL-ATTR
    chmod +a "www allow write" A/rawfile-ACL-ATTR
    ./xattr –set 'color' 'blue' A/rawfile-ACL-ATTR
    echo "1234567890" >A/rawfile-ATTR
    ./xattr –set 'color' 'blue' A/rawfile-ATTR
    echo "1234567890" >A/rawfile-ATTR-XFI
    ./xattr –set 'color' 'blue' A/rawfile-ATTR-XFI
    osascript -e 'tell app "Finder" to set label index of file "Data:A:rawfile-ATTR-XFI" to 2'
    echo "1234567890" >A/rawfile-ACL-ATTR-XFI
    chmod +a "www allow write" A/rawfile-ACL-ATTR-XFI
    ./xattr –set 'color' 'blue' A/rawfile-ACL-ATTR-XFI
    osascript -e 'tell app "Finder" to set label index of file "Data:A:rawfile-ACL-ATTR-XFI" to 2'
    echo "1234567890" >A/rawfile-ACL-ATTR-XFI-FI
    chmod +a "www allow write" A/rawfile-ACL-ATTR-XFI-FI
    ./xattr –set 'color' 'blue' A/rawfile-ACL-ATTR-XFI-FI
    osascript -e 'tell app "Finder" to set label index of file "Data:A:rawfile-ACL-ATTR-XFI-FI" to 2'
    /Developer/Tools/SetFile -a V A/rawfile-ACL-ATTR-XFI-FI
    cp -p SimpleText A/
    cp -p SimpleText A/SimpleText-ACL
    chmod +a "www allow write" A/SimpleText-ACL
    cp -p SimpleText A/SimpleText-ACL-ATTR
    chmod +a "www allow write" A/SimpleText-ACL-ATTR
    ./xattr –set 'color' 'blue' A/SimpleText-ACL-ATTR
    cp -p SimpleText A/SimpleText-ATTR
    ./xattr –set 'color' 'blue' A/SimpleText-ATTR
    
    # Let's enjoy the relief to be able to rsync rsynced files.
    rsync -aE A/ B/
    rsync -aE B/ C/
    
    # Let's fetch some info about the original files…
    ls -ales A
    ./xattr –get color A/*
    for f in A/*; do /Developer/Tools/GetFileInfo "$f"; done
    # … as well as about their final copies.
    ls -ales C
    ./xattr –get color C/*
    for f in C/*; do /Developer/Tools/GetFileInfo "$f"; done
    # Looking at them in the Finder will tell about their visibility and labels.

It appears that, with the above code, both sets of file info are identical (tested on a PPC box and an Intel one).
Of course, a similar conclusion holds when volume "Data" is not ACL-enabled (just remove the parts of the sample code devoted to the creation of ACLs).

So, looks like 10.4.9 came with a much more capable rsync than ever.
Locally:
    – it sustains large file lists,
    – it may reliably be used with its -E flag on "regular" volumes
    – as well as on ACL-enabled volumes

So far, so good.
But then, what about current rsync's capabilities as a server?

Let's configure on each of our test boxes (PPC and Intel) an rsync server:
    – with a daemon running as root
    – with a module named "test"
    – that module being accessible by an rsync user named "test" too
    – the module's directory being /Volumes/Data/rsync_test

Assuming folder "A" still contains the 11 files created with above code, running this command on the Intel box:
    rsync -aE A/ [email protected]::test/
fills the module's directory with the expected files (ie the same ones as those created by the local rsync into folder "C").

But trying the same to the remote server:
    rsync -aE A/ [email protected]::test/
tends to show that some old demons are still alive…
Indeed, on the remote server, an:
    ls -Ales rsync_folder
shows this:
    240 -r——–   1 luttgens  admin  121847 Jun  3  1999 ._SimpleText-ACL-ATTR
      8 -r——–   1 luttgens  admin     245 Apr 23 19:26 ._rawfile-ACL-ATTR
      8 -r——–   1 luttgens  admin     245 Apr 23 19:26 ._rawfile-ACL-ATTR-XFI
      8 -r——–   1 luttgens  admin     245 Apr 23 19:26 ._rawfile-ACL-ATTR-XFI-FI
    240 -rw-rwxr–   1 luttgens  admin       0 Jun  3  1999 SimpleText
    240 -rw-rwxr– + 1 luttgens  admin       0 Jun  3  1999 SimpleText-ACL
     0: user:www allow write
      0 -rw-rwxr–   1 luttgens  admin       0 Jun  3  1999 SimpleText-ACL-ATTR
    240 -rw-rwxr–   1 luttgens  admin       0 Jun  3  1999 SimpleText-ATTR
      8 -rw-r–r–   1 luttgens  admin      11 Apr 23 19:26 rawfile
      8 -rw-r–r– + 1 luttgens  admin      11 Apr 23 19:26 rawfile-ACL
     0: user:www allow write
      8 -rw-r–r–   1 luttgens  admin      11 Apr 23 19:26 rawfile-ACL-ATTR
      8 -rw-r–r–   1 luttgens  admin      11 Apr 23 19:26 rawfile-ACL-ATTR-XFI
      8 -rw-r–r–   1 luttgens  admin      11 Apr 23 19:26 rawfile-ACL-ATTR-XFI-FI
      8 -rw-r–r–   1 luttgens  admin      11 Apr 23 19:26 rawfile-ATTR
      8 -rw-r–r–   1 luttgens  admin      11 Apr 23 19:26 rawfile-ATTR-XFI
That is, with the sample files under consideration, for each file having both an ACL and an attribute, the remote receiver "forgot" to merge the synthetic file back with the data file.
And this time, as could have been expected, running FixupResourceForks is only a partial workaround: if it allows to restore resource forks and Finder info, both the ACLs and the attributes are lost on the "repaired" files.

Fortunately (?), exactly the same behavior may be observed on the PPC box too; the problem thus doesn't seem to be architecture-dependent. [3]
Moreover, I couldn't find any other combination triggering such a behavior.
And, let's be abysmally optimistic, that behavior is a rather benign one, in the sense that it doesn't abruptly crash the process as would probably have been the case with earlier versions.

So, what do we preserve (loose) with rsync -E?

Current rsync's incarnation has a lot going for it, as:

    – it doesn't fear large file transfers anymore [4]
    – it now reliably works with ACL-enabled volumes
    – beside a very special case, it behaves well as a server too.

This of course raises the question: what's the price, dude?

I sure don't have a full answer, but here are some thoughts (any addition and, of course, amendments, are welcome!).

File types for which it make sense to keep a copy seem to be correctly handled by rsync:

    regular files
    directories
    aliases
    symbolic links
    hard links

As far as I can tell, following information seems to be consistently preserved under any circumstances:

    data fork
    resource fork
    owner/group
    permissions (ie rwx, suid, sgid, sticky)
    POSIX dates
    basic Finder information
    label
    custom icon
    ACLs – but see below
    attributes – but see below

And now the less pleasant news:

    creation date: always set to the source file's modification date
    Spotlight comments (formerly known as Finder comments): always lost
    icon position: always lost
    locked flag: always lost [5]
    ACLs: irrecoverable after remote transfers when combined with attributes
    attributes: irrecoverable after remote transfers when combined with ACLs

[1] Of course, there are many other ways to preserve that data (tar, asr,…); but amongst available tools, its versatility makes of rsync a rather atypical and potentially extremely useful one.
[2] Such discrepancies at least tell us one thing: as far as their handling is concerned, a resource fork is not Finder info, nor is it a POSIX filemode…; until the code of copyfile() is stabilized, it is probably safer to check with each new version for possible regressions.
[3] It is fortunate in the sense that the bug should be rather easy to locate, very likely in rsync's receiver code, not in an deeply buried architecture-dependent system library.
[4] To be sure, I just tested this again on about 1,300,000 files, for a full copy as well as a differential one, in both cases locally and remotely.
[5] Which also is the uchg flag from a POSIX point of view; note that when transferred, locked files leave synthetic files in the sender's temporary directory. A persisting old bug.

No Comments

  • This one goes around the ACL-ATTR combination during a remote transfer, which still lets me rather amazed.

    It could well be that the problem is architecture-related, after all.
    If yes, the conclusions should read:
    – ACLs: irrecoverable after remote transfers between PPC-Intel when combined with attributes
    – attributes: irrecoverable after remote transfers between PPC-Intel when combined with ACLs
    which also means that remote transfers between similar archictectures should be OK.

    It looks like the problem isn’t related to rsync’s receiver code, but well to copyfile() – or even something deeper, in which case my third footnote really really starts to be outdated…

    In fact, that combination ACL-ATTR seems to end with "synthetic files" which are not architecture-agnostic anymore.

    In a word, when created on a PPC (Intel) box, they should be understood on a PPC (Intel) box, but raise an error on an Intel (PPC) box.

    Because of stupid medical reasons, I’m currently stuck here with two boxes: an Intel one, and a PPC one. I thus can’t perform remote PPC-PPC or Intel-Intel remote transfers; as I couldn’t only try PPC <-> Intel, I would be pleased to hear about about PPC <-> PPC and Intel <-> Intel remote trials.

    Moreover, compiling something like this:

    #include <sys/param.h>
    #include <sys/stat.h>

    #include <err.h>
    #include <errno.h>
    #include <fts.h>
    #include <limits.h>
    #include <stdlib.h>
    #include <string.h>
    #include <unistd.h>
    #include <sys/param.h>

    #include <libc.h>

    #include <stdio.h>
    #include <libgen.h>
    #include <copyfile.h>

    /* Frankly didn’t bother about optimizing the includes 😉 */

    int main (int argc, const char * argv[]) {

    char * src_path;
    struct stat src_info;
    char dst_path[MAXPATHLEN];

    setenv("COPYFILE_DEBUG", "5", 1);

    /* Some really poor args checking… */

    if (argc != 2)
    err(65, "Needs one argument.");

    src_path = argv[1];
    if (stat(src_path, &src_info) != 0)
    err(65, "Bad path: %s", src_path);

    snprintf(dst_path, MAXPATHLEN, "%s-pck", src_path);

    if (copyfile(src_path, dst_path, NULL, COPYFILE_PACK | COPYFILE_METADATA | COPYFILE_VERBOSE | (1<<31)) != 0)
    err(70, "Seems to have failed while creating ", dst_path);

    return 0;
    }

    should provide a command-line tool much more verbose than copyfile() as called by rsync, and also completely independant of rsync.
    Such a tool should allow to see how "somefile" becomes "somefile-pck", which may then analyzed through hexdump or a similar tool.

    Any feedback?

  • OK, the bug is in copyfile() on Intel, more exactly in a helper function swap_attrhdr().
    The latter treats attribute entries (the ones of type attr_entry_t) as if they were of fixed size; as a result, as soon as there are more then one such entries, the helper function will swap the wrong pieces of data.
    A synthetic file with more than one attribute entry created on an Intel won’t comply to the AppleDouble format anymore; but, as swap_attrhdr() is also used when reading a synthetic file, an Intel box will just undo the mess it has introduced in the file.
    Because of that bug, a synthetic file with more than one attribute entry and created on an Intel can’t be exploited on a PPC, and vice-versa.

  • Thanks for the article 🙂
    Further valuable input can be found re: discussion of the same, on Apple’s OS X Server mailing-list

    A few typos and/or awkward translations still in the article. “Loose” is not “lose,” a common error.

  • Looks like Security Update 2007-005 corrects the problems that came with 2007-004 v1.0 and v1.1.

    • Thanks a lot for your kind comment.
      But I still have overlooked a big problem when Apple’s rsync binary is applied against items bearing ACLs (see my latest comment).
      May I be forgiven by just saying that nobody is perfect? 🙁
      And that the main conclusions are still valid?

      Axel

  • It is only rather recently that I performed really heavy testings with Apple’s rsync against files/folders with ACLs.

    Please agree my apologies for having missed those rather huge memory leaks occuring in such a case: not only are those leaks really noticeable during the building of the file list, but they also lead to increasing memory consumption during the file transfer.
    As a result, even if the file list gets successfully built, problems are anyway likely to happen during the transfer of numerous items bearing ACLs (lots of error messages related to malloc, with rather random tranfers of metadata in a broad sense).

    An update of the article should appear on the site.
    In the meantime, should you mainly have items with ACLs to rsync, a good rule of thumb would be to split your operations into chunks of about 100-200.000 items.
    Apple has acknowleged the problem, mainly located in copyfile’s code, so that one may expect an update.
    But for those interested in an immediate solution, a home brewed patched binary is at disposal (just ask me); it doesn’t touch the overall logics built into Apple’s binary, it just tries to minimize the memory leaks.

    HTH,
    Axel

Leave a reply

You must be logged in to post a comment.