Race Condition when Moving NSDocuments to iCloud

Update: I just re-read a cocoa-dev post from last month in which Kevin Perry from Apple states the following:

I don’t see the nested uses of performSynchronousFileAccessUsingBlock: you mentioned in that code, but that’s not a problem anyway, since file access is recursive, as long as it happens synchronously within the outer-most file access block (a fact that admittedly may not be documented well anywhere).

That would certainly change things. Perhaps the reason that -relinquishPresentedItemToWriter: was deadlocking on me is because it calls -performActivityWithSynchronousWaiting:, not just -performSynchronousFileAccess? I’ll have to experiment.


Now that iCloud is out, I can finally talk about this in a public forum. Unfortunately, the cocoa-dev mailing list is down. But I really need to get this question out there, or at least written down, because it involves one of the most confusing new APIs in Lion: -[NSDocument performSynchronousFileAccess:] (or -performAsynchronousFileAccess: if you really hate your sanity).

Say I’ve got a document open on my local machine that I’d like to move to iCloud. The API for moving documents to iCloud is -[NSFileManager setUbiquitous:itemAtURL:destinationURL:error:]. Since this API actually moves the file on disk, it performs a coordinated write of the source and destination URLs. Of course the NSDocument itself is a registered NSFilePresenter for the source URL, so NSFileCoordinator will ask it to save the document’s contents by invoking -relinquishPresentedItemToWriter: prior to beginning the move.

This is the genesis of the deadlock warning in the -setUbiquitous:… documentation. If you call -setUbiquitous:… on the main thread, and any of the file presenters registered for the source URL try to do any work on the main thread prior to invoking the writer, you have a deadlock scenario on your hands.

Of course, NSDocument does need to do work on the main thread as part of its saving. In Lion, NSDocument gained a whole slew of methods for dealing with asynchronous autosaving operations and their need to perform some pieces of work on the main thread. These are -performActvitityWithSynchronousWaiting:usingBlock:, -continueActivityUsingBlock:, -continueAsynchronousWorkOnMainThreadUsingBlock:, and -perform(A)SynchronousFileAccessUsingBlock:.

Their documentation (and identical headerdoc) is immensely confusing, and doesn’t do the best job of describing the actual use and behavior of these methods. After weeks of trial, error, and inquiry, I’ve finally wrapped my head around what these methods actually do. I blame overload of the word “block” for much of this confusion. It is used with three completely separate meanings, often within the same paragraph or even the same sentence! It is used to refer to: the traditional multithreading sense of a thread that is “blocked” on a synchronization primitive such as a semaphore; a “code block” of the caret-and-bracket form; and an entirely new hybrid sense of “an activity that has been enqueued and cannot continue until another activity completes.”

It is these “activities” that -performActivityWithSynchronousWaiting:usingBlock: deals with. An activity is a portion of the program’s execution beginning with the invocation of the activity block and ending with the invocation of the activityCompletionHandler that was passed to that block. (Those of you who are better at computer science than I am probably have a much better word to describe this concept.)

Activities exist to prevent attempts to display multiple document-modal UI sessions at the same time. The documentation does a fairly decent job of explaining how that situation can arise: as one example, a background autosave could fail and try to present a sheet alerting the user that their data is not safe while the user is looking at some other sheet.

NSDocument only allows one activity to be executing at a time; it conceptually maintains a serial queue of activities. One activity can only begin after the previous activity in the queue has called its activityCompletionHandler. When you call -performActivityWithSynchronousWaiting:usingBlock:, you are enqueuing an activity on that queue. If you pass YES for the first argument, -performActivityWithSynchronousWaiting: will block the current thread (the main thread, since this is a main-thread-only API) until the activity is dequeued and its block argument has been invoked. (See how two completely different meanings of the word “block” can appear in the same sentence?)

But what of -perform(A)SynchronousFileAccessUsingBlock:? As the documentation describes, this method exists to separate the “file-access” portion of the activity from the other things an activity might do, like presenting a sheet to gather necessary inputs or alert the user of an error.

In effect, -perform(A)SynchronousFileAccessUsingBlock: maintains another queue of “file accesses” that is a direct analogue to the queue of activities maintained by -performActivityWithSynchronousWaiting:! Conceptually, -performSynchronousFileAccessUsingBlock: maps to -performActivityWithSynchronousWaiting:YES, while -performAsynchronousFileAccessUsingBlock: maps to -performActivityWithSynchronousWaiting:NO. A “file access” begins when the file access block is invoked, and ends when either the file access block returns (when using -performSynchronousFileAccessUsingBlock:) or the fileAccessCompletionHandler is invoked (when using -performAsynchronousFileAccessUsingBlock:).

There is a further restriction that you cannot begin an activity from within a file access, but you can begin a file access from within an activity. You can also begin a file access from outside of any activity. Because file accesses are serialized, you can be assured that while within a file access, methods like -fileURL and -fileType will be consistent with each other; a background save operation will not be changing those values on another thread while you’re trying to use them to make decisions on the main thread.

So now, to my dilemma. As mentioned at the beginning of this post, -[NSFileManager setUbiquitous:itemAtURL:destinationURL:error:] performs a coordinated move from the source to the destination URL. That means I have to provide it a source URL: the document’s current -fileURL. In order to avoid being tripped up by race condition with a background save, it’s only safe to ask for the -fileURL from within a file-access block executed by -performAsynchronousFileAccessUsingBlock:. But I must call the fileAccessCompletionHandler before I call -[NSFileManager setUbiquitous:…], because otherwise NSDocument’s implementation of -relinquishPresentedItemToWriter: will call -performAsynchronousFileAccessUsingBlock:, which will enqueue a file access that will never be dequeued because I’m in the middle of performing the file-access that called -setUbiquitous: in the first place! And then -setUbiquitous: will never actually be able to perform its coordinated move because NSFileCoordinator will be sitting forever waiting for -[NSDocument relinquishPresentedItemToWriter:] to invoke its writer block argument.

Here’s what the code winds up looking like:

- (void)moveToCloud
{
  [self performActivityWithSynchronousWaiting:YES usingBlock:^(void (^activityCompletionHandler)(void)) {
    [self performAsynchronousFileAccessUsingBlock:^(void (^fileAccessCompletionHandler)(void)) {
      NSURL *localURL = [self fileURL];
      NSURL *cloudURL = CloudURLForDocument();

      fileAccessCompletionHandler();

      NSError *error;
      BOOL success = [[NSFileManager defaultManager] setUbiquitous:YES itemAtURL:localURL destinationURL:cloudURL error:&error];
      if (!success)
        [self presentError:error];

      activityCompletionHandler();
    }];
  }];
}

Because I must call the fileAccessCompletionHandler before calling -[NSFileManager setUbiquitous:…], I have now introduced a race condition. It’s possible for some other file-access block to execute in between my rescinding file access by calling the fileAccessCompletionHandler and calling -setUbiquitous:…. This file-access block could change the document’s -fileURL, making the argument that I pass to -setUbiquitous:… out of date and incorrect.

In the worst case scenario, some other app could attempt to move my document aside and replace it with a different file while my file-access block is executing. The other app is using NSFileCoordinator, which will call -relinquishPresentedItemToWriter: on my NSDocument, which will enqueue its own file-access block, causing the other application to block. My file-access block calls -fileURL to determine the document’s current URL, then calls its fileAccessCompletionHandler. This will cause -relinquishPresentedItemToWriter:’s file-access block to execute, which will then wake up the other application and allow it to move my file to another URL and write its own file in its place. Then my code resumes execution using the out-of-date URL, and moves the wrong file to iCloud. Some time later, my document gets a -presentedItemDidMoveToURL: to inform it that the other app has moved the document to a different location. Now my document window correctly reflects that it showing a file on the local disk, rather than the file the user thought they just moved to iCloud.

NSDocument has a method for dealing with reentrancy for activities: if an activity needs to call another method that might try to perform its own work within an activity, it can wrap the call to that method in a block passed to -continueActivityUsingBlock:. Then the inner invocation of -performActivityWithSynchronousWaiting: will simply execute its block argument rather than enqueue it as an activity. There is no equivalent reentrancy method for file access. I can’t wrap my call to -[NSFileManager setUbiquitous:…] in a hypothetical -continueFileAccessUsingBlock: that would allow the resulting invocation of -relinquishPresentedItemToWriter: to perform its work immediately.

Except that -continueFileAccessUsingBlock: does exist. It’s undocumented, private API on NSDocument. Since it’s undocumented, I’m not sure it would solve my problem. I also don’t know if using it would open up a different race condition whereby the autosave timer fires and attempts to enqueue its own file-access but instead unwittingly just marches right through and starts mucking with -fileURL and such on a background thread.

I don’t see a way out of this scenario with our current API. I don’t even know what API enhancements would be necessary to make it work. -[NSFileManager setUbiquitous:itemAtURL:…] is a black box; it (or more accurately NSFileCoordinator) would need to somehow inform the NSDocument file presenter methods that they are being called as a result of the document wanting to move to iCloud, not as a result of a separate app trying to perform operations on the file.

In the meantime, I think we’re going to have to ship our apps with a small but glaring race condition.