Why can't we use a dispatch_sync on the current queue?

32,254

Solution 1

I found this in the documentation (last chapter):

Do not call the dispatch_sync function from a task that is executing on the same queue that you pass to your function call. Doing so will deadlock the queue. If you need to dispatch to the current queue, do so asynchronously using the dispatch_async function.

Also, I followed the link that you provided and in the description of dispatch_sync I read this:

Calling this function and targeting the current queue results in deadlock.

So I don't think it's a problem with GCD, I think the only sensible approach is the one you invented after discovering the problem.

Solution 2

dispatch_sync does two things:

  1. queue a block
  2. blocks the current thread until the block has finished running

Given that the main thread is a serial queue (which means it uses only one thread), if you run the following statement on the main queue:

dispatch_sync(dispatch_get_main_queue(), ^(){/*...*/});

the following events will happen:

  1. dispatch_sync queues the block in the main queue.
  2. dispatch_sync blocks the thread of the main queue until the block finishes executing.
  3. dispatch_sync waits forever because the thread where the block is supposed to run is blocked.

The key to understanding this issue is that dispatch_sync does not execute blocks, it only queues them. Execution will happen on a future iteration of the run loop.

The following approach:

if (queueA == dispatch_get_current_queue()){
    block();
} else {
    dispatch_sync(queueA, block);
}

is perfectly fine, but be aware that it won't protect you from complex scenarios involving a hierarchy of queues. In such case, the current queue may be different than a previously blocked queue where you are trying to send your block. Example:

dispatch_sync(queueA, ^{
    dispatch_sync(queueB, ^{
        // dispatch_get_current_queue() is B, but A is blocked, 
        // so a dispatch_sync(A,b) will deadlock.
        dispatch_sync(queueA, ^{
            // some task
        });
    });
});

For complex cases, read/write key-value data in the dispatch queue:

dispatch_queue_t workerQ = dispatch_queue_create("com.meh.sometask", NULL);
dispatch_queue_t funnelQ = dispatch_queue_create("com.meh.funnel", NULL);
dispatch_set_target_queue(workerQ,funnelQ);

static int kKey;
 
// saves string "funnel" in funnelQ
CFStringRef tag = CFSTR("funnel");
dispatch_queue_set_specific(funnelQ, 
                            &kKey,
                            (void*)tag,
                            (dispatch_function_t)CFRelease);

dispatch_sync(workerQ, ^{
    // is funnelQ in the hierarchy of workerQ?
    CFStringRef tag = dispatch_get_specific(&kKey);
    if (tag){
        dispatch_sync(funnelQ, ^{
            // some task
        });
    } else {
        // some task
    }
});

Explanation:

  • I create a workerQ queue that points to a funnelQ queue. In real code this is useful if you have several “worker” queues and you want to resume/suspend all at once (which is achieved by resuming/updating their target funnelQ queue).
  • I may funnel my worker queues at any point in time, so to know if they are funneled or not, I tag funnelQ with the word "funnel".
  • Down the road I dispatch_sync something to workerQ, and for whatever reason I want to dispatch_sync to funnelQ, but avoiding a dispatch_sync to the current queue, so I check for the tag and act accordingly. Because the get walks up the hierarchy, the value won't be found in workerQ but it will be found in funnelQ. This is a way of finding out if any queue in the hierarchy is the one where we stored the value. And therefore, to prevent a dispatch_sync to the current queue.

If you are wondering about the functions that read/write context data, there are three:

  • dispatch_queue_set_specific: Write to a queue.
  • dispatch_queue_get_specific: Read from a queue.
  • dispatch_get_specific: Convenience function to read from the current queue.

The key is compared by pointer, and never dereferenced. The last parameter in the setter is a destructor to release the key.

If you are wondering about “pointing one queue to another”, it means exactly that. For example, I can point a queue A to the main queue, and it will cause all blocks in the queue A to run in the main queue (usually this is done for UI updates).

Solution 3

I know where your confusion comes from:

As an optimization, this function invokes the block on the current thread when possible.

Careful, it says current thread.

Thread != Queue

A queue doesn't own a thread and a thread is not bound to a queue. There are threads and there are queues. Whenever a queue wants to run a block, it needs a thread but that won't always be the same thread. It just needs any thread for it (this may be a different one each time) and when it's done running blocks (for the moment), the same thread can now be used by a different queue.

The optimization this sentence talks about is about threads, not about queues. E.g. consider you have two serial queues, QueueA and QueueB and now you do the following:

dispatch_async(QueueA, ^{
    someFunctionA(...);
    dispatch_sync(QueueB, ^{
        someFunctionB(...);
    });
});

When QueueA runs the block, it will temporarily own a thread, any thread. someFunctionA(...) will execute on that thread. Now while doing the synchronous dispatch, QueueA cannot do anything else, it has to wait for the dispatch to finish. QueueB on the other hand, will also need a thread to run its block and execute someFunctionB(...). So either QueueA temporarily suspends its thread and QueueB uses some other thread to run the block or QueueA hands its thread over to QueueB (after all it won't need it anyway until the synchronous dispatch has finished) and QueueB directly uses the current thread of QueueA.

Needless to say that the last option is much faster as no thread switch is required. And this is the optimization the sentence talks about. So a dispatch_sync() to a different queue may not always cause a thread switch (different queue, maybe same thread).

But a dispatch_sync() still cannot happen to the same queue (same thread, yes, same queue, no). That's because a queue will execute block after block and when it currently executes a block, it won't execute another one until the currently executed is done. So it executes BlockA and BlockA does a dispatch_sync() of BlockB on the same queue. The queue won't run BlockB as long as it still runs BlockA, but running BlockA won't continue until BlockB has ran. See the problem? It's a classical deadlock.

Solution 4

The documentation clearly states that passing the current queue will cause a deadlock.

Now they don’t say why they designed things that way (except that it would actually take extra code to make it work), but I suspect the reason for doing things this way is because in this special case, blocks would be “jumping” the queue, i.e. in normal cases your block ends up running after all the other blocks on the queue have run but in this case it would run before.

This problem arises when you are trying to use GCD as a mutual exclusion mechanism, and this particular case is equivalent to using a recursive mutex. I don’t want to get into the argument about whether it’s better to use GCD or a traditional mutual exclusion API such as pthreads mutexes, or even whether it’s a good idea to use recursive mutexes; I’ll let others argue about that, but there is certainly a demand for this, particularly when it’s the main queue that you’re dealing with.

Personally, I think that dispatch_sync would be more useful if it supported this or if there was another function that provided the alternate behaviour. I would urge others that think so to file a bug report with Apple (as I have done, ID: 12668073).

You can write your own function to do the same, but it’s a bit of a hack:

// Like dispatch_sync but works on current queue
static inline void dispatch_synchronized (dispatch_queue_t queue,
                                          dispatch_block_t block)
{
  dispatch_queue_set_specific (queue, queue, (void *)1, NULL);
  if (dispatch_get_specific (queue))
    block ();
  else
    dispatch_sync (queue, block);
}

N.B. Previously, I had an example that used dispatch_get_current_queue() but that has now been deprecated.

Solution 5

Both dispatch_async and dispatch_sync perform push their action onto the desired queue. The action does not happen immediately; it happens on some future iteration of the run loop of the queue. The difference between dispatch_async and dispatch_sync is that dispatch_sync blocks the current queue until the action finishes.

Think about what happens when you execute something asynchronously on the current queue. Again, it does not happen immediately; it puts it in a FIFO queue, and it has to wait until after the current iteration of the run loop is done (and possibly also wait for other actions that were in the queue before you put this new action on).

Now you might ask, when performing an action on the current queue asynchronously, why not always just call the function directly, instead of wait until some future time. The answer is that there is a big difference between the two. A lot of times, you need to perform an action, but it needs to be performed after whatever side effects are performed by functions up the stack in the current iteration of the run loop; or you need to perform your action after some animation action that is already scheduled on the run loop, etc. That's why a lot of times you will see the code [obj performSelector:selector withObject:foo afterDelay:0] (yes, it's different from [obj performSelector:selector withObject:foo]).

As we said before, dispatch_sync is the same as dispatch_async, except that it blocks until the action is completed. So it's obvious why it would deadlock -- the block cannot execute until at least after the current iteration of the run loop is finished; but we are waiting for it to finish before continuing.

In theory it would be possible to make a special case for dispatch_sync for when it is the current thread, to execute it immediately. (Such a special case exists for performSelector:onThread:withObject:waitUntilDone:, when the thread is the current thread and waitUntilDone: is YES, it executes it immediately.) However, I guess Apple decided that it was better to have consistent behavior here regardless of queue.

Share:
32,254
Richard J. Ross III
Author by

Richard J. Ross III

I am Richard J. Ross III, and am an experienced iOS, android, and .NET programmer, having written apps, open-source libraries, and tools.

Updated on December 08, 2020

Comments

  • Richard J. Ross III
    Richard J. Ross III over 3 years

    I ran into a scenario where I had a delegate callback which could occur on either the main thread or another thread, and I wouldn't know which until runtime (using StoreKit.framework).

    I also had UI code that I needed to update in that callback which needed to happen before the function executed, so my initial thought was to have a function like this:

    -(void) someDelegateCallback:(id) sender
    {
        dispatch_sync(dispatch_get_main_queue(), ^{
            // ui update code here
        });
    
        // code here that depends upon the UI getting updated
    }
    

    That works great, when it is executed on the background thread. However, when executed on the main thread, the program comes to a deadlock.

    That alone seems interesting to me, if I read the docs for dispatch_sync right, then I would expect it to just execute the block outright, not worrying about scheduling it into the runloop, as said here:

    As an optimization, this function invokes the block on the current thread when possible.

    But, that's not too big of a deal, it simply means a bit more typing, which lead me to this approach:

    -(void) someDelegateCallBack:(id) sender
    {
        dispatch_block_t onMain = ^{
            // update UI code here
        };
    
        if (dispatch_get_current_queue() == dispatch_get_main_queue())
           onMain();
        else
           dispatch_sync(dispatch_get_main_queue(), onMain);
    }
    

    However, this seems a bit backwards. Was this a bug in the making of GCD, or is there something that I am missing in the docs?