dispatch_sync vs. dispatch_async on main queue

56,075

Solution 1

This is a common issue related to disk I/O and GCD. Basically, GCD is probably spawning one thread for each file, and at a certain point you've got too many threads for the system to service in a reasonable amount of time.

Every time you call dispatch_async() and in that block you attempt to to any I/O (for example, it looks like you're reading some files here), it's likely that the thread in which that block of code is executing will block (get paused by the OS) while it waits for the data to be read from the filesystem. The way GCD works is such that when it sees that one of its worker threads is blocked on I/O and you're still asking it to do more work concurrently, it'll just spawn a new worker thread. Thus if you try to open 50 files on a concurrent queue, it's likely that you'll end up causing GCD to spawn ~50 threads.

This is too many threads for the system to meaningfully service, and you end up starving your main thread for CPU.

The way to fix this is to use a serial queue instead of a concurrent queue to do your file-based operations. It's easy to do. You'll want to create a serial queue and store it as an ivar in your object so you don't end up creating multiple serial queues. So remove this call:

dispatch_queue_t taskQ = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);

Add this in your init method:

taskQ = dispatch_queue_create("com.yourcompany.yourMeaningfulLabel", DISPATCH_QUEUE_SERIAL);

Add this in your dealloc method:

dispatch_release(taskQ);

And add this as an ivar in your class declaration:

dispatch_queue_t taskQ;

Solution 2

I believe Ryan is on the right path: there are simply too many threads being spawned when a project has 1,500 files (the amount I decided to test with.)

So, I refactored the code above to work like this:

- (void) establishImportLinksForFilesInProject:(LPProject *)aProject
{
        dispatch_queue_t taskQ = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);

     dispatch_async(taskQ, 
     ^{

     // Create a new Core Data Context on this thread using the same persistent data store    
     // as the main thread. Pass the objectID of aProject to access the managedObject
     // for that project on this thread's context:

     NSManagedObjectID *projectID = [aProject objectID];

     for (LPFile *fileToCheck in [backgroundContext objectWithID:projectID] memberFiles])
     {
        if (//Some condition is met)
        {
                // Here, we do the scanning for @import statements. 
                // When we find a valid one, we put the whole path to the 
                // imported file into an array called 'verifiedImports'. 

                // Pass this ID to main thread in dispatch call below to access the same
                // file in the main thread's context
                NSManagedObjectID *fileID = [fileToCheck objectID];


                // go back to the main thread and update the model 
                // (Core Data is not thread-safe.)
                dispatch_async(dispatch_get_main_queue(), 
                ^{
                    for (NSString *import in verifiedImports)
                    {  
                       LPFile *targetFile = [mainContext objectWithID:fileID];
                       // Add the relationship to targetFile. 
                    }
                 });//end block
         }
    }
    // Easy way to tell when we're done processing all files.
    // Could add a dispatch_async(main_queue) call here to do something like UI updates, etc

    });//end block
    }

So, basically, we're now spawning one thread that reads all the files instead of one-thread-per-file. Also, it turns out that calling dispatch_async() on the main_queue is the correct approach: the worker thread will dispatch that block to the main thread and NOT wait for it to return before proceeding to scan the next file.

This implementation essentially sets up a "serial" queue as Ryan suggested (the for loop is the serial part of it), but with one advantage: when the for loop ends, we're done processing all the files and we can just stick a dispatch_async(main_queue) block there to do whatever we want. It's a very nice way to tell when the concurrent processing task is finished and that didn't exist in my old version.

The disadvantage here is that it's a bit more complicated to work with Core Data on multiple threads. But this approach seems to be bulletproof for projects with 5,000 files (which is the highest I've tested.)

Share:
56,075
Bryan
Author by

Bryan

A Mac developer with a finance MBA who flies airplanes upside down competitively.

Updated on July 09, 2022

Comments

  • Bryan
    Bryan almost 2 years

    Bear with me, this is going to take some explaining. I have a function that looks like the one below.

    Context: "aProject" is a Core Data entity named LPProject with an array named 'memberFiles' that contains instances of another Core Data entity called LPFile. Each LPFile represents a file on disk and what we want to do is open each of those files and parse its text, looking for @import statements that point to OTHER files. If we find @import statements, we want to locate the file they point to and then 'link' that file to this one by adding a relationship to the core data entity that represents the first file. Since all of that can take some time on large files, we'll do it off the main thread using GCD.

    - (void) establishImportLinksForFilesInProject:(LPProject *)aProject {
        dispatch_queue_t taskQ = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
         for (LPFile *fileToCheck in aProject.memberFiles) {
             if (//Some condition is met) {
                dispatch_async(taskQ, ^{
                    // Here, we do the scanning for @import statements. 
                    // When we find a valid one, we put the whole path to the imported file into an array called 'verifiedImports'. 
    
                    // go back to the main thread and update the model (Core Data is not thread-safe.)
                    dispatch_sync(dispatch_get_main_queue(), ^{
    
                        NSLog(@"Got to main thread.");
    
                        for (NSString *import in verifiedImports) {  
                                // Add the relationship to Core Data LPFile entity.
                        }
                    });//end block
                });//end block
            }
        }
    }
    

    Now, here's where things get weird:

    This code works, but I'm seeing an odd problem. If I run it on an LPProject that has a few files (about 20), it runs perfectly. However, if I run it on an LPProject that has more files (say, 60-70), it does NOT run correctly. We never get back to the main thread, the NSLog(@"got to main thread"); never appears and the app hangs. BUT, (and this is where things get REALLY weird) --- if I run the code on the small project FIRST and THEN run it on the large project, everything works perfectly. It's ONLY when I run the code on the large project first that the trouble shows up.

    And here's the kicker, if I change the second dispatch line to this:

    dispatch_async(dispatch_get_main_queue(), ^{
    

    (That is, use async instead of sync to dispatch the block to the main queue), everything works all the time. Perfectly. Regardless of the number of files in a project!

    I'm at a loss to explain this behavior. Any help or tips on what to test next would be appreciated.

    • Bryan
      Bryan almost 13 years
      Note: I've redacted the "scanning" and "Core Data entry" code fragments for brevity. I'm almost certain they aren't the culprits, however, because they work perfectly if I put everything on a single thread AND they work perfectly in the multi-thread situations described above ("warming up" everything by running a small project first and/or using dispatch_async() on the main queue instead of dispatch_sync()).
    • Dave DeLong
      Dave DeLong almost 13 years
      Sounds like you're hitting a deadlock issue
    • FrancesSun
      FrancesSun almost 13 years
      You should run sample or instruments against your application when it is in this state to see what the other threads are all doing. If they're deadlocked, what's happening should be much more apparent.
    • ImHuntingWabbits
      ImHuntingWabbits almost 13 years
      Where is NSManagedObjectContext -save called? Do you have an observer of that notification that is forcing it's response to the main thread using performSelectorOnMainThread?
    • Ryan
      Ryan almost 13 years
      This question should be edited to indicate where individual file I/O is happening vs. where CoreData queries are happening. As it stand, it is misleading.
    • Bryan
      Bryan almost 13 years
      @wabbits: -save is called before I enter this method. Pushing any changes in the main thread's managedObjectContext to the persistent store before we create a new context in a background thread means the background context is an exact replica of the main thread's context. I never change any managedObjects in the background context, so I never need to push any changes back to the main thread's context. I simply go back to the main thread and make the changes to THAT context.
  • jscs
    jscs almost 13 years
    Mike Ash also has an excellent writeup about this problem: mikeash.com/pyblog/friday-qa-2009-09-25-gcd-practicum.html
  • Bryan
    Bryan almost 13 years
    @Ryan - Thanks for the input. This had occurred to me as well, but if the problem was too many concurrent threads, we would expect the large project to fail EVERY time. In this case, it WORKS, as long as I run the code on a smaller project first. (Note that the two projects are completely separate files, so no files are cached, etc.)
  • Ryan
    Ryan almost 13 years
    Yeah, your original question didn't make clear that your "files" were really CoreData objects. That is a completely different type of issue. My response dealt with actual file I/O. At this point I realize I can't actually tell what you're doing without seeing a full source code listing. I'm not sure when you're doing file I/O or reading data out of CoreData. Feel free to list the source for what you're doing if you'd like more input.
  • byJeevan
    byJeevan over 9 years
    What if automatic reference counting enabled?