Re: New LLVM compiler introduces hang using gcd

Question

Created Jun ’15

Replies 7

Boosts 0

Views 1.2k

Participants 4

I originally posted this problem in the old devforums, and now have a reduced test case.

I still don't understand what's fundamentally the issue here, but basically the new compiler is somehow allowing a block to get added to a GCD queue twice in some situations. In the following code, the 10 worker blocks each do their thing and the parent block waits until they're all done. This used to work really well (through Xcode 5.1.1), but now, occationally, one of the worker threads will get added to the queue *twice*.

Jeffrey

int main(int argc, const char * argv[]) {
    @autoreleasepool {
       
        dispatch_group_t group = dispatch_group_create();
        dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_HIGH, 0);
        NSUInteger totalWorkers = 10;
       
        NSMutableArray *workers = [[NSMutableArray alloc] init];
        for (NSUInteger iWorker=0; iWorker<totalWorkers; iWorker++) {
            __block int numThreads = 0;
            workerType worker = ^{
                numThreads++;
                if (numThreads>1) {
                    NSLog(@"Whoops! Called by two threads");
                }
                int y = 0;
                for (int i=0; i<1000; i++) {
                    y += 1;
                }
                dispatch_group_leave(group);
                numThreads--;
            };
           
            [workers addObject: worker];
        }
       
        void (^parent)(void) = ^{
            for (NSUInteger i=0; i<totalWorkers; i++) {
                dispatch_group_enter(group);
            }
           
            for ( workerType aWorker in workers ) {
                dispatch_async( queue, ^{
                    aWorker();
                });
            }
           
            dispatch_group_wait(group, DISPATCH_TIME_FOREVER);
        };
       
        NSUInteger n = 0;
        while (1) {
            parent();
            n++;
            if (n%10000==0) {
                NSLog(@"Looped %lu", n);
            }
        }
       
    }
    return 0;
}

Boost

Answer 1

OP

Apple

Jun ’15

Why would you expect that two of them wouldn't run at the same time? (that's actually what you've asked GCD to do: run them all concurrently). I'm actually surprised that it only happens "occasionally". You also have no synchronization around the numThreads variable, so it is not being accessed in a thread-safe manner.

If you really don't want them to execute concurrently, use a serial queue instead of one of the global (concurrent) queues.

0

Answer 2

JeffreyEarly OP

Jun ’15

You misunderstand what the code is doing.

The ten worker threads are (and should be) executing concurrently. That's why they added with dispatch_async to a concurrent queue.

What is happening (that should not be) is that the *same* block (one of the same workers) is being added to the queue twice in that for-loop.

Note that each block gets its own copy of the __block int numThreads variable, because it's enclosing block is copied to the heap. So you should never see numThreads above 1.

0

Answer 3

JeffreyEarly OP

Jun ’15

I submitted this as rdar://21439069

0

Answer 4

CC-Dog OP

Jun ’15

I suggest that you try to use dispatch_apply for a workaround. It's functioning perfectly for me.

0

Answer 5

JeffreyEarly OP

Jun ’15

The fundamental issue appears to be that numThreads-- needs to be placed before calling dispatch_group_notify(). This should have been obvious: dispatch_group_notify() is called when the 'work' is complete. Thanks to an Apple engineer for figuring this out.

Unfortunately, this means I did not manage to create a reduced test case for my issue. I'm still getting inexplicable hangs in dispatch_async() starting with Xcode 6.

Jeffrey

0

Answer 6

JeffreyEarly OP

Jun ’15

With my production code I can reproduce the hang fairly regularly. It always hangs on the first line of this code,

dispatch_async( globalQueue, ^{
  anExecutionBlock( dataBuffers );
  });

And it can be the only other thread (except for the main thread and an NSOperation thread) that is running. The backtrace of the hung thread looks like,

* thread #7: tid = 0x19de87, 0x00000001000322cc libBacktraceRecording.dylib`gcd_queue_item_enqueue_hook + 563, queue = 'com.earlyinnovations.OperationOptimizer'
    frame #0: 0x00000001000322cc libBacktraceRecording.dylib`gcd_queue_item_enqueue_hook + 563
    frame #1: 0x000000010072f75f libdispatch.dylib`_dispatch_introspection_queue_item_enqueue_hook + 46
    frame #2: 0x000000010071101e libdispatch.dylib`_dispatch_async_f_redirect + 791
  * frame #3: 0x000000010007be33 GLNumericalModelingKit.dylib`__90-[GLOperationOptimizer createExecutionBlockFromOperation:forTopVariables:bottomVariables:]_block_invoke(.block_descriptor=0x0000000115bebe50, dataBuffers=0x0000000115e2c760) + 531 at GLOperationOptimizer.m:754
etc.

And the gcd_queue_item_enqueue_hook is just stuck in an infinite loop on this,

->  0x1000322c3 <+554>: incl   %ebx
    0x1000322c5 <+556>: movq   0x1078(%rax), %rax
    0x1000322cc <+563>: testq  %rax, %rax
    0x1000322cf <+566>: jne    0x1000322c3               ; <+554>

which I don't understand.

If I ask the queue to describe itself, I get this,

<OS_dispatch_queue: com.earlyinnovations.OperationOptimizer[0x115bd8800] = { xrefcnt = 0x8f, refcnt = 0x19, suspend_cnt = 0x0, locked = 0, target = com.apple.root.default-qos[0x100744d80], width = 0x7fff, running = 0x4, barrier = 0 }>

And so yeah, there my code sits completely hung.

I'm completely stumped on this one.

0

Answer 7

Systems Engineer OP

Apple

Jul ’15

The occurrence of libBacktraceRecording.dylib in the backtrace indicates that the hang is related to the Xcode queue debugging feature, which injects this library into the process and uses a different version of libdispatch.dylib that has additional introspection hooks.

If you run your app outside of Xcode (or attach to it after it is started from Finder) this code will not be present and the hang should go away.

Please make sure to file a new radar with the specific backtraces that you see during this hang, and if possible your production code that reproduces it.

0