hi, there, I have a thread blocking issue when using ParallelRenderCommandEncoder.
I found that I can't allocate more than 64 encoders (including parallel encoder and normal encoder), once the limit met, the allocation thread will be blocked by a semaphore waiting.
So I wonder does anyone know the mechanism behind this behavior, is there any work around for allocating more encoders ?
My usage of parallel encoder is to encode commands with multi threads, as it's designed for. As modern games can have multiple render passes in one frame, and parallel encoder should be allocated per pass, so generally I need m * n + m encoders, m for passes, n for job threads, plus m parallel encoders. For example, with 8 job threads for parallel encoding, my game can only have 7 passes maximum.
I initially thought after each pass, as I end encoding, the encoders should be recycled and I have never read about any limitation on encoder allocations in any document. But my program end up blocked by encoder allocation semaphore after 64 encoders, even if I end all encoders after each pass. And it seems cumulating across two or three frames, which is really bad news. In my test, even if I only allocate 50 encoders each frame, my test still get stuck during the second frame. It can only work with a very limited encoder count each frame.
To better illustrate the problem, I narrowed down the sample to simplest condition, I added the following code in the official Metal sample project and the problem still occurs.
This will block my program, if I pause it in Xcode and inspect the thread call stack, I will see it being blocked by a semaphore
The normal command encoder doesn't seem to interfere with the situation, if I limit my parallel encoder and its sub encoders to less than 64, then I can still allocate in a for loop hundreds of normal command encoders from the command buffer without problem.
Does anyone have a clue about why it's happening and how should I solve it ?
Thanks for help !
General description
I found that I can't allocate more than 64 encoders (including parallel encoder and normal encoder), once the limit met, the allocation thread will be blocked by a semaphore waiting.
So I wonder does anyone know the mechanism behind this behavior, is there any work around for allocating more encoders ?
Intentional usage and simpler sample
My usage of parallel encoder is to encode commands with multi threads, as it's designed for. As modern games can have multiple render passes in one frame, and parallel encoder should be allocated per pass, so generally I need m * n + m encoders, m for passes, n for job threads, plus m parallel encoders. For example, with 8 job threads for parallel encoding, my game can only have 7 passes maximum.
I initially thought after each pass, as I end encoding, the encoders should be recycled and I have never read about any limitation on encoder allocations in any document. But my program end up blocked by encoder allocation semaphore after 64 encoders, even if I end all encoders after each pass. And it seems cumulating across two or three frames, which is really bad news. In my test, even if I only allocate 50 encoders each frame, my test still get stuck during the second frame. It can only work with a very limited encoder count each frame.
To better illustrate the problem, I narrowed down the sample to simplest condition, I added the following code in the official Metal sample project and the problem still occurs.
Code Block c id <MTLCommandBuffer> cb = [_commandQueue commandBuffer]; id<MTLParallelRenderCommandEncoder> parallelEncoder = [cb parallelRenderCommandEncoderWithDescriptor:desc]; for (uint i = 0; i < 64; i) { id<MTLRenderCommandEncoder> encoder = [parallelEncoder renderCommandEncoder]; [encoder endEncoding]; } [parallelEncoder endEncoding]; [cb commit];
This will block my program, if I pause it in Xcode and inspect the thread call stack, I will see it being blocked by a semaphore
Code Block language #0 0x00007fff6d8d9e36 in semaphore_wait_trap () #1 0x00000001002dbe1e in _dispatch_sema4_wait () #2 0x00000001002dc2f0 in _dispatch_semaphore_wait_slow () #3 0x00007fff38be1d85 in -[_MTLCommandBuffer initWithQueue:retainedReferences:synchronousDebugMode:] () #4 0x00007fff38be1ba7 in -[MTLIOAccelCommandBuffer initWithQueue:retainedReferences:synchronousDebugMode:] () #5 0x00007fff269c3f0f in -[BronzeMtlCmdBuffer initWithQueue:retainedReferences:synchronousDebugMode:] () #6 0x00007fff269c3ece in -[BronzeMtlCmdBuffer initWithQueue:retainedReferences:] () #7 0x00007fff269e912c in -[BronzeMtlCmdQueue commandBuffer] () #8 0x00007fff38c30d3c in -[_MTLParallelRenderCommandEncoder _renderCommandEncoderCommon] () #9 0x00007fff38c30e9c in -[_MTLParallelRenderCommandEncoder renderCommandEncoder] () #10 0x00007fff269e0a69 in -[BronzeMtlParallelRenderCmdEncoder renderCommandEncoder] () #11 0x00007fff5a2a16bc in -[MTLDebugParallelRenderCommandEncoder renderCommandEncoder] () #12 0x00007fff6a5397ed in _lldb_unnamed_symbol1149$$libMTLCapture.dylib () #13 0x0000000100005ece in -[Renderer drawInMTKView:] at /Users/panda/Desktop/test/test/Renderer.m:299 #14 0x00007fff38ca4849 in -[MTKView draw] () #15 0x00007fff38ca4728 in 23-[MTKView initCommon]_block_invoke () #16 0x00000001002db826 in _dispatch_client_callout () #17 0x00000001002de67d in _dispatch_continuation_pop () #18 0x00000001002f4635 in _dispatch_source_invoke () #19 0x00000001002eb275 in _dispatch_main_queue_callback_4CF () #20 0x00007fff33720e81 in CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE () #21 0x00007fff336e0c87 in CFRunLoopRun () #22 0x00007fff336dfe3e in CFRunLoopRunSpecific () #23 0x00007fff3230cabd in RunCurrentEventLoopInMode () #24 0x00007fff3230c7d5 in ReceiveNextEventCommon () #25 0x00007fff3230c579 in _BlockUntilNextEventMatchingListInModeWithFilter () #26 0x00007fff30952039 in _DPSNextEvent () #27 0x00007fff30950880 in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] () #28 0x00007fff3094258e in -[NSApplication run] () #29 0x00007fff30914396 in NSApplicationMain () #30 0x00000001000042cf in main at /Users/panda/Desktop/test/test/main.m:14
Other notice
The normal command encoder doesn't seem to interfere with the situation, if I limit my parallel encoder and its sub encoders to less than 64, then I can still allocate in a for loop hundreds of normal command encoders from the command buffer without problem.
Does anyone have a clue about why it's happening and how should I solve it ?
Thanks for help !