Metal Frame Capture Debugging Tools shader time isn't consistent with CommandBuffer's GPUTime

I profile a metal shader using both Frame Capture Debugging Tools and addCompletedHandler method of command buffer.

The frame capture tool shows me that my shader costs 2.7ms (differs between 2.5ms-3ms for executing multiple times).

And the command buffer I use following code:
Code Block
[commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> cb) {
CFTimeInterval executionDuration = cb.GPUEndTime - cb.GPUStartTime;
// print executionDuration to console
}];

The executionDuration shows about 5ms. What the difference between these two time duration?



Is this on iOS? The frame capture times also measure the GPU command buffer execution time. However, the frame capture tools will lock power states so that you get consistent measurements and reliably see how changes affect performance.

The time calculated with (GPUEndTime - GPUStartTime) should only match or better than these times when the device is in its highest power state.
@Graphics and Games Engineer 
I am using the frame captures on iOS (iphone xr), and use it to capture command data programmatically.

the frame capture tools will lock power states

I don't understand the meaning of "lock power states". Do you mean by the frame capture will make the gpu lock in its highest power state? Is there any programmatical method to make iphone's gpu in its highest power state?

And I have another hypothesis that iOS limits single app's metal usage to be under certain percentage of GPU Time (maybe 50% ?)
Accepted Answer
Xcode will does not lock the GPU to the highest thermal (aka power) state, but it will lock it to a constant power state when analyzing performance and lock it the same one from run to run.

There is intentionally no API to lock any part of the system to a particular thermal state as that would like be misused and damage user devices.

iOS does not limit an App's GPU utilization by and particular percentage. However, there are many factors beyond a app's control which would yield inconsistent performance. Variability of device thermals is often the culprit. There is some discussion on dealing with this variability at the beginning of this WWDC presentation from 2019
Metal Frame Capture Debugging Tools shader time isn't consistent with CommandBuffer's GPUTime
 
 
Q