Post not yet marked as solved
Hi, I want to know the gpu device available memory by code, so that I could calculate the size of buffer and texture I will create using Metal. I only find the MTLDevice's property currentAllocatedSize that can show the allocated memory for this process.
I profile a metal shader using both Frame Capture Debugging Tools and addCompletedHandler method of command buffer.
The frame capture tool shows me that my shader costs 2.7ms (differs between 2.5ms-3ms for executing multiple times).
And the command buffer I use following code:
[commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> cb) {
		CFTimeInterval executionDuration = cb.GPUEndTime - cb.GPUStartTime;
		// print executionDuration to console
}];
The executionDuration shows about 5ms. What the difference between these two time duration?
Post not yet marked as solved
I have a model contains Convolution, LeakyReLU as activation layer and one concat layer. I convert it into .mlmodel format for IPhone12 inference. When the model is on GPU, the result is a little different from the result on CPU but still acceptable. When using ANE (Apple Neural Engine), the result differs a lot from CPU and GPU both.
I have tried changing LeakyReLU to ReLU, but the problem still exists. The only difference in my CPU, GPU and ANE inference execution is changing compute units between MLComputeCPUOnly, MLComputeCPUAndGPU and MLComputeAll.