In our use case, there is a Background Mac App (running on Mac M1) that is responsible for receiving data from a companion iOS App via WebSocket connection (client-side Apple Swift API, Vapor4 server side API) and perform computations using Metal Compute APIs and our custom kernels. In order to optimize execution time of these compute kernels we are looking for a way to profile their execution time i.e. how much combined GPU execution time (compute and memory accesses) is taken by each instance? As may be obvious, our primary focus is not the waiting time spent in the kernel scheduling queues before execution begins, but this may be helpful as an extra. We are not sure whether Instruments in XCode will be helpful in above scenario (partly in iOS, partly 3rd party WebSocket API, and partly background Mac App (command line App))? Also, is Metal frame capturing method dependent on presence of Metal graphics APIs and hence will not work for Background Apps? Can we get desired info using GPU Counter Sample Buffers, or are we looking at the wrong places? Any assistance wrt above (measurement of Metal compute kernel execution times in the context of a Mac Background App) will be highly appreciated.
Execution time profiling of Metal compute kernels.
Add a Comment