does MetalFX utilizes NPU?

I heard MetalFX TAA utilize NPU

I wonder SpatialScaling also utilizes NPU

Accepted Reply

I highly doubt that MetalFX utilizes the ANE. More information in https://developer.apple.com/forums/thread/707667. The reason is, switching contexts between accelerators incurs a lot of overhead, and the latency might be several milliseconds. Even if the Neural Engine has higher throughput, it's harder to access and less programmable. Furthermore, Apple GPUs, starting with the Apple7 generation, have hardware acceleration for matrix multiplication. It's called simdgroup_matrix and documented in the MSL specification. It increases the ALU utilization from 25% to 80%. The fact that this is limited to Apple7 and Apple8 GPUs - the only GPUs with simdgroup_matrix - further supports this hypothesis.

More explanation on how powerful simdgroup_matrix is: M1 Max has a GPU with 10 TFLOPS F32. Double that equals 20 TFLOPS F16, 80% is 16 TFLOPS F32. This is more processing power than the A14/M1's ANE, which is 11 TFLOPS F16. This could explain why Apple currently limits MetalFX to high-end Macs, where the GPU is more powerful than the ANE. On an A14/A15, it might be more power-efficient to use an image upscaling CoreML model on the ANE.

Replies

I highly doubt that MetalFX utilizes the ANE. More information in https://developer.apple.com/forums/thread/707667. The reason is, switching contexts between accelerators incurs a lot of overhead, and the latency might be several milliseconds. Even if the Neural Engine has higher throughput, it's harder to access and less programmable. Furthermore, Apple GPUs, starting with the Apple7 generation, have hardware acceleration for matrix multiplication. It's called simdgroup_matrix and documented in the MSL specification. It increases the ALU utilization from 25% to 80%. The fact that this is limited to Apple7 and Apple8 GPUs - the only GPUs with simdgroup_matrix - further supports this hypothesis.

More explanation on how powerful simdgroup_matrix is: M1 Max has a GPU with 10 TFLOPS F32. Double that equals 20 TFLOPS F16, 80% is 16 TFLOPS F32. This is more processing power than the A14/M1's ANE, which is 11 TFLOPS F16. This could explain why Apple currently limits MetalFX to high-end Macs, where the GPU is more powerful than the ANE. On an A14/A15, it might be more power-efficient to use an image upscaling CoreML model on the ANE.

MetalFX TAA actually does use the ANE. I was ray tracing at 120 Hz, and the GPU was using only 2 W (frame time was 4 milliseconds out of 8.3). It was using the lower 300-500 MHz clock speeds to decrease power consumption beyond what you'd think is possible. However, it also used the ANE to 80 mW. The ANE was at 0 watts when MetalFX was off, and 80 mW exactly the moment MetalFX turned on. I also got some error messages from the Xcode console about the ANE, whenever I used Metal Frame Capture.

This could explain why in Apple's MetalFX video, they stress giving you the ability to overlap work from different frames. I imagine the ANE has incredible latency to access, or some peculiarities in how it's accessed. MetalFX has a pipeline that runs, and automatically finishes in time for your next frame submission. It's executing work sporadically throughput the entire frame, presumably to hide some kind of latency. It might be shuffling work back and forth between the GPU and ANE.

MetalFX spatial upscaling probably does not use the ANE, because it is compatible with Intel Macs.

  • Hi, Philip. You were doing raytracing calculation at 120hz, using Apple Neural Engine? Using ANE like Nvidia RT Cores? I've heard that Neural Engine can theoretically be used for raytracing calculations, because it is quite similar to Nvidia dedicated RT Cores. If so, could Apple bring this feature to current Apple Silicon on a Metal 3.1 update?

  • Thank you for the insightful analysis. One thing that strikes me as curious is the incredibly low power usage of the ANE. For instance, when I run super-resolution upscaling in Pixelmator Pro on my M1, I can observe in ASITOP that the ANE's power consumption spikes to over 5W. However, when playing the No Man's Sky game with MetalFX TAAU turned on, as you mentioned, the power consumption remains below 0.2W.

Add a Comment