How to force CoreML to use only single thread for inference on MacOS

Is there anyway we can set the number of threads used during coreML inference? My model is relatively small and the overhead of launching new threads is too expensive. When using TensorFlow C API, forcing to single thread results in significant decrease in CPU usage. (So far coreML with multiple threads has 3 times the cpu usage compares to TensorFlow with single thread).

Also, wondering if anyone has compared the performance between TensorFlow in C and coreML?

Replies

Hello Brianyan, thank you for the question. I am an engineer in CoreML team and would like get a radar with sysdiagnose captured right after the inference, Instruments's trace (Time profiler template is a good start), and the model file.

CoreML doesn't explicitly create a thread, though it uses dispatch_queue, which may pick up a new thread from a pool. Generally speaking, this part of the architecture is rarely a reason of performance problems.

  • Realize I didn't comment on your response, just FYI.. Please see below. Thanks.

Add a Comment

Hi Thanks for the response. Unfortunately, I cannot share the files for confidentiality reasons. I am wondering though if you can point me to what and where I should be looking at in the sysdiagnose and trace files?

Also, for everyone's reference: I was able to reduce the cpu usage by handcrafting the model and copy the weights instead of using the coremltools to auto convert. But the cpu usage is still 50% higher (compare to 300% previously) than using TensorFlow C API. Inference time for CoreML and TensorFlow is on par with each other (only a fraction of ms difference)