How to bind threads to performance (P) or efficiency (E) cores?

For some simulation work-loads I have, I would like to use the system to its full potential and therefore use both P and E cores. Splitting the work-load into individual tasks is not easily possible (the threads communicate with each other and run in semi-lockstep). I can allocate smaller portions of the domain to the E cores (and iteratively adjust this so they take the same amount of time as the P cores).

But in order for this to work well, I need to ensure that a given thread (with its associated workload) is bound to the right type of core: *either* the performance (doing larger chunks of the domain) or the efficiency (doing smaller chunks of the domain) cores.

What's the best way to do this? So far, I don't think thread-to-core affinity has been something that was choosable in macOS.

The documentation mentioned the QoS classes, but which class(es) (or relative priorities) would I pick?

Code Block c
pthread_set_qos_class_self_np(QOS_CLASS_UTILITY, 0);


The existing classifications don't really map well, the work is user-initiated (i.e. they launched a console application), but not a GUI program. Would I use 4 threads with QOS_CLASS_UTILITY and 4 with QOS_CLASS_BACKGROUND? Would I just use UTILITY with relative priority for performance vs. efficiency cores?

Replies

Were you able to solve this? I'm looking to do some research on related topics and this would be what I need to make my research feasible.
Nope, but I hope to be able to do some experiments on real hardware once by DTK replacement arrives.

I've been investigating a similar issue with my codebase.

In particular, as this stack overflow topic points out, when my threads are distributed across all cores with equal workload, the efficiency cores create a significant bottleneck: https://stackoverflow.com/questions/66348801/how-to-utilize-the-high-performance-cores-on-apple-silicon

I have been searching high and low for a mechanism to deal with this. It seems like disabling efficiency cores for my application would be better than the current situation.

(EDIT: Seems there's an open ticket in this thread for future readers: https://developer.apple.com/forums/thread/703361)

Yeah, I've been running as many threads as there are P-cores for my simulations and making sure to never yield. Unfortunately, it isn't always easy / possible to that, so I would still appreciate either a proper core affinity API, or at least a way to opt out of E-cores.

The core asymmetry can create quite annoying situations with OpenMP for example where work is equally distributed.