I'm trying to hint the task scheduler that some threads should be scheduled together using the task_policy_set
API with THREAD_AFFINITY_POLICY (in lieu of there being no "real" thread to core affinity API).
All the examples mention setting the policy after creation but before execution of the task(s). Unfortunately, I'm not creating these tasks (but OpenMP is), and when I then try to use the API on an already running thread, I get a return value of KERN_INVALID_ARGUMENT(= 4)
thread_affinity_policy_data_t policy = { 1 };
auto r = thread_policy_set(mach_task_self(), THREAD_AFFINITY_POLICY, (thread_policy_t)&policy, THREAD_AFFINITY_POLICY_COUNT);
When I replace mach_task_self()
by pthread_mach_thread_np(pthread_self())
, I get an KERN_NOT_SUPPORTED
error instead (= 46, "Empty thread activation (No thread linked to it)").
Has anyone used these APIs successfully on an already running thread?
Background: The code I'm working on divides a problem set into a small number of roughly equal sized pieces (e.g. 8 or 16, this is an input parameter derived from the number of cores to be utilized). These pieces are not entirely independent but need to be processed in lock-step (as occasionally data from neighboring pieces is accessed).
Sometimes when a neighboring piece isn't ready yet for a fairly long time, we call std::this_thread::yield()
which unfortunately seems to indicate to the scheduler that this thread should move to the efficiency cores (which then wreaks havoc with the assumption of each computation over a piece roughly requiring the same amount of time so all threads can remain in lock-step). :(
A similar (?) problem seems to happen with OpenMP barriers, which have terrible performance on the M1 Ultra at least unless KMP_USE_YIELD=0
is used (for the OpenMP run-time from LLVM). Can this automatic migration (note: not the relinquishing of the remaining time-slice) be prevented?