HI - I'm trying to implement a Barnes-Hut N-Body simulation code in Metal. The code requires construction of a tree. The CUDA implementation uses locks to allow insertion of new nodes into the tree.
I've tried using an array of atomic ints in a test case, but this doesn't seem to work:
kernel void binning_compute_function(
device MyArgument *arg1 [[ buffer(0)]],
constant float *ranarr [[ buffer(1) ]],
device volatile atomic_int *flagArr [[ buffer(2) ]],
device int *bins [[buffer(3)]],
uint index [[ thread_position_in_grid ]]) {
int expected=0;
int ibin = (ranarr[index] * arg1->nbins);
for (int i = 0; i < 100000000; i++) {
// Lock
expected = 0;
bool test = !atomic_compare_exchange_weak_explicit(&flagArr[ibin],&expected,1,memory_order_relaxed,memory_order_relaxed);
if (test) {
bins[ibin] += 1;
atomic_store_explicit(&flagArr[ibin], 0, memory_order_relaxed);
break;
}
}
}
Any other suggestions? The alternative is to use the CPU for this, but seems a shame to miss out on the processing power of the GPU.
Thank you,
Colin