Any way to implement a lock in Metal Shader Language?

Question

Created Jul ’22

Replies 3

Boosts 0

Views 1.2k

Participants 2

HI - I'm trying to implement a Barnes-Hut N-Body simulation code in Metal. The code requires construction of a tree. The CUDA implementation uses locks to allow insertion of new nodes into the tree.

I've tried using an array of atomic ints in a test case, but this doesn't seem to work:

kernel void binning_compute_function(
        device MyArgument *arg1 [[ buffer(0)]],
        constant float *ranarr      [[ buffer(1) ]],
        device volatile atomic_int *flagArr [[ buffer(2) ]],
        device int *bins [[buffer(3)]],
         uint   index [[ thread_position_in_grid ]]) {

    int expected=0;
    int ibin = (ranarr[index] * arg1->nbins);

    for (int i = 0; i < 100000000; i++) {
        // Lock
        expected = 0;
        bool test = !atomic_compare_exchange_weak_explicit(&flagArr[ibin],&expected,1,memory_order_relaxed,memory_order_relaxed);

        if (test) {
            bins[ibin] += 1;
            atomic_store_explicit(&flagArr[ibin], 0, memory_order_relaxed);
            break;
        }
    }
}

Any other suggestions? The alternative is to use the CPU for this, but seems a shame to miss out on the processing power of the GPU.

Thank you,

Colin

Boost

Answer 1

endecotp OP

Jul ’22

atomic_compare....() returns true if the location was 0 and has been updated to 1.

Your atomic_store...() stores 0.

Why do you have the ! in bool test = !atomic_compare...()? (A better name than "test" would help.)

Maybe I'm missing something, but I think you want:

expected = 0;  // We hope it is unlocked
bool succeeded_in_locking = atomic_compare_exchange....(flag, &expected, 1, ....);   // Try to lock by storing a 1 in place of the 0.
if (succeeded_in_locking) {
  // OK, we have the lock this is where we would do the work protected by it.
  // Now we want to unlock.
  // Just write 0:
  atomic_store...(flag, 0, ....);
} else {
  // We didn't lock - presumably some other thread has it.
  // Try again.
}

Anyway... I'm not convinced that a memory_order_relaxed atomic can be used to implement a lock, as there are no constraints on the ordering of those atomic operations relative to the operations that you are trying to protect.

But I know nothing about Metal.

0

Answer 2

mcgillca OP

Jul ’22

Thank you very much for your reply.

You are right on both counts - I originally had the ! since it was implemented as a "while" loop, and I forgot to change this when I altered to a "for" loop.

As you predicted, however, it still doesn't work - I still have a race condition on the threads where the locks appear to be set, and some threads never get to set the lock.

I can't see how to do this inside Metal - fences seem to work between encoders, rather than inside a kernel.

Is this a current limitation of Metal or are there techniques for solving this?

Thank you again,

Colin

0

Answer 3

mcgillca OP

Jul ’22

I found these symbols appears in Xcode:

METAL_MEMORY_ORDER_ACQUIRE METAL_MEMORY_ORDER_RELEASE

Any idea how to use these anyone?

I couldn't find them in any of the metal include files, though METAL_MEMORY_ORDER_RELAXED

was defined in metal_types.

Thank you,

Colin

0