Is there an implicit ordering to thread group execution?

In a compute ("kernel") function, is there an implicit ordering to threadgroup execution.


Say I have a 32xN buffer. Take following trivial kernel (ignoring boundary conditions):

kernel void example(device uint *buffer [[ buffer(0) ]],
                    const uint2 pos [[ thread_position_in_grid ]]) {
     uint index = pos.y * N + pos.x;
     buffer[i - 1] = buffer[i]
     buffer[i + 1] = 0;
}


If we dispatch that kernel in N 32x1 threadgroups, is there any way that the thread groups are executed left-to-right? i.e. the resulting buffer is all 0 (once again, ignoring the boundary conditions).


Thanks

To my knowledge, you cant assume an order of execution. (Implementation dependent)


you can insert threadgroup barriers inside a kernal to ensure that all threads in that one thread group, have executed up to the barrier.

This can be important if your are initializing the threadgroup's own memory.

Correct, any number of threadgroups can be launched in any order. If you want one threadgroup to run before another, you'll need to put them into different grids.

Is there an implicit ordering to thread group execution?
 
 
Q