Loop unrolling

How can I unroll loops in Metal compute kernel? I've tried

#pragma unroll(n)
, but the compiler ignores it.


#pragma unroll(16)
for (int i=0; i<16; i++) {
 // how to unroll contents of this loop?
}

I'd just move the loop body into a macro and call it 16 times and compare performance. Not the most elegant solution, I know. You could then move it into a separate function and compare performance again.

I've actually tried this approach (3x performance boost for loop iterating 108 times), but calling a macro 108 times is (as you wrote) messy,

especially if I'd want to change the number of times it's called or fiddle with the inner code, not mentioning nested loops.


That's why I'd really prefer the compiler to unroll it for me.

Have you thought about using Duff's device? This is no compiler and ugly hack in general, but perhaps?

Haven't heard about Duff's device – I'm always learning something new thanks to you 😁.

If I understand Duff's device correctly, I believe it wouldn't benefit me, because it still performs do-while – moreover, I use 2 nested loops and I need to access the loops' indices.

Essentially, I'm looking for a way by which the compiler could fully unroll my loop, so there'd be no loops at all in the compiled code.

Accepted Answer

Hah I used to work down the hall from Tom Duff of Duff's device fame :-)


I suggested a template metaprogramming approach over on stackoverflow... I'm assuming this is your question:


http://stackoverflow.com/questions/41249758/loop-unrolling-in-metal-kernels/41476975#41476975

Loop unrolling
 
 
Q