Hi,
I'm currently working on a GPU driven pipeline and am in need of setting compute pipeline state from within the compute shader which fills in the ICB. (I've narrowed down the problem to how the commands get encoded in the ICB - the pipelines which I'm encoding into the ICB all work individually).
For some reason, if I encode more than one compute_command into my ICB, the Metal GPU Replayer acts up (the presentable is all purple and none of the resources are inspectable) which makes me feel like I'm doing something very illegal in this compute shader which fills the ICB but am completely oblivious as to what it could be.
This is the compute shader which fills the ICB:
struct ExclusivePrefixSumICB
{
command_buffer exclusivePrefixSumCommands [[id(0)]];
command_buffer exclusivePrefixSumAccCommands [[id(1)]];
compute_pipeline_state exclusivePrefixSumPipeline[[id(2)]];
compute_pipeline_state exclusivePrefixSumAccPipeline[[id(3)]];
compute_pipeline_state increaseCommandNumber[[id(4)]];
};
template <typename T, uint kBlockSize, uint kGrainSize>
kernel void kernelExclusivePrefixSumDispatch(device T *inputData [[buffer(0)]],
device T *outputData [[buffer(1)]],
device ExclusivePrefixSumInfo *info [[buffer(2)]],
device ExclusivePrefixSumICB &icb [[buffer(3)]],
device PrefixSumNBuffer *nBuffer [[buffer(4)]],
uint lid [[thread_position_in_threadgroup]])
{
uint numValuesInInput = info->numElements;
uint offset = 0;
uint offsets[MAX_COMMANDS_IN_PREFIX_SUM][2] = {};
uint offsetsCount = 0;
uint iterCount = ceil(log2((float)numValuesInInput) / log2((float)PREFIX_SUM_NUM_VALUES_PER_THREADGROUP));
iterCount = 1;
uint commandCount = 0;
for (uint i = 0; i < iterCount; ++i)
{
{ /* Prefix scan pipeline. */
compute_command cmd(icb.exclusivePrefixSumCommands, i*2);
offsets[offsetsCount][0] = offset;
offsets[offsetsCount][1] = offset+numValuesInInput;
nBuffer[commandCount].commandNumber = 0;
nBuffer[commandCount].n = numValuesInInput;
nBuffer[commandCount].inputDataOffset = offset;
nBuffer[commandCount].outputDataOffset = offset;
nBuffer[commandCount].partialSumOffset = offset + numValuesInInput;
cmd.set_compute_pipeline_state(icb.exclusivePrefixSumPipeline);
cmd.set_kernel_buffer(outputData, 0);
cmd.set_kernel_buffer(inputData, 1);
cmd.set_kernel_buffer(nBuffer, 2);
uint3 threadgroupGridSize = uint3(divideRoundUp((uint)numValuesInInput, (uint)PREFIX_SUM_NUM_VALUES_PER_THREADGROUP), 1, 1);
uint3 threadgroupSize = uint3(PREFIX_SUM_NUM_THREADS_PER_THREADGROUP, 1 ,1);
cmd.concurrent_dispatch_threadgroups(threadgroupGridSize, threadgroupSize);
cmd.set_barrier();
offset += numValuesInInput;
numValuesInInput = divideRoundUp((uint)numValuesInInput, (uint)PREFIX_SUM_NUM_VALUES_PER_THREADGROUP);
}
{ /* Increase command number: */
compute_command increaseCmd(icb.exclusivePrefixSumCommands, i*2+1);
increaseCmd.set_compute_pipeline_state(icb.increaseCommandNumber);
increaseCmd.set_kernel_buffer(nBuffer, 0);
increaseCmd.concurrent_dispatch_threadgroups(uint3(1, 1, 1), uint3(1, 1, 1));
increaseCmd.set_barrier();
}
++offsetsCount;
++commandCount;
}
}
Is there anything obviously wrong with the way I am encoding the commands into the ICB?
Furthermore, the GPU replayed complains that the pipeline exclusivePrefixSumPipeline is missing all its buffer bindings even though I'm setting them right there in the shader. Furthermore, the replayer warns that "The ICB doesn’t inherit the parent encoder’s buffer arguments; specify the command’s buffer parameters when you encode commands into the ICB." which I'm guessing it's telling me because it thinks that this means that the pipelines are missing their buffer bindings (I intentionally made the ICB not inherit pipelines/buffers so I can set them in the shader).
Sorry for the blurt of stuff, but I am at a complete loss of how to do this and there are no examples of compute commands being encoded into ICBs on the GPU.
I would really appreciate any support / tips to figure out what is wrong.
(P.S. all these issues would go away if there is a way to get the command index (passed into the constructor of compute_command) from the shader which that command invokes so if there is a way to do that, please let me know. All these issues stem from the fact that I need each invocation of the exclusivePrefixSumPipeline to be able to access data that is individual to each invocation. Equivalently if there is an equivalent of setBytes or something that would be exactly what I need).