With the new MLCompute framework in Objective-C, I built a simple MLCInferenceGraph for testing. It just runs one layer, the batched MatMul.
Works fine except for one thing.
With all two input and one output MLCTensor allocated on the CPU, when you call repeatedly (e.g. with new input data :)
I do not want that as I have already pre-allocated, properly cache-aligned CPU memory for all the input and output buffers i.e. no new allocations necessary. And this is CPU-only i.e. no sync needs.
First, I tried binding the outputData via addOutputs: (like input) at the MLCInferenceGraph [1,2] but still the resultTensor data in completionHandler is always newly allocated.
Second, I tried using executeWithInputsData [1,3] with outputsData: myPreallocatedOutdata and still it is newly allocated on every invocation.
Question: How can we as users avoid the new allocation of output data on every invocation of
(Via crashes I see that an internal call to
PS: I check the [resultTensor data] in completionHandler to verify whether I get my pre-allocated tensor/data buffers or not.
What am I missing :) ? Any solutions?
[1] https://developer.apple.com/documentation/mlcompute/mlcinferencegraph?language=objc
[2] https://developer.apple.com/documentation/mlcompute/mlcinferencegraph/3579690-addoutputs?language=objc
[3] https://developer.apple.com/documentation/mlcompute/mlcinferencegraph/3579696-executewithinputsdata?language=objc
Works fine except for one thing.
With all two input and one output MLCTensor allocated on the CPU, when you call repeatedly (e.g. with new input data :)
[MLCInferenceGraph executeWithInputsData:batchSize:options:completionHandler:]
I do not want that as I have already pre-allocated, properly cache-aligned CPU memory for all the input and output buffers i.e. no new allocations necessary. And this is CPU-only i.e. no sync needs.
First, I tried binding the outputData via addOutputs: (like input) at the MLCInferenceGraph [1,2] but still the resultTensor data in completionHandler is always newly allocated.
Second, I tried using executeWithInputsData [1,3] with outputsData: myPreallocatedOutdata and still it is newly allocated on every invocation.
[MLCInferenceGraph executeWithInputsData:outputsData:batchSize:options:completionHandler:]
Question: How can we as users avoid the new allocation of output data on every invocation of
[MLCInferenceGraph executeWithInputsData:batchSize:options:completionHandler:]
(Via crashes I see that an internal call to
[MLCDeviceCPU(MLCEngineDispatch) dispatchForwardMatMulLayer:sourceTensor:secondarySourceTensor:resultTensor:resultTensorIsTemporary:resultTensorAllocate:] + 204)
PS: I check the [resultTensor data] in completionHandler to verify whether I get my pre-allocated tensor/data buffers or not.
What am I missing :) ? Any solutions?
[1] https://developer.apple.com/documentation/mlcompute/mlcinferencegraph?language=objc
[2] https://developer.apple.com/documentation/mlcompute/mlcinferencegraph/3579690-addoutputs?language=objc
[3] https://developer.apple.com/documentation/mlcompute/mlcinferencegraph/3579696-executewithinputsdata?language=objc