Error when training a Convolutional Neural Network (MPS)

I'm trying to train a network in MPS which consists of a series of CNN blocks (conv layer, activation and max pooling) and a couple of fully connected layers at the end; but each time I try and train I receive an error while inference works fine.


Error seen below.


/BuildRoot/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MetalPerformanceShaders-121.1.1/MPSCore/Utility/MPSLibrary.mm:218: failed assertion `MPSKernel MTLComputePipelineStateCache unable to load function cnnConv_Update_32x32.

Compiler encountered an internal error: (null)


I have managed to train a simple network which consists of a single convolutional layer (node) but anytime anytime I stacked multiple convolutional blocks I get the above error - cnnConv_Update_32x32.


Curious to know if anyone has came across (and, ideally) resolved it or some inkling as to what might be causing this error.


Any ideas much appreciated,

Josh

Some further details;


As mentioned previously; no errors are thrown when performing inference (straight forward pass) only when training (during backpropagation) specifically when it reaches a convolutional node which I suspect it has something to do during convolution transpose i.e. upsamlpling from a pooling layer; in addition, by setting the MPSGraph's input node to a node prior to a pooling and convolution layer avoids the error. This lends itself to an issue with the padding, possibly how it handles the edges but what is suspicious is by reducing the input size from 128 to 64 (or smaller) seems to also silient this error.


As of before; appreciate any suggestions, thoughts as to what the issue may be or references/sample code where something has successfully implemented a CNN in MPS - thanks.

I don't think many people have used this part of MPS yet, and so it wouldn't surprise me if you ran into a bug. The only sample code I know of is Turi Create, which uses MPS for GPU training on the Mac. It's open source at github . com / apple/turicreate, so I suggest looking there.

Thanks for the suggestion (good suggestion) unfortunately after browsing through their code (Turicreate) I cannot see an obvious entry point into MPS. I have also submitted a query to Apple incase it is it is a bug.


Thanks again for your reply.

Josh

It's here: h ttps://github.com/apple/turicreate/blob/master/src/unity/toolkits/neural_net/mps_layers.mm

I managed to resolve this (sort of); instead of using a pooling layer to reduce the image, I insttead used the stride of a convolutional layer to get it down to a size thatt MPS 'would support'. I'm a little unsure how/why but it appears there are some issues using default padding i.e. the gradient pass 'tripped' up on pooling layers operating on lagrer inputs - I painfully outputting the shape of each input/outputt of the graph but couldn't find any obvious issues. Something I want to explore us building up the graph in absence of the MPSNNGraph.

Hi Josh,

Can you add reproducible code, iOS version and detail of device being used to the radar you filed.

Thanks.

Inam.

Error when training a Convolutional Neural Network (MPS)
 
 
Q