Matrix Math + GPU

Does anyone know where I should start looking to perform matrix-based calculations that are GPU accelerated? I've lately gotten into neural networks, which require a huge amount of matrix math. I've been able to train neural networks on servers/computers, and there are cases where I want to run the network on iOS devices.

I've successfully ported some neural networks, like for handwriting recognition, to Swift.


However, it quickly slows down when I increase the complexity. For example, it will analyze a 20x20 image just fine, but when I start increasing the resolution to 56x56 or more, the number of calculations increases exponentially. It is a perfect scenario for a GPU.

Are there any good matrix libraries/frameworks that let you create variable sized matrices and perform calculations with them, that are GPU accelerated?

Thanks!

Do you have to use Metal for this? Because frankly, I'd think of CUDA first, OpenCL second, with Metal far on the list. Do not get me wrong, Metal is lovely, it is just that it is available on limited range of devices right now. And not very powerful ones at that. Plus, it is pretty new, so no much code is already written for it.

I'm using CUDA to train the networks. But on an iOS device, for applications like computer vision, it's fastest to run it natively.

What I mean is: all the really intensive training happens on my desktop/AWS. But once it's trained I need to run it on iOS. I don't need to do any intensive backpropagation in iOS, but just trying to do a forward pass on a single image is pretty intensive in some cases. Multi threading + SIMD can only do so much.

Accepted Answer

Please take a look at Metal Performance Shaders framework in iOS10 which now adds support to accelerate convolution neural networks (CNN) on iOS GPUs. You should be able to use the APIs provided by the MetalPerformanceShaders.framework to run your inference network on iOS. There is also an example available (https://developer.apple.com/library/prerelease/content/samplecode/MetalImageRecognition/Introduction/Intro.html) that shows InceptionV3 network from TensorFlow ported to use these CNN APIs in Metal Performance Shaders framework. Also recommend the What's New In Metal Part II talk at WWDC this year where we describe how to build CNNs in iOS using the MPS framework. Let us know if you have any questions about these APIs.

That's awesome! Thanks amunshi!

Matrix Math + GPU
 
 
Q