I'm sitting on a working CUDA codebase that I would like to make run on hardware other than Nvidia GPUs. On macOS, OpenCL and Metal are the obvious choices, and with Apple not updating their OpenCL implementaiton in ages, I'm reading Metal documentation right now.
One thing that stuck out, in the Metal device capabilities table, it lists "Maximum buffer length: 256MB". Does that mean I cannot allocate a memory buffer larger than that in Metal? If so, then that already disqualifies Metal from my use cases, since I'd like to work on buffers of 1GB and more. Or am I misreading this?
What is the API of choice for GPU compute application on macOS then?
Well, despite being documented as 512MB in one place and 256MB in another place, an exception is thrown on OS X 10.11.6 when trying to allocate buffers larger than 256MB.
Example code:
Tested with XCode Version 8.0 beta 6 (8S201h), deployment target: OSX 10.11
NSArray <id<MTLDevice>> *devices = MTLCopyAllDevices();
id<MTLDevice> device = devices[0]; // in my case: "NVIDIA GeForce GT 750M"
id<MTLCommandQueue> commandQueue = [device newCommandQueue];
id<MTLCommandBuffer> commandBuffer = [commandQueue commandBuffer];
id<MTLComputeCommandEncoder> encoder = [commandBuffer computeCommandEncoder];
id<MTLBuffer> buffer = [device newBufferWithLength:(256*1024*1024) options:MTLResourceStorageModeShared]; // works
id<MTLBuffer> buffer2 = [device newBufferWithLength:(512*1024*1024) options:MTLResourceStorageModeShared]; // throws exception
[commandBuffer commit];
[commandBuffer waitUntilCompleted];Exception thrown by the code on line 7:
/Library/Caches/com.apple.xbs/Sources/Metal/Metal-56.6.1/ToolsLayers/Debug/MTLDebugDevice.mm:89: failed assertion `newBufferWith*:length 0x20000000 must not exceed 256 MB.'