Render advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.

Metal Documentation

Posts under Metal subtopic

Post

Replies

Boosts

Views

Activity

Normally distributed MPSMatrixRandom number generation generates NaN
When generating large arrays of random numbers, NaNs show up. They also show up at the same indices when using the same seed, leading me to believe that this is a bug with MPSMatrixRandom's normally distributed Float32 random number distribution. Happens with both Philox and MTGP32. Is this intentional and how do I work around this? See the original post for a MWE in Swift and Julia: https://github.com/JuliaGPU/Metal.jl/issues/474
1
1
637
Dec ’24
Metal and NVIDIA graphic driver
Hi, A user sent us a crash report that indicates an error occurring just after loading the default Metal library of our app. Application Specific Information: Crashing on exception: *** -[__NSArrayM objectAtIndex:]: index 0 beyond bounds for empty array The report pointed me to these (simplified) lines of codes in the library setup: _vertexFunctions = [[NSMutableArray alloc] init]; _fragmentFunctions = [[NSMutableArray alloc] init]; id<MTLLibrary> library = [device newDefaultLibrary]; 2 vertex shaders and 5 fragment shaders are then loaded and stored in these two arrays using this method: -(BOOL) addShaderNamed:(NSString *)name library:(id<MTLLibrary>)library isFragment:(BOOL)isFragment { id shader = [library newFunctionWithName:name]; if (!shader) { ALOG(@"Error : Unable to find the shader named : “%@”", name); return NO; } [(isFragment ? _fragmentFunctions : _vertexFunctions) addObject:shader]; return YES; } As you can see, the arrays are not filled if the method fails... however, a few lines later, they are used without checking if they are really filled, and that causes the crash... But this coding error doesn't explain why no shader of a certain type (or both types) have been added to the array, meaning: why -newFunctionWithName: returned nil for all given names (since the implied array appears completely empty)? Clue This error has only be detected once by a user running the app on macOS 10.13 with a NVIDIA Web Driver instead of the default macOS graphic driver. Moreover, it wasn't possible to reproduce the problem on the same OS using the native macOS driver. So my question is: is it some known conflicts between NVIDIA drivers and the use of Metal libraries? Or does this case would require some specific options in the Metal implementation? Any help appreciated, thanks!
1
0
643
Dec ’24
Can't link metal-cpp to Modern Rendering With Metal sample
There is a sample project from Apple here. It has a scene of a city at night and you can move in it. It basically has 2 parts: application code written in what looks like Objective-C (I am more familiar with C++), which inherits from things like NSObject, MTKView, NSViewController and so on - it processes input and all app-related and window-related stuff. rendering code that also looks like Objective-C. Btw both parts are mostly in .mm files (Obj-C++ AFAIK). The application part directly uses only one class from the rendering part - AAPLRenderer. I want to move the rendering part to C++ using metal-cpp. For that I need to link metal-cpp to the project. I did it successfully with blank projects several times before using this tutorial. But with this sample project Xcode can't find Foundation/Foundation.hpp (and other metal-cpp headers). The error says this: Did not find header 'Foundation.hpp' in framework 'Foundation' (loaded from '/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX15.0.sdk/System/Library/Frameworks') Pls help
2
0
822
Feb ’25
Metal runtime shader library compilation and linking issue
In my project I need to do the following: In runtime create metal Dynamic library from source. In runtime create metal Executable library from source and Link it with my previous created Dynamic library. Create compute pipeline using those two libraries created above. But I get the following error at the third step: Error Domain=AGXMetalG15X_M1 Code=2 "Undefined symbols: _Z5noisev, referenced from: OnTheFlyKernel " UserInfo={NSLocalizedDescription=Undefined symbols: _Z5noisev, referenced from: OnTheFlyKernel } import Foundation import Metal class MetalShaderCompiler { let device = MTLCreateSystemDefaultDevice()! var pipeline: MTLComputePipelineState! func compileDylib() -> MTLDynamicLibrary { let source = """ #include <metal_stdlib> using namespace metal; half3 noise() { return half3(1, 0, 1); } """ let option = MTLCompileOptions() option.libraryType = .dynamic option.installName = "@executable_path/libFoundation.metallib" let library = try! device.makeLibrary(source: source, options: option) let dylib = try! device.makeDynamicLibrary(library: library) return dylib } func compileExlib(dylib: MTLDynamicLibrary) -> MTLLibrary { let source = """ #include <metal_stdlib> using namespace metal; extern half3 noise(); kernel void OnTheFlyKernel(texture2d<half, access::read> src [[texture(0)]], texture2d<half, access::write> dst [[texture(1)]], ushort2 gid [[thread_position_in_grid]]) { half4 rgba = src.read(gid); rgba.rgb += noise(); dst.write(rgba, gid); } """ let option = MTLCompileOptions() option.libraryType = .executable option.libraries = [dylib] let library = try! self.device.makeLibrary(source: source, options: option) return library } func runtime() { let dylib = self.compileDylib() let exlib = self.compileExlib(dylib: dylib) let pipelineDescriptor = MTLComputePipelineDescriptor() pipelineDescriptor.computeFunction = exlib.makeFunction(name: "OnTheFlyKernel") pipelineDescriptor.preloadedLibraries = [dylib] pipeline = try! device.makeComputePipelineState(descriptor: pipelineDescriptor, options: .bindingInfo, reflection: nil) } }
4
0
855
Feb ’25
In Metal compute kernels, when do thread variables get spilled into the device memory?
How many 32-bit variables can I use concurrently in a single thread of a Metal compute kernel without worrying about the variables getting spilled into the device memory? Alternatively: how many 32-bit registers does a single thread have available for itself? Let's say that each thread of my compute kernel needs to store and work with its own array of N float variables, where N can be 128, 256, 512 or more. To achieve maximum possible performance, I do not want to the local thread variables to get spilled into the slow device memory. I want all N variables to be stored "on-chip", in the thread memory space. To make my question more concrete, let's say there is an array thread float localArray[N]. Assuming an unrealistic hypothetical scenario where localArray is the only variable in the whole kernel, what is the maximum value of N for which no portion of localArray would get spilled into the device memory? I searched in the Metal feature set tables, but I could not find any details.
1
0
573
Mar ’25
How to tell when Metal is "done" with an id<MTLTexture>, so we can release it properly.
We have a production Metal app with a complex multithreaded Metal pipeline. When everything is operating smoothly, it works great. Even when extremely overloaded, it does not crash for days at a time. This isn't good enough for our users. Unfortunately, when I have zero visibility into id, I have no way of knowing when metal is "done" with an id. When overloaded, stale metal render passes need to be 'aborted', which results in metal callbacks not being called. for example, these callbacks may not be called after an aborted pass: id<MTLCommandBuffer> m_cmdbuf; [m_cmdbuf addScheduledHandler:^(id <MTLCommandBuffer> cb) { cpr->scheduled = MachAbsoluteTime(); }]; [m_cmdbuf addCompletedHandler:^(id <MTLCommandBuffer> cb) { cpr->completed = MachAbsoluteTime(); }]; For the moment, our workaround is a system which waits a few seconds after we "think" a rendering pass should be done with all its (aborted) resources before releasing buffers. This is not ideal, to say the least. So, in summary, my question is, it would be nice to be able to 'query' an id to know when metal is done with it, so that we know that its safe to release it along with our own internal resources. Is there any such (undocumented) mechanism? I have exhaustively read all existing Metal documentation many times. An idea that I've been toying with... it would be nice to have something akin to Zombie detection running all the time for id only. In OpenGL, it was OK to use a released texture... you may display a bad frame, but not CRASH!. Is there any similar option for id?
1
0
616
Dec ’24
Metal and Swift Concurrency
Hi, Introducing Swift Concurrency to my Metal app has been a bit challenging as Swift Concurrency is limited by the cooperative thread pool. GPU work is obviously not CPU bound and can block forward moving progress, especially when using waitUntilCompleted on the command buffer. For concurrent render work this has the potential of under utilizing the CPU and even creating dead locks. My question is, what is the Metal's teams general recommendation when it comes to concurrency? It seems to me that Dispatch or OperationQueues are still the preferred way for Metal bound tasks in order to gain maximum performance? To integrate with Swift Concurrency my idea is to use continuations that kick off render jobs via Dispatch or Queues? Would this be the best solution to bridge async tasks with Metal work? Thanks!
5
0
1k
Apr ’25
Sample Project for WWDC24 10092 Metal with Passthrough?
It’s great that we’ll be able to use Metal custom renderers in passthrough mode on visionOS. https://developer.apple.com/wwdc24/10092 This is a lot of complicated set-up, however. It’s also unclear how occlusion and custom algorithms / raytracing will work in tandem with scene understanding. May we have a project template and/or sample? Preferably with the C api and not just swift. This would be much-appreciated and helpful to everyone who wants this set-up. I’d like to see the whole process. Thank you for introducing this feature!
3
1
1.1k
Nov ’24
Using the same texture for both input & output of a fragment shader
Hello, This exact question was already asked in this forum (8 years ago) but I can't find a definitive answer: Does Metal allow using the same color texture as both an input and output (color attachment) of a fragment shader? Is the behavior defined somewhere? I believe this results in undefined behavior under both DirectX and OpenGL, so I'd assume the same for Metal, but then why doesn't Metal warn me about this as it does on some many other "misconfigurations"? It also seems to work correctly in my case, as I found out by accident. Would love to get a clarification! Thanks ahead!
8
1
1.2k
Nov ’24
Metal texture allocated size versus actual image data size
Hello. In the iOS app i'm working on we are very tight on memory budget and I was looking at ways to reduce our texture memory usage. However I noticed that comparing ASTC8x8 to ASTC12x12, there is no actual difference in allocated memory for most of our textures despite ASTC12x12 having less than half the bpp of 8x8. The difference between the two only becomes apparent for textures 1024x1024 and larger, and even in that case the actual texture data is sometimes only 60% of the allocation size. I understand there must be some alignment and padding going on, but this seems extreme. For an example scene in my app with astc12x12 for most textures there is over a 100mb difference in astc size on disk versus when loaded, so I would love to be able to recover even a portion of that memory. Here is some test code with some measurements i've taken using an iphone 11: for(int i = 0; i &lt; 11; i++) { MTLTextureDescriptor *texDesc = [[MTLTextureDescriptor alloc] init]; texDesc.pixelFormat = MTLPixelFormatASTC_12x12_LDR; int dim = 12; int n = 2 &lt;&lt; i; int mips = i+1; texDesc.width = n; texDesc.height = n; texDesc.mipmapLevelCount = mips; texDesc.resourceOptions = MTLResourceStorageModeShared; texDesc.usage = MTLTextureUsageShaderRead; // Calculate the equivalent astc texture size int blocks = 0; if(mips == 1) { blocks = n/dim + (n%dim&gt;0? 1 : 0); blocks *= blocks; } else { for(int j = 0; j &lt; mips; j++) { int a = 2 &lt;&lt; j; int cur = a/dim + (a%dim&gt;0? 1 : 0); blocks += cur*cur; } } auto tex = [objCObj newTextureWithDescriptor:texDesc]; printf("%dx%d, mips %d, Astc: %d, Metal: %d\n", n, n, mips, blocks*16, (int)tex.allocatedSize); } MTLPixelFormatASTC_12x12_LDR 128x128, mips 7, Astc: 2768, Metal: 6016 256x256, mips 8, Astc: 10512, Metal: 32768 512x512, mips 9, Astc: 40096, Metal: 98304 1024x1024, mips 10, Astc: 158432, Metal: 262144 128x128, mips 1, Astc: 1936, Metal: 4096 256x256, mips 1, Astc: 7744, Metal: 16384 512x512, mips 1, Astc: 29584, Metal: 65536 1024x1024, mips 1, Astc: 118336, Metal: 147456 MTLPixelFormatASTC_8x8_LDR 128x128, mips 7, Astc: 5488, Metal: 6016 256x256, mips 8, Astc: 21872, Metal: 32768 512x512, mips 9, Astc: 87408, Metal: 98304 1024x1024, mips 10, Astc: 349552, Metal: 360448 128x128, mips 1, Astc: 4096, Metal: 4096 256x256, mips 1, Astc: 16384, Metal: 16384 512x512, mips 1, Astc: 65536, Metal: 65536 1024x1024, mips 1, Astc: 262144, Metal: 262144 I also tried using MTLHeaps (placement and automatic) hoping they might be better, but saw nearly the same numbers. Is there any way to have metal allocate these textures in a more compact way to save on memory?
8
0
2.7k
Mar ’25