Render advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.

Metal Documentation

Posts under Metal subtopic

Post

Replies

Boosts

Views

Activity

M1 GPU violates atomic_thread_fence across threadgroups
I have an M1 Pro with a 16-core GPU. When I run a shader with 8193 threads, atomic_thread_fence is violated across the boundary between thread 8191 (the last thread in the 7th threadgroup) and 8192 (the first thread in the 9th threadgroup). I've attached the Metal and Swift files, but I'll repost the relevant kernel here. It's a function that launches N threads to iterate through a binary tree from the leaves, where the first thread to reach the parent terminates and the second one populates it with the sum of the nodes two children. // clang-format off void sum(device const int& size, device const int* __restrict__ in, device int* __restrict__ out, device atomic_int* visited, uint i [[thread_position_in_grid]]) { // clang-format on int val = in[i]; uint cur = (size + i - 1); out[cur] = val; atomic_thread_fence(mem_flags::mem_device, memory_order_seq_cst); cur = (cur - 1) / 2; int proceed = atomic_fetch_add_explicit(&visited[cur], 1, memory_order_relaxed); while (proceed == 1) { uint left = 2 * cur + 1; uint right = 2 * cur + 2; uint val_left = out[left]; uint val_right = out[right]; uint val_cur = val_left + val_right; out[cur] = val_cur; if (cur == 0) { break; } cur = (cur - 1) / 2; atomic_thread_fence(mem_flags::mem_device, memory_order_seq_cst); proceed = atomic_fetch_add_explicit(&visited[cur], 1, memory_order_relaxed); } } What I'm observing is that thread 8192 hits the atomic_fetch_add first and terminates, while thread 8191 hits it second (observes that thread 8192 had incremented it by 1) and proceeds into the loop. Thread 8191 reads out[16383] (which it populated with 8191) and out[16384] (which thread 8192 populated with 8192 prior to the atomic_thread_fence). Instead of reading 8192 from out[16384] though, it reads 0. Maybe I'm missing something but this seems like a pretty clear violation of the atomic_thread_fence which (I thought) was supposed to guarantee that the write from thread 8192 to out[16384] would be visible to any thread observing the effects of the following atomic_fetch_add. Is atomic_fetch_add not a store operation? Modifying it to an atomic_store or atomic_exchange still results in the bug. Adding another atomic_thread_fence between the atomic_fetch_add and reading of out also doesn't change anything. I only begin to observe this on grid sizes of 8193 and upwards. That's 9 threadgroups per grid, which I assume could be related to my M1 Pro GPU having 16 cores. Running the same example on an A17 Pro GPU doesn't show any of this behavior up through a tested grid size of 4194303 (2^22-1), at which point testing larger grid sizes starts to run into other issues so I can't test anything larger. Removing the atomic_thread_fences on both the M1 and A17 cause the test to fail at much smaller grid sizes, as expected. sum.metal main.swift
2
0
526
Dec ’24
Jurassic World Evolution 2 Likely Fails Due to Missing Tiled Resources Support
I’ve been trying to run Jurassic World Evolution 2 using the Game Porting Toolkit on macOS, but the game doesn’t launch and crashes immediately. Based on the error and research, it seems the issue is related to missing support for D3D12_TILED_RESOURCES_TIER_2 in the Metal API. If this is the case, does anyone know if support for tiled resources is planned for future updates of the toolkit? Or are there any potential workarounds for bypassing this limitation?
1
0
721
Dec ’24
MDLAsset loads texture in usdz file loaded with wrong colorspace
I have a very basic usdz file from this repo I call loadTextures() after loading the usdz via MDLAsset. Inspecting the MDLTexture object I can tell it is assigning a colorspace of linear rgb instead of srgb although the image file in the usdz is srgb. This causes the textures to ultimately render as over saturated. In the code I later convert the MDLTexture to MTLTexture via MTKTextureLoader but if I set the srgb option it seems to ignore it. This significantly impacts the usefulness of Model I/O if it can't load a simple usdz texture correctly. Am I missing something? Thanks!
3
3
841
Dec ’24
Normally distributed MPSMatrixRandom number generation generates NaN
When generating large arrays of random numbers, NaNs show up. They also show up at the same indices when using the same seed, leading me to believe that this is a bug with MPSMatrixRandom's normally distributed Float32 random number distribution. Happens with both Philox and MTGP32. Is this intentional and how do I work around this? See the original post for a MWE in Swift and Julia: https://github.com/JuliaGPU/Metal.jl/issues/474
1
1
647
Dec ’24
How to tell when Metal is "done" with an id<MTLTexture>, so we can release it properly.
We have a production Metal app with a complex multithreaded Metal pipeline. When everything is operating smoothly, it works great. Even when extremely overloaded, it does not crash for days at a time. This isn't good enough for our users. Unfortunately, when I have zero visibility into id, I have no way of knowing when metal is "done" with an id. When overloaded, stale metal render passes need to be 'aborted', which results in metal callbacks not being called. for example, these callbacks may not be called after an aborted pass: id<MTLCommandBuffer> m_cmdbuf; [m_cmdbuf addScheduledHandler:^(id <MTLCommandBuffer> cb) { cpr->scheduled = MachAbsoluteTime(); }]; [m_cmdbuf addCompletedHandler:^(id <MTLCommandBuffer> cb) { cpr->completed = MachAbsoluteTime(); }]; For the moment, our workaround is a system which waits a few seconds after we "think" a rendering pass should be done with all its (aborted) resources before releasing buffers. This is not ideal, to say the least. So, in summary, my question is, it would be nice to be able to 'query' an id to know when metal is done with it, so that we know that its safe to release it along with our own internal resources. Is there any such (undocumented) mechanism? I have exhaustively read all existing Metal documentation many times. An idea that I've been toying with... it would be nice to have something akin to Zombie detection running all the time for id only. In OpenGL, it was OK to use a released texture... you may display a bad frame, but not CRASH!. Is there any similar option for id?
1
0
632
Dec ’24
Mixing a lot of shaders.
Project: I have some data wich could be transformed by shader, result may be kept in rgb channels of image. Great. But now to mix dozens of those results? Not one by one, image after image, but all at once. Something like „complicated average” color of particular pixel from all delivered images. Is it possible?
1
0
635
Dec ’24
MTKView draw method causes EXC_BAD_ACCESS crash
Hello, I am using MTKView to display: camera preview & video playback. I am testing on iPhone 16. App crashes at a random moment whenever MTKView is rendering CIImage. MetalView: public enum MetalActionType { case image(CIImage) case buffer(CVPixelBuffer) } public struct MetalView: UIViewRepresentable { let mtkView = MTKView() public let actionPublisher: any Publisher<MetalActionType, Never> public func makeCoordinator() -> Coordinator { Coordinator(self) } public func makeUIView(context: UIViewRepresentableContext<MetalView>) -> MTKView { guard let metalDevice = MTLCreateSystemDefaultDevice() else { return mtkView } mtkView.device = metalDevice mtkView.framebufferOnly = false mtkView.clearColor = MTLClearColor(red: 0, green: 0, blue: 0, alpha: 0) mtkView.drawableSize = mtkView.frame.size mtkView.delegate = context.coordinator mtkView.isPaused = true mtkView.enableSetNeedsDisplay = true mtkView.preferredFramesPerSecond = 60 context.coordinator.ciContext = CIContext( mtlDevice: metalDevice, options: [.priorityRequestLow: true, .highQualityDownsample: false]) context.coordinator.metalCommandQueue = metalDevice.makeCommandQueue() context.coordinator.actionSubscriber = actionPublisher.sink { type in switch type { case .buffer(let pixelBuffer): context.coordinator.updateCIImage(pixelBuffer) break case .image(let image): context.coordinator.updateCIImage(image) break } } return mtkView } public func updateUIView(_ nsView: MTKView, context: UIViewRepresentableContext<MetalView>) { } public class Coordinator: NSObject, MTKViewDelegate { var parent: MetalView var metalCommandQueue: MTLCommandQueue! var ciContext: CIContext! private var image: CIImage? { didSet { Task { @MainActor in self.parent.mtkView.setNeedsDisplay() //<--- call Draw method } } } var actionSubscriber: (any Combine.Cancellable)? private let operationQueue = OperationQueue() init(_ parent: MetalView) { self.parent = parent operationQueue.qualityOfService = .background super.init() } public func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) { } public func draw(in view: MTKView) { guard let drawable = view.currentDrawable, let ciImage = image, let commandBuffer = metalCommandQueue.makeCommandBuffer(), let ci = ciContext else { return } //making sure nothing is nil, now we can add the current frame to the operationQueue for processing operationQueue.addOperation( MetalOperation( drawable: drawable, drawableSize: view.drawableSize, ciImage: ciImage, commandBuffer: commandBuffer, pixelFormat: view.colorPixelFormat, ciContext: ci)) } //consumed by Subscriber func updateCIImage(_ img: CIImage) { image = img } //consumed by Subscriber func updateCIImage(_ buffer: CVPixelBuffer) { image = CIImage(cvPixelBuffer: buffer) } } } now the MetalOperation class: private class MetalOperation: Operation, @unchecked Sendable { let drawable: CAMetalDrawable let drawableSize: CGSize let ciImage: CIImage let commandBuffer: MTLCommandBuffer let pixelFormat: MTLPixelFormat let ciContext: CIContext init( drawable: CAMetalDrawable, drawableSize: CGSize, ciImage: CIImage, commandBuffer: MTLCommandBuffer, pixelFormat: MTLPixelFormat, ciContext: CIContext ) { self.drawable = drawable self.drawableSize = drawableSize self.ciImage = ciImage self.commandBuffer = commandBuffer self.pixelFormat = pixelFormat self.ciContext = ciContext } override func main() { let width = Int(drawableSize.width) let height = Int(drawableSize.height) let ciWidth = Int(ciImage.extent.width) //<-- Thread 22: EXC_BAD_ACCESS (code=1, address=0x5e71f5490) A bad access to memory terminated the process. let ciHeight = Int(ciImage.extent.height) let destination = CIRenderDestination( width: width, height: height, pixelFormat: pixelFormat, commandBuffer: commandBuffer, mtlTextureProvider: { [self] () -> MTLTexture in return drawable.texture }) let transform = CGAffineTransform( scaleX: CGFloat(width) / CGFloat(ciWidth), y: CGFloat(height) / CGFloat(ciHeight)) do { try ciContext.startTask(toClear: destination) try ciContext.startTask(toRender: ciImage.transformed(by: transform), to: destination) } catch { } commandBuffer.present(drawable) commandBuffer.commit() commandBuffer.waitUntilCompleted() } } Now I am no Metal expert, but I believe it's a very simple execution that shouldn't cause memory leak especially after we have already checked for whether CIImage is nil or not. I have also tried running this code without OperationQueue and also tried with @autoreleasepool but none of them has solved this problem. Am I missing something?
1
0
769
Dec ’24