Render advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.

Metal Documentation

Posts under Metal subtopic

Post

Replies

Boosts

Views

Activity

How to use imageblock_slice
Is there a working example of imageblock_slice with implicit layout somewhere? I get a compilation error when i write this: imageblock_slilce color_slice = img_blk.slice(frag->color); Error: No matching member function for call to 'slice' candidate template ignored: couldn't infer template argument 'E' candidate function template not viable: requires 2 arguments, but 1 was provided Too few template arguments for class template 'imageblock_slice' It seems the syntax has changed since the Imageblocks presentation https://developer.apple.com/videos/play/tech-talks/603/ I tried supplying the struct type of the image block between <> but it still does not work.
1
0
635
Dec ’24
Xcode Metal geometry inspector uses wrong NDC space?
When inspecting the geometry in Xcode's metal debugger, I noticed that the shown "frustrum box" didn't make sense. Since Metal uses depth range 0,1 in NDC space, I would expect a vertex that is projected to z:0 to be on the front clipping plane of the frustrum shown in the geometry inspector. This is however not the case. A vertex with ndc z:0 is shown halfway inside the frustrum. Vertices with ndc z less than 0 are correctly culled during rendering, while the geometry inspector's frustrum shows that the vertex is stil inside the frustrum. The image shows vertices that are visually in the middle of the frustrum on z axis, but at the same time the out position shows that they are projected to z:0. How is this possible, unless there's a bug in the geometry inspector?
1
0
498
Feb ’25
Why is depth/stencil buffer loaded/stored twice in xcode gpu capture?
I used xcode gpu capture to profile render pipeline's bandwidth of my game.Then i found depth buffer and stencil buffer use the same buffer whitch it's format is Depth32Float_Stencil8. But why in a single pass of pipeline, this buffer was loaded twice, and the Load Attachment Size of Encoder Statistics was double. Is there any bug with xcode gpu capture?Or the pass really loaded the buffer twice times?
1
0
343
Mar ’25
Implementing Scalable Order-Independent Transparency (OIT) in Metal
Hi, Apple’s documentation on Order-Independent Transparency (OIT) describes an approach using image blocks, where an array of size 4 is allocated per fragment to store depth and color in a tile shading compute pass. However, when increasing the scene’s depth complexity by adding more overlapping quads, the OIT implementation fails due to the fixed array size. Is there a way to dynamically allocate storage for fragments based on actual depth complexity encountered during rasterization, rather than using a fixed-size array? Specifically, can an adaptive array of fragments be maintained and sorted by depth, where the size grows as needed instead of being limited to 4 entries? Any insights or alternative approaches would be greatly appreciated. Thank you!
1
0
534
Mar ’25
Is Metal usable from Swift 6?
Hello ladies and gentlemen, I'm writing a simple renderer on the main actor using Metal and Swift 6. I am at the stage now where I want to create a render pipeline state using asynchronous API: @MainActor class Renderer { let opaqueMeshRPS: MTLRenderPipelineState init(/*...*/) async throws { let descriptor = MTLRenderPipelineDescriptor() // ... opaqueMeshRPS = try await device.makeRenderPipelineState(descriptor: descriptor) } } I get a compilation error if try to use the asynchronous version of the makeRenderPipelineState method: Non-sendable type 'any MTLRenderPipelineState' returned by implicitly asynchronous call to nonisolated function cannot cross actor boundary Which is understandable, since MTLRenderPipelineState is not Sendable. But it looks like no matter where or how I try to access this method, I just can't do it - you have this API, but you can't use it, you can only use the synchronous versions. Am I missing something or is Metal just not usable with Swift 6 right now?
1
0
579
Mar ’25
3D Skeletal animation in metal-cpp?
Hey all! I'm got my hands on a refurbished mac mini m1 and already diving into metal. At the moment, i'm currently studying graphics programming with opengl and got to a point where I can almost create a 3d cube. However, I noticed there aren't many tutorials for metal cpp but rather demos. One thing I love about graphic programming, is skinning/skeletal animation. At the moment, I can't find any sources or tutorials on how to load skeletal animations into metal-cpp. So, if I create my character in blender and had all types of animations all loaded into a .FBX or maybe .DAE and load this into metal api with metal-cpp, how can I go on about how this works?
1
0
346
Mar ’25
why GLDContextRec::flushContextInternal() leads to abort
The flushContextInternal function in glr_sync.mm:262 called abort internally. What caused this? Was it due to high device temperature or some other reason? Date/Time: 2024-08-29 09:20:09.3102 +0800 Launch Time: 2024-08-29 08:53:11.3878 +0800 OS Version: iPhone OS 16.7.10 (20H350) Release Type: User Baseband Version: 8.50.04 Report Version: 104 Exception Type: EXC_CRASH (SIGABRT) Exception Codes: 0x0000000000000000, 0x0000000000000000 Triggered by Thread: 0 Thread 0 name: Thread 0 Crashed: 0 libsystem_kernel.dylib 0x00000001ed053198 __pthread_kill + 8 (:-1) 1 libsystem_pthread.dylib 0x00000001fc5e25f8 pthread_kill + 208 (pthread.c:1670) 2 libsystem_c.dylib 0x00000001b869c4b8 abort + 124 (abort.c:118) 3 AppleMetalGLRenderer 0x00000002349f574c GLDContextRec::flushContextInternal() + 700 (glr_sync.mm:262) 4 DiSpecialDriver 0x000000010824b07c Di::RHI::onRenderFrameEnd() + 184 (RHIDevice.cpp:118) 5 DiSpecialDriver 0x00000001081b85f8 Di::Client::drawFrame() + 120 (Client.cpp:155) 2024-08-27_14-44-10.8104_+0800-07d9de9207ce4c73289507e608e5de4320d02ccf.crash
1
0
97
Mar ’25
How can I get pixel coordinates in the fragment tile function?
In this video, tile fragment shading is recommended for image processing. In this example, the unpack function takes two arguments, one of which is RasterizerData. As I understand it, this is the data passed to us from the previous stage (Vertex) of the graphics pipeline. However, the properties of MTLTileRenderPipelineDescriptor do not include an option for specifying a Vertex function. Therefore, in this render pass, a mix of commands is used: first, a draw command is executed to obtain UV coordinates, and then threads are dispatched. My question is: without using a draw command, only dispatch, how can I get pixel coordinates in the fragment tile function? For the kernel tile function, everything is clear. typedef struct { float4 OPTexture [[ color(0) ]]; float4 IntermediateTex [[ color(1) ]]; } FragmentIO; fragment FragmentIO Unpack(RasterizerData in [[ stage_in ]], texture2d<float, access::sample> srcImageTexture [[texture(0)]]) { FragmentIO out; //... // Run necessary per-pixel operations out.OPTexture = // assign computed value; out.IntermediateTex = // assign computed value; return out; }
1
0
139
Mar ’25
Metal triangle strips uniform opacity.
I have this drawing app that I have been working on for the past few years when I have free time. I recently rebuilt the app in Metal to build out other brushes and improve performance, need to render 10000s of lines in realtime. I’m running into this issue trying to create a uniform opacity per path. I have a solution but do not love it - as this is a realtime app and the solution could have some bottlenecks. If I just generate a triangle strip from touch points and do my best to smooth, resample, and handle miters I will always get some overlaps. See: To create a uniform opacity I render to an offscreen texture with blending disabled. I then pre-multiply the color and draw that texture to a composite texture with blending on (I do this per path). This works but gets tricky when you introduce a textured brush, the edges of the texture in the frag shader cut out the line. Pasted Graphic 1.png Solution: I discard below a threshold fragment float4 fragment_line(VertexOut in [[stage_in]], texture2d<float> texture [[ texture(0) ]]) { constexpr sampler s(coord::normalized, address::mirrored_repeat, filter::linear); float2 texCoord = in.texCoord; float4 texColor = texture.sample(s, texCoord); if (texColor.a < 0.01) discard_fragment(); // may be slow (from what I read) return in.color * texColor; } Better but still not perfect. Question: I'm looking for better ways to create a uniform opacity per path. I tried .max blending but that will cause no blending of other paths. Any tips, ideas, much appreciated. If this is too detailed of a question just achieve.
1
0
70
Mar ’25
Threadgroup memory for fragment shader
Hello I am trying to get thread group memory access in fragment shader. In essence, I would like to have all the fragments in a tile to bitwiseOR some value. My idea was to use simd_or across the SIMD group, then make each SIMD group thread 0 to atomic or the value into thread group memory. Finally very first thread of the tile would be tasked with writing the value down to texture with write access. Now, I can allocate the thread group memory argument to the fragment function all right. MTLRenderEncoder has setThreadgroupMemoryLength call, which I am using the following way [renderEncoder setThreagroupMemoryLength: 16 offset: 0 atIndex:0] Unfortunately, all I am getting is the following error (runtime assertion) -[MTLDebugRenderCommandEncoder setThreadgroupMemoryLength:offset:atIndex:]:3487: failed assertion Set Threadgroup Memory Length Validation offset + length(16) must be <= threadgroupMemoryLength(0).` What I am doing wrong? How I can get thread group memory in the fragment shader? I know I could use tile shading and compute function but the problem is that here I really like to use fragment stuff. Will be grateful for help.
1
0
83
Apr ’25
iOS Metal system delayed one Vsync period to really display the frame on the screen
View Layout Add the following views in a view controller: Label View A, with a subview of the same size: MTKView A View B, with a subview of the same size: MTKView B Refresh Rates of Each View The label view refreshes at 60fps (driven by CADisplayLink). MTKView A and B refresh at 15fps. MTKView Implementation Details The corresponding CAMetalLayer's maximumDrawableCount is set to 2, changed to double buffering. The scheduling mechanism is modified; drawing is not driven by the internal loop but is done manually. The draw call is triggered immediately upon receiving a frame. self.metalView.enableSetNeedsDisplay = NO; self.metalView.paused = YES; A new high-priority queue is created for drawing, instead of handling it on the main queue. MTKView Latency Tracking The GPU completion time T1 is observed through the addCompletedHandler callback of the CommandBuffer. The presentation time T2 of the frame is observed through the addPresentedHandler callback of the currentDrawable in MTKView. Testing shows that T2 - T1 > 16.6ms (the Vsync period at 60Hz). This means that after the GPU rendering in MTLView is finished, the frame is not actually displayed at the next Vsync instruction but only at the Vsync instruction after that. I believe there is an extra 16.6ms of latency here, which I want to eliminate by adjusting the rendering mechanism. Observation from Instruments From Instruments, the Surface presentation aligns with the above test results. After the Metal encoder finishes, the Surface in Display switches only after the next-next Vsync instruction. See the image in the link for details. Questions According to a beginner's understanding, after MTKView's GPU rendering is finished, the next Vsync instruction should officially display (make it visible). However, this is not what is observed. Does the subview MTKView need to wait for another Vsync cycle to be drawn to the actual display buffer? The label updates its text at 60fps, so the entire interface should be displayed at 60fps. Is the content of MTKView not synchronized when the display happens? Explanation of the Reasoning Behind Some MTKView Code Details Changing from the default triple buffering to double buffering helps reduce the latency introduced by rendering. Not using MTKView's own scheduling mechanism but using manual triggering of the draw method is because MTKView's own scheduling mechanism is driven by CADisplayLink. Therefore, if a frame falls within a Vsync window, it needs to wait for the next Vsync window to trigger the draw operation, which introduces waiting latency.
1
0
98
Apr ’25
Support for clock() shader instruction in MSL similar to VK_KHR_shader_clock instructions
Hi, seems MSL is missing support for a clock() shader instruction available in other graphics APIs like Vulkan or OpenGL for example.. useful for counting cost in number of clock cycles of some code insider shader with much finer granularity than launching a micro kernel with same instructions and measuring cycles cost from CPU.. also useful for MoltenVK to support that extensions.. thanks..
1
0
97
Apr ’25
CMake unable to generate the Xcode file described in this tutorial
In the Creating A 3D Application With Hydra Rendering tutorial on the Apple Developer website, on the last step where I execute this command: cmake -S ~/Users/macuser/CreatingA3DApplicationWithHydraRendering/ -B ~/Users/macuser/CreatingA3DApplicationWithHydraRendering/ I keep getting an error: CMake Error at CMakeLists.txt:5 (include): include could not find requested file: /Users/macuser/USDInstall/bin/pxrConfig.cmake I've tried to follow the instructions as mentioned in the README.md file included in the project files at least 5 times as well as moving the pxrConfig.cmake file around and copying it in different folders, then executed the command and was still unsuccessful into generating the proper file expected to compile and render the HydraPlayer renderer. How do I get cmake to generate the Xcode file to create the HydraPlayer renderer?
1
0
120
May ’25
Sparse Texture Writes
Hey, I've been struggling with this for some days now. I am trying to write to a sparse texture in a compute shader. I'm performing the following steps: Set up a sparse heap and create a texture from it Map the whole area of the sparse texture using updateTextureMapping(..) Overwrite every value with the value "4" in a compute shader Blit the texture to a shared buffer Assert that the values in the buffer are "4". I have a minimal example (which is still pretty long unfortunately). It works perfectly when removing the line heapDesc.type = .sparse. What am I missing? I could not find any information that writes to sparse textures are unsupported. Any help would be greatly appreciated. import Metal func sparseTexture64x64Demo() throws { // ── Metal objects guard let device = MTLCreateSystemDefaultDevice() else { throw NSError(domain: "SparseNotSupported", code: -1) } let queue = device.makeCommandQueue()! let lib = device.makeDefaultLibrary()! let pipeline = try device.makeComputePipelineState(function: lib.makeFunction(name: "addOne")!) // ── Texture descriptor let width = 64, height = 64 let format: MTLPixelFormat = .r32Uint // 4 B per texel let desc = MTLTextureDescriptor() desc.textureType = .type2D desc.pixelFormat = format desc.width = width desc.height = height desc.storageMode = .private desc.usage = [.shaderWrite, .shaderRead] // ── Sparse heap let bytesPerTile = device.sparseTileSizeInBytes let meta = device.heapTextureSizeAndAlign(descriptor: desc) let heapBytes = ((bytesPerTile + meta.size + bytesPerTile - 1) / bytesPerTile) * bytesPerTile let heapDesc = MTLHeapDescriptor() heapDesc.type = .sparse heapDesc.storageMode = .private heapDesc.size = heapBytes let heap = device.makeHeap(descriptor: heapDesc)! let tex = heap.makeTexture(descriptor: desc)! // ── CPU buffers let bytesPerPixel = MemoryLayout<UInt32>.stride let rowStride = width * bytesPerPixel let totalBytes = rowStride * height let dstBuf = device.makeBuffer(length: totalBytes, options: .storageModeShared)! let cb = queue.makeCommandBuffer()! let fence = device.makeFence()! // 2. Map the sparse tile, then signal the fence let rse = cb.makeResourceStateCommandEncoder()! rse.updateTextureMapping( tex, mode: .map, region: MTLRegionMake2D(0, 0, width, height), mipLevel: 0, slice: 0) rse.update(fence) // ← capture all work so far rse.endEncoding() let ce = cb.makeComputeCommandEncoder()! ce.waitForFence(fence) ce.setComputePipelineState(pipeline) ce.setTexture(tex, index: 0) let threadsPerTG = MTLSize(width: 8, height: 8, depth: 1) let tgCount = MTLSize(width: (width + 7) / 8, height: (height + 7) / 8, depth: 1) ce.dispatchThreadgroups(tgCount, threadsPerThreadgroup: threadsPerTG) ce.updateFence(fence) ce.endEncoding() // Blit texture into shared buffer let blit = cb.makeBlitCommandEncoder()! blit.waitForFence(fence) blit.copy( from: tex, sourceSlice: 0, sourceLevel: 0, sourceOrigin: MTLOrigin(x: 0, y: 0, z: 0), sourceSize: MTLSize(width: width, height: height, depth: 1), to: dstBuf, destinationOffset: 0, destinationBytesPerRow: rowStride, destinationBytesPerImage: totalBytes) blit.endEncoding() cb.commit() cb.waitUntilCompleted() assert(cb.error == nil, "GPU error: \(String(describing: cb.error))") // ── Verify a few texels let out = dstBuf.contents().bindMemory(to: UInt32.self, capacity: width * height) print("first three texels:", out[0], out[1], out[width]) // 0 1 64 assert(out[0] == 4 && out[1] == 4 && out[width] == 4) } Metal shader: #include <metal_stdlib> using namespace metal; kernel void addOne(texture2d<uint, access::write> tex [[texture(0)]], uint2 gid [[thread_position_in_grid]]) { tex.write(4, gid); }
1
0
98
May ’25
Distortion Artifacts on VisionOS When Rendering Opaque/Alpha Clipped Foliage in URP (Unity 6.0, Metal)
I'm running into a persistent visual issue while deploying a floral corridor scene to Apple Vision Pro using Unity 6.0 with URP and Metal. The issue only appears on the Vision Pro device — everything looks fine in the Unity Editor. Issue Description When the frame rate drops to around 60–70 FPS, noticeable distortion artifacts appear around the edges of foliage models. It seems like the background meshes (behind the plants) get warped and leak through the edges of the foliage. Although this is most visible around the leaves, even solid objects like standard URP wall or box models show distorted edges when the issue occurs. All the foliage uses Opaque or Alpha Clipping materials. Things I've Tried Changing the foliage materials to Transparent mode —distortion around edges disappears, but using Transparent for a large number of foliage assets is not ideal for performance or sorting complexity. Reducing the number of foliage objects — with only a few plants in the scene and the frame rate staying around 100 FPS, the distortion disappears. However, this isn’t a practical solution for a full environment. Possible Cause? I came across this note in the Unity documentation: "Ensure depth-buffer for each pixel is non-zero - on visionOS, the depth buffer is used for reprojection. To ensure visual effects like skyboxes and shaders are displayed beautifully, ensure that some value is written to the depth for each pixel." Could this be related to the issue? Is it possible that Alpha Clipping with low pixel coverage leads to some pixels not writing to the depth buffer, which then causes problems during Vision Pro’s reprojection or foveated rendering? However, even when I disable Alpha Clipping entirely, the distortion issue still persists, so it may not be solely caused by clipping itself. Project Setup Unity 6.0 (URP) Depth Texture: Enable Using Metal as the graphics backend Running on real Vision Pro hardware (not simulator) Any advice on how to avoid these distortion issues on Vision Pro would be greatly appreciated. Thanks!
1
0
93
Jul ’25
打开显示HUD图形后,应用崩溃
hi everyone, 我们发现了一个和Metal相关崩溃。应用中使用了Metal相关的接口,在进行性能测试时,打开了设置-开发者-显示HUD图形。运行应用后,正常展示HUD,但应用很快发生了崩溃,日志主要信息如下: Incident Identifier: 1F093635-2DB8-4B29-9DA5-488A6609277B CrashReporter Key: 233e54398e2a0266d95265cfb96c5a89eb3403fd Hardware Model: iPhone14,3 Process: waimai [16584] Path: /private/var/containers/Bundle/Application/CCCFC0AE-EFB8-4BD8-B674-ED089B776221/waimai.app/waimai Identifier: Version: 61488 (8.53.0) Code Type: ARM-64 Parent Process: ? [1] Date/Time: 2025-06-12 14:41:45.296 +0800 OS Version: iOS 18.0 (22A3354) Report Version: 104 Monitor Type: Mach Exception Exception Type: EXC_BAD_ACCESS (SIGBUS) Exception Codes: KERN_PROTECTION_FAILURE at 0x000000014fffae00 Crashed Thread: 57 Thread 57 Crashed: 0 libMTLHud.dylib esfm_GenerateTriangesForString + 408 1 libMTLHud.dylib esfm_GenerateTriangesForString + 92 2 libMTLHud.dylib Renderer::DrawText(char const*, int, unsigned int) + 204 3 libMTLHud.dylib Overlay::onPresent(id<CAMetalDrawable>) + 1656 4 libMTLHud.dylib CAMetalDrawable_present(void (*)(), objc_object*, objc_selector*) + 72 5 libMTLHud.dylib invocation function for block in void replaceMethod<void>(objc_class*, objc_selector*, void (*)(void (*)(), objc_object*, objc_selector*)) + 56 6 Metal __45-[_MTLCommandBuffer presentDrawable:options:]_block_invoke + 104 7 Metal MTLDispatchListApply + 52 8 Metal -[_MTLCommandBuffer didScheduleWithStartTime:endTime:error:] + 312 9 IOGPU IOGPUNotificationQueueDispatchAvailableCompletionNotifications + 136 10 IOGPU __IOGPUNotificationQueueSetDispatchQueue_block_invoke + 64 11 libdispatch.dylib _dispatch_client_callout4 + 20 12 libdispatch.dylib _dispatch_mach_msg_invoke + 464 13 libdispatch.dylib _dispatch_lane_serial_drain + 368 14 libdispatch.dylib _dispatch_mach_invoke + 456 15 libdispatch.dylib _dispatch_lane_serial_drain + 368 16 libdispatch.dylib _dispatch_lane_invoke + 432 17 libdispatch.dylib _dispatch_lane_serial_drain + 368 18 libdispatch.dylib _dispatch_lane_invoke + 380 19 libdispatch.dylib _dispatch_root_queue_drain_deferred_wlh + 288 20 libdispatch.dylib _dispatch_workloop_worker_thread + 540 21 libsystem_pthread.dylib _pthread_wqthread + 288 我们测试了几个不同的机型,只有iPhone 13 Pro Max会发生崩溃。 Q1:为什么会发生这个崩溃? Q2:相同的逻辑,为什么仅在iPhone 13 Pro Max机型上出现崩溃? 期待您的解答。
1
0
106
Jul ’25
Combining render encoders
When I take a frame capture of my application in Xcode, it shows a warning that reads "Your application created separate command encoders which can be combined into a single encoder. By combining these encoders you may reduce your application's load/store bandwidth usage." In the minimal reproduction case I've identified for this warning, I have two render pipeline states: The first writes to the current drawable, the depth buffer, and a secondary color buffer. The second writes only to the current drawable. Because these are writing to a different set of outputs, I was initially creating two separate render command encoders to handle the draws under each of these states. My understanding is that Xcode is telling me I could only create one, however when I try to do that, I get runtime asserts when attempting to apply the second render pipeline state since it doesn't have a matching attachment configured for the second color buffer or for the depth buffer, so I can't just combine the encoders. Is the only solution here to detect and propagate forward the color/depth attachments from the first state into the creation of the second state? Is there any way to suppress this specific warning in Xcode?
1
0
289
Jul ’25
Compute kernel fails to compile when calling texture.read()
If I compile a compute kernel with a call to texture.read(), it fails with the following error: "Error Domain=AGXMetalG13X Code=3 "Encountered unlowered function call to air.get_read_sampler" UserInfo={NSLocalizedDescription=Encountered unlowered function call to air.get_read_sampler}." This error occurs on both macOS and iOS 26 Beta 5, but not when running on a simulator or in a playground. It does not occur on a macOS Sequoia VM. It occurs whether I use the old metal 3 or new metal 4 compilation method. A workaround would be to use a sampler, but according to the feature tables, all platforms support reading from textures of all formats. Below is a minimal example which produces the error: let device = MTLCreateSystemDefaultDevice()! let library = device.makeDefaultLibrary()! let computeFunction = library.makeFunction(name: "compute_test")! do { let pipeline = try device.makeComputePipelineState(function: computeFunction) debugPrint(pipeline) } catch { debugPrint("Metal 3 failed with error:\n\(error)") } #import <metal_stdlib> using namespace metal; kernel void compute_test(uint2 gid [[thread_position_in_grid]], texture2d<float, access::read> in [[texture(0)]], texture2d<float, access::write> out [[texture(1)]]) { out.write(in.read(gid), gid); } I filed feedback FB19530049.
1
0
146
Aug ’25