Render advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.

Metal Documentation

Posts under Metal subtopic

Post

Replies

Boosts

Views

Activity

Support for clock() shader instruction in MSL similar to VK_KHR_shader_clock instructions
Hi, seems MSL is missing support for a clock() shader instruction available in other graphics APIs like Vulkan or OpenGL for example.. useful for counting cost in number of clock cycles of some code insider shader with much finer granularity than launching a micro kernel with same instructions and measuring cycles cost from CPU.. also useful for MoltenVK to support that extensions.. thanks..
1
0
112
Apr ’25
Metal and Swift PM
I have run into an issue where I am trying to use atomic_float in a swift package but I cannot get things to compile because it appears that the Swift Package Manager doesn't support Metal 3 (atomic_float is Metal 3 functionality). Is there any way around this? I am using // swift-tools-version: 6.1 and my Metal code includes: #include <metal_stdlib> #include <metal_geometric> #include <metal_math> #include <metal_atomic> using namespace metal; kernel void test(device atomic_float* imageBuffer [[buffer(1)]], uint id [[ thread_position_in_grid ]]) { } But I get an error on the definition of atomic_float . Any help, one more importantly, where I could have found this information about this limitation, would be helpful. -RadBobby
0
0
93
Apr ’25
CMake unable to generate the Xcode file described in this tutorial
In the Creating A 3D Application With Hydra Rendering tutorial on the Apple Developer website, on the last step where I execute this command: cmake -S ~/Users/macuser/CreatingA3DApplicationWithHydraRendering/ -B ~/Users/macuser/CreatingA3DApplicationWithHydraRendering/ I keep getting an error: CMake Error at CMakeLists.txt:5 (include): include could not find requested file: /Users/macuser/USDInstall/bin/pxrConfig.cmake I've tried to follow the instructions as mentioned in the README.md file included in the project files at least 5 times as well as moving the pxrConfig.cmake file around and copying it in different folders, then executed the command and was still unsuccessful into generating the proper file expected to compile and render the HydraPlayer renderer. How do I get cmake to generate the Xcode file to create the HydraPlayer renderer?
1
0
141
May ’25
Query GPU metrics
Hello! I'm a developer working on a plugin for the Elgato Stream Deck, called GPU Metrics. The plugin currently only works on Windows but I'd like to bring it to macOS. However, based on forum posts I've read (and StackOverflow) there isn't a very clear path to query GPU metrics like usage, temperature, used GPU memory, and power consumption. There are some tools out there that do similar things, but I wanted to see what would be the recommendation from Apple's engineering team to get this data via a public API. Requirements: Access GPU utilization, temperature, memory usage, power usage C/C++ based API for querying the metrics so I can expose the data to JavaScript via Node Addon No need to compatibile with Intel-based Macs, as Apple silicon will be fine for now Plugin GitHub Thank you! Noah
0
0
118
May ’25
vsync, drawable present, instrument gui
hi When analyzing our game using Instruments, I've always been confused about the two items "Drawable Present" and "Drawable Presented" in the GPU column. The timing of Drawable Present seems to be when the CPU layer calls commandbuffer:present, rather than when the actual encoding is completed on the GPU. Also, what does drawable presented specifically mean? In our case, when a CPU stall occurs, it appears that the vsync interval changes in the next frame, and a surface that has already been calculated is not displayed. Why is this happening?
0
0
122
May ’25
Sparse Texture Writes
Hey, I've been struggling with this for some days now. I am trying to write to a sparse texture in a compute shader. I'm performing the following steps: Set up a sparse heap and create a texture from it Map the whole area of the sparse texture using updateTextureMapping(..) Overwrite every value with the value "4" in a compute shader Blit the texture to a shared buffer Assert that the values in the buffer are "4". I have a minimal example (which is still pretty long unfortunately). It works perfectly when removing the line heapDesc.type = .sparse. What am I missing? I could not find any information that writes to sparse textures are unsupported. Any help would be greatly appreciated. import Metal func sparseTexture64x64Demo() throws { // ── Metal objects guard let device = MTLCreateSystemDefaultDevice() else { throw NSError(domain: "SparseNotSupported", code: -1) } let queue = device.makeCommandQueue()! let lib = device.makeDefaultLibrary()! let pipeline = try device.makeComputePipelineState(function: lib.makeFunction(name: "addOne")!) // ── Texture descriptor let width = 64, height = 64 let format: MTLPixelFormat = .r32Uint // 4 B per texel let desc = MTLTextureDescriptor() desc.textureType = .type2D desc.pixelFormat = format desc.width = width desc.height = height desc.storageMode = .private desc.usage = [.shaderWrite, .shaderRead] // ── Sparse heap let bytesPerTile = device.sparseTileSizeInBytes let meta = device.heapTextureSizeAndAlign(descriptor: desc) let heapBytes = ((bytesPerTile + meta.size + bytesPerTile - 1) / bytesPerTile) * bytesPerTile let heapDesc = MTLHeapDescriptor() heapDesc.type = .sparse heapDesc.storageMode = .private heapDesc.size = heapBytes let heap = device.makeHeap(descriptor: heapDesc)! let tex = heap.makeTexture(descriptor: desc)! // ── CPU buffers let bytesPerPixel = MemoryLayout<UInt32>.stride let rowStride = width * bytesPerPixel let totalBytes = rowStride * height let dstBuf = device.makeBuffer(length: totalBytes, options: .storageModeShared)! let cb = queue.makeCommandBuffer()! let fence = device.makeFence()! // 2. Map the sparse tile, then signal the fence let rse = cb.makeResourceStateCommandEncoder()! rse.updateTextureMapping( tex, mode: .map, region: MTLRegionMake2D(0, 0, width, height), mipLevel: 0, slice: 0) rse.update(fence) // ← capture all work so far rse.endEncoding() let ce = cb.makeComputeCommandEncoder()! ce.waitForFence(fence) ce.setComputePipelineState(pipeline) ce.setTexture(tex, index: 0) let threadsPerTG = MTLSize(width: 8, height: 8, depth: 1) let tgCount = MTLSize(width: (width + 7) / 8, height: (height + 7) / 8, depth: 1) ce.dispatchThreadgroups(tgCount, threadsPerThreadgroup: threadsPerTG) ce.updateFence(fence) ce.endEncoding() // Blit texture into shared buffer let blit = cb.makeBlitCommandEncoder()! blit.waitForFence(fence) blit.copy( from: tex, sourceSlice: 0, sourceLevel: 0, sourceOrigin: MTLOrigin(x: 0, y: 0, z: 0), sourceSize: MTLSize(width: width, height: height, depth: 1), to: dstBuf, destinationOffset: 0, destinationBytesPerRow: rowStride, destinationBytesPerImage: totalBytes) blit.endEncoding() cb.commit() cb.waitUntilCompleted() assert(cb.error == nil, "GPU error: \(String(describing: cb.error))") // ── Verify a few texels let out = dstBuf.contents().bindMemory(to: UInt32.self, capacity: width * height) print("first three texels:", out[0], out[1], out[width]) // 0 1 64 assert(out[0] == 4 && out[1] == 4 && out[width] == 4) } Metal shader: #include <metal_stdlib> using namespace metal; kernel void addOne(texture2d<uint, access::write> tex [[texture(0)]], uint2 gid [[thread_position_in_grid]]) { tex.write(4, gid); }
1
0
117
May ’25
Distortion Artifacts on VisionOS When Rendering Opaque/Alpha Clipped Foliage in URP (Unity 6.0, Metal)
I'm running into a persistent visual issue while deploying a floral corridor scene to Apple Vision Pro using Unity 6.0 with URP and Metal. The issue only appears on the Vision Pro device — everything looks fine in the Unity Editor. Issue Description When the frame rate drops to around 60–70 FPS, noticeable distortion artifacts appear around the edges of foliage models. It seems like the background meshes (behind the plants) get warped and leak through the edges of the foliage. Although this is most visible around the leaves, even solid objects like standard URP wall or box models show distorted edges when the issue occurs. All the foliage uses Opaque or Alpha Clipping materials. Things I've Tried Changing the foliage materials to Transparent mode —distortion around edges disappears, but using Transparent for a large number of foliage assets is not ideal for performance or sorting complexity. Reducing the number of foliage objects — with only a few plants in the scene and the frame rate staying around 100 FPS, the distortion disappears. However, this isn’t a practical solution for a full environment. Possible Cause? I came across this note in the Unity documentation: "Ensure depth-buffer for each pixel is non-zero - on visionOS, the depth buffer is used for reprojection. To ensure visual effects like skyboxes and shaders are displayed beautifully, ensure that some value is written to the depth for each pixel." Could this be related to the issue? Is it possible that Alpha Clipping with low pixel coverage leads to some pixels not writing to the depth buffer, which then causes problems during Vision Pro’s reprojection or foveated rendering? However, even when I disable Alpha Clipping entirely, the distortion issue still persists, so it may not be solely caused by clipping itself. Project Setup Unity 6.0 (URP) Depth Texture: Enable Using Metal as the graphics backend Running on real Vision Pro hardware (not simulator) Any advice on how to avoid these distortion issues on Vision Pro would be greatly appreciated. Thanks!
1
0
109
Jul ’25
打开显示HUD图形后,应用崩溃
hi everyone, 我们发现了一个和Metal相关崩溃。应用中使用了Metal相关的接口,在进行性能测试时,打开了设置-开发者-显示HUD图形。运行应用后,正常展示HUD,但应用很快发生了崩溃,日志主要信息如下: Incident Identifier: 1F093635-2DB8-4B29-9DA5-488A6609277B CrashReporter Key: 233e54398e2a0266d95265cfb96c5a89eb3403fd Hardware Model: iPhone14,3 Process: waimai [16584] Path: /private/var/containers/Bundle/Application/CCCFC0AE-EFB8-4BD8-B674-ED089B776221/waimai.app/waimai Identifier: Version: 61488 (8.53.0) Code Type: ARM-64 Parent Process: ? [1] Date/Time: 2025-06-12 14:41:45.296 +0800 OS Version: iOS 18.0 (22A3354) Report Version: 104 Monitor Type: Mach Exception Exception Type: EXC_BAD_ACCESS (SIGBUS) Exception Codes: KERN_PROTECTION_FAILURE at 0x000000014fffae00 Crashed Thread: 57 Thread 57 Crashed: 0 libMTLHud.dylib esfm_GenerateTriangesForString + 408 1 libMTLHud.dylib esfm_GenerateTriangesForString + 92 2 libMTLHud.dylib Renderer::DrawText(char const*, int, unsigned int) + 204 3 libMTLHud.dylib Overlay::onPresent(id<CAMetalDrawable>) + 1656 4 libMTLHud.dylib CAMetalDrawable_present(void (*)(), objc_object*, objc_selector*) + 72 5 libMTLHud.dylib invocation function for block in void replaceMethod<void>(objc_class*, objc_selector*, void (*)(void (*)(), objc_object*, objc_selector*)) + 56 6 Metal __45-[_MTLCommandBuffer presentDrawable:options:]_block_invoke + 104 7 Metal MTLDispatchListApply + 52 8 Metal -[_MTLCommandBuffer didScheduleWithStartTime:endTime:error:] + 312 9 IOGPU IOGPUNotificationQueueDispatchAvailableCompletionNotifications + 136 10 IOGPU __IOGPUNotificationQueueSetDispatchQueue_block_invoke + 64 11 libdispatch.dylib _dispatch_client_callout4 + 20 12 libdispatch.dylib _dispatch_mach_msg_invoke + 464 13 libdispatch.dylib _dispatch_lane_serial_drain + 368 14 libdispatch.dylib _dispatch_mach_invoke + 456 15 libdispatch.dylib _dispatch_lane_serial_drain + 368 16 libdispatch.dylib _dispatch_lane_invoke + 432 17 libdispatch.dylib _dispatch_lane_serial_drain + 368 18 libdispatch.dylib _dispatch_lane_invoke + 380 19 libdispatch.dylib _dispatch_root_queue_drain_deferred_wlh + 288 20 libdispatch.dylib _dispatch_workloop_worker_thread + 540 21 libsystem_pthread.dylib _pthread_wqthread + 288 我们测试了几个不同的机型,只有iPhone 13 Pro Max会发生崩溃。 Q1:为什么会发生这个崩溃? Q2:相同的逻辑,为什么仅在iPhone 13 Pro Max机型上出现崩溃? 期待您的解答。
1
0
146
Jul ’25
Metal useResource vs. MTLFence
Hello, I'm tracking down a bug where useResource doesn't seem to apply proper synchronization when a resource is produced by the render pass then consumed by the compute pass, but when I use MTLFence between the to signal and wait between the render/compute encoders, the artifact goes away. The resource is created with MTLHazardTrackingModeTracked and useResource is called on the compute encoder after the render pass. Metal API Validation doesn't report any warnings/errors. Am I misunderstanding the difference between the two APIs? I dug through the Metal documentation and it looks like useResource should handle synchronization given the resource has MTLHazardTrackingModeTracked but on the other hand, MTLFence should be used to ensure proper synchronization between command encoders. Can someone can clarify the difference between the two APIs and when to use them.
3
0
141
Jul ’25
Metal 4 & Acceleration Structures
I have really enjoyed looking through the code and videos related to Metal 4. Currently, my interest is to update a ReSTIR Project and take advantage of more robust ways to refit acceleration Structures and more powerful ways to access resources. I am working in Swift and have encountered a couple of puzzles: What is the 'accepted' way to create a MTL4BufferRange to store indices and vertices? How do I properly rewrite Swift code to build and compact an Acceleration Structure? I do realize that this is all in Beta and will happily look through Code Samples this Fall. If other guidance is available earlier, that would be fabulous! Thank you
4
0
609
Sep ’25
Float8 and Float16 "Reserved_Name__Do_not_use"
I am developing a macOS terminal app, running on an M4 Pro, and using Metal. I am not able use float8 or float16, both reporting Variable has incomplete type 'float16' (aka '__Reserved_Name__Do_not_use_float16'). Based on the system I should be able to use these. Either it is because it is also compiling to Intel, which they are not allowed, or something else. Either way I have not been able to figure out how to get past this. IIs there a compiler setting I need to set to make this work? if so which one and what setting do I need? I only want to run this on M processes, on the latest version of OS so not interested in Intel version or backward compatibility.
Topic: Graphics & Games SubTopic: Metal Tags:
4
0
172
Aug ’25
MTLCaptureManager.sharedCaptureManager generates corrupted .gputrace files (0KB, invalid internal structure)
Hello, I am experiencing an issue with programmatically capturing a GPU trace using MTLCaptureManager. The .gputrace file that is generated appears to be corrupted, and I'm looking for guidance or a solution. Description of the Problem: I am using MTLCaptureManager.sharedCaptureManager to capture a Metal frame and save it to disk. The generated .gputrace file is consistently reported as 0 bytes in size by the file system. Crucially, when I compress this 0-byte .gputrace file into a .zip archive, the resulting archive contains the full, expected data. After unzipping, the file can be opened and viewed correctly in Xcode. However,When inspecting the file's contents using NSFileManager in Objective-C (treating it as a directory), the internal structure is different from a .gputrace file captured directly from Xcode's Metal Debugger. capture in xcode capture in file Finally,When capturing multiple frames programmatically, the first captured frame contains valid buffer data. However, for subsequent frames (starting from the second frame), the corresponding buffer contents are all zero-filled. Frame 1: All MTLBuffer data is correctly captured and populated. Frame 2 and onward: The same MTLBuffer objects are present in the trace, but their contents are entirely 0 (i.e., the data is not captured or is corrupted). In this case, the on-screen display is normal, but the captured frame is incorrect. The frame captured directly in Xcode is also correct. Only the frame captured to a file is abnormal.
1
0
471
Aug ’25
Metal recommendedMaxWorkingSetSize vs actual RAM on iPhone (LLM load fails)
Context I’m deploying large language models on iPhone using llama.cpp. A new iPhone Air (12 GB RAM) reports a Metal MTLDevice.recommendedMaxWorkingSetSize of 8,192 MB, and my attempt to load Llama-2-13B Q4_K (~7.32 GB weights) fails during model initialization. Environment Device: iPhone Air (12 GB RAM) iOS: 26 Xcode: 26.0.1 Build: Metal backend enabled llama.cpp App runs on device (not Simulator) What I’m seeing MTLCreateSystemDefaultDevice().recommendedMaxWorkingSetSize == 8192 MiB Loading Llama-2-13B Q4_K (7.32 GB) fails to complete. Logs indicate memory pressure / allocation issues consistent with the 8 GB working-set guidance. Smaller models (e.g., 7B/8B with similar quantization) load and run (8B Q4_K provide around 9 tokens/second decoding speed). Questions Is 8,192 MB an expected recommendedMaxWorkingSetSize on a 12 GB iPhone? What values should I expect on other 2025 devices including iPhone 17 (8 GB RAM) and iPhone 17 Pro (12 GB RAM) Is it strictly enforced by Metal allocations (heaps/buffers), or advisory for best performance/eviction behavior? Can a process practically exceed this for long-lived buffers without immediate Jetsam risk? Any guidance for LLM scenarios near the limit?
0
0
392
Oct ’25
MPSMatrixRandom SEGFAULTs when ran in an async context
The following minimal snippet SEGFAULTS with SDK 26.0 and 26.1. Won't crash if I remove async from the enclosing function signature - but it's impractical in a real project. import Metal import MetalPerformanceShaders let SEED = UInt64(0x0) typealias T = Float16 /* Why ran in async context? Because global GPU object, and async makeMTLFunction, and async makeMTLComputePipelineState. Nevertheless, can trigger the bug without using global @MainActor let myGPU = MyGPU() */ @main struct CMDLine { static func main() async { let ptr = UnsafeMutablePointer<T>.allocate(capacity: 0) async let future: Void = randomFillOnGPU(ptr, count: 0) print("Main thread is playing around") await future print("Successfully reached the end.") } static func randomFillOnGPU(_ buf: UnsafeMutablePointer<T>, count destbufcount: Int) async { // let (device, queue) = await (myGPU.device, myGPU.commandqueue) let myGPU = MyGPU() let (device, queue) = (myGPU.device, myGPU.commandqueue) // Init MTLBuffer, async let makeFunction, makeComputePipelineState, etc. let tempDataType = MPSDataType.uInt32 let randfiller = MPSMatrixRandomMTGP32(device: device, destinationDataType: tempDataType, seed: Int(bitPattern:UInt(SEED))) print("randomFillOnGPU: successfully created MPSMatrixRandom.") // try await computePipelineState // ^ Crashes before this could return // Or in this minimal case, after randomFillOnGPU() returns // make encoder, set pso, dispatch, commit... } } actor MyGPU { let device : MTLDevice let commandqueue : MTLCommandQueue init() { guard let dev: MTLDevice = MPSGetPreferredDevice(.skipRemovable), let cq = dev.makeCommandQueue(), dev.supportsFamily(.apple6) || dev.supportsFamily(.mac2) else { print("Unable to get Metal Device! Exiting"); exit(EX_UNAVAILABLE) } print("Selected device: \(String(format: "%llX", dev.registryID))") self.device = dev self.commandqueue = cq print("myGPU: initialization complete.") } } See FB20916929. Apparently objc autorelease pool is releasing the wrong address during context switch (across suspension points). I wonder why such obvious case has not been caught before.
0
0
58
Nov ’25
Cannot load .mtlpackage to MTLLibrary
After watching WWDC 2025 session "Combine Metal 4 machine learning and graphics", I have decided to give it a shot to integrate the latest MTL4MachineLearningCommandEncoder to my existing render pipeline. After a lot of trial and errors, I managed to set up the pipeline and have the app compiled. However, I am now stuck on creating a MTLLibrary with .mtlpackage. Here is the code I have to create a MTLLibrary according the WWDC session https://developer.apple.com/videos/play/wwdc2025/262/?time=550: let coreMLFilePath = bundle.path(forResource: "my_model", ofType: "mtlpackage")! let coreMLURL = URL(string: coreMLFilePath)! do { metalDevice.makeLibrary(URL: coreMLURL) } catch { print("error: \(error)") } With the above code, I am getting error: Error Domain=MTLLibraryErrorDomain Code=1 "Invalid metal package" UserInfo={NSLocalizedDescription=Invalid metal package} What is the correct way to create a MTLLibrary with .mtlpackage? Do I see this error because the .mtlpackage I am using is incorrect? How should I go with debugging this? I'd really appreciate if I could get some help on this as I have been stuck with it for some time now. Thanks in advance!
0
0
156
4w
MetalFX for Unity 2022.3.62f3?
Hi, I’m testing Unity’s Spaceship HDRP demo on iPhone 17 Pro Max and iPad Pro M4 (iOS 26.1). Everything renders correctly, and my custom MetalFX Spatial plugin initializes successfully — it briefly reports active scaling (e.g. 1434×660 → 2868×1320 at 50% scaling), then reverts to native rendering a few frames later. Setup: Xcode 16.1 (targeting iOS 18) Unity 2022.3.62f3 (HDRP) Metal backend Dynamic Resolution enabled in HDRP assets and cameras Relevant Xcode console excerpt: [MetalFXPlugin] MetalFX_Enable(True) called. [SpaceshipOptions] MetalFX enabled with HDRP dynamic resolution integration. [SpaceshipOptions] Disabled TAA for MetalFX Spatial. [SpaceshipOptions] Created runtime RenderTexture: 1434x660 [MetalFX] Spatial scaler created (1434x660 → 2868x1320). [MetalFX] Processed frame with scaler. [MetalFXPlugin] Sent RenderTexture (1434x660) to MetalFX. Output target 2868x1320. [SpaceshipOptions] MetalFX target set: 1434x660 [SpaceshipOptions] Camera targetTexture cleared after MetalFX handoff. It looks like HDRP clears the camera’s target texture right after MetalFX submits the frame, which causes it to revert to native rendering. Is there a recommended way to persist or rebind the MetalFX output texture when using HDRP on iOS? Unity doesn’t appear to support MetalFX in the Editor either: Thanks!
0
0
110
3w
Metal: Intersection results unstable when reusing Instance Acceleration Structures
Hi all, I'm encountering an issue with Metal raytracing on my M5 MacBook Pro regarding Instance Acceleration Structure (IAS). Intersection tests suddenly stop working after a certain point in the sampling loop. Situation I implemented an offline GPU path tracer that runs the same kernel multiple times per pixel (sampleCount) using metal::raytracing. Intersection tests are performed using an IAS. Since this is an offline path tracer, geometries inside the IAS never changes across samples (no transforms or updates). As sampleCount increases, there comes a point where the number of intersections drops to zero, and remains zero for all subsequent samples. Here's a code sketch: let sampleCount: UInt16 = 1024 for sampleIndex: UInt16 in 0..<sampleCount { // ... do { let commandBuffer = commandQueue.makeCommandBuffer() // Dispatch the intersection kernel. await commandBuffer.completed() } do { let commandBuffer = commandQueue.makeCommandBuffer() // Use the intersection test results from the previous command buffer. await commandBuffer.completed() } // ... } kernel void intersectAlongRay( const metal::uint32_t threadIndex [[thread_position_in_grid]], // ... const metal::raytracing::instance_acceleration_structure accelerationStructure [[buffer(2)]], // ... ) { // ... const auto result = intersector.intersect(ray, accelerationStructure); switch (result.type) { case metal::raytracing::intersection_type::triangle: { // Write intersection result to device buffers. break; } default: break; } Observations Encoding both the intersection kernel and the subsequent result usage in the same command buffer does not resolve the problem. Switching from IAS to Primitive Acceleration Structure (PAS) fixes the problem. Rebuilding the IAS for each sample also resolves the issue. Intersections produce inconsistent results even though the IAS and rays are identical — Image 1 shows a hit, while Image 2 shows a miss. Questions Am I misusing IAS in some way ? Could this be a Metal bug ? Any guidance or confirmation would be greatly appreciated.
0
0
239
2w
How can I assign priorities to my app’s GPU workloads?
My app has a number of heterogeneous GPU workloads that all run concurrently. Some of these should be executed with the highest priority because the app’s responsiveness depends on them, while others are triggered by file imports and the like which should have a low priority. If this was running on the CPU I’d assign the former User Interactive QoS and the latter Utility QoS. Is there an equivalent to this for GPU work?
0
0
261
1w