Hi,
seems MSL is missing support for a clock() shader instruction available in other graphics APIs like Vulkan or OpenGL for example..
useful for counting cost in number of clock cycles of some code insider shader with much finer granularity than launching a micro kernel with same instructions and measuring cycles cost from CPU..
also useful for MoltenVK to support that extensions..
thanks..
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
I have run into an issue where I am trying to use atomic_float in a swift package but I cannot get things to compile because it appears that the Swift Package Manager doesn't support Metal 3 (atomic_float is Metal 3 functionality). Is there any way around this? I am using
// swift-tools-version: 6.1
and my Metal code includes:
#include <metal_stdlib>
#include <metal_geometric>
#include <metal_math>
#include <metal_atomic>
using namespace metal;
kernel void test(device atomic_float* imageBuffer [[buffer(1)]],
uint id [[ thread_position_in_grid ]]) {
}
But I get an error on the definition of atomic_float .
Any help, one more importantly, where I could have found this information about this limitation, would be helpful.
-RadBobby
Topic:
Graphics & Games
SubTopic:
Metal
In the Creating A 3D Application With Hydra Rendering tutorial on the Apple Developer website, on the last step where I execute this command:
cmake -S ~/Users/macuser/CreatingA3DApplicationWithHydraRendering/ -B ~/Users/macuser/CreatingA3DApplicationWithHydraRendering/
I keep getting an error:
CMake Error at CMakeLists.txt:5 (include):
include could not find requested file:
/Users/macuser/USDInstall/bin/pxrConfig.cmake
I've tried to follow the instructions as mentioned in the README.md file included in the project files at least 5 times as well as moving the pxrConfig.cmake file around and copying it in different folders, then executed the command and was still unsuccessful into generating the proper file expected to compile and render the HydraPlayer renderer. How do I get cmake to generate the Xcode file to create the HydraPlayer renderer?
Hello!
I'm a developer working on a plugin for the Elgato Stream Deck, called GPU Metrics. The plugin currently only works on Windows but I'd like to bring it to macOS. However, based on forum posts I've read (and StackOverflow) there isn't a very clear path to query GPU metrics like usage, temperature, used GPU memory, and power consumption. There are some tools out there that do similar things, but I wanted to see what would be the recommendation from Apple's engineering team to get this data via a public API.
Requirements:
Access GPU utilization, temperature, memory usage, power usage
C/C++ based API for querying the metrics so I can expose the data to JavaScript via Node Addon
No need to compatibile with Intel-based Macs, as Apple silicon will be fine for now
Plugin GitHub
Thank you!
Noah
hi
When analyzing our game using Instruments, I've always been confused about the two items "Drawable Present" and "Drawable Presented" in the GPU column. The timing of Drawable Present seems to be when the CPU layer calls commandbuffer:present, rather than when the actual encoding is completed on the GPU. Also, what does drawable presented specifically mean? In our case, when a CPU stall occurs, it appears that the vsync interval changes in the next frame, and a surface that has already been calculated is not displayed. Why is this happening?
Hey, I've been struggling with this for some days now.
I am trying to write to a sparse texture in a compute shader. I'm performing the following steps:
Set up a sparse heap and create a texture from it
Map the whole area of the sparse texture using updateTextureMapping(..)
Overwrite every value with the value "4" in a compute shader
Blit the texture to a shared buffer
Assert that the values in the buffer are "4".
I have a minimal example (which is still pretty long unfortunately).
It works perfectly when removing the line heapDesc.type = .sparse.
What am I missing? I could not find any information that writes to sparse textures are unsupported. Any help would be greatly appreciated.
import Metal
func sparseTexture64x64Demo() throws {
// ── Metal objects
guard let device = MTLCreateSystemDefaultDevice()
else { throw NSError(domain: "SparseNotSupported", code: -1) }
let queue = device.makeCommandQueue()!
let lib = device.makeDefaultLibrary()!
let pipeline = try device.makeComputePipelineState(function: lib.makeFunction(name: "addOne")!)
// ── Texture descriptor
let width = 64, height = 64
let format: MTLPixelFormat = .r32Uint // 4 B per texel
let desc = MTLTextureDescriptor()
desc.textureType = .type2D
desc.pixelFormat = format
desc.width = width
desc.height = height
desc.storageMode = .private
desc.usage = [.shaderWrite, .shaderRead]
// ── Sparse heap
let bytesPerTile = device.sparseTileSizeInBytes
let meta = device.heapTextureSizeAndAlign(descriptor: desc)
let heapBytes = ((bytesPerTile + meta.size + bytesPerTile - 1) / bytesPerTile) * bytesPerTile
let heapDesc = MTLHeapDescriptor()
heapDesc.type = .sparse
heapDesc.storageMode = .private
heapDesc.size = heapBytes
let heap = device.makeHeap(descriptor: heapDesc)!
let tex = heap.makeTexture(descriptor: desc)!
// ── CPU buffers
let bytesPerPixel = MemoryLayout<UInt32>.stride
let rowStride = width * bytesPerPixel
let totalBytes = rowStride * height
let dstBuf = device.makeBuffer(length: totalBytes, options: .storageModeShared)!
let cb = queue.makeCommandBuffer()!
let fence = device.makeFence()!
// 2. Map the sparse tile, then signal the fence
let rse = cb.makeResourceStateCommandEncoder()!
rse.updateTextureMapping(
tex,
mode: .map,
region: MTLRegionMake2D(0, 0, width, height),
mipLevel: 0,
slice: 0)
rse.update(fence) // ← capture all work so far
rse.endEncoding()
let ce = cb.makeComputeCommandEncoder()!
ce.waitForFence(fence)
ce.setComputePipelineState(pipeline)
ce.setTexture(tex, index: 0)
let threadsPerTG = MTLSize(width: 8, height: 8, depth: 1)
let tgCount = MTLSize(width: (width + 7) / 8,
height: (height + 7) / 8,
depth: 1)
ce.dispatchThreadgroups(tgCount, threadsPerThreadgroup: threadsPerTG)
ce.updateFence(fence)
ce.endEncoding()
// Blit texture into shared buffer
let blit = cb.makeBlitCommandEncoder()!
blit.waitForFence(fence)
blit.copy(
from: tex,
sourceSlice: 0,
sourceLevel: 0,
sourceOrigin: MTLOrigin(x: 0, y: 0, z: 0),
sourceSize: MTLSize(width: width, height: height, depth: 1),
to: dstBuf,
destinationOffset: 0,
destinationBytesPerRow: rowStride,
destinationBytesPerImage: totalBytes)
blit.endEncoding()
cb.commit()
cb.waitUntilCompleted()
assert(cb.error == nil, "GPU error: \(String(describing: cb.error))")
// ── Verify a few texels
let out = dstBuf.contents().bindMemory(to: UInt32.self, capacity: width * height)
print("first three texels:", out[0], out[1], out[width]) // 0 1 64
assert(out[0] == 4 && out[1] == 4 && out[width] == 4)
}
Metal shader:
#include <metal_stdlib>
using namespace metal;
kernel void addOne(texture2d<uint, access::write> tex [[texture(0)]],
uint2 gid [[thread_position_in_grid]])
{
tex.write(4, gid);
}
I'm running into a persistent visual issue while deploying a floral corridor scene to Apple Vision Pro using Unity 6.0 with URP and Metal. The issue only appears on the Vision Pro device — everything looks fine in the Unity Editor.
Issue Description
When the frame rate drops to around 60–70 FPS, noticeable distortion artifacts appear around the edges of foliage models. It seems like the background meshes (behind the plants) get warped and leak through the edges of the foliage. Although this is most visible around the leaves, even solid objects like standard URP wall or box models show distorted edges when the issue occurs.
All the foliage uses Opaque or Alpha Clipping materials.
Things I've Tried
Changing the foliage materials to Transparent mode —distortion around edges disappears, but using Transparent for a large number of foliage assets is not ideal for performance or sorting complexity.
Reducing the number of foliage objects — with only a few plants in the scene and the frame rate staying around 100 FPS, the distortion disappears. However, this isn’t a practical solution for a full environment.
Possible Cause?
I came across this note in the Unity documentation:
"Ensure depth-buffer for each pixel is non-zero - on visionOS, the depth buffer is used for reprojection. To ensure visual effects like skyboxes and shaders are displayed beautifully, ensure that some value is written to the depth for each pixel."
Could this be related to the issue? Is it possible that Alpha Clipping with low pixel coverage leads to some pixels not writing to the depth buffer, which then causes problems during Vision Pro’s reprojection or foveated rendering? However, even when I disable Alpha Clipping entirely, the distortion issue still persists, so it may not be solely caused by clipping itself.
Project Setup
Unity 6.0 (URP)
Depth Texture: Enable
Using Metal as the graphics backend
Running on real Vision Pro hardware (not simulator)
Any advice on how to avoid these distortion issues on Vision Pro would be greatly appreciated.
Thanks!
hi everyone,
我们发现了一个和Metal相关崩溃。应用中使用了Metal相关的接口,在进行性能测试时,打开了设置-开发者-显示HUD图形。运行应用后,正常展示HUD,但应用很快发生了崩溃,日志主要信息如下:
Incident Identifier: 1F093635-2DB8-4B29-9DA5-488A6609277B
CrashReporter Key: 233e54398e2a0266d95265cfb96c5a89eb3403fd
Hardware Model: iPhone14,3
Process: waimai [16584]
Path: /private/var/containers/Bundle/Application/CCCFC0AE-EFB8-4BD8-B674-ED089B776221/waimai.app/waimai
Identifier:
Version: 61488 (8.53.0)
Code Type: ARM-64
Parent Process: ? [1]
Date/Time: 2025-06-12 14:41:45.296 +0800
OS Version: iOS 18.0 (22A3354)
Report Version: 104
Monitor Type: Mach Exception
Exception Type: EXC_BAD_ACCESS (SIGBUS)
Exception Codes: KERN_PROTECTION_FAILURE at 0x000000014fffae00
Crashed Thread: 57
Thread 57 Crashed:
0 libMTLHud.dylib esfm_GenerateTriangesForString + 408
1 libMTLHud.dylib esfm_GenerateTriangesForString + 92
2 libMTLHud.dylib Renderer::DrawText(char const*, int, unsigned int) + 204
3 libMTLHud.dylib Overlay::onPresent(id<CAMetalDrawable>) + 1656
4 libMTLHud.dylib CAMetalDrawable_present(void (*)(), objc_object*, objc_selector*) + 72
5 libMTLHud.dylib invocation function for block in void replaceMethod<void>(objc_class*, objc_selector*, void (*)(void (*)(), objc_object*, objc_selector*)) + 56
6 Metal __45-[_MTLCommandBuffer presentDrawable:options:]_block_invoke + 104
7 Metal MTLDispatchListApply + 52
8 Metal -[_MTLCommandBuffer didScheduleWithStartTime:endTime:error:] + 312
9 IOGPU IOGPUNotificationQueueDispatchAvailableCompletionNotifications + 136
10 IOGPU __IOGPUNotificationQueueSetDispatchQueue_block_invoke + 64
11 libdispatch.dylib _dispatch_client_callout4 + 20
12 libdispatch.dylib _dispatch_mach_msg_invoke + 464
13 libdispatch.dylib _dispatch_lane_serial_drain + 368
14 libdispatch.dylib _dispatch_mach_invoke + 456
15 libdispatch.dylib _dispatch_lane_serial_drain + 368
16 libdispatch.dylib _dispatch_lane_invoke + 432
17 libdispatch.dylib _dispatch_lane_serial_drain + 368
18 libdispatch.dylib _dispatch_lane_invoke + 380
19 libdispatch.dylib _dispatch_root_queue_drain_deferred_wlh + 288
20 libdispatch.dylib _dispatch_workloop_worker_thread + 540
21 libsystem_pthread.dylib _pthread_wqthread + 288
我们测试了几个不同的机型,只有iPhone 13 Pro Max会发生崩溃。
Q1:为什么会发生这个崩溃?
Q2:相同的逻辑,为什么仅在iPhone 13 Pro Max机型上出现崩溃?
期待您的解答。
Hi there,
I'm wondering if it's possible under iOS 28 developer beta to enable MetalFX scaling info with '{"MTL_HUD_ENABLED": "1" for my App.
This information has been added to Mac, but looks to be absent on iPhone / iPad
Hello, I'm tracking down a bug where useResource doesn't seem to apply proper synchronization when a resource is produced by the render pass then consumed by the compute pass, but when I use MTLFence between the to signal and wait between the render/compute encoders, the artifact goes away.
The resource is created with MTLHazardTrackingModeTracked and useResource is called on the compute encoder after the render pass. Metal API Validation doesn't report any warnings/errors.
Am I misunderstanding the difference between the two APIs? I dug through the Metal documentation and it looks like useResource should handle synchronization given the resource has MTLHazardTrackingModeTracked but on the other hand, MTLFence should be used to ensure proper synchronization between command encoders. Can someone can clarify the difference between the two APIs and when to use them.
I have really enjoyed looking through the code and videos related to Metal 4. Currently, my interest is to update a ReSTIR Project and take advantage of more robust ways to refit acceleration Structures and more powerful ways to access resources.
I am working in Swift and have encountered a couple of puzzles:
What is the 'accepted' way to create a MTL4BufferRange to store indices and vertices?
How do I properly rewrite Swift code to build and compact an Acceleration Structure?
I do realize that this is all in Beta and will happily look through Code Samples this Fall. If other guidance is available earlier, that would be fabulous!
Thank you
I am developing a macOS terminal app, running on an M4 Pro, and using Metal.
I am not able use float8 or float16, both reporting Variable has incomplete type 'float16' (aka '__Reserved_Name__Do_not_use_float16').
Based on the system I should be able to use these. Either it is because it is also compiling to Intel, which they are not allowed, or something else. Either way I have not been able to figure out how to get past this.
IIs there a compiler setting I need to set to make this work? if so which one and what setting do I need? I only want to run this on M processes, on the latest version of OS so not interested in Intel version or backward compatibility.
Hello,
I am experiencing an issue with programmatically capturing a GPU trace using MTLCaptureManager. The .gputrace file that is generated appears to be corrupted, and I'm looking for guidance or a solution.
Description of the Problem:
I am using MTLCaptureManager.sharedCaptureManager to capture a Metal frame and save it to disk.
The generated .gputrace file is consistently reported as 0 bytes in size by the file system.
Crucially, when I compress this 0-byte .gputrace file into a .zip archive, the resulting archive contains the full, expected data. After unzipping, the file can be opened and viewed correctly in Xcode.
However,When inspecting the file's contents using NSFileManager in Objective-C (treating it as a directory), the internal structure is different from a .gputrace file captured directly from Xcode's Metal Debugger.
capture in xcode
capture in file
Finally,When capturing multiple frames programmatically, the first captured frame contains valid buffer data. However, for subsequent frames (starting from the second frame), the corresponding buffer contents are all zero-filled.
Frame 1: All MTLBuffer data is correctly captured and populated.
Frame 2 and onward: The same MTLBuffer objects are present in the trace, but their contents are entirely 0 (i.e., the data is not captured or is corrupted).
In this case, the on-screen display is normal, but the captured frame is incorrect. The frame captured directly in Xcode is also correct. Only the frame captured to a file is abnormal.
Topic:
Graphics & Games
SubTopic:
Metal
I mean…I want to use defaults rather than launching apps via open with the saved environment variables.
This is pretty easy on iOS and other platforms. So what about in macOS?
Context
I’m deploying large language models on iPhone using llama.cpp. A new iPhone Air (12 GB RAM) reports a Metal MTLDevice.recommendedMaxWorkingSetSize of 8,192 MB, and my attempt to load Llama-2-13B Q4_K (~7.32 GB weights) fails during model initialization.
Environment
Device: iPhone Air (12 GB RAM)
iOS: 26
Xcode: 26.0.1
Build: Metal backend enabled llama.cpp
App runs on device (not Simulator)
What I’m seeing
MTLCreateSystemDefaultDevice().recommendedMaxWorkingSetSize == 8192 MiB
Loading Llama-2-13B Q4_K (7.32 GB) fails to complete. Logs indicate memory pressure / allocation issues consistent with the 8 GB working-set guidance.
Smaller models (e.g., 7B/8B with similar quantization) load and run (8B Q4_K provide around 9 tokens/second decoding speed).
Questions
Is 8,192 MB an expected recommendedMaxWorkingSetSize on a 12 GB iPhone?
What values should I expect on other 2025 devices including iPhone 17 (8 GB RAM) and iPhone 17 Pro (12 GB RAM)
Is it strictly enforced by Metal allocations (heaps/buffers), or advisory for best performance/eviction behavior?
Can a process practically exceed this for long-lived buffers without immediate Jetsam risk?
Any guidance for LLM scenarios near the limit?
The following minimal snippet SEGFAULTS with SDK 26.0 and 26.1. Won't crash if I remove async from the enclosing function signature - but it's impractical in a real project.
import Metal
import MetalPerformanceShaders
let SEED = UInt64(0x0)
typealias T = Float16
/* Why ran in async context? Because global GPU object,
and async makeMTLFunction,
and async makeMTLComputePipelineState.
Nevertheless, can trigger the bug without using global
@MainActor
let myGPU = MyGPU()
*/
@main
struct CMDLine {
static func main() async {
let ptr = UnsafeMutablePointer<T>.allocate(capacity: 0)
async let future: Void = randomFillOnGPU(ptr, count: 0)
print("Main thread is playing around")
await future
print("Successfully reached the end.")
}
static func randomFillOnGPU(_ buf: UnsafeMutablePointer<T>, count destbufcount: Int) async {
// let (device, queue) = await (myGPU.device, myGPU.commandqueue)
let myGPU = MyGPU()
let (device, queue) = (myGPU.device, myGPU.commandqueue)
// Init MTLBuffer, async let makeFunction, makeComputePipelineState, etc.
let tempDataType = MPSDataType.uInt32
let randfiller = MPSMatrixRandomMTGP32(device: device, destinationDataType: tempDataType, seed: Int(bitPattern:UInt(SEED)))
print("randomFillOnGPU: successfully created MPSMatrixRandom.")
// try await computePipelineState
// ^ Crashes before this could return
// Or in this minimal case, after randomFillOnGPU() returns
// make encoder, set pso, dispatch, commit...
}
}
actor MyGPU {
let device : MTLDevice
let commandqueue : MTLCommandQueue
init() {
guard let dev: MTLDevice = MPSGetPreferredDevice(.skipRemovable),
let cq = dev.makeCommandQueue(),
dev.supportsFamily(.apple6) || dev.supportsFamily(.mac2)
else { print("Unable to get Metal Device! Exiting"); exit(EX_UNAVAILABLE) }
print("Selected device: \(String(format: "%llX", dev.registryID))")
self.device = dev
self.commandqueue = cq
print("myGPU: initialization complete.")
}
}
See FB20916929. Apparently objc autorelease pool is releasing the wrong address during context switch (across suspension points). I wonder why such obvious case has not been caught before.
After watching WWDC 2025 session "Combine Metal 4 machine learning and graphics", I have decided to give it a shot to integrate the latest MTL4MachineLearningCommandEncoder to my existing render pipeline. After a lot of trial and errors, I managed to set up the pipeline and have the app compiled.
However, I am now stuck on creating a MTLLibrary with .mtlpackage.
Here is the code I have to create a MTLLibrary according the WWDC session https://developer.apple.com/videos/play/wwdc2025/262/?time=550:
let coreMLFilePath = bundle.path(forResource: "my_model", ofType: "mtlpackage")!
let coreMLURL = URL(string: coreMLFilePath)!
do {
metalDevice.makeLibrary(URL: coreMLURL)
} catch {
print("error: \(error)")
}
With the above code, I am getting error:
Error Domain=MTLLibraryErrorDomain Code=1 "Invalid metal package" UserInfo={NSLocalizedDescription=Invalid metal package}
What is the correct way to create a MTLLibrary with .mtlpackage? Do I see this error because the .mtlpackage I am using is incorrect? How should I go with debugging this?
I'd really appreciate if I could get some help on this as I have been stuck with it for some time now. Thanks in advance!
Hi,
I’m testing Unity’s Spaceship HDRP demo on iPhone 17 Pro Max and iPad Pro M4 (iOS 26.1).
Everything renders correctly, and my custom MetalFX Spatial plugin initializes successfully — it briefly reports active scaling (e.g. 1434×660 → 2868×1320 at 50% scaling), then reverts to native rendering a few frames later.
Setup:
Xcode 16.1 (targeting iOS 18)
Unity 2022.3.62f3 (HDRP)
Metal backend
Dynamic Resolution enabled in HDRP assets and cameras
Relevant Xcode console excerpt:
[MetalFXPlugin] MetalFX_Enable(True) called.
[SpaceshipOptions] MetalFX enabled with HDRP dynamic resolution integration.
[SpaceshipOptions] Disabled TAA for MetalFX Spatial.
[SpaceshipOptions] Created runtime RenderTexture: 1434x660
[MetalFX] Spatial scaler created (1434x660 → 2868x1320).
[MetalFX] Processed frame with scaler.
[MetalFXPlugin] Sent RenderTexture (1434x660) to MetalFX. Output target 2868x1320.
[SpaceshipOptions] MetalFX target set: 1434x660
[SpaceshipOptions] Camera targetTexture cleared after MetalFX handoff.
It looks like HDRP clears the camera’s target texture right after MetalFX submits the frame, which causes it to revert to native rendering.
Is there a recommended way to persist or rebind the MetalFX output texture when using HDRP on iOS?
Unity doesn’t appear to support MetalFX in the Editor either:
Thanks!
Hi all,
I'm encountering an issue with Metal raytracing on my M5 MacBook Pro regarding Instance Acceleration Structure (IAS).
Intersection tests suddenly stop working after a certain point in the sampling loop.
Situation
I implemented an offline GPU path tracer that runs the same kernel multiple times per pixel (sampleCount) using metal::raytracing.
Intersection tests are performed using an IAS.
Since this is an offline path tracer, geometries inside the IAS never changes across samples (no transforms or updates).
As sampleCount increases, there comes a point where the number of intersections drops to zero, and remains zero for all subsequent samples.
Here's a code sketch:
let sampleCount: UInt16 = 1024
for sampleIndex: UInt16 in 0..<sampleCount {
// ...
do {
let commandBuffer = commandQueue.makeCommandBuffer()
// Dispatch the intersection kernel.
await commandBuffer.completed()
}
do {
let commandBuffer = commandQueue.makeCommandBuffer()
// Use the intersection test results from the previous command buffer.
await commandBuffer.completed()
}
// ...
}
kernel void intersectAlongRay(
const metal::uint32_t threadIndex [[thread_position_in_grid]],
// ...
const metal::raytracing::instance_acceleration_structure accelerationStructure [[buffer(2)]],
// ...
)
{
// ...
const auto result = intersector.intersect(ray, accelerationStructure);
switch (result.type) {
case metal::raytracing::intersection_type::triangle: {
// Write intersection result to device buffers.
break;
}
default:
break;
}
Observations
Encoding both the intersection kernel and the subsequent result usage in the same command buffer does not resolve the problem.
Switching from IAS to Primitive Acceleration Structure (PAS) fixes the problem.
Rebuilding the IAS for each sample also resolves the issue.
Intersections produce inconsistent results even though the IAS and rays are identical — Image 1 shows a hit, while Image 2 shows a miss.
Questions
Am I misusing IAS in some way ?
Could this be a Metal bug ?
Any guidance or confirmation would be greatly appreciated.
My app has a number of heterogeneous GPU workloads that all run concurrently. Some of these should be executed with the highest priority because the app’s responsiveness depends on them, while others are triggered by file imports and the like which should have a low priority. If this was running on the CPU I’d assign the former User Interactive QoS and the latter Utility QoS. Is there an equivalent to this for GPU work?