Hi,
Introducing Swift Concurrency to my Metal app has been a bit challenging as Swift Concurrency is limited by the cooperative thread pool.
GPU work is obviously not CPU bound and can block forward moving progress, especially when using waitUntilCompleted on the command buffer. For concurrent render work this has the potential of under utilizing the CPU and even creating dead locks.
My question is, what is the Metal's teams general recommendation when it comes to concurrency? It seems to me that Dispatch or OperationQueues are still the preferred way for Metal bound tasks in order to gain maximum performance?
To integrate with Swift Concurrency my idea is to use continuations that kick off render jobs via Dispatch or Queues? Would this be the best solution to bridge async tasks with Metal work?
Thanks!
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
I'm implementing optimized matmul on metal: https://github.com/crynux-ai/metal-matmul/blob/main/metal/1_shared_mem.metal
I notice that performance is significantly different with different threadgroup memory set in
[computeEncoder setThreadgroupMemoryLength]
All other lines are exactly same, the only difference is this parameter.
Matmul performance is roughly 250 GFLops if I set 32768 (max bytes allowed on this M1 Max),
but 400 GFLops if I set 8192.
Why does this happen? How can I optimize it?
Topic:
Graphics & Games
SubTopic:
Metal
The game physics work as expected using GTPK 2.0 using Crossover 24 or Whisky. However, using GPTK 2.1 with Crossover 25, the player and camera physics misbehave. See https://www.reddit.com/r/WWEGames/comments/1jx9mph/the_siamese_elbow/ and https://www.reddit.com/r/WWEGames/comments/1jx9ow4/camera_glitch/
Full video also linked in the Reddit post.
I have also submitted this bug via the feedback assistant.
Hi,
seems MSL is missing support for a clock() shader instruction available in other graphics APIs like Vulkan or OpenGL for example..
useful for counting cost in number of clock cycles of some code insider shader with much finer granularity than launching a micro kernel with same instructions and measuring cycles cost from CPU..
also useful for MoltenVK to support that extensions..
thanks..
I have run into an issue where I am trying to use atomic_float in a swift package but I cannot get things to compile because it appears that the Swift Package Manager doesn't support Metal 3 (atomic_float is Metal 3 functionality). Is there any way around this? I am using
// swift-tools-version: 6.1
and my Metal code includes:
#include <metal_stdlib>
#include <metal_geometric>
#include <metal_math>
#include <metal_atomic>
using namespace metal;
kernel void test(device atomic_float* imageBuffer [[buffer(1)]],
uint id [[ thread_position_in_grid ]]) {
}
But I get an error on the definition of atomic_float .
Any help, one more importantly, where I could have found this information about this limitation, would be helpful.
-RadBobby
Topic:
Graphics & Games
SubTopic:
Metal
In the Creating A 3D Application With Hydra Rendering tutorial on the Apple Developer website, on the last step where I execute this command:
cmake -S ~/Users/macuser/CreatingA3DApplicationWithHydraRendering/ -B ~/Users/macuser/CreatingA3DApplicationWithHydraRendering/
I keep getting an error:
CMake Error at CMakeLists.txt:5 (include):
include could not find requested file:
/Users/macuser/USDInstall/bin/pxrConfig.cmake
I've tried to follow the instructions as mentioned in the README.md file included in the project files at least 5 times as well as moving the pxrConfig.cmake file around and copying it in different folders, then executed the command and was still unsuccessful into generating the proper file expected to compile and render the HydraPlayer renderer. How do I get cmake to generate the Xcode file to create the HydraPlayer renderer?
Hello!
I'm a developer working on a plugin for the Elgato Stream Deck, called GPU Metrics. The plugin currently only works on Windows but I'd like to bring it to macOS. However, based on forum posts I've read (and StackOverflow) there isn't a very clear path to query GPU metrics like usage, temperature, used GPU memory, and power consumption. There are some tools out there that do similar things, but I wanted to see what would be the recommendation from Apple's engineering team to get this data via a public API.
Requirements:
Access GPU utilization, temperature, memory usage, power usage
C/C++ based API for querying the metrics so I can expose the data to JavaScript via Node Addon
No need to compatibile with Intel-based Macs, as Apple silicon will be fine for now
Plugin GitHub
Thank you!
Noah
hi
When analyzing our game using Instruments, I've always been confused about the two items "Drawable Present" and "Drawable Presented" in the GPU column. The timing of Drawable Present seems to be when the CPU layer calls commandbuffer:present, rather than when the actual encoding is completed on the GPU. Also, what does drawable presented specifically mean? In our case, when a CPU stall occurs, it appears that the vsync interval changes in the next frame, and a surface that has already been calculated is not displayed. Why is this happening?
I am building a MacOS desktop app (https://anukari.com) that is using Metal compute to do real-time audio/DSP processing, as I have a problem that is highly parallelizable and too computationally expensive for the CPU.
However it seems that the way in which I am using the GPU, even when my app is fully compute-limited, the OS never increases the power/performance state. Because this is a real-time audio synthesis application, it's a huge problem to not be able to take advantage of the full clock speeds that the GPU is capable of, because the app can't keep up with real-time.
I discovered this issue while profiling the app using Instrument's Metal tracing (and Game tracing) modes. In the profiling configuration under "Metal Application" there is a drop-down to select the "Performance State." If I run the application under Instruments with Performance State set to Maximum, it runs amazingly well, and all my problems go away.
For comparison, when I run the app on its own, outside of Instruments, the expensive GPU computation it's doing takes around 2x as long to complete, meaning that the app performs half as well.
I've done a ton of work to micro-optimize my Metal compute code, based on every scrap of information from the WWDC videos, etc. A problem I'm running into is that I think that the more efficient I make my code, the less it signals to the OS that I want high GPU clock speeds!
I think part of why the OS is confused is that in most use cases, my computation can be done using only a small number of Metal threadgroups. I'm guessing that the OS heuristics see that only a small fraction of the GPU is saturated and fail to scale up the power/clock state.
I'm not sure what to do here; I'm in a bit of a bind. One possibility is that I intentionally schedule busy work -- spin threadgroups just to waste energy and signal to the OS that I need higher clock speeds. This is obviously a really bad idea, but it might work.
Is there any other (better) way for my app to signal to the OS that it is doing real-time latency-sensitive computation on the GPU and needs the clock speeds to be scaled up?
Note that game mode is not really an option, as my app also runs as an AU plugin inside hosts like Garageband, so it can't be made fullscreen, etc.
I am trying to load some PNG data with MTKTextureLoader newTextureWithData,but the result shows wrong at the alpha area.
Here is the code. I have an image URL, after it downloads successfully, I try to use the data or UIImagePNGRepresentation (image), they all show wrong.
UIImage *tempImg = [UIImage imageWithData:data];
CGImageRef cgRef = tempImg.CGImage;
MTKTextureLoader *loader = [[MTKTextureLoader alloc] initWithDevice:device];
id<MTLTexture> temp1 = [loader newTextureWithData:data options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil];
NSData *tempData = UIImagePNGRepresentation(tempImg);
id<MTLTexture> temp2 = [loader newTextureWithData:tempData options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil];
id<MTLTexture> temp3 = [loader newTextureWithCGImage:cgRef options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil];
}] resume];
Hey, I've been struggling with this for some days now.
I am trying to write to a sparse texture in a compute shader. I'm performing the following steps:
Set up a sparse heap and create a texture from it
Map the whole area of the sparse texture using updateTextureMapping(..)
Overwrite every value with the value "4" in a compute shader
Blit the texture to a shared buffer
Assert that the values in the buffer are "4".
I have a minimal example (which is still pretty long unfortunately).
It works perfectly when removing the line heapDesc.type = .sparse.
What am I missing? I could not find any information that writes to sparse textures are unsupported. Any help would be greatly appreciated.
import Metal
func sparseTexture64x64Demo() throws {
// ── Metal objects
guard let device = MTLCreateSystemDefaultDevice()
else { throw NSError(domain: "SparseNotSupported", code: -1) }
let queue = device.makeCommandQueue()!
let lib = device.makeDefaultLibrary()!
let pipeline = try device.makeComputePipelineState(function: lib.makeFunction(name: "addOne")!)
// ── Texture descriptor
let width = 64, height = 64
let format: MTLPixelFormat = .r32Uint // 4 B per texel
let desc = MTLTextureDescriptor()
desc.textureType = .type2D
desc.pixelFormat = format
desc.width = width
desc.height = height
desc.storageMode = .private
desc.usage = [.shaderWrite, .shaderRead]
// ── Sparse heap
let bytesPerTile = device.sparseTileSizeInBytes
let meta = device.heapTextureSizeAndAlign(descriptor: desc)
let heapBytes = ((bytesPerTile + meta.size + bytesPerTile - 1) / bytesPerTile) * bytesPerTile
let heapDesc = MTLHeapDescriptor()
heapDesc.type = .sparse
heapDesc.storageMode = .private
heapDesc.size = heapBytes
let heap = device.makeHeap(descriptor: heapDesc)!
let tex = heap.makeTexture(descriptor: desc)!
// ── CPU buffers
let bytesPerPixel = MemoryLayout<UInt32>.stride
let rowStride = width * bytesPerPixel
let totalBytes = rowStride * height
let dstBuf = device.makeBuffer(length: totalBytes, options: .storageModeShared)!
let cb = queue.makeCommandBuffer()!
let fence = device.makeFence()!
// 2. Map the sparse tile, then signal the fence
let rse = cb.makeResourceStateCommandEncoder()!
rse.updateTextureMapping(
tex,
mode: .map,
region: MTLRegionMake2D(0, 0, width, height),
mipLevel: 0,
slice: 0)
rse.update(fence) // ← capture all work so far
rse.endEncoding()
let ce = cb.makeComputeCommandEncoder()!
ce.waitForFence(fence)
ce.setComputePipelineState(pipeline)
ce.setTexture(tex, index: 0)
let threadsPerTG = MTLSize(width: 8, height: 8, depth: 1)
let tgCount = MTLSize(width: (width + 7) / 8,
height: (height + 7) / 8,
depth: 1)
ce.dispatchThreadgroups(tgCount, threadsPerThreadgroup: threadsPerTG)
ce.updateFence(fence)
ce.endEncoding()
// Blit texture into shared buffer
let blit = cb.makeBlitCommandEncoder()!
blit.waitForFence(fence)
blit.copy(
from: tex,
sourceSlice: 0,
sourceLevel: 0,
sourceOrigin: MTLOrigin(x: 0, y: 0, z: 0),
sourceSize: MTLSize(width: width, height: height, depth: 1),
to: dstBuf,
destinationOffset: 0,
destinationBytesPerRow: rowStride,
destinationBytesPerImage: totalBytes)
blit.endEncoding()
cb.commit()
cb.waitUntilCompleted()
assert(cb.error == nil, "GPU error: \(String(describing: cb.error))")
// ── Verify a few texels
let out = dstBuf.contents().bindMemory(to: UInt32.self, capacity: width * height)
print("first three texels:", out[0], out[1], out[width]) // 0 1 64
assert(out[0] == 4 && out[1] == 4 && out[width] == 4)
}
Metal shader:
#include <metal_stdlib>
using namespace metal;
kernel void addOne(texture2d<uint, access::write> tex [[texture(0)]],
uint2 gid [[thread_position_in_grid]])
{
tex.write(4, gid);
}
Hi there,
I'm wondering if it's possible under iOS 28 developer beta to enable MetalFX scaling info with '{"MTL_HUD_ENABLED": "1" for my App.
This information has been added to Mac, but looks to be absent on iPhone / iPad
Im new in the Mac area but for sure not UE. Windows is a long process to packaging but it could be done. All the documentation for Epic and from the internet is basically non existent with exactly how to package a project within UE. I have Xcode installed which makes sense, agreed to terms and install for MacOS, I've been able to make a project for several weeks now and want to package for a test run for my friends to play on Windows. Now I just get this in the log:
UATHelper: Packaging (Mac): ERROR: Failed to finalize the .app with Xcode. Check the log for more information
UATHelper: Packaging (Mac): Trace written to file /Users/rileysleger/Library/Logs/Unreal Engine/LocalBuildLogs/UBA-ProjectNightTerror-Mac-Development.uba with size 12.6kb
UATHelper: Packaging (Mac): Total time in Unreal Build Accelerator local executor: 8.12 seconds
UATHelper: Packaging (Mac): Result: Failed (OtherCompilationError)
UATHelper: Packaging (Mac): Total execution time: 9.71 seconds
PackagingResults: Error: Failed to finalize the .app with Xcode. Check the log for more information
UATHelper: Packaging (Mac): Took 9.77s to run dotnet, ExitCode=6
UATHelper: Packaging (Mac): UnrealBuildTool failed. See log for more details. (/Users/rileysleger/Library/Logs/Unreal Engine/LocalBuildLogs/UBA-ProjectNightTerror-Mac-Development.txt)
UATHelper: Packaging (Mac): AutomationTool executed for 0h 0m 10s
UATHelper: Packaging (Mac): AutomationTool exiting with ExitCode=6 (6)
UATHelper: Packaging (Mac): RunUAT ERROR: AutomationTool was unable to run successfully. Exited with code: 6
PackagingResults: Error: AutomationTool was unable to run successfully. Exited with code: 6
PackagingResults: Error: Unknown Error
This absolutely makes no sense to me. Anyone have ideas?
The sample code here, has code like:
// Create a display link capable of being used with all active displays
cvReturn = CVDisplayLinkCreateWithActiveCGDisplays(&_displayLink);
But that function's doc says it's deprecated and to use NSView/NSWindow/NSScreen displayLink instead. That returns CADisplayLink, not CVDisplayLink.
Also the documentation for that displayLink method is completely empty. I'm not sure if I'm supposed to add it to run loop, or what, after I get it.
It would be nice to get an updated version of this sample project and/or have some documentation in NSView.displayLink
Topic:
Graphics & Games
SubTopic:
Metal
When I take a frame capture of my application in Xcode, it shows a warning that reads "Your application created separate command encoders which can be combined into a single encoder. By combining these encoders you may reduce your application's load/store bandwidth usage."
In the minimal reproduction case I've identified for this warning, I have two render pipeline states: The first writes to the current drawable, the depth buffer, and a secondary color buffer. The second writes only to the current drawable.
Because these are writing to a different set of outputs, I was initially creating two separate render command encoders to handle the draws under each of these states.
My understanding is that Xcode is telling me I could only create one, however when I try to do that, I get runtime asserts when attempting to apply the second render pipeline state since it doesn't have a matching attachment configured for the second color buffer or for the depth buffer, so I can't just combine the encoders.
Is the only solution here to detect and propagate forward the color/depth attachments from the first state into the creation of the second state?
Is there any way to suppress this specific warning in Xcode?
Topic:
Graphics & Games
SubTopic:
Metal
Description:
In the official visionOS 26 Hover Effect sample code project , I encountered an issue where the event.trackingAreaIdentifier returned by onSpatialEvent does not reset as expected.
Steps to Reproduce:
Select an object with trackingAreaID = 6 in the sample app.
Look at a blank space (outside any tracking area) and perform a pinch gesture .
Expected Behavior:
The event.trackingAreaIdentifier should return 0 when interacting with a non-tracking area.
Actual Behavior:
The event.trackingAreaIdentifier still returns 6, even after restarting the app or killing the process. This persists regardless of where the pinch gesture is performed
hi everyone,
我们发现了一个和Metal相关崩溃。应用中使用了Metal相关的接口,在进行性能测试时,打开了设置-开发者-显示HUD图形。运行应用后,正常展示HUD,但应用很快发生了崩溃,日志主要信息如下:
Incident Identifier: 1F093635-2DB8-4B29-9DA5-488A6609277B
CrashReporter Key: 233e54398e2a0266d95265cfb96c5a89eb3403fd
Hardware Model: iPhone14,3
Process: waimai [16584]
Path: /private/var/containers/Bundle/Application/CCCFC0AE-EFB8-4BD8-B674-ED089B776221/waimai.app/waimai
Identifier:
Version: 61488 (8.53.0)
Code Type: ARM-64
Parent Process: ? [1]
Date/Time: 2025-06-12 14:41:45.296 +0800
OS Version: iOS 18.0 (22A3354)
Report Version: 104
Monitor Type: Mach Exception
Exception Type: EXC_BAD_ACCESS (SIGBUS)
Exception Codes: KERN_PROTECTION_FAILURE at 0x000000014fffae00
Crashed Thread: 57
Thread 57 Crashed:
0 libMTLHud.dylib esfm_GenerateTriangesForString + 408
1 libMTLHud.dylib esfm_GenerateTriangesForString + 92
2 libMTLHud.dylib Renderer::DrawText(char const*, int, unsigned int) + 204
3 libMTLHud.dylib Overlay::onPresent(id<CAMetalDrawable>) + 1656
4 libMTLHud.dylib CAMetalDrawable_present(void (*)(), objc_object*, objc_selector*) + 72
5 libMTLHud.dylib invocation function for block in void replaceMethod<void>(objc_class*, objc_selector*, void (*)(void (*)(), objc_object*, objc_selector*)) + 56
6 Metal __45-[_MTLCommandBuffer presentDrawable:options:]_block_invoke + 104
7 Metal MTLDispatchListApply + 52
8 Metal -[_MTLCommandBuffer didScheduleWithStartTime:endTime:error:] + 312
9 IOGPU IOGPUNotificationQueueDispatchAvailableCompletionNotifications + 136
10 IOGPU __IOGPUNotificationQueueSetDispatchQueue_block_invoke + 64
11 libdispatch.dylib _dispatch_client_callout4 + 20
12 libdispatch.dylib _dispatch_mach_msg_invoke + 464
13 libdispatch.dylib _dispatch_lane_serial_drain + 368
14 libdispatch.dylib _dispatch_mach_invoke + 456
15 libdispatch.dylib _dispatch_lane_serial_drain + 368
16 libdispatch.dylib _dispatch_lane_invoke + 432
17 libdispatch.dylib _dispatch_lane_serial_drain + 368
18 libdispatch.dylib _dispatch_lane_invoke + 380
19 libdispatch.dylib _dispatch_root_queue_drain_deferred_wlh + 288
20 libdispatch.dylib _dispatch_workloop_worker_thread + 540
21 libsystem_pthread.dylib _pthread_wqthread + 288
我们测试了几个不同的机型,只有iPhone 13 Pro Max会发生崩溃。
Q1:为什么会发生这个崩溃?
Q2:相同的逻辑,为什么仅在iPhone 13 Pro Max机型上出现崩溃?
期待您的解答。
Hello,
Thank you for attending today’s Metal & game technologies group lab at WWDC25!
We were delighted to answer many questions from developers and energized by the community engagement.
We hope you enjoyed it and welcome your feedback.
We invite you to carry on the conversation here, particularly if your question appeared in Slido and we were unable to answer it during the lab.
If your question received feedback let us know if you need clarification.
You may want to ask your question again in a different lab e.g. visionOS tomorrow.
(We realize that this can be confusing when frameworks interoperate)
We have a lot to learn from each other so let’s get to Q&A and make the best of WWDC25! 😃
Looking forward to your questions posted in new threads.
Hi ,
My application meet below crash backtrace at very low repro rate from the public users, i do not see it relate to a specific iOS version or iPhone model. The last code line from my application is calling CAMetalLayer nextDrawable API.
I did some basic studying, suppose it may relate to the wrong CAMetaLayer configuration, like
frame property w or h <= 0.0
bounds property w or h <= 0.0
drawableSize w or h <= 0.0 or w or h > max value (like 16384)
Not sure my above thinking is right or not? Will the UIView which my CAMetaLayer attached will cause such nextDrawable crash or not ?
Thanks a lot
Main Thread - Crashed
libsystem_kernel.dylib
__pthread_kill
libsystem_c.dylib
abort
libsystem_c.dylib
__assert_rtn
Metal
MTLReportFailure.cold.1
Metal
MTLReportFailure
Metal
_MTLMessageContextEnd
Metal
-[MTLTextureDescriptorInternal validateWithDevice:]
AGXMetalA13
0x245b1a000 + 4522096
QuartzCore
allocate_drawable_texture(id<MTLDevice>, __IOSurface*, unsigned int, unsigned int, MTLPixelFormat, unsigned long long, CAMetalLayerRotation, bool, NSString*, unsigned long)
QuartzCore
get_unused_drawable(_CAMetalLayerPrivate*, CAMetalLayerRotation, bool, bool)
QuartzCore
CAMetalLayerPrivateNextDrawableLocked(CAMetalLayer*, CAMetalDrawable**, unsigned long*)
QuartzCore
-[CAMetalLayer nextDrawable]
SpaceApp
-[MetalRender renderFrame:] MetalRenderer.mm:167
SpaceApp
-[FrameBuffer acceptFrame:] VideoRender.mm:173
QuartzCore
CA::Display::DisplayLinkItem::dispatch_(CA::SignPost::Interval<(CA::SignPost::CAEventCode)835322056>&)
QuartzCore
CA::Display::DisplayLink::dispatch_items(unsigned long long, unsigned long long, unsigned long long)
QuartzCore
CA::Display::DisplayLink::dispatch_deferred_display_links(unsigned int)
UIKitCore
_UIUpdateSequenceRun
UIKitCore
schedulerStepScheduledMainSection
UIKitCore
runloopSourceCallback
CoreFoundation
__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__
CoreFoundation
__CFRunLoopDoSource0
CoreFoundation
__CFRunLoopDoSources0
CoreFoundation
__CFRunLoopRun
CoreFoundation
CFRunLoopRunSpecific
GraphicsServices
GSEventRunModal
UIKitCore
-[UIApplication _run]
UIKitCore
UIApplicationMain
I am using Unreal Engine 5.6 on a MacBook Pro with an M3 chip and macOS 15.5. I’ve installed Xcode and accepted the license, but Unreal is not detecting the latest Metal Shader Standard (Metal v3.0). The maximum version Unreal sees is Metal v2.4, even though the hardware and OS should support Metal 3.0. I’ve also run sudo xcode-select -s /Applications/Xcode.app and accepted the license via Terminal. Is there anything in Xcode settings, SDK availability, or system permissions that could be preventing access to Metal 3.0 features?"