[CRITICAL] Metal API Memory Leak - Heap Memory Never Released to OS (CWE-400)
Security Classification
This issue constitutes a resource exhaustion vulnerability (CWE-400):
Aspect
Details
Type
Uncontrolled Resource Consumption
CWE
CWE-400
Vector
Local (any Metal application)
Impact
System instability, denial of service
User Control
None - no mitigation available
Recovery
Requires application restart
Summary
Metal heap allocations are never released back to macOS, even when the memory is entirely unused. This causes continuous, unbounded memory growth until system instability or crash. The issue affects any application using Metal API heap allocation.
This was discovered in Unreal Engine 5, but reproduces in a completely blank UE5 project with zero application code - confirming this is Metal framework behavior, not application-level.
Environment
OS: macOS Tahoe 26.2
Hardware: Apple Silicon M4 Max (also reproduced on M1, M2, M3)
API: Metal
Reproduction Steps
Run any Metal application that allocates and deallocates GPU buffers via Metal heaps
Open Activity Monitor and observe the application's memory usage
Let the application run idle (no user interaction required)
Observe memory growing continuously at ~1-2 MB per second
Memory never plateaus or stabilizes
Eventually system becomes unstable
For testing: Any Unreal Engine 5.4+ project on macOS will reproduce this. Even a blank project with no gameplay code exhibits the leak. (Tested on UE 5.7.1)
Observed Behavior
Memory Analysis
Using Unreal's memreport -full command, two reports taken 86 seconds apart:
Metric
Report 1 (183s)
Report 2 (269s)
Delta
Process Physical
4373.64 MB
4463.39 MB
+89.75 MB
Metal Heap Buffer
7168 MB
8192 MB
+1024 MB
Unused Heap
3453 MB
4477 MB
+1024 MB
Object Count
73,840
73,840
0 (no change)
Key Finding
Metal Heap grew by exactly 1 GB while "Unused Heap" also grew by 1 GB. This demonstrates:
Metal is allocating new heap blocks in ~1 GB increments
Previously allocated heap memory becomes "unused" but is never released
The unused memory accumulates indefinitely
No application-level objects are leaking (count remains constant)
Memory Growth Pattern
Continuous growth while idle (no user interaction)
Growth rate: approximately 1-2 MB per second
No plateau or stabilization occurs
Metal allocates new 1 GB heap blocks rather than reusing freed space
Eventually leads to system instability and crash
What is NOT Causing This
We verified the following are NOT the source:
Application objects - Object count remains constant
Application code - Blank project with no code reproduces the issue
Texture streaming - Disabling texture streaming had no effect
CPU garbage collection - Running GC has no effect (this is GPU memory)
Mitigations Attempted (None Worked)
setPurgeableState
Setting resources to purgeable state before release:
[buffer setPurgeableState:MTLPurgeableStateEmpty];
Result: Metal ignores this hint and does not reclaim heap memory.
Avoiding Heap Pooling
Forcing individual buffer allocations instead of heap-based pooling.
Result: Leak persists - Metal still manages underlying allocations.
Aggressive Buffer Compaction
Attempting to compact/defragment buffers within heaps every frame.
Result: Only moves data between existing heaps. Does NOT release heaps back to OS.
Reducing Pool Sizes
Minimizing all buffer pool sizes to force more frequent reuse.
Result: Slightly slows the leak rate but does not stop it.
Root Cause Analysis
How Metal Heap Allocation Appears to Work
Metal allocates GPU heap blocks in large chunks (~1 GB observed)
Application requests buffers from these heaps
When application releases buffers, memory becomes "unused" within the heap
Metal does NOT release heap blocks back to macOS, even when entirely unused
When fragmentation prevents reuse, Metal allocates new heap blocks
Result: Continuous memory growth with no upper bound
The Core Problem
There appears to be no Metal API to force heap memory release. The only way to reclaim this memory is to destroy the Metal device entirely, which requires restarting the application.
Expected Behavior
Metal should:
Release unused heaps - When a heap block is entirely unused, release it back to macOS
Respect purgeable hints - Honor setPurgeableState calls from applications
Compact allocations - Defragment heap allocations to reduce fragmentation
Provide control APIs - Allow applications to request heap compaction or release
Enforce limits - Have configurable maximum heap memory consumption
Security Implications
Local Denial of Service - Any Metal application can exhaust system memory, causing instability affecting all running applications
Memory Pressure Attack - Forces other applications to swap to disk, degrading system-wide performance
No Upper Bound - Memory consumption continues until system failure
Unmitigable - End users have no way to prevent or limit the leak
Affects All Metal Apps - Any application using Metal heaps is potentially affected
Impact
Applications become unstable after extended use
System-wide performance degrades as memory pressure increases
Users must periodically restart applications
Developers cannot work around this at the application level
Long-running applications (games, creative tools, servers) are particularly affected
Request
Investigate Metal heap memory management behavior
Implement heap release when blocks become entirely unused
Honor setPurgeableState hints from applications
Consider providing an API for applications to request heap compaction
Document any intended behavior or workarounds
Additional Notes
This issue has been observed across multiple Unreal Engine versions (5.4, 5.7) and multiple Apple Silicon generations (M1 through M4). The behavior is consistent and reproducible.
The Unreal Engine team has implemented various CVars to attempt mitigation (rhi.Metal.HeapBufferBytesToCompact, rhi.Metal.ResourcePurgeInPool, etc.) but none successfully address the issue because the root cause is at the Metal framework level.
Tested: January 2026
Platform: macOS Tahoe 26.2, Apple Silicon (M1/M2/M3/M4)
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
I noticed that when the render command encoder adds no draw calls an apps memory usage seems to grow unboundedly. Using a super simple MTKView-based drawing with the following delegate (code at end).
If I add the simplest of draw calls, e.g., a single vertex, the app's memory usage is normal, around 100-ish MBs.
I am attaching a couple screenshot, one from Xcode and one from Instruments.
What's going on here? Is this an illegal program? If yes, why does it not crash, such as if the encode or command buffer weren't ended.
Or is there some race condition at play here due to the lack of draws?
class Renderer: NSObject, MTKViewDelegate {
var device: MTLDevice
var commandQueue: MTL4CommandQueue
var commandBuffer: MTL4CommandBuffer
var allocator: MTL4CommandAllocator
override init() {
guard let d = MTLCreateSystemDefaultDevice(),
let queue = d.makeMTL4CommandQueue(),
let cmdBuffer = d.makeCommandBuffer(),
let alloc = d.makeCommandAllocator()
else {
fatalError("unable to create metal 4 objects")
}
self.device = d
self.commandQueue = queue
self.commandBuffer = cmdBuffer
self.allocator = alloc
super.init()
}
func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {}
func draw(in view: MTKView) {
guard let drawable = view.currentDrawable else { return }
commandBuffer.beginCommandBuffer(allocator: allocator)
guard let descriptor = view.currentMTL4RenderPassDescriptor,
let encoder = commandBuffer.makeRenderCommandEncoder(
descriptor: descriptor
)
else {
fatalError("unable to create encoder")
}
encoder.endEncoding()
commandBuffer.endCommandBuffer()
commandQueue.waitForDrawable(drawable)
commandQueue.commit([commandBuffer])
commandQueue.signalDrawable(drawable)
drawable.present()
}
}
Topic:
Graphics & Games
SubTopic:
Metal
My goal is to print a debug message from a shader. I follow the guide that orders to set -fmetal-enable-logging metal compiler flag and following environment variables:
MTL_LOG_LEVEL=MTLLogLevelDebug
MTL_LOG_BUFFER_SIZE=2048
MTL_LOG_TO_STDERR=1
However there's an issue with the guide, it's only covers Xcode project setup, however I'm working on a Swift Package. It has a Metal-only target that's included into main target like this:
targets: [
// A separate target for shaders.
.target(
name: "MetalShaders",
resources: [
.process("Metal")
],
plugins: [
// https://github.com/schwa/MetalCompilerPlugin
.plugin(name: "MetalCompilerPlugin", package: "MetalCompilerPlugin")
]
),
// Main target
.target(
name: "MegApp",
dependencies: ["MetalShaders"]
),
.testTarget(
name: "MegAppTests",
dependencies: [
"MegApp",
"MetalShaders",
]
]
So to apply compiler flag I use MetalCompilerPlugin which emits debug.metallib, it also allows to define DEBUG macro for shaders. This code compiles:
#ifdef DEBUG
logger.log_error("Hello There!");
os_log_default.log_debug("Hello thread: %d", gid);
// this proves that code exectutes
result.flag = true;
#endif
Environment is set via .xctestplan and valideted to work with ProcessInfo. However, nothing is printed to Xcode console nor to Console app.
In attempt to fix it I'm trying to setup a MTLLogState, however the makeLogState(descriptor:) fails with error:
if #available(iOS 18.0, *) {
let logDescriptor = MTLLogStateDescriptor()
logDescriptor.level = .debug
logDescriptor.bufferSize = 2048
// Error Domain=MTLLogStateErrorDomain Code=2 "Cannot create residency set for MTLLogState: (null)" UserInfo={NSLocalizedDescription=Cannot create residency set for MTLLogState: (null)}
let logState = try! device.makeLogState(descriptor: logDescriptor)
commandBufferDescriptor.logState = logState
}
Some LLMs suggested that this is connected with Simulator, and truly, I run the tests on simulator. However tests don't want to run on iPhone... I found solution running them on My Mac (Mac Catalyst). Surprisingly descriptor log works there, even without MTLLogState. But the Simulator behaviour seems like a bug...
On MacBook Pro M3 14" I can profile the Metal App performance by running it, then clicking on the M icon and choosing profile after replay.
On Mac Studio M2 Ultra I cannot: the profiler starts and crashes. I have tried everything including reinstalling the OS, Xcode, the Metal SDK, you name it.
The app uses the Metal 4 API. The content of the replayer errorinfo report is shown at the end.
Any ideas what is going on here and/or what else I can do do root cause this and fix it?
FWIW, it was worse on 26.1 (Xcode just reported Metal 4 profiling not available). In 26.2 Xcode attempts to profile and invariably crashes.
=== Error summary: ===
1x DYErrorDomain (512) - guest app crashed (512)
1x com.apple.gputools.MTLReplayer (100) - Abort trap: 6
=== First Error ===
Domain: DYErrorDomain
Error code: 512
Description: guest app crashed (512)
GTErrorKeyPID: 26913
GTErrorKeyProcessName: GPUToolsReplayService
GTErrorKeyCrashDate: 2026-01-09 19:22:52 +0000
=== Underlying Error #1 ===
Domain: com.apple.gputools.MTLReplayer
Error code: 100
Description: Abort trap: 6
Call stack:
0 GPUToolsReplay 0x0000000249c25850 MakeNSError + 284
1 GPUToolsReplay 0x0000000249c26428 HandleCrashSignal + 252
2 libsystem_platform.dylib 0x00000001856c7744 _sigtramp + 56
3 libsystem_pthread.dylib 0x00000001856bd888 pthread_kill + 296
4 libsystem_c.dylib 0x00000001855c2850 abort + 124
5 libsystem_c.dylib 0x00000001855c1a84 err + 0
6 IOGPU 0x00000001a9ea60a8 -[IOGPUMetal4CommandQueue _commit:count:commitFeedback:].cold.1 + 0
7 IOGPU 0x00000001a9ea0df8 __77-[IOGPUMetal4CommandQueue commitFillArgs:count:args:argsSize:commitFeedback:]_block_invoke + 0
8 IOGPU 0x00000001a9ea1004 -[IOGPUMetal4CommandQueue _commit:count:commitFeedback:] + 148
9 AGXMetalG14X 0x00000001158a2c98 -[AGXG14XFamilyCommandQueue_mtlnext noMergeCommit:count:options:commitFeedback:error:] + 116
10 AGXMetalG14X 0x0000000115a45c14 +[AGXG14XFamilyRenderContext_mtlnext mergeRenderEncoders:count:options:commitFeedback:queue:error:] + 4740
11 AGXMetalG14X 0x00000001158a2b34 -[AGXG14XFamilyCommandQueue_mtlnext commit:count:options:] + 96
12 GPUToolsReplay 0x0000000249bf0644 GTMTLReplayController_defaultDispatchFunction_noPinning + 2744
13 GPUToolsReplay 0x0000000249befb10 GTMTLReplayController_defaultDispatchFunction + 1368
14 GPUToolsReplay 0x0000000249b7a61c _ZL16DispatchFunctionP21GTMTLReplayControllerPK11GTTraceFuncRb + 476
15 GPUToolsReplay 0x0000000249b8603c ___ZN35GTUSCSamplingStreamingManagerHelper19StreamFrameTimeDataEv_block_invoke + 456
16 Foundation 0x0000000186f6c878 __NSBLOCKOPERATION_IS_CALLING_OUT_TO_A_BLOCK__ + 24
17 Foundation 0x0000000186f6c740 -[NSBlockOperation main] + 96
18 Foundation 0x0000000186f6c6d8 __NSOPERATION_IS_INVOKING_MAIN__ + 16
19 Foundation 0x0000000186f6c308 -[NSOperation start] + 640
20 Foundation 0x0000000186f6c080 __NSOPERATIONQUEUE_IS_STARTING_AN_OPERATION__ + 16
21 Foundation 0x0000000186f6bf70 __NSOQSchedule_f + 164
22 libdispatch.dylib 0x00000001855104d0 _dispatch_block_async_invoke2 + 148
23 libdispatch.dylib 0x000000018551aad4 _dispatch_client_callout + 16
24 libdispatch.dylib 0x00000001855056e4 _dispatch_continuation_pop + 596
25 libdispatch.dylib 0x0000000185504d58 _dispatch_async_redirect_invoke + 580
26 libdispatch.dylib 0x0000000185512fc8 _dispatch_root_queue_drain + 364
27 libdispatch.dylib 0x0000000185513784 _dispatch_worker_thread2 + 180
28 libsystem_pthread.dylib 0x00000001856b9e10 _pthread_wqthread + 232
29 libsystem_pthread.dylib 0x00000001856b8b9c start_wqthread + 8
Replayer breadcrumbs:
[
]
GTErrorKeyProcessSignal: SIGABRT
=== Setup ===
Capture device: star.localdomain (Mac14,14) - macOS 26.2 (25C56) - 0BA10D1D-D340-5F2E-934B-536675AF9BA1
Metal version: 370.64.2
Supported graphics APIs:
Metal device: Apple M2 Ultra
Supported GPU families: Apple1 Apple2 Apple3 Apple4 Apple5 Apple6 Apple7 Apple8 Mac1 Mac2 Common1 Common2 Common3 Metal3 Metal4
Replay device: star (Mac14,14) - macOS 26.2 (25C56) - 0BA10D1D-D340-5F2E-934B-536675AF9BA1
Metal version: 370.64.2
Supported graphics APIs:
Metal device: Apple M2 Ultra
Supported GPU families: Apple1 Apple2 Apple3 Apple4 Apple5 Apple6 Apple7 Apple8 Mac1 Mac2 Common1 Common2 Common3 Metal3 Metal4
Host: Mac14,14 - macOS 26.2 (25C56)
Tool: Xcode (17C52)
Known SDKs:
Topic:
Graphics & Games
SubTopic:
Metal
I have something like this drawing in an MTKView (see at bottom).
I am finding it difficult to figure out when can the Swift-land resources used in making the MTLBuffer(s) be released? Below, for example, is it ok if args goes out of scope (or is otherwise deallocated) at point 1, 2, or 3? Or perhaps even earlier, as soon as argsBuffer has been created?
I have been reading through various articles such as
Setting resource storage modes
Choosing a resource storage mode for Apple GPUs
Copying data to a private resource
but it's a lot to absorb and I haven't been really able to find an authoritative description of the required lifetime of the resources in CPU land.
I should mention that this is Metal 4 code. In previous versions of Metal, the MTLCommandBuffer had the ability to add a completion handler to be called by the GPU after it has finished running the commands in the buffer but in Metal 4 there is no such thing (it it were even needed for the purpose I am interested in).
Any advice and/or pointers to the definitive literature will be appreciated.
guard let argsBuffer = device.makeBuffer(bytes: &args,...
argumentTable.setAddress(argsBuffer.gpuAddress, ...
encoder.setArgumentTable(argumentTable, stages: .vertex)
// encode drawing
renderEncoder.draw...
...
encoder.endEncoding() // 1
commandBuffer.endCommandBuffer() // 2
commandQueue.waitForDrawable(drawable)
commandQueue.commit([commandBuffer]) // 3
commandQueue.signalDrawable(drawable)
drawable.present()
Topic:
Graphics & Games
SubTopic:
Metal
My app has a number of heterogeneous GPU workloads that all run concurrently. Some of these should be executed with the highest priority because the app’s responsiveness depends on them, while others are triggered by file imports and the like which should have a low priority. If this was running on the CPU I’d assign the former User Interactive QoS and the latter Utility QoS. Is there an equivalent to this for GPU work?
I am puzzled by the setAddress(_:attributeStride:index:) of MTL4ArgumentTable. Can anyone please explain what the attributeStride parameter is for? The doc says that it is "The stride between attributes in the buffer." but why?
Who uses this for what? On the C++ side in the shaders the stride is determined by the C++ type, as far as I know. What am I missing here?
Thanks!
Hi, I am using xCode26.x. But my Metal4 classes are not compiling. I downloaded the sample code from Apple's website - https://developer.apple.com/documentation/Metal/processing-a-texture-in-a-compute-function. For example, I am getting errors like "Cannot find protocol declaration for 'MTL4CommandQueue';
I have hit a deadline. Any recommendations are very welcome.
I have downloaded the Metal Tool chain. When I run the following commands on the terminal - xcodebuild -showComponent metalToolchain ; xcrun -f metal ; xcrun metal --version
I get the following response -
Asset Path: /System/Library/AssetsV2/com_apple_MobileAsset_MetalToolchain/86fbaf7b114a899754307896c0bfd52ffbf4fded.asset/AssetData
Build Version: 17A321
Status: installed
Toolchain Identifier: com.apple.dt.toolchain.Metal.32023
Toolchain Search Path: /Users/private/Library/Developer/DVTDownloads/MetalToolchain/mounts/86fbaf7b114a899754307896c0bfd52ffbf4fded
/Users/private/Library/Developer/DVTDownloads/MetalToolchain/mounts/86fbaf7b114a899754307896c0bfd52ffbf4fded/Metal.xctoolchain/usr/bin/metal
Apple metal version 32023.830 (metalfe-32023.830.2)
Target: air64-apple-darwin24.6.0
Thread model: posix
InstalledDir: /Users/private/Library/Developer/DVTDownloads/MetalToolchain/mounts/86fbaf7b114a899754307896c0bfd52ffbf4fded/Metal.xctoolchain/usr/metal/current/bin
I noticed that MTLPixelFormat has this cases:
case r32Float = 55
case rg32Float = 105
case rgba32Float = 125
But no case rgb32Float. What's the reason for such a discrimination?
Hi,
I would like clarification on whether the new hover effects feature introduced in vision os 26 supported pinch gestures through the psvr 2 controllers.
In your sample application, I was not able to confirm that this was working. Only pinch clicking with my hands worked. Pulling the trigger on the controller whilst looking at a 3d object did not activate the hover effect spatial event in the sample application. (The object is showing the highlight though)
This is inconsistent with hover effect behavior with psvr2 controllers on swift ui views, where the trigger press does count as a button click.
The sample I used was this one:
https://developer.apple.com/documentation/compositorservices/rendering_hover_effects_in_metal_immersive_apps
Hey I'm using the CIDepthBlurEffect Core Image Filter in my app. It seems to work ok but I get these errors in the console when calling the class.
CoreImage Metal library does not contain function for name: sparserendering_xhlrb_scan
CoreImage Metal library does not contain function for name: sparserendering_xhlrb_diffuse
CoreImage Metal library does not contain function for name: sparserendering_xhlrb_copy_back
CoreImage Metal library does not contain function for name: plain_or_sRGB_copy
Am I missing some sort of import to gain these Metal functions? I am using my own custom shaders but I assume you'd be able to use them along side the built in ones.
Is the pseudocode below thread-safe? Imagine that the Main thread sets the CAMetalLayer's drawableSize to a new size meanwhile the rendering thread is in the middle of rendering into an existing MTLDrawable which does still have the old size.
Is the change of metalLayer.drawableSize thread-safe in the sense that I can present an old MTLDrawable which has a different resolution than the current value of metalLayer.drawableSize? I assume that setting the drawableSize property informs Metal that the next MTLDrawable offered by the CAMetalLayer should have the new size, right?
Is it valid to assume that "metalLayer.drawableSize = newSize" and "metalLayer.nextDrawable()" are internally synchronized, so it cannot happen that metalLayer.nextDrawable() would produce e.g. a MTLDrawable with the old width but with the new height (or a completely invalid resolution due to potential race conditions)?
func onWindowResized(newSize: CGSize) {
// Called on the Main thread
metalLayer.drawableSize = newSize
}
func onVsync(drawable: MTLDrawable) {
// Called on a background rendering thread
renderer.renderInto(drawable: drawable)
}
We set the CVDisplayLink on macOS to 0 or 120, and get the following. This then clamps maximum refresh to 60Hz on the 120Hz ProMotion display on a MBP M2 Max laptop. How is this not fixed in 4 macOS releases?
CoreVideo: currentVBLDelta returned 200000 for display 1 -- ignoring unreasonable value
CoreVideo: [0x7fe2fb816020] Bad CurrentVBLDelta for display 1 is zero. defaulting to 60Hz.
The maximumExtendedDynamicRangeColorComponentValue should provide some value between 1.0 and maximumPotentialExtendedDynamicRangeColorComponentValue depending on the available EDR headroom if there is any content on-screen that uses EDR.
This works fine in most scenarios but in macOS 26 Tahoe (including in 26.2) this seemingly breaks down when a third party external display is in HDR mode and the Mac goes to sleep and wakes up. After wake only a value of 1.0 is provided by the third party external display's NSScreen object, no matter what (although when the SDR peak brightness is being changed using the brightness slider, didChangeScreenParametersNotification is firing and the system should provide a proper updated headroom value). This makes dynamic tone-mapping that adapts to actual screen brightness impossible.
Everything works fine in Sequoia. In Tahoe the user needs to turn off HDR, then go through a sleep/wake cycle and turn HDR back on to have this fixed, which is obviously not a sustainable workaround.
Hi all,
I'm encountering an issue with Metal raytracing on my M5 MacBook Pro regarding Instance Acceleration Structure (IAS).
Intersection tests suddenly stop working after a certain point in the sampling loop.
Situation
I implemented an offline GPU path tracer that runs the same kernel multiple times per pixel (sampleCount) using metal::raytracing.
Intersection tests are performed using an IAS.
Since this is an offline path tracer, geometries inside the IAS never changes across samples (no transforms or updates).
As sampleCount increases, there comes a point where the number of intersections drops to zero, and remains zero for all subsequent samples.
Here's a code sketch:
let sampleCount: UInt16 = 1024
for sampleIndex: UInt16 in 0..<sampleCount {
// ...
do {
let commandBuffer = commandQueue.makeCommandBuffer()
// Dispatch the intersection kernel.
await commandBuffer.completed()
}
do {
let commandBuffer = commandQueue.makeCommandBuffer()
// Use the intersection test results from the previous command buffer.
await commandBuffer.completed()
}
// ...
}
kernel void intersectAlongRay(
const metal::uint32_t threadIndex [[thread_position_in_grid]],
// ...
const metal::raytracing::instance_acceleration_structure accelerationStructure [[buffer(2)]],
// ...
)
{
// ...
const auto result = intersector.intersect(ray, accelerationStructure);
switch (result.type) {
case metal::raytracing::intersection_type::triangle: {
// Write intersection result to device buffers.
break;
}
default:
break;
}
Observations
Encoding both the intersection kernel and the subsequent result usage in the same command buffer does not resolve the problem.
Switching from IAS to Primitive Acceleration Structure (PAS) fixes the problem.
Rebuilding the IAS for each sample also resolves the issue.
Intersections produce inconsistent results even though the IAS and rays are identical — Image 1 shows a hit, while Image 2 shows a miss.
Questions
Am I misusing IAS in some way ?
Could this be a Metal bug ?
Any guidance or confirmation would be greatly appreciated.
I work on a Qt/QML app that uses Esri Maps SDK for Qt and that is deployed to both Windows and iPads. With a recent iPad OS upgrade to 26.1, many iPad users are reporting the application freezing after panning and/or identifying features in the map. It runs fine for our Windows users.
I was able to reproduce this and grabbed the following error messages when the freeze happens:
IOGPUMetalError: Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)
IOGPUMetalError: Invalid Resource (00000009:kIOGPUCommandBufferCallbackErrorInvalidResource)
Environment:
Qt 6.5.4 (Qt for iOS)
Esri Maps SDK for Qt 200.3
iPadOS 26.1
Because it appears to be a Metal error, I tried using OpenGL (Qt offers a way to easily set hte target graphics api):
QQuickWindow::setGraphicsApi(QSGRendererInterface::GraphicsApi::OpenGL)
Which worked! No more freezing. But I'm seeing many posts that OpenGL has been deprecated by Apple.
I've seen posts that Apple deprecated OpenGL ES. But it seems to still be available with iPadOS 26.1. If so, will this fix (above) just cause problems with a future iPadOS update?
Any other suggestions to address this issue? Upgrading our version of Qt + Esri SDK to the latest version is not an option for us. We are in the process to upgrade the full application, but it is a year or two out. So, we just need a fix to buy us some time for now.
Appreciate any thoughts/insights....
View Layout
Add the following views in a view controller:
Label
View A, with a subview of the same size: MTKView A
View B, with a subview of the same size: MTKView B
Refresh Rates of Each View
The label view refreshes at 60fps (driven by CADisplayLink).
MTKView A and B refresh at 15fps.
MTKView Implementation Details
The corresponding CAMetalLayer's maximumDrawableCount is set to 2, changed to double buffering.
The scheduling mechanism is modified; drawing is not driven by the internal loop but is done manually. The draw call is triggered immediately upon receiving a frame.
self.metalView.enableSetNeedsDisplay = NO;
self.metalView.paused = YES;
A new high-priority queue is created for drawing, instead of handling it on the main queue.
MTKView Latency Tracking
The GPU completion time T1 is observed through the addCompletedHandler callback of the CommandBuffer.
The presentation time T2 of the frame is observed through the addPresentedHandler callback of the currentDrawable in MTKView.
Testing shows that T2 - T1 > 16.6ms (the Vsync period at 60Hz). This means that after the GPU rendering in MTLView is finished, the frame is not actually displayed at the next Vsync instruction but only at the Vsync instruction after that.
I believe there is an extra 16.6ms of latency here, which I want to eliminate by adjusting the rendering mechanism.
Observation from Instruments
From Instruments, the Surface presentation aligns with the above test results. After the Metal encoder finishes, the Surface in Display switches only after the next-next Vsync instruction. See the image in the link for details.
Questions
According to a beginner's understanding, after MTKView's GPU rendering is finished, the next Vsync instruction should officially display (make it visible). However, this is not what is observed. Does the subview MTKView need to wait for another Vsync cycle to be drawn to the actual display buffer?
The label updates its text at 60fps, so the entire interface should be displayed at 60fps. Is the content of MTKView not synchronized when the display happens?
Explanation of the Reasoning Behind Some MTKView Code Details
Changing from the default triple buffering to double buffering helps reduce the latency introduced by rendering.
Not using MTKView's own scheduling mechanism but using manual triggering of the draw method is because MTKView's own scheduling mechanism is driven by CADisplayLink. Therefore, if a frame falls within a Vsync window, it needs to wait for the next Vsync window to trigger the draw operation, which introduces waiting latency.
Deterministic RNG behaviour across Mac M1 CPU and Metal GPU – BigCrush pass & structural diagnostics
Hello,
I am currently working on a research project under ENINCA Consulting, focused on advanced diagnostic tools for pseudorandom number generators (structural metrics, multi-seed stability, cross-architecture reproducibility, and complementary indicators to TestU01).
To validate this diagnostic framework, I prototyped a small non-linear 64-bit PRNG (not as a goal in itself, but simply as a vehicle to test the methodology).
During these evaluations, I observed something interesting on Apple Silicon (Mac M1): • bit-exact reproducibility between M1 ARM CPU and M1 Metal GPU, • full BigCrush pass on both CPU and Metal backends, • excellent p-values, • stable behaviour across multiple seeds and runs.
This was not the intended objective, the goal was mainly to validate the diagnostic concepts, but these results raised some questions about deterministic compute behaviour in Metal.
My question: Is there any official guidance on achieving (or expecting) deterministic RNG or compute behaviour across CPU ↔ Metal GPU on Apple Silicon? More specifically:
• Are deterministic compute kernels expected or guaranteed on Metal for scientific workloads?
• Are there recommended patterns or best practices to ensure reproducibility across GPU generations (M1 → M2 → M3 → M4)? • Are there known Metal features that can introduce non-determinism?
I am not sharing the internal recurrence (this work is proprietary), but I can discuss the high-level diagnostic observations if helpful.
Thank you for any insight, very interested in how the Metal engineering team views deterministic compute patterns on Apple Silicon.
Pascal ENINCA Consulting
Topic:
Graphics & Games
SubTopic:
Metal
Deterministic RNG behaviour across Mac M1 CPU and Metal GPU – BigCrush pass & structural diagnostics
Hello,
I am currently working on a research project under ENINCA Consulting, focused on advanced diagnostic tools for pseudorandom number generators (structural metrics, multi-seed stability, cross-architecture reproducibility, and complementary indicators to TestU01).
To validate this diagnostic framework, I prototyped a small non-linear 64-bit PRNG (not as a goal in itself, but simply as a vehicle to test the methodology).
During these evaluations, I observed something interesting on Apple Silicon (Mac M1):
• bit-exact reproducibility between M1 ARM CPU and M1 Metal GPU,
• full BigCrush pass on both CPU and Metal backends,
• excellent p-values,
• stable behaviour across multiple seeds and runs.
This was not the intended objective, the goal was mainly to validate the diagnostic concepts, but these results raised some questions about deterministic compute behaviour in Metal.
My question: Is there any official guidance on achieving (or expecting) deterministic RNG or compute behaviour across CPU ↔ Metal GPU on Apple Silicon? More specifically:
• Are deterministic compute kernels expected or guaranteed on Metal for scientific workloads?
• Are there recommended patterns or best practices to ensure reproducibility across GPU generations (M1 → M2 → M3 → M4)?
• Are there known Metal features that can introduce non-determinism?
I am not sharing the internal recurrence (this work is proprietary), but I can discuss the high-level diagnostic observations if helpful.
Thank you for any insight, very interested in how the Metal engineering team views deterministic compute patterns on Apple Silicon.
Pascal ENINCA Consulting
Topic:
Graphics & Games
SubTopic:
Metal
Tags:
ML Compute
Metal
Metal Performance Shaders
Apple Silicon
Hi,
I’m testing Unity’s Spaceship HDRP demo on iPhone 17 Pro Max and iPad Pro M4 (iOS 26.1).
Everything renders correctly, and my custom MetalFX Spatial plugin initializes successfully — it briefly reports active scaling (e.g. 1434×660 → 2868×1320 at 50% scaling), then reverts to native rendering a few frames later.
Setup:
Xcode 16.1 (targeting iOS 18)
Unity 2022.3.62f3 (HDRP)
Metal backend
Dynamic Resolution enabled in HDRP assets and cameras
Relevant Xcode console excerpt:
[MetalFXPlugin] MetalFX_Enable(True) called.
[SpaceshipOptions] MetalFX enabled with HDRP dynamic resolution integration.
[SpaceshipOptions] Disabled TAA for MetalFX Spatial.
[SpaceshipOptions] Created runtime RenderTexture: 1434x660
[MetalFX] Spatial scaler created (1434x660 → 2868x1320).
[MetalFX] Processed frame with scaler.
[MetalFXPlugin] Sent RenderTexture (1434x660) to MetalFX. Output target 2868x1320.
[SpaceshipOptions] MetalFX target set: 1434x660
[SpaceshipOptions] Camera targetTexture cleared after MetalFX handoff.
It looks like HDRP clears the camera’s target texture right after MetalFX submits the frame, which causes it to revert to native rendering.
Is there a recommended way to persist or rebind the MetalFX output texture when using HDRP on iOS?
Unity doesn’t appear to support MetalFX in the Editor either:
Thanks!