I have a Core Image filter in my app that uses Metal. I cannot compile it because it complains that the executable tool metal is not available, but I have installed it in Xcode.
If I go to the "Components" section of Xcode Settings, it shows it as downloaded. And if I run the suggested command, it also shows it as installed. Any advice?
Xcode Version
Version 26.0 beta (17A5241e)
Build Output
Showing All Errors Only
Build target Lessons of project StudyJapanese with configuration Light
RuleScriptExecution /Users/chris/Library/Developer/Xcode/DerivedData/StudyJapanese-glbneyedpsgxhscqueifpekwaofk/Build/Intermediates.noindex/StudyJapanese.build/Light-iphonesimulator/Lessons.build/DerivedSources/OtsuThresholdKernel.ci.air /Users/chris/Code/SerpentiSei/Shared/iOS/CoreImage/OtsuThresholdKernel.ci.metal normal undefined_arch (in target 'Lessons' from project 'StudyJapanese')
cd /Users/chris/Code/SerpentiSei/StudyJapanese
/bin/sh -c xcrun\ metal\ -w\ -c\ -fcikernel\ \"\$\{INPUT_FILE_PATH\}\"\ -o\ \"\$\{SCRIPT_OUTPUT_FILE_0\}\"'
'
error: error: cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain
/Users/chris/Code/SerpentiSei/StudyJapanese/error:1:1: cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain
Build failed 6/9/25, 8:31 PM 27.1 seconds
Result of xcodebuild -downloadComponent MetalToolchain (after switching Xcode-beta.app with xcode-select)
xcodebuild -downloadComponent MetalToolchain
Beginning asset download...
Downloaded asset to: /System/Library/AssetsV2/com_apple_MobileAsset_MetalToolchain/4d77809b60771042e514cfcf39662c6d1c195f7d.asset/AssetData/Restore/022-19457-035.dmg
Done downloading: Metal Toolchain (17A5241c).
Screenshots from Xcode
Result of "Copy Information"
Metal Toolchain 26.0 [com.apple.MobileAsset.MetalToolchain: 17.0 (17A5241c)] (Installed)
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Hi, developers,
I maintain a shipped app that uses string concatenation to construct Metal shader and compile on-device. Beta 4 seems disabled __asm keyword, resulting the compilation failure.
The error is:
v2/GEMMKernel.cpp:229: error: program_source:23:9: error: illegal string literal in 'asm'
__asm("air.simdgroup_async_copy_1d.p3i8.p1i8");
The relevant code is available at https://github.com/liuliu/ccv/blob/unstable/lib/nnc/mfa/v2/GEMMHeaders.cpp#L30 although any __asm will trip this.
Please give us guidance on whether this is a regression or this will be something enforced in 26 release. Personally, I would consider this as a bug given it won't impact anything "compiled" shaders.
Thanks for your patience reading this!
So I've been trying out GPTK with Elite Dangerous Horizons game and it looks like from what I can tell. The VRAM keeps going up until it goes over the limit where it drops the FPS to 1-3 FPS and then crashes the game. From the Performance HUD I can see that it looks like when using GPTK, the VRAM usage just keeps climbing and I never saw it drop down at all. I did some limited testing, and from that I think I can conclude that it is probably not a VRAM leak, but it might be caching it. The reason for this is because I noticed that if I went back to the area that I've been before. It won't increase the VRAM usage.
So either there is something wrong with the freeing VRAM memory part, or it could be that GPTK might not be reporting the right amount of VRAM available to use? So maybe that's why it keeps allocating VRAM until it went out of memory and crashed the game.
Just to test, I did try running the game with DXVK+MoltenVK combo, and I can see that it works just fine. VRAM is being freed up when it's no longer used.
Is this a known issue in some games?
Hello,
I recently watched the WWDC2025 session titled “Combine Metal 4 machine learning and graphics” (https://developer.apple.com/videos/play/wwdc2025/262/ ), and I’m very excited about the new Metal 4 features that integrate machine learning with graphics—such as neural ambient occlusion, shader-based ML inference, and the use of MTLTensor and MTL4MachineLearningCommandEncoder.
While the session includes helpful code snippets and a compelling debug demo (e.g., the neural ambient occlusion example), the implementation details are not fully shown, and I haven’t been able to find a complete, runnable sample project that demonstrates end-to-end integration of ML and rendering in Metal 4.
Would Apple be able to provide a full, working example—such as an Xcode project—that shows how to:
Export a model to an .mlpackage,
Convert it to an .mtlpackage,
Use MTL4MachineLearningCommandEncoder alongside render passes,
Or embed small neural networks directly in shaders using Shader ML?
Having such a sample would greatly help developers like me adopt these powerful new capabilities correctly and efficiently.
Thank you very much for your time and support!
Best regards,
My goal is to print a debug message from a shader. I follow the guide that orders to set -fmetal-enable-logging metal compiler flag and following environment variables:
MTL_LOG_LEVEL=MTLLogLevelDebug
MTL_LOG_BUFFER_SIZE=2048
MTL_LOG_TO_STDERR=1
However there's an issue with the guide, it's only covers Xcode project setup, however I'm working on a Swift Package. It has a Metal-only target that's included into main target like this:
targets: [
// A separate target for shaders.
.target(
name: "MetalShaders",
resources: [
.process("Metal")
],
plugins: [
// https://github.com/schwa/MetalCompilerPlugin
.plugin(name: "MetalCompilerPlugin", package: "MetalCompilerPlugin")
]
),
// Main target
.target(
name: "MegApp",
dependencies: ["MetalShaders"]
),
.testTarget(
name: "MegAppTests",
dependencies: [
"MegApp",
"MetalShaders",
]
]
So to apply compiler flag I use MetalCompilerPlugin which emits debug.metallib, it also allows to define DEBUG macro for shaders. This code compiles:
#ifdef DEBUG
logger.log_error("Hello There!");
os_log_default.log_debug("Hello thread: %d", gid);
// this proves that code exectutes
result.flag = true;
#endif
Environment is set via .xctestplan and valideted to work with ProcessInfo. However, nothing is printed to Xcode console nor to Console app.
In attempt to fix it I'm trying to setup a MTLLogState, however the makeLogState(descriptor:) fails with error:
if #available(iOS 18.0, *) {
let logDescriptor = MTLLogStateDescriptor()
logDescriptor.level = .debug
logDescriptor.bufferSize = 2048
// Error Domain=MTLLogStateErrorDomain Code=2 "Cannot create residency set for MTLLogState: (null)" UserInfo={NSLocalizedDescription=Cannot create residency set for MTLLogState: (null)}
let logState = try! device.makeLogState(descriptor: logDescriptor)
commandBufferDescriptor.logState = logState
}
Some LLMs suggested that this is connected with Simulator, and truly, I run the tests on simulator. However tests don't want to run on iPhone... I found solution running them on My Mac (Mac Catalyst). Surprisingly descriptor log works there, even without MTLLogState. But the Simulator behaviour seems like a bug...
[CRITICAL] Metal API Memory Leak - Heap Memory Never Released to OS (CWE-400)
Security Classification
This issue constitutes a resource exhaustion vulnerability (CWE-400):
Aspect
Details
Type
Uncontrolled Resource Consumption
CWE
CWE-400
Vector
Local (any Metal application)
Impact
System instability, denial of service
User Control
None - no mitigation available
Recovery
Requires application restart
Summary
Metal heap allocations are never released back to macOS, even when the memory is entirely unused. This causes continuous, unbounded memory growth until system instability or crash. The issue affects any application using Metal API heap allocation.
This was discovered in Unreal Engine 5, but reproduces in a completely blank UE5 project with zero application code - confirming this is Metal framework behavior, not application-level.
Environment
OS: macOS Tahoe 26.2
Hardware: Apple Silicon M4 Max (also reproduced on M1, M2, M3)
API: Metal
Reproduction Steps
Run any Metal application that allocates and deallocates GPU buffers via Metal heaps
Open Activity Monitor and observe the application's memory usage
Let the application run idle (no user interaction required)
Observe memory growing continuously at ~1-2 MB per second
Memory never plateaus or stabilizes
Eventually system becomes unstable
For testing: Any Unreal Engine 5.4+ project on macOS will reproduce this. Even a blank project with no gameplay code exhibits the leak. (Tested on UE 5.7.1)
Observed Behavior
Memory Analysis
Using Unreal's memreport -full command, two reports taken 86 seconds apart:
Metric
Report 1 (183s)
Report 2 (269s)
Delta
Process Physical
4373.64 MB
4463.39 MB
+89.75 MB
Metal Heap Buffer
7168 MB
8192 MB
+1024 MB
Unused Heap
3453 MB
4477 MB
+1024 MB
Object Count
73,840
73,840
0 (no change)
Key Finding
Metal Heap grew by exactly 1 GB while "Unused Heap" also grew by 1 GB. This demonstrates:
Metal is allocating new heap blocks in ~1 GB increments
Previously allocated heap memory becomes "unused" but is never released
The unused memory accumulates indefinitely
No application-level objects are leaking (count remains constant)
Memory Growth Pattern
Continuous growth while idle (no user interaction)
Growth rate: approximately 1-2 MB per second
No plateau or stabilization occurs
Metal allocates new 1 GB heap blocks rather than reusing freed space
Eventually leads to system instability and crash
What is NOT Causing This
We verified the following are NOT the source:
Application objects - Object count remains constant
Application code - Blank project with no code reproduces the issue
Texture streaming - Disabling texture streaming had no effect
CPU garbage collection - Running GC has no effect (this is GPU memory)
Mitigations Attempted (None Worked)
setPurgeableState
Setting resources to purgeable state before release:
[buffer setPurgeableState:MTLPurgeableStateEmpty];
Result: Metal ignores this hint and does not reclaim heap memory.
Avoiding Heap Pooling
Forcing individual buffer allocations instead of heap-based pooling.
Result: Leak persists - Metal still manages underlying allocations.
Aggressive Buffer Compaction
Attempting to compact/defragment buffers within heaps every frame.
Result: Only moves data between existing heaps. Does NOT release heaps back to OS.
Reducing Pool Sizes
Minimizing all buffer pool sizes to force more frequent reuse.
Result: Slightly slows the leak rate but does not stop it.
Root Cause Analysis
How Metal Heap Allocation Appears to Work
Metal allocates GPU heap blocks in large chunks (~1 GB observed)
Application requests buffers from these heaps
When application releases buffers, memory becomes "unused" within the heap
Metal does NOT release heap blocks back to macOS, even when entirely unused
When fragmentation prevents reuse, Metal allocates new heap blocks
Result: Continuous memory growth with no upper bound
The Core Problem
There appears to be no Metal API to force heap memory release. The only way to reclaim this memory is to destroy the Metal device entirely, which requires restarting the application.
Expected Behavior
Metal should:
Release unused heaps - When a heap block is entirely unused, release it back to macOS
Respect purgeable hints - Honor setPurgeableState calls from applications
Compact allocations - Defragment heap allocations to reduce fragmentation
Provide control APIs - Allow applications to request heap compaction or release
Enforce limits - Have configurable maximum heap memory consumption
Security Implications
Local Denial of Service - Any Metal application can exhaust system memory, causing instability affecting all running applications
Memory Pressure Attack - Forces other applications to swap to disk, degrading system-wide performance
No Upper Bound - Memory consumption continues until system failure
Unmitigable - End users have no way to prevent or limit the leak
Affects All Metal Apps - Any application using Metal heaps is potentially affected
Impact
Applications become unstable after extended use
System-wide performance degrades as memory pressure increases
Users must periodically restart applications
Developers cannot work around this at the application level
Long-running applications (games, creative tools, servers) are particularly affected
Request
Investigate Metal heap memory management behavior
Implement heap release when blocks become entirely unused
Honor setPurgeableState hints from applications
Consider providing an API for applications to request heap compaction
Document any intended behavior or workarounds
Additional Notes
This issue has been observed across multiple Unreal Engine versions (5.4, 5.7) and multiple Apple Silicon generations (M1 through M4). The behavior is consistent and reproducible.
The Unreal Engine team has implemented various CVars to attempt mitigation (rhi.Metal.HeapBufferBytesToCompact, rhi.Metal.ResourcePurgeInPool, etc.) but none successfully address the issue because the root cause is at the Metal framework level.
Tested: January 2026
Platform: macOS Tahoe 26.2, Apple Silicon (M1/M2/M3/M4)
I noticed that when the render command encoder adds no draw calls an apps memory usage seems to grow unboundedly. Using a super simple MTKView-based drawing with the following delegate (code at end).
If I add the simplest of draw calls, e.g., a single vertex, the app's memory usage is normal, around 100-ish MBs.
I am attaching a couple screenshot, one from Xcode and one from Instruments.
What's going on here? Is this an illegal program? If yes, why does it not crash, such as if the encode or command buffer weren't ended.
Or is there some race condition at play here due to the lack of draws?
class Renderer: NSObject, MTKViewDelegate {
var device: MTLDevice
var commandQueue: MTL4CommandQueue
var commandBuffer: MTL4CommandBuffer
var allocator: MTL4CommandAllocator
override init() {
guard let d = MTLCreateSystemDefaultDevice(),
let queue = d.makeMTL4CommandQueue(),
let cmdBuffer = d.makeCommandBuffer(),
let alloc = d.makeCommandAllocator()
else {
fatalError("unable to create metal 4 objects")
}
self.device = d
self.commandQueue = queue
self.commandBuffer = cmdBuffer
self.allocator = alloc
super.init()
}
func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {}
func draw(in view: MTKView) {
guard let drawable = view.currentDrawable else { return }
commandBuffer.beginCommandBuffer(allocator: allocator)
guard let descriptor = view.currentMTL4RenderPassDescriptor,
let encoder = commandBuffer.makeRenderCommandEncoder(
descriptor: descriptor
)
else {
fatalError("unable to create encoder")
}
encoder.endEncoding()
commandBuffer.endCommandBuffer()
commandQueue.waitForDrawable(drawable)
commandQueue.commit([commandBuffer])
commandQueue.signalDrawable(drawable)
drawable.present()
}
}
Topic:
Graphics & Games
SubTopic:
Metal
The title is self-exploratory. I wasn't able to find the CAMetalDisplayLink on the most recent metal-cpp release (metal-cpp_macOS15_iOS18-beta). Are there any plans to include it in the next release?
I'm trying to use MTLBinaryArchive. I collected a BinaryArchive from one device and used metal-tt to translate it for all supported iPhone devices, ranging from iPhone 7 Plus to iPhone 16.
However, this BinaryArchive is quite large, around 1.5GB uncompressed, and about 500MB compressed in the IPA. I'm wondering how to address the size issue.
I watched the WWDC 2022 video, which mentioned that the operating system or app installation process would handle compatibility. Does this compatibility support different GPU chips? I tried installing an IPA with a BinaryArchive collected only from an iPhone 12 on an iPhone 13, but the BinaryArchive didn't take effect.
I also saw that Apple supports App Thinning. However, it seems that resources in the Asset Catalog cannot be accessed via URL, and creating an MTLBinaryArchive requires a URL. Is it possible for MTLBinaryArchive to be distributed through App Thinning?
The WWDC 2022 video also mentioned using the -Os optimization flag to reduce size. Can this give an estimate of how much compression it would achieve? Are there any methods to solve the BinaryArchive size issue without impacting performance?
Topic:
Graphics & Games
SubTopic:
Metal
Hi,
I am working with a large project. We are compiling each material to its own .metallib. They all include many common files full of inline functions. Finally we link it all together at the end with a single big pathtrace kernel. Everything works as expected, however the compile times have gotten completely out of hand and it takes multiple minutes to compile at runtime (to native code). I have gathered that I can do this offline by using metal-tt however if I am wondering if there is a way to reduce the compile times in such a scenario, and how to investigate what the root cause of the problem is. I suspect it could have to do with the fact that every materials metallib contains duplications of all the inline functions. Any ideas on how to profile and debug this?
Thanks,
Rasmus
Hi everyone,
This project uses PyTorch on an Apple Silicon Mac (M1/M2/etc.), and the goal is to use the MPS backend for GPU acceleration, notes Apple Developer. However, the workflow depends on Float64 (double-precision) floating-point numbers for certain computations, notes PyTorch Forums.
The error "Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead" has been encountered, notes GitHub. It seems that the MPS backend doesn't currently support Float64 for direct GPU computation.
Questions for the community:
Are there any known workarounds or best practices for handling Float64-dependent operations when using the MPS backend with PyTorch?
For those working with high-precision tasks on Apple Silicon, what strategies are being used to balance performance with the need for Float64?
Offloading to the CPU is an option, and it's of interest to know if there are any specific techniques or libraries within the Apple ecosystem that could streamline this process while aiming for optimal performance.
Any insights, tips, or experiences would be appreciated.
Thanks in advance,
Jonaid
MacBook Pro M3 Max
Topic:
Graphics & Games
SubTopic:
Metal
Just wondering if anyone knows what it will take to hit greater than 60hz when targeting iPhone. If I set the preferredFramesPerSecond of an MTKView to 120, it works on the iPad, but on iPhone it never goes over 60hz, even with a simple hello triangle sample app... is this a limitation of targeting iPhone?
Topic:
Graphics & Games
SubTopic:
Metal
I am trying to learn the new Metal Peformance Primitives APIs. I have added the MetalPeformancePrimitives framework and included the header in my shader code as per documentation
#include <MetalPeformancePrimitives/MetalPeformancePrimitives.h>
Unfortunately, Xcode complains that the header cannot be found. How do I include it properly?
I am using Xcode 26 on Tahoe. The MetalPeformancePrimitives framework is present on my machine and I can inspect the headers in the filesystem.
Topic:
Graphics & Games
SubTopic:
Metal
I'm updating our app to support metal 4, but the metal 4 types don't seem to get recognized when targeting simulator. Is it known if metal 4 will be supported in the near future, or am I setting up the app wrong?
subj
And how in this case are beautiful system dials made with smoke effects and other particles?
Topic:
Graphics & Games
SubTopic:
Metal
Hi, I'm Beginner with Metal 4 and Model I/O 🥺.
I can render simple models with just one mesh, but when I try to render models with SubMeshes, nothing shows up on screen.
Can anyone help me figure out how to properly render models with multiple submeshes? I think I'm not iterating through them correctly or maybe missing some buffers setup.
Here's what I have so far:
https://www.icloud.com.cn/iclouddrive/0a6x_NLwlWy-herPocExZ8g3Q#LoadModel
Hi all,
I'm encountering an issue with Metal raytracing on my M5 MacBook Pro regarding Instance Acceleration Structure (IAS).
Intersection tests suddenly stop working after a certain point in the sampling loop.
Situation
I implemented an offline GPU path tracer that runs the same kernel multiple times per pixel (sampleCount) using metal::raytracing.
Intersection tests are performed using an IAS.
Since this is an offline path tracer, geometries inside the IAS never changes across samples (no transforms or updates).
As sampleCount increases, there comes a point where the number of intersections drops to zero, and remains zero for all subsequent samples.
Here's a code sketch:
let sampleCount: UInt16 = 1024
for sampleIndex: UInt16 in 0..<sampleCount {
// ...
do {
let commandBuffer = commandQueue.makeCommandBuffer()
// Dispatch the intersection kernel.
await commandBuffer.completed()
}
do {
let commandBuffer = commandQueue.makeCommandBuffer()
// Use the intersection test results from the previous command buffer.
await commandBuffer.completed()
}
// ...
}
kernel void intersectAlongRay(
const metal::uint32_t threadIndex [[thread_position_in_grid]],
// ...
const metal::raytracing::instance_acceleration_structure accelerationStructure [[buffer(2)]],
// ...
)
{
// ...
const auto result = intersector.intersect(ray, accelerationStructure);
switch (result.type) {
case metal::raytracing::intersection_type::triangle: {
// Write intersection result to device buffers.
break;
}
default:
break;
}
Observations
Encoding both the intersection kernel and the subsequent result usage in the same command buffer does not resolve the problem.
Switching from IAS to Primitive Acceleration Structure (PAS) fixes the problem.
Rebuilding the IAS for each sample also resolves the issue.
Intersections produce inconsistent results even though the IAS and rays are identical — Image 1 shows a hit, while Image 2 shows a miss.
Questions
Am I misusing IAS in some way ?
Could this be a Metal bug ?
Any guidance or confirmation would be greatly appreciated.
I work on a Qt/QML app that uses Esri Maps SDK for Qt and that is deployed to both Windows and iPads. With a recent iPad OS upgrade to 26.1, many iPad users are reporting the application freezing after panning and/or identifying features in the map. It runs fine for our Windows users.
I was able to reproduce this and grabbed the following error messages when the freeze happens:
IOGPUMetalError: Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)
IOGPUMetalError: Invalid Resource (00000009:kIOGPUCommandBufferCallbackErrorInvalidResource)
Environment:
Qt 6.5.4 (Qt for iOS)
Esri Maps SDK for Qt 200.3
iPadOS 26.1
Because it appears to be a Metal error, I tried using OpenGL (Qt offers a way to easily set hte target graphics api):
QQuickWindow::setGraphicsApi(QSGRendererInterface::GraphicsApi::OpenGL)
Which worked! No more freezing. But I'm seeing many posts that OpenGL has been deprecated by Apple.
I've seen posts that Apple deprecated OpenGL ES. But it seems to still be available with iPadOS 26.1. If so, will this fix (above) just cause problems with a future iPadOS update?
Any other suggestions to address this issue? Upgrading our version of Qt + Esri SDK to the latest version is not an option for us. We are in the process to upgrade the full application, but it is a year or two out. So, we just need a fix to buy us some time for now.
Appreciate any thoughts/insights....
I have something like this drawing in an MTKView (see at bottom).
I am finding it difficult to figure out when can the Swift-land resources used in making the MTLBuffer(s) be released? Below, for example, is it ok if args goes out of scope (or is otherwise deallocated) at point 1, 2, or 3? Or perhaps even earlier, as soon as argsBuffer has been created?
I have been reading through various articles such as
Setting resource storage modes
Choosing a resource storage mode for Apple GPUs
Copying data to a private resource
but it's a lot to absorb and I haven't been really able to find an authoritative description of the required lifetime of the resources in CPU land.
I should mention that this is Metal 4 code. In previous versions of Metal, the MTLCommandBuffer had the ability to add a completion handler to be called by the GPU after it has finished running the commands in the buffer but in Metal 4 there is no such thing (it it were even needed for the purpose I am interested in).
Any advice and/or pointers to the definitive literature will be appreciated.
guard let argsBuffer = device.makeBuffer(bytes: &args,...
argumentTable.setAddress(argsBuffer.gpuAddress, ...
encoder.setArgumentTable(argumentTable, stages: .vertex)
// encode drawing
renderEncoder.draw...
...
encoder.endEncoding() // 1
commandBuffer.endCommandBuffer() // 2
commandQueue.waitForDrawable(drawable)
commandQueue.commit([commandBuffer]) // 3
commandQueue.signalDrawable(drawable)
drawable.present()
Topic:
Graphics & Games
SubTopic:
Metal
Is the pseudocode below thread-safe? Imagine that the Main thread sets the CAMetalLayer's drawableSize to a new size meanwhile the rendering thread is in the middle of rendering into an existing MTLDrawable which does still have the old size.
Is the change of metalLayer.drawableSize thread-safe in the sense that I can present an old MTLDrawable which has a different resolution than the current value of metalLayer.drawableSize? I assume that setting the drawableSize property informs Metal that the next MTLDrawable offered by the CAMetalLayer should have the new size, right?
Is it valid to assume that "metalLayer.drawableSize = newSize" and "metalLayer.nextDrawable()" are internally synchronized, so it cannot happen that metalLayer.nextDrawable() would produce e.g. a MTLDrawable with the old width but with the new height (or a completely invalid resolution due to potential race conditions)?
func onWindowResized(newSize: CGSize) {
// Called on the Main thread
metalLayer.drawableSize = newSize
}
func onVsync(drawable: MTLDrawable) {
// Called on a background rendering thread
renderer.renderInto(drawable: drawable)
}