I have a Core Image filter in my app that uses Metal. I cannot compile it because it complains that the executable tool metal is not available, but I have installed it in Xcode.
If I go to the "Components" section of Xcode Settings, it shows it as downloaded. And if I run the suggested command, it also shows it as installed. Any advice?
Xcode Version
Version 26.0 beta (17A5241e)
Build Output
Showing All Errors Only
Build target Lessons of project StudyJapanese with configuration Light
RuleScriptExecution /Users/chris/Library/Developer/Xcode/DerivedData/StudyJapanese-glbneyedpsgxhscqueifpekwaofk/Build/Intermediates.noindex/StudyJapanese.build/Light-iphonesimulator/Lessons.build/DerivedSources/OtsuThresholdKernel.ci.air /Users/chris/Code/SerpentiSei/Shared/iOS/CoreImage/OtsuThresholdKernel.ci.metal normal undefined_arch (in target 'Lessons' from project 'StudyJapanese')
cd /Users/chris/Code/SerpentiSei/StudyJapanese
/bin/sh -c xcrun\ metal\ -w\ -c\ -fcikernel\ \"\$\{INPUT_FILE_PATH\}\"\ -o\ \"\$\{SCRIPT_OUTPUT_FILE_0\}\"'
'
error: error: cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain
/Users/chris/Code/SerpentiSei/StudyJapanese/error:1:1: cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain
Build failed 6/9/25, 8:31 PM 27.1 seconds
Result of xcodebuild -downloadComponent MetalToolchain (after switching Xcode-beta.app with xcode-select)
xcodebuild -downloadComponent MetalToolchain
Beginning asset download...
Downloaded asset to: /System/Library/AssetsV2/com_apple_MobileAsset_MetalToolchain/4d77809b60771042e514cfcf39662c6d1c195f7d.asset/AssetData/Restore/022-19457-035.dmg
Done downloading: Metal Toolchain (17A5241c).
Screenshots from Xcode
Result of "Copy Information"
Metal Toolchain 26.0 [com.apple.MobileAsset.MetalToolchain: 17.0 (17A5241c)] (Installed)
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Hi, developers,
I maintain a shipped app that uses string concatenation to construct Metal shader and compile on-device. Beta 4 seems disabled __asm keyword, resulting the compilation failure.
The error is:
v2/GEMMKernel.cpp:229: error: program_source:23:9: error: illegal string literal in 'asm'
__asm("air.simdgroup_async_copy_1d.p3i8.p1i8");
The relevant code is available at https://github.com/liuliu/ccv/blob/unstable/lib/nnc/mfa/v2/GEMMHeaders.cpp#L30 although any __asm will trip this.
Please give us guidance on whether this is a regression or this will be something enforced in 26 release. Personally, I would consider this as a bug given it won't impact anything "compiled" shaders.
Thanks for your patience reading this!
I have a very basic usdz file from this repo
I call loadTextures() after loading the usdz via MDLAsset. Inspecting the MDLTexture object I can tell it is assigning a colorspace of linear rgb instead of srgb although the image file in the usdz is srgb.
This causes the textures to ultimately render as over saturated.
In the code I later convert the MDLTexture to MTLTexture via MTKTextureLoader but if I set the srgb option it seems to ignore it.
This significantly impacts the usefulness of Model I/O if it can't load a simple usdz texture correctly. Am I missing something?
Thanks!
So I've been trying out GPTK with Elite Dangerous Horizons game and it looks like from what I can tell. The VRAM keeps going up until it goes over the limit where it drops the FPS to 1-3 FPS and then crashes the game. From the Performance HUD I can see that it looks like when using GPTK, the VRAM usage just keeps climbing and I never saw it drop down at all. I did some limited testing, and from that I think I can conclude that it is probably not a VRAM leak, but it might be caching it. The reason for this is because I noticed that if I went back to the area that I've been before. It won't increase the VRAM usage.
So either there is something wrong with the freeing VRAM memory part, or it could be that GPTK might not be reporting the right amount of VRAM available to use? So maybe that's why it keeps allocating VRAM until it went out of memory and crashed the game.
Just to test, I did try running the game with DXVK+MoltenVK combo, and I can see that it works just fine. VRAM is being freed up when it's no longer used.
Is this a known issue in some games?
Hello,
I recently watched the WWDC2025 session titled “Combine Metal 4 machine learning and graphics” (https://developer.apple.com/videos/play/wwdc2025/262/ ), and I’m very excited about the new Metal 4 features that integrate machine learning with graphics—such as neural ambient occlusion, shader-based ML inference, and the use of MTLTensor and MTL4MachineLearningCommandEncoder.
While the session includes helpful code snippets and a compelling debug demo (e.g., the neural ambient occlusion example), the implementation details are not fully shown, and I haven’t been able to find a complete, runnable sample project that demonstrates end-to-end integration of ML and rendering in Metal 4.
Would Apple be able to provide a full, working example—such as an Xcode project—that shows how to:
Export a model to an .mlpackage,
Convert it to an .mtlpackage,
Use MTL4MachineLearningCommandEncoder alongside render passes,
Or embed small neural networks directly in shaders using Shader ML?
Having such a sample would greatly help developers like me adopt these powerful new capabilities correctly and efficiently.
Thank you very much for your time and support!
Best regards,
The title is self-exploratory. I wasn't able to find the CAMetalDisplayLink on the most recent metal-cpp release (metal-cpp_macOS15_iOS18-beta). Are there any plans to include it in the next release?
When generating large arrays of random numbers, NaNs show up. They also show up at the same indices when using the same seed, leading me to believe that this is a bug with MPSMatrixRandom's normally distributed Float32 random number distribution.
Happens with both Philox and MTGP32.
Is this intentional and how do I work around this?
See the original post for a MWE in Swift and Julia: https://github.com/JuliaGPU/Metal.jl/issues/474
I'm trying to use MTLBinaryArchive. I collected a BinaryArchive from one device and used metal-tt to translate it for all supported iPhone devices, ranging from iPhone 7 Plus to iPhone 16.
However, this BinaryArchive is quite large, around 1.5GB uncompressed, and about 500MB compressed in the IPA. I'm wondering how to address the size issue.
I watched the WWDC 2022 video, which mentioned that the operating system or app installation process would handle compatibility. Does this compatibility support different GPU chips? I tried installing an IPA with a BinaryArchive collected only from an iPhone 12 on an iPhone 13, but the BinaryArchive didn't take effect.
I also saw that Apple supports App Thinning. However, it seems that resources in the Asset Catalog cannot be accessed via URL, and creating an MTLBinaryArchive requires a URL. Is it possible for MTLBinaryArchive to be distributed through App Thinning?
The WWDC 2022 video also mentioned using the -Os optimization flag to reduce size. Can this give an estimate of how much compression it would achieve? Are there any methods to solve the BinaryArchive size issue without impacting performance?
Topic:
Graphics & Games
SubTopic:
Metal
Hi,
I am working with a large project. We are compiling each material to its own .metallib. They all include many common files full of inline functions. Finally we link it all together at the end with a single big pathtrace kernel. Everything works as expected, however the compile times have gotten completely out of hand and it takes multiple minutes to compile at runtime (to native code). I have gathered that I can do this offline by using metal-tt however if I am wondering if there is a way to reduce the compile times in such a scenario, and how to investigate what the root cause of the problem is. I suspect it could have to do with the fact that every materials metallib contains duplications of all the inline functions. Any ideas on how to profile and debug this?
Thanks,
Rasmus
Just wondering if anyone knows what it will take to hit greater than 60hz when targeting iPhone. If I set the preferredFramesPerSecond of an MTKView to 120, it works on the iPad, but on iPhone it never goes over 60hz, even with a simple hello triangle sample app... is this a limitation of targeting iPhone?
Topic:
Graphics & Games
SubTopic:
Metal
I am trying to learn the new Metal Peformance Primitives APIs. I have added the MetalPeformancePrimitives framework and included the header in my shader code as per documentation
#include <MetalPeformancePrimitives/MetalPeformancePrimitives.h>
Unfortunately, Xcode complains that the header cannot be found. How do I include it properly?
I am using Xcode 26 on Tahoe. The MetalPeformancePrimitives framework is present on my machine and I can inspect the headers in the filesystem.
Topic:
Graphics & Games
SubTopic:
Metal
I'm updating our app to support metal 4, but the metal 4 types don't seem to get recognized when targeting simulator. Is it known if metal 4 will be supported in the near future, or am I setting up the app wrong?
subj
And how in this case are beautiful system dials made with smoke effects and other particles?
Topic:
Graphics & Games
SubTopic:
Metal
Hi, I'm Beginner with Metal 4 and Model I/O 🥺.
I can render simple models with just one mesh, but when I try to render models with SubMeshes, nothing shows up on screen.
Can anyone help me figure out how to properly render models with multiple submeshes? I think I'm not iterating through them correctly or maybe missing some buffers setup.
Here's what I have so far:
https://www.icloud.com.cn/iclouddrive/0a6x_NLwlWy-herPocExZ8g3Q#LoadModel
Now the examples of metal-cpp are target on desktop and using AppKit which is not supported on iOS. Is there any tips for developing with metal-cpp on mobile device?
We have a production Metal app with a complex multithreaded Metal pipeline.
When everything is operating smoothly, it works great.
Even when extremely overloaded, it does not crash for days at a time.
This isn't good enough for our users.
Unfortunately, when I have zero visibility into id, I have no way of knowing when metal is "done" with an id.
When overloaded, stale metal render passes need to be 'aborted', which results in metal callbacks not being called.
for example, these callbacks may not be called after an aborted pass:
id<MTLCommandBuffer> m_cmdbuf;
[m_cmdbuf addScheduledHandler:^(id <MTLCommandBuffer> cb) {
cpr->scheduled = MachAbsoluteTime();
}];
[m_cmdbuf addCompletedHandler:^(id <MTLCommandBuffer> cb) {
cpr->completed = MachAbsoluteTime();
}];
For the moment, our workaround is a system which waits a few seconds after we "think" a rendering pass should be done with all its (aborted) resources before releasing buffers. This is not ideal, to say the least.
So, in summary, my question is, it would be nice to be able to 'query' an id to know when metal is done with it, so that we know that its safe to release it along with our own internal resources.
Is there any such (undocumented) mechanism? I have exhaustively read all existing Metal documentation many times.
An idea that I've been toying with... it would be nice to have something akin to Zombie detection running all the time for id only.
In OpenGL, it was OK to use a released texture... you may display a bad frame, but not CRASH!. Is there any similar option for id?
Topic:
Graphics & Games
SubTopic:
Metal
How many 32-bit variables can I use concurrently in a single thread of a Metal compute kernel without worrying about the variables getting spilled into the device memory? Alternatively: how many 32-bit registers does a single thread have available for itself?
Let's say that each thread of my compute kernel needs to store and work with its own array of N float variables, where N can be 128, 256, 512 or more. To achieve maximum possible performance, I do not want to the local thread variables to get spilled into the slow device memory. I want all N variables to be stored "on-chip", in the thread memory space.
To make my question more concrete, let's say there is an array thread float localArray[N]. Assuming an unrealistic hypothetical scenario where localArray is the only variable in the whole kernel, what is the maximum value of N for which no portion of localArray would get spilled into the device memory?
I searched in the Metal feature set tables, but I could not find any details.
In my project I need to do the following:
In runtime create metal Dynamic library from source.
In runtime create metal Executable library from source and Link it with my previous created Dynamic library.
Create compute pipeline using those two libraries created above.
But I get the following error at the third step:
Error Domain=AGXMetalG15X_M1 Code=2 "Undefined symbols:
_Z5noisev, referenced from: OnTheFlyKernel
" UserInfo={NSLocalizedDescription=Undefined symbols:
_Z5noisev, referenced from: OnTheFlyKernel
}
import Foundation
import Metal
class MetalShaderCompiler {
let device = MTLCreateSystemDefaultDevice()!
var pipeline: MTLComputePipelineState!
func compileDylib() -> MTLDynamicLibrary {
let source = """
#include <metal_stdlib>
using namespace metal;
half3 noise() {
return half3(1, 0, 1);
}
"""
let option = MTLCompileOptions()
option.libraryType = .dynamic
option.installName = "@executable_path/libFoundation.metallib"
let library = try! device.makeLibrary(source: source, options: option)
let dylib = try! device.makeDynamicLibrary(library: library)
return dylib
}
func compileExlib(dylib: MTLDynamicLibrary) -> MTLLibrary {
let source = """
#include <metal_stdlib>
using namespace metal;
extern half3 noise();
kernel void OnTheFlyKernel(texture2d<half, access::read> src [[texture(0)]],
texture2d<half, access::write> dst [[texture(1)]],
ushort2 gid [[thread_position_in_grid]]) {
half4 rgba = src.read(gid);
rgba.rgb += noise();
dst.write(rgba, gid);
}
"""
let option = MTLCompileOptions()
option.libraryType = .executable
option.libraries = [dylib]
let library = try! self.device.makeLibrary(source: source, options: option)
return library
}
func runtime() {
let dylib = self.compileDylib()
let exlib = self.compileExlib(dylib: dylib)
let pipelineDescriptor = MTLComputePipelineDescriptor()
pipelineDescriptor.computeFunction = exlib.makeFunction(name: "OnTheFlyKernel")
pipelineDescriptor.preloadedLibraries = [dylib]
pipeline = try! device.makeComputePipelineState(descriptor: pipelineDescriptor, options: .bindingInfo, reflection: nil)
}
}
There is a sample project from Apple here. It has a scene of a city at night and you can move in it.
It basically has 2 parts:
application code written in what looks like Objective-C (I am more familiar with C++), which inherits from things like NSObject, MTKView, NSViewController and so on - it processes input and all app-related and window-related stuff.
rendering code that also looks like Objective-C. Btw both parts are mostly in .mm files (Obj-C++ AFAIK). The application part directly uses only one class from the rendering part - AAPLRenderer.
I want to move the rendering part to C++ using metal-cpp. For that I need to link metal-cpp to the project. I did it successfully with blank projects several times before using this tutorial. But with this sample project Xcode can't find Foundation/Foundation.hpp (and other metal-cpp headers). The error says this:
Did not find header 'Foundation.hpp' in framework 'Foundation' (loaded from '/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX15.0.sdk/System/Library/Frameworks')
Pls help
Hi,
A user sent us a crash report that indicates an error occurring just after loading the default Metal library of our app.
Application Specific Information:
Crashing on exception: *** -[__NSArrayM objectAtIndex:]: index 0 beyond bounds for empty array
The report pointed me to these (simplified) lines of codes in the library setup:
_vertexFunctions = [[NSMutableArray alloc] init];
_fragmentFunctions = [[NSMutableArray alloc] init];
id<MTLLibrary> library = [device newDefaultLibrary];
2 vertex shaders and 5 fragment shaders are then loaded and stored in these two arrays using this method:
-(BOOL) addShaderNamed:(NSString *)name library:(id<MTLLibrary>)library isFragment:(BOOL)isFragment {
id shader = [library newFunctionWithName:name];
if (!shader) {
ALOG(@"Error : Unable to find the shader named : “%@”", name);
return NO;
}
[(isFragment ? _fragmentFunctions : _vertexFunctions) addObject:shader];
return YES;
}
As you can see, the arrays are not filled if the method fails... however, a few lines later, they are used without checking if they are really filled, and that causes the crash...
But this coding error doesn't explain why no shader of a certain type (or both types) have been added to the array, meaning: why -newFunctionWithName: returned nil for all given names (since the implied array appears completely empty)?
Clue
This error has only be detected once by a user running the app on macOS 10.13 with a NVIDIA Web Driver instead of the default macOS graphic driver. Moreover, it wasn't possible to reproduce the problem on the same OS using the native macOS driver.
So my question is: is it some known conflicts between NVIDIA drivers and the use of Metal libraries? Or does this case would require some specific options in the Metal implementation?
Any help appreciated, thanks!