Render advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.

Metal Documentation

Posts under Metal subtopic

Post

Replies

Boosts

Views

Activity

Unable to compile Core Image filter on Xcode 26 due to missing Metal toolchain
I have a Core Image filter in my app that uses Metal. I cannot compile it because it complains that the executable tool metal is not available, but I have installed it in Xcode. If I go to the "Components" section of Xcode Settings, it shows it as downloaded. And if I run the suggested command, it also shows it as installed. Any advice? Xcode Version Version 26.0 beta (17A5241e) Build Output Showing All Errors Only Build target Lessons of project StudyJapanese with configuration Light RuleScriptExecution /Users/chris/Library/Developer/Xcode/DerivedData/StudyJapanese-glbneyedpsgxhscqueifpekwaofk/Build/Intermediates.noindex/StudyJapanese.build/Light-iphonesimulator/Lessons.build/DerivedSources/OtsuThresholdKernel.ci.air /Users/chris/Code/SerpentiSei/Shared/iOS/CoreImage/OtsuThresholdKernel.ci.metal normal undefined_arch (in target 'Lessons' from project 'StudyJapanese') cd /Users/chris/Code/SerpentiSei/StudyJapanese /bin/sh -c xcrun\ metal\ -w\ -c\ -fcikernel\ \"\$\{INPUT_FILE_PATH\}\"\ -o\ \"\$\{SCRIPT_OUTPUT_FILE_0\}\"' ' error: error: cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain /Users/chris/Code/SerpentiSei/StudyJapanese/error:1:1: cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain Build failed 6/9/25, 8:31 PM 27.1 seconds Result of xcodebuild -downloadComponent MetalToolchain (after switching Xcode-beta.app with xcode-select) xcodebuild -downloadComponent MetalToolchain Beginning asset download... Downloaded asset to: /System/Library/AssetsV2/com_apple_MobileAsset_MetalToolchain/4d77809b60771042e514cfcf39662c6d1c195f7d.asset/AssetData/Restore/022-19457-035.dmg Done downloading: Metal Toolchain (17A5241c). Screenshots from Xcode Result of "Copy Information" Metal Toolchain 26.0 [com.apple.MobileAsset.MetalToolchain: 17.0 (17A5241c)] (Installed)
25
0
2.9k
Oct ’25
VRAM not freeing in Elite Dangerous
So I've been trying out GPTK with Elite Dangerous Horizons game and it looks like from what I can tell. The VRAM keeps going up until it goes over the limit where it drops the FPS to 1-3 FPS and then crashes the game. From the Performance HUD I can see that it looks like when using GPTK, the VRAM usage just keeps climbing and I never saw it drop down at all. I did some limited testing, and from that I think I can conclude that it is probably not a VRAM leak, but it might be caching it. The reason for this is because I noticed that if I went back to the area that I've been before. It won't increase the VRAM usage. So either there is something wrong with the freeing VRAM memory part, or it could be that GPTK might not be reporting the right amount of VRAM available to use? So maybe that's why it keeps allocating VRAM until it went out of memory and crashed the game. Just to test, I did try running the game with DXVK+MoltenVK combo, and I can see that it works just fine. VRAM is being freed up when it's no longer used. Is this a known issue in some games?
12
3
859
Apr ’25
Metal texture allocated size versus actual image data size
Hello. In the iOS app i'm working on we are very tight on memory budget and I was looking at ways to reduce our texture memory usage. However I noticed that comparing ASTC8x8 to ASTC12x12, there is no actual difference in allocated memory for most of our textures despite ASTC12x12 having less than half the bpp of 8x8. The difference between the two only becomes apparent for textures 1024x1024 and larger, and even in that case the actual texture data is sometimes only 60% of the allocation size. I understand there must be some alignment and padding going on, but this seems extreme. For an example scene in my app with astc12x12 for most textures there is over a 100mb difference in astc size on disk versus when loaded, so I would love to be able to recover even a portion of that memory. Here is some test code with some measurements i've taken using an iphone 11: for(int i = 0; i < 11; i++) { MTLTextureDescriptor *texDesc = [[MTLTextureDescriptor alloc] init]; texDesc.pixelFormat = MTLPixelFormatASTC_12x12_LDR; int dim = 12; int n = 2 << i; int mips = i+1; texDesc.width = n; texDesc.height = n; texDesc.mipmapLevelCount = mips; texDesc.resourceOptions = MTLResourceStorageModeShared; texDesc.usage = MTLTextureUsageShaderRead; // Calculate the equivalent astc texture size int blocks = 0; if(mips == 1) { blocks = n/dim + (n%dim>0? 1 : 0); blocks *= blocks; } else { for(int j = 0; j < mips; j++) { int a = 2 << j; int cur = a/dim + (a%dim>0? 1 : 0); blocks += cur*cur; } } auto tex = [objCObj newTextureWithDescriptor:texDesc]; printf("%dx%d, mips %d, Astc: %d, Metal: %d\n", n, n, mips, blocks*16, (int)tex.allocatedSize); } MTLPixelFormatASTC_12x12_LDR 128x128, mips 7, Astc: 2768, Metal: 6016 256x256, mips 8, Astc: 10512, Metal: 32768 512x512, mips 9, Astc: 40096, Metal: 98304 1024x1024, mips 10, Astc: 158432, Metal: 262144 128x128, mips 1, Astc: 1936, Metal: 4096 256x256, mips 1, Astc: 7744, Metal: 16384 512x512, mips 1, Astc: 29584, Metal: 65536 1024x1024, mips 1, Astc: 118336, Metal: 147456 MTLPixelFormatASTC_8x8_LDR 128x128, mips 7, Astc: 5488, Metal: 6016 256x256, mips 8, Astc: 21872, Metal: 32768 512x512, mips 9, Astc: 87408, Metal: 98304 1024x1024, mips 10, Astc: 349552, Metal: 360448 128x128, mips 1, Astc: 4096, Metal: 4096 256x256, mips 1, Astc: 16384, Metal: 16384 512x512, mips 1, Astc: 65536, Metal: 65536 1024x1024, mips 1, Astc: 262144, Metal: 262144 I also tried using MTLHeaps (placement and automatic) hoping they might be better, but saw nearly the same numbers. Is there any way to have metal allocate these textures in a more compact way to save on memory?
8
0
2.7k
Mar ’25
OS choosing performance state poorly for GPU use case
I am building a MacOS desktop app (https://anukari.com) that is using Metal compute to do real-time audio/DSP processing, as I have a problem that is highly parallelizable and too computationally expensive for the CPU. However it seems that the way in which I am using the GPU, even when my app is fully compute-limited, the OS never increases the power/performance state. Because this is a real-time audio synthesis application, it's a huge problem to not be able to take advantage of the full clock speeds that the GPU is capable of, because the app can't keep up with real-time. I discovered this issue while profiling the app using Instrument's Metal tracing (and Game tracing) modes. In the profiling configuration under "Metal Application" there is a drop-down to select the "Performance State." If I run the application under Instruments with Performance State set to Maximum, it runs amazingly well, and all my problems go away. For comparison, when I run the app on its own, outside of Instruments, the expensive GPU computation it's doing takes around 2x as long to complete, meaning that the app performs half as well. I've done a ton of work to micro-optimize my Metal compute code, based on every scrap of information from the WWDC videos, etc. A problem I'm running into is that I think that the more efficient I make my code, the less it signals to the OS that I want high GPU clock speeds! I think part of why the OS is confused is that in most use cases, my computation can be done using only a small number of Metal threadgroups. I'm guessing that the OS heuristics see that only a small fraction of the GPU is saturated and fail to scale up the power/clock state. I'm not sure what to do here; I'm in a bit of a bind. One possibility is that I intentionally schedule busy work -- spin threadgroups just to waste energy and signal to the OS that I need higher clock speeds. This is obviously a really bad idea, but it might work. Is there any other (better) way for my app to signal to the OS that it is doing real-time latency-sensitive computation on the GPU and needs the clock speeds to be scaled up? Note that game mode is not really an option, as my app also runs as an AU plugin inside hosts like Garageband, so it can't be made fullscreen, etc.
6
0
895
May ’25
Metal and Swift Concurrency
Hi, Introducing Swift Concurrency to my Metal app has been a bit challenging as Swift Concurrency is limited by the cooperative thread pool. GPU work is obviously not CPU bound and can block forward moving progress, especially when using waitUntilCompleted on the command buffer. For concurrent render work this has the potential of under utilizing the CPU and even creating dead locks. My question is, what is the Metal's teams general recommendation when it comes to concurrency? It seems to me that Dispatch or OperationQueues are still the preferred way for Metal bound tasks in order to gain maximum performance? To integrate with Swift Concurrency my idea is to use continuations that kick off render jobs via Dispatch or Queues? Would this be the best solution to bridge async tasks with Metal work? Thanks!
5
0
1.1k
Apr ’25
How to use MTKTextureLoader to load png data
I am trying to load some PNG data with MTKTextureLoader newTextureWithData,but the result shows wrong at the alpha area. Here is the code. I have an image URL, after it downloads successfully, I try to use the data or UIImagePNGRepresentation (image), they all show wrong. UIImage *tempImg = [UIImage imageWithData:data]; CGImageRef cgRef = tempImg.CGImage; MTKTextureLoader *loader = [[MTKTextureLoader alloc] initWithDevice:device]; id<MTLTexture> temp1 = [loader newTextureWithData:data options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil]; NSData *tempData = UIImagePNGRepresentation(tempImg); id<MTLTexture> temp2 = [loader newTextureWithData:tempData options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil]; id<MTLTexture> temp3 = [loader newTextureWithCGImage:cgRef options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil]; }] resume];
5
0
614
May ’25
Tile Shaders performance when writing to tile texture vs. resolve texture
I am working on a custom resolve tile shader for a client. I see a big difference in performance depending on where we write to: 1- the resolve texture of the color attachment 2- a rw tile shader texture set via [renderEncoder setTileTexture: myResolvedTexture] Option 2 is more than twice as slow than option 1. Our compute shader writes to 4 UAVs so just using the resolve texture entry is not possible. Why such a difference as there is no more data being written? Can option 2 be as fast as option 1? I can demonstrate the issue in a modified version of the Multisample code sample.
5
0
556
Feb ’25
Metal runtime shader library compilation and linking issue
In my project I need to do the following: In runtime create metal Dynamic library from source. In runtime create metal Executable library from source and Link it with my previous created Dynamic library. Create compute pipeline using those two libraries created above. But I get the following error at the third step: Error Domain=AGXMetalG15X_M1 Code=2 "Undefined symbols: _Z5noisev, referenced from: OnTheFlyKernel " UserInfo={NSLocalizedDescription=Undefined symbols: _Z5noisev, referenced from: OnTheFlyKernel } import Foundation import Metal class MetalShaderCompiler { let device = MTLCreateSystemDefaultDevice()! var pipeline: MTLComputePipelineState! func compileDylib() -> MTLDynamicLibrary { let source = """ #include <metal_stdlib> using namespace metal; half3 noise() { return half3(1, 0, 1); } """ let option = MTLCompileOptions() option.libraryType = .dynamic option.installName = "@executable_path/libFoundation.metallib" let library = try! device.makeLibrary(source: source, options: option) let dylib = try! device.makeDynamicLibrary(library: library) return dylib } func compileExlib(dylib: MTLDynamicLibrary) -> MTLLibrary { let source = """ #include <metal_stdlib> using namespace metal; extern half3 noise(); kernel void OnTheFlyKernel(texture2d<half, access::read> src [[texture(0)]], texture2d<half, access::write> dst [[texture(1)]], ushort2 gid [[thread_position_in_grid]]) { half4 rgba = src.read(gid); rgba.rgb += noise(); dst.write(rgba, gid); } """ let option = MTLCompileOptions() option.libraryType = .executable option.libraries = [dylib] let library = try! self.device.makeLibrary(source: source, options: option) return library } func runtime() { let dylib = self.compileDylib() let exlib = self.compileExlib(dylib: dylib) let pipelineDescriptor = MTLComputePipelineDescriptor() pipelineDescriptor.computeFunction = exlib.makeFunction(name: "OnTheFlyKernel") pipelineDescriptor.preloadedLibraries = [dylib] pipeline = try! device.makeComputePipelineState(descriptor: pipelineDescriptor, options: .bindingInfo, reflection: nil) } }
4
0
877
Feb ’25
Xcode Vulkan is opening two windows instead of one.
I'm a newbee at Vulkan and Xcode. I have my project on github https://github.com/flocela/OrangeSpider/ Whenever I run, two windows open instead of only one. I added testing, which means I have an OrangeSpider.xctestplan in the OrangeSpider/TestsOrangeSpider/ folder. This is my first time adding testing to an XCode project, so I think this may be where the problem is. I also get this error message: ViewBridge to RemoteViewService Terminated: Error Domain=com.apple.ViewBridge Code=18 "(null)" UserInfo={com.apple.ViewBridge.error.hint=this process disconnected remote view controller -- benign unless unexpected, com.apple.ViewBridge.error.description=NSViewBridgeErrorCanceled}
4
0
150
Jul ’25
Background GPU Access availability
I would love to use Background GPU Access to do some video processing in the background. However the documentation of BGContinuedProcessingTaskRequest.Resources.gpu clearly states: Not all devices support background GPU use. For more information, see Performing long-running tasks on iOS and iPadOS. Is there a list available of currently released devices that do (or don't) support GPU background usage? That would help to understand what part of our user base can use this feature. (And what hardware we need to test this on as developers.) For example it seems that it isn't supported on an iPad Pro M1 with the current iOS 26 beta. The simulators also seem to not support the background GPU resource. So would be great to understand what hardware is capable of using this feature!
4
0
881
Jul ’25
macOS Tahoe Beta 4 disabled __asm keyword for Metal
Hi, developers, I maintain a shipped app that uses string concatenation to construct Metal shader and compile on-device. Beta 4 seems disabled __asm keyword, resulting the compilation failure. The error is: v2/GEMMKernel.cpp:229: error: program_source:23:9: error: illegal string literal in 'asm' __asm("air.simdgroup_async_copy_1d.p3i8.p1i8"); The relevant code is available at https://github.com/liuliu/ccv/blob/unstable/lib/nnc/mfa/v2/GEMMHeaders.cpp#L30 although any __asm will trip this. Please give us guidance on whether this is a regression or this will be something enforced in 26 release. Personally, I would consider this as a bug given it won't impact anything "compiled" shaders. Thanks for your patience reading this!
Topic: Graphics & Games SubTopic: Metal Tags:
4
6
870
Jul ’25
Metal 4 & Acceleration Structures
I have really enjoyed looking through the code and videos related to Metal 4. Currently, my interest is to update a ReSTIR Project and take advantage of more robust ways to refit acceleration Structures and more powerful ways to access resources. I am working in Swift and have encountered a couple of puzzles: What is the 'accepted' way to create a MTL4BufferRange to store indices and vertices? How do I properly rewrite Swift code to build and compact an Acceleration Structure? I do realize that this is all in Beta and will happily look through Code Samples this Fall. If other guidance is available earlier, that would be fabulous! Thank you
4
0
609
Sep ’25
Float8 and Float16 "Reserved_Name__Do_not_use"
I am developing a macOS terminal app, running on an M4 Pro, and using Metal. I am not able use float8 or float16, both reporting Variable has incomplete type 'float16' (aka '__Reserved_Name__Do_not_use_float16'). Based on the system I should be able to use these. Either it is because it is also compiling to Intel, which they are not allowed, or something else. Either way I have not been able to figure out how to get past this. IIs there a compiler setting I need to set to make this work? if so which one and what setting do I need? I only want to run this on M processes, on the latest version of OS so not interested in Intel version or backward compatibility.
Topic: Graphics & Games SubTopic: Metal Tags:
4
0
172
Aug ’25
Are there complete code examples available for “Combine Metal 4 machine learning and graphics”?
Hello, I recently watched the WWDC2025 session titled “Combine Metal 4 machine learning and graphics” (https://developer.apple.com/videos/play/wwdc2025/262/ ), and I’m very excited about the new Metal 4 features that integrate machine learning with graphics—such as neural ambient occlusion, shader-based ML inference, and the use of MTLTensor and MTL4MachineLearningCommandEncoder. While the session includes helpful code snippets and a compelling debug demo (e.g., the neural ambient occlusion example), the implementation details are not fully shown, and I haven’t been able to find a complete, runnable sample project that demonstrates end-to-end integration of ML and rendering in Metal 4. Would Apple be able to provide a full, working example—such as an Xcode project—that shows how to: Export a model to an .mlpackage, Convert it to an .mtlpackage, Use MTL4MachineLearningCommandEncoder alongside render passes, Or embed small neural networks directly in shaders using Shader ML? Having such a sample would greatly help developers like me adopt these powerful new capabilities correctly and efficiently. Thank you very much for your time and support! Best regards,
4
2
846
4w
MDLAsset loads texture in usdz file loaded with wrong colorspace
I have a very basic usdz file from this repo I call loadTextures() after loading the usdz via MDLAsset. Inspecting the MDLTexture object I can tell it is assigning a colorspace of linear rgb instead of srgb although the image file in the usdz is srgb. This causes the textures to ultimately render as over saturated. In the code I later convert the MDLTexture to MTLTexture via MTKTextureLoader but if I set the srgb option it seems to ignore it. This significantly impacts the usefulness of Model I/O if it can't load a simple usdz texture correctly. Am I missing something? Thanks!
3
3
841
Dec ’24
After updating CAMetalLayer.drawableSize, [CAMetalLayer nextDrawable:] frequently takes ~1s
I have a bare-bones Metal app setup where I attach a CAMetalLayer to a window that inherits from a NSWindow with a custom delegate. Everything else is vanilla. I'm also using metal-cpp and metal shader converter. I'm running into a issue where the application runs fine in the beginning, but once I resize the window, it starts hitching. It turns out that [CAMetalLayer nextDrawable:] frequently (but not always) takes around a full second (plus or minus a few milliseconds) to return once drawableSize has been updated. I've tried setting allowsNextDrawableTimeout to false which doesn't work; it returns a valid drawable after a second instead of nil. Setting displaySyncEnabled to false reduces the likelihood of this happening to around 50% from 90%+ but does not eliminate it. Setting maximumDrawableCount to 2 or 3 does not seem to make a difference. By dumping the resource IDs of the returned textures I've noticed something interesting: Before resizing, the layer seems to shuffle between 2 textures or at least 2 resource IDs, but after resizing it starts to create new textures for each returned drawable. Occasionally it seems to reuse a previous resource ID, but it does not seem to have anything to do with whether the method returns quickly or not. Why does this happen, and how can I fix it? Should I create a new CAMetalLayer when resizing the window instead of updating drawableSize?
3
0
740
Jan ’25
Xcode Playground - The LLDB RPC server has crashed.
I am trying to learn Metal development on my MacBook Pro M1 Pro (Sequoia 15.3.1) on Xcode Playground, but when I write these two lines of code: import Metal let device = MTLCreateSystemDefaultDevice()! I get the error The LLDB RPC server has crashed. Any ideas as to what I can do to solve this? I have rebooted the machine and reinstalled Xcode...
3
0
485
Mar ’25