Render advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.

Posts under Metal tag

121 Posts

Post

Replies

Boosts

Views

Activity

_FusedMatMul with [BiasAdd, Relu] produces incorrect results in graph mode on Metal GPU
When running a tf.function-traced graph on the Metal GPU, any operation that combines MatMul → BiasAdd → Relu (the fused pattern emitted by tf.keras.layers.Dense(activation='relu')) produces numerically incorrect output — errors on the order of tens of units, not floating-point noise. Eager mode on the same Metal GPU is correct. Graph mode forced to CPU (tf.config.set_visible_devices([], 'GPU')) is also correct. The bug is deterministic and data-independent (reproduces with random weights). the three-op combination of MatMul + BiasAdd + Relu trigger the error. Specifically: relu(tf.nn.bias_add(tf.matmul(x, W), b)) in graph mode on Metal is wrong, while relu(tf.matmul(x, W) + b) (using AddV2 instead of BiasAdd) is correct. Removing the Relu also makes the result correct — tf.nn.bias_add(tf.matmul(x, W), b) without a following Relu produces correct output at every shape tested. This points to the Metal plugin's fused _FusedMatMul kernel with fused_ops=[BiasAdd, Relu] as the culprit. Disabling the TF core grappler remapping pass (tf.config.optimizer.set_experimental_options({'remapping': False})) does not fix the issue, confirming that the fusion decision is made inside the Metal plugin's own kernel selection, below the TF core graph optimizer. The bug reproduces across all shapes tested (batch 4–200, inner dimension K 512–8192, output 128–2048) and is not specific to any particular weight values. A minimal reproducer: import tensorflow as tf import numpy as np # Any shape works; larger K makes the error more obvious M, K, N = 64, 2048, 1024 W = tf.Variable(tf.random.normal([K, N])) b = tf.Variable(tf.random.normal([N])) x = tf.random.normal([M, K]) @tf.function def graph_fused(x): return tf.nn.relu(tf.nn.bias_add(tf.matmul(x, W), b)) @tf.function def graph_safe(x): return tf.nn.relu(tf.matmul(x, W) + b) # AddV2 instead of BiasAdd eager_ref = tf.nn.relu(tf.nn.bias_add(tf.matmul(x, W), b)) # eager = correct fused_out = graph_fused(x) # Metal graph mode = WRONG safe_out = graph_safe(x) # Metal graph mode = correct print(f"eager vs graph_fused (BiasAdd): {tf.reduce_max(tf.abs(eager_ref - fused_out)).numpy():.1f}") # ^ typically 30–80+ (WRONG) print(f"eager vs graph_safe (AddV2): {tf.reduce_max(tf.abs(eager_ref - safe_out)).numpy():.2e}") # ^ typically ~1e-5 (correct) Environment: TensorFlow 2.18.1, Keras 3.11.2, tensorflow-metal (latest as of 2026-05-26), Apple Silicon Mac. Impact: This breaks any Keras model that uses Dense(activation='relu') when called inside a tf.function or via SavedModel serving on the Metal GPU. Eager-mode inference is unaffected.
0
0
1k
1w
Metal GPU Driver Crash on M5 Pro + macOS 26.5 — kIOGPUCommandBufferCallbackErrorOutOfMemory with <2GB working sets
Metal GPU Driver Crash on M5 Pro + macOS 26.5 — kIOGPUCommandBufferCallbackErrorOutOfMemory with <2GB working sets Summary The Metal driver AGXMetalG17X 351.2 on macOS 26.5 (25F71) for the M5 Pro chip crashes with kIOGPUCommandBufferCallbackErrorOutOfMemory (00000008) when running LLM inference workloads with working sets as small as ~1.5GB, despite 24GB of unified memory being available and Apple Diagnostics confirming the hardware is fully functional. This affects multiple tools: MLX, llama.cpp (Metal backend), and native apps using Metal for inference. System Component Value Model MacBook Pro (Mac17,9) Chip Apple M5 Pro (applegpu_g17s) GPU Cores 16 RAM 24 GB LPDDR5 macOS 26.5 (25F71) Metal Metal 4 GPU Driver AGXMetalG17X 351.2 Xcode 26.5 (17F42) Reproduction MLX (Python) pip install mlx mlx-lm python -m mlx_lm.generate \ --model mlx-community/Qwen2.5-3B-Instruct-4bit \ --max-tokens 10 \ --prompt "Hello" Expected: Normal text generation Actual: Crash with: libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory) llama.cpp brew install llama.cpp llama-cli --model model.gguf --prompt "Hello" --n-predict 20 --n-gpu-layers 99 Expected: Fast GPU generation Actual: Process hangs indefinitely Test Results Tool Model Peak Memory Result MLX Qwen2.5-0.5B-4bit 0.36 GB ✅ Works MLX Qwen2.5-1.5B-4bit 0.98 GB ✅ Works MLX Qwen3-1.7B-4bit 1.01 GB ✅ Works MLX Qwen2.5-3B-4bit ~1.5 GB ❌ Metal OOM crash MLX Qwen3-4B-4bit ~2.1 GB ❌ Metal OOM crash MLX Qwen3-8B-4bit ~4.5 GB ❌ Metal OOM crash llama.cpp Qwen2.5-0.5B GGUF ~0.5 GB ❌ Hangs with GPU llama.cpp Qwen2.5-0.5B GGUF ~0.5 GB ✅ Works with CPU only Key Evidence Hardware is healthy — Apple Diagnostics passed all tests Basic Metal works — matmul, array ops work fine CPU inference works — llama.cpp with -ngl 0 runs correctly The error is NOT about actual memory exhaustion — kIOGPUCommandBufferCallbackErrorOutOfMemory means the kernel rejects the Metal memory commit, not that physical memory is full. The system reports 17.76GB available for Metal working set. Crash Log Extract Thread 31 Crashed: 0 libsystem_kernel.dylib __pthread_kill + 8 1 libsystem_pthread.dylib pthread_kill + 296 2 libsystem_c.dylib abort + 148 3 Metal MTLReportFailure.cold.1 + 48 4 Metal MTLReportFailure + 576 5 Metal -[_MTLCommandBuffer addCompletedHandler:] + 104 ... Exception Type: EXC_CRASH (SIGABRT) Termination Reason: Namespace SIGNAL, Code 6, Abort trap: 6 Related Issues ml-explore/mlx#3586 — Metal compiler regression on macOS 26.5 ml-explore/mlx#3534 — M5 float32 precision issue ml-explore/mlx#3568 — M5 random divergence ml-explore/mlx#3539 — Metal residency OOM (M4 Max) Request Please investigate the AGXMetalG17X driver for M5 Pro on macOS 26.5. The driver appears to incorrectly reject Metal memory commits for LLM inference workloads, even when the working set is well within the system's reported limits (1.5GB requested vs 17.76GB available). Happy to provide full crash logs, sysdiagnose archives, or run additional tests.
0
0
73
1w
MetalToolchain and auto updates...
Hello, I can understand why you do not ship the MetalToolchain with the default Xcode installation any more due to the relatively low usage and high download size. That said, every time Xcode runs an auto update it wipes MetalToolchain and breaks my local development build. It would be nice if the updates would be smart enough to honor the fact that. I have already run: "xcodebuild -downloadComponent MetalToolchain" and include that in the update, rather than deleting the module. Thanks, Chris
1
0
187
1w
Inexplicable Metal crash ever since iOS 26.5 beta 4
Hi all, I'm working on updating my audio visualizer app. I'm adding new visualizers based on Metal 4 compute shaders. They worked in iOS 26.4 and iOS 26.5 up until beta 3. However, after that, the visualizers started crashing the phone and forcing a restart. On the latest version of iOS 26.5, the crash is still there. I submitted feedback, but haven't heard anything back just yet. I was wondering if others have faced this same issue, and if there are any workarounds. Here is my repo if you want to look at the code (forgive me if it's sloppy, I'm quite new to graphics programming and Metal): https://github.com/aabagdi/VisualMan/tree/main Thank you!
4
0
1.3k
1w
XPC Communication between Editor app and user-compiled code
Hello! I'm trying to implement an editor app (macOS) that allows the user to write code, which will be compiled and executed, showing the result in the editor window. Imagine it like SwiftUI previews, but the graphic output is created with Metal, not SwiftUI. I found that IOSurface can be used to share that kind of data over XPC, so I would not have to rely on the private NSRemoteView. However, I'm confused if it is, at all, possible for my editor app to connect to an XPC Service, that was NOT bundled with it (but compiled by it at runtime). I succeeded to launch an XPC service defined as: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>Label</key> <string>com.myteam.myproject.service</string> <key>MachServices</key> <dict> <key>com.myteam.myproject.service</key> <true/> </dict> <key>Program</key> <string>/Path/to/service/run_my_service.sh</string> </dict> </plist> But the call to let connection = NSXPCConnection(machServiceName: "com.myteam.myproject.service") let proxy = connection.remoteObjectProxyWithErrorHandler { error in continuation.resume(throwing: error) } as? MyServiceProtocol fails with "The connection to service named com.myteam.myproject.service was invalidated: Connection init failed at lookup with error 3 - No such process." I have added <key>com.apple.security.temporary-exception.mach-lookup.global-name</key> <array> <string>com.myteam.myproject.service</string> </array> to my entitlements. Since the tutorials I followed are quite old, I'm wondering if support for something like this was dropped at some point. Thanks for any advice!
6
0
563
2w
Possibilities of Overclocking Apple Silicon
I've been testing Apple Silicon devices in their desktop configurations on the Mac Studio and now retired Mac Pro and it seems like they're greatly bottlenecked by their clock speeds. For reference here's my testing results. Testing Results: Mac Studio M2 Max • 32GBs RAM • 30 core GPU • 1TB Storage CPU Utilization • 60% • 20W CPU Temperature • 47ºC GPU Utilization • 100% • 20W GPU Temperature • 55ºC Fan Speed • 50% Workload Duration • 2hrs Another point is that the clock speed on the M2 Max's CPU is 3.5 GHz and on the GPU it is 1.44 GHz at max performance. Which the Mac Studio has no trouble pushing. My question is how do I push those clock speeds higher? Cause 1.44 GHz at 55ºC is evidence for extensive headroom. I'm sure there are tools internally for testing the upper limits of the silicon, but it makes no sense why it would be set so low the Mac Studio is at no worries of melting. Is there any way to push the performance of my Mac Studio? FB22713867 - Possibilities of Overclocking Apple Silicon
1
0
279
3w
Metal, Vulkan, OpenGL & Godot
Greetings! I'm preparing to publish an app in Apple Store. It's a 2D Audio app made in Godot, already published in Google Store.. As we know, OpenGL is considered deprecated since iOS 12 / 2018 .. However given the current state of Metal, or Vulkan integration in Godot, and with the idea of bringing the Best possible experience on iOS.. I'm not completely sure what will be the best API to use as primary option.. -As good as Metal, or even Vulkan work in Godot; the fact of the matter is, each API has its strong and weak points.. -Metal: Native on iOS, fully compliant and supported. However it has two weak points: Initial Compilation Freeze - +5 sec. Performance Hit, (although negligible for final user) app uses 25% more CPU (on my iPhone 12). Battery drain? -Vulkan: In godot, Vulkan > MoltenVk > Metal More complex translation layer, but interestingly gives slightly better Performance than Metal.. Initial Compilation doesn't cause Freeze, because is lazy/delayed and performed while the app is starting. Uses 25% less CPU than Metal and gives slightly more stable Framerate. (iPhone 12) However, given the extra complexity it could be more prone to error, or Compatibility Problems, which are known and have been reported with older iOS devices (iPads come to mind..) Right? -OpenGL: No Initial Compilation Needed Max Performance, No CPU munch Universally supported, (in theory?) works Perfectly on my iPhone 12 with iOS 26.3 and 26.4.2 And all in all, gives the best Performance and user experience. -And that's pretty much the situation! Since the graphics API of choice, will have an effect and directly translate to User experience... what's then the best one? -This will be the first app I Publish on Apple Store, so as you can imagine I want to Comply with Apple as much as possible; and bring iOS users the best possible experience. However each one of the APIs seem to have a negative aspect.. Metal: 5sec Compilation Freeze Vulkan: Compatibility Problems? OpenGL: "Deprecated" In practical terms, right now, OpenGL gives the best Performance, and the best User Experience.. So what to do? -The Android version is published in Google Store in OpenGL Compat mode. Works perfectly. Even tho OpenGL has been Deprecated on iOS for 7+ years, it has survived all along, with no announced removal date from Apple. And it seems to work perfectly and be fully operational up to the latest iOS 26 version.. right? Maybe Apple is maintaining it for stability and compatibility reasons, even if they're no longer actively developing it? Butthee "deprecated" label sounds alarming, as if support could drop any day.. So what will be the best choice in this situation? -Will an app built primarily for OpenGL, (with Metal fallback) be Rejected right away in Apple Store? -Otoh Vulkan (via MoltenVK) could be a middle term solution, second best Performance, no Compilation Freeze.. But yeah, the Compatibility aspect is important; and while considerable improvements have been made in Godot's implementation, the current status or possible outcome is harder to assess.. Both Metal and OpenGL seem safer options in that sense..
5
0
1k
Apr ’26
LowLevelInstanceData & animation
AppleOS 26 introduces LowLevelInstanceData that can reduce CPU draw calls significantly by instancing. However, I have noticed trouble with animating each individual instance. As I wanted low-level control, I'm using a custom system and LowLevelInstanceData.replace(using:) to update the transform each frame. The update closure itself is extremely efficient (Xcode Instruments reports nearly no cost). But I noticed extremely high runloop time, reach around 20ms. Time Profiler shows that the CPU is blocked by kernel.release.t6401. I think it is caused by synchronization between CPU and GPU, however, as I am already using a MTLCommandBuffer to coordinate it, I don't understand why I am still seeing large CPU time.
3
0
896
Apr ’26
SCNTechnique clearColor Always Shows sceneBackground When Passes Share Depth Buffer
Problem Description I'm encountering an issue with SCNTechnique where the clearColor setting is being ignored when multiple passes share the same depth buffer. The clear color always appears as the scene background, regardless of what value I set. The minimal project for reproducing the issue: https://www.dropbox.com/scl/fi/30mx06xunh75wgl3t4sbd/SCNTechniqueCustomSymbols.zip?rlkey=yuehjtk7xh2pmdbetv2r8t2lx&st=b9uobpkp&dl=0 Problem Details In my SCNTechnique configuration, I have two passes that need to share the same depth buffer for proper occlusion handling: "passes": [ "box1_pass": [ "draw": "DRAW_SCENE", "includeCategoryMask": 1, "colorStates": [ "clear": true, "clearColor": "0 0 0 0" // Expecting transparent black ], "depthStates": [ "clear": true, "enableWrite": true ], "outputs": [ "depth": "box1_depth", "color": "box1_color" ], ], "box2_pass": [ "draw": "DRAW_SCENE", "includeCategoryMask": 2, "colorStates": [ "clear": true, "clearColor": "0 0 0 0" // Also expecting transparent black ], "depthStates": [ "clear": false, "enableWrite": false ], "outputs": [ "depth": "box1_depth", // Sharing the same depth buffer "color": "box2_color", ], ], "final_quad": [ "draw": "DRAW_QUAD", "metalVertexShader": "myVertexShader", "metalFragmentShader": "myFragmentShader", "inputs": [ "box1_color": "box1_color", "box2_color": "box2_color", ], "outputs": [ "color": "COLOR" ] ] ] And the metal shader used to display box1_color and box2_color with splitting: fragment half4 myFragmentShader(VertexOut in [[stage_in]], texture2d<half, access::sample> box1_color [[texture(0)]], texture2d<half, access::sample> box2_color [[texture(1)]]) { half4 color1 = box1_color.sample(s, in.texcoord); half4 color2 = box2_color.sample(s, in.texcoord); if (in.texcoord.x < 0.5) { return color1; } return color2; }; Expected Behavior Both passes should clear their color targets to transparent black (0, 0, 0, 0) The depth buffer should be shared between passes for proper occlusion Actual Behavior Both box1_color and box2_color targets contain the scene background instead of being cleared to transparent (see attached image) This happens even when I explicitly set clearColor: "0 0 0 0" for both passes Setting scene.background.contents = UIColor.clear makes the clearColor work as expected, but I need to keep the scene background for other purposes What I've Tried Setting different clearColor values - all are ignored when sharing depth buffer Using DRAW_NODE instead of DRAW_SCENE - didn't solve the issue Creating a separate pass to capture the background - the background still appears in the other passes Various combinations of clear flags and render orders Environment iOS/macOS, running with "My Mac (Designed for iPad)" Xcode 16.2 Question Is this a known limitation of SceneKit when passes share a depth buffer? Is there a workaround to achieve truly transparent clear colors while maintaining a shared depth buffer for occlusion testing? The core issue seems to be that SceneKit automatically renders the scene background in every DRAW_SCENE pass when a shared depth buffer is detected, overriding any clearColor settings. Any insights or workarounds would be greatly appreciated. Thank you!
1
0
907
Apr ’26
Cannot load .mtlpackage to MTLLibrary
After watching WWDC 2025 session "Combine Metal 4 machine learning and graphics", I have decided to give it a shot to integrate the latest MTL4MachineLearningCommandEncoder to my existing render pipeline. After a lot of trial and errors, I managed to set up the pipeline and have the app compiled. However, I am now stuck on creating a MTLLibrary with .mtlpackage. Here is the code I have to create a MTLLibrary according the WWDC session https://developer.apple.com/videos/play/wwdc2025/262/?time=550: let coreMLFilePath = bundle.path(forResource: "my_model", ofType: "mtlpackage")! let coreMLURL = URL(string: coreMLFilePath)! do { metalDevice.makeLibrary(URL: coreMLURL) } catch { print("error: \(error)") } With the above code, I am getting error: Error Domain=MTLLibraryErrorDomain Code=1 "Invalid metal package" UserInfo={NSLocalizedDescription=Invalid metal package} What is the correct way to create a MTLLibrary with .mtlpackage? Do I see this error because the .mtlpackage I am using is incorrect? How should I go with debugging this? I'd really appreciate if I could get some help on this as I have been stuck with it for some time now. Thanks in advance!
1
0
544
Apr ’26
Can a compute pipeline be as efficient as a render pipeline for rasterization?
I'm new to graphics and game design and I just wanted to know if a compute pipeline could be as efficient as a render pipeline for rasterization and an explanation on how and why. Also is it possible to manually perform rasterization with a render pipeline as in manipulate individual pixel data in a metal texture yourself but do it with a render pipeline?
1
0
580
Apr ’26
GPTK 3 and D3DMetal issue with Modern Pipeline Creation
Death Stranding 2: On the Beach (v1.0.48.0, Steam) crashes during rendering initialization when running through CrossOver 26 with D3DMetal 3.0 on an Apple M2 Max Mac Studio running macOS Sequoia. The game successfully initializes Streamline, NVAPI, DLSS (Result::eOk), DLSSG (Result::eOk), Reflex, and XeSS — all subsystems report success. The crash occurs immediately after, during rendering pipeline creation, before the game reaches NXStorage initialization or window creation. Minidump analysis confirms the crash is an access violation (0xc0000005) at DS2.exe+0x67233d, writing to address 0x0. RAX=0x0 (null pointer being dereferenced), R12=0xFFFFFFFFFFFFFFFF (error/invalid handle return). The game appears to call a D3D12 API — likely CheckFeatureSupport or a pipeline state creation function — that D3DMetal acknowledges as supported but returns null or invalid data for. The game trusts the response and dereferences the null pointer. Two other Nixxes titles using the same engine and D3DMetal setup run without issue: Spider-Man 2 (~50 FPS) and Horizon Zero Dawn Remastered (~34 FPS). DS2 uses newer technology versions (DLSS 4, FSR 4, XeSS 2) and a newer DirectX 12 Agility SDK, which likely queries D3D12 features that D3DMetal does not yet fully implement. The crash also reproduces when D3DMetal reports as AMD vendor (1002) instead of NVIDIA (10de), crashing at the same executable offset, confirming it is a D3D12 feature reporting gap in D3DMetal rather than a vendor-specific issue. How To Reproduce Install Crossover 26+ on MacOS 26.4 Install Steam and download Death Stranding 2 Run Death Stranding 2 and check logs after crash in Documents\DEATH STRANDING 2 ON THE BEACH Feedback Requests FB22285513 — Game Porting Toolkit 3 issue with Modern Pipeline Creation
1
4
821
Apr ’26
Xcode26 Replay frame broken
Got a broken frame when using Xcode to capture a frame and replay it from a Unity game. It seems like the vertex buffer is broken; I see a bunch of "nan"s in the vertex buffer. However, the game displays correct when running, and it only happend when I upgrade my Xcode and iphone to Xcode26 and IOS26 ios26
1
0
423
Apr ’26
Missing DirectX Calls for Tearing and Depth Bound Test in D3DMetal and GPTK 3
I want to address the missing or incomplete DirectX calls from D3DMetal and Game Porting Toolkit 3. These missing calls have in part caused issue with our porting process and we are reconsidering. Missing or Incomplete Calls DXGI_FEATURE_PRESENT_ALLOW_TEARING — IDXGIFactory5::CheckFeatureSupport — this calls has to do with how VSync is handled and some modern games require it to initialize. Currently D3DMetal return 0 maybe by design but most likely because it’s not integrated. Adding a stub that returns 1 can fix this. I’m my use case I simply Noped the check and forced it to continue. D3D12_FEATURE_D3D12_OPTIONS2.DepthBoundsTestSupported — this call is also not present. Which causes games to not initialize rendering. Thankfully this was fixed by once again skipping the check. But this is essential for water rendering. This could be one reason currently water is not rendering in our game. IDXGIOutput6::GetDesc1().ColorSpace — returns DXGI_COLOR_SPACE_RGB_FULL_G22_NONE_P709 (SDR) on external HDR compatible displays. We were able to fix this by forcing HDR to be enabled. It should return HDR support. These calls may exist but they need to be updated to return the correct values. Specifically for depth bound test you can reference MoltenVK which sets it up on top of Metal since it’s not a native feature. The water issue could be also an issue with how the shaders are compiled. But I’m unable to check because of the closed source nature of GPTK and its debuggers. What is a better way we can debug our game to see why the water isn’t rendering. Does D3DMetal have some debug options or something similar? Feedback Number FB22330617 - Missing DirectX Calls for Tearing and Depth Bound Test in D3DMetal and GPTK 3 We hope these issues are resolved quickly because we were thinking of a simultaneous release with our Windows version, but we can't ship with such large bugs.
6
3
551
Apr ’26
Xcode Metal Capture crash when using MTLSamplerState
The sample code just draw a triangle and sample texture. both sample code can draw a correct triangle and sample texture as expected. there are no error message from terminal. Sample code using constexpr Sampler can capture and replay well. Sample code using a argumentTable to bind a MTLSamplerState was crashed when using Metal capture and replay on Xcode. Here are sample codes. Sample Code Test Environment: M1 Pro MacOS 26.3 (25D125) Xcode Version 26.2 (17C52) Feedback ID: FB22031701
2
0
443
Apr ’26
D3DMetal Extreme Over Synchronization Issues
Explanation Currently, D3DMetal’s GPU synchronization approach introduces significant compute overhead on the CPU. This specifically affects D3D12 games that use modern rendering pipelines on Apple Silicon. Specifically, I’ve tested Death Stranding 2 On the Beach for how it handles its rendering. And the results are extreme: frame times are suffering from a 42% decrease from synchronization. Although there are obviously other effects at play, such as the overhead introduced by Rosetta and Wine, both of them don’t introduce as much overhead as D3DMetal. This issue isn’t just specific to Death Stranding 2 On the Beach; most games running through D3DMetal suffer from this. Most games still seem to force synchronization to ~30 ms to reach the 30 fps amount. But it could be better with better synchronization, such as how DXMT handles it. Instead of doubling the work, it allows Metal to single-handedly track resource dependencies internally. This is in part due to the unfortunate bad mapping of D3D12 calls onto shared logic between D3D11 and D3D12. System M2 Max Mac Studio — 32 GBs — 30-core GPU macOS 26.4 Tahoe CrossOver 26.1 RC Death Stranding 2 On the Beach — Steam Assassin’s Creed Valhalla — Steam & Ubisoft Connect Thank you for your commitment. Another game that I recommend testing to really see this swell is Assassin’s Creed Valhalla. Feedback FB22426600 - D3DMetal Extereme Over Syncranization Issues
1
1
594
Apr ’26
Xcode_26 not compiling Metal project
Hello Xcode 26.0.1 (17A400) Missing some Metal components When building a program using Metal, it induces an unexpected error : “error: error: cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain Command CompileMetalFile failed with a nonzero exit code” Which terminates the build The fix given “xcodebuild -downloadComponent MetalToolchain” using sudo does not work Did someone find a work around or could resolve the issue? Many thanks Jean MacBook Air M4; macOS 26.0.1; Xcode 26.0.1
5
2
497
Mar ’26
How to load and draw texture with opacity in Metal
The background I'm finally working to convert my very old Mac kaleidoscope application, ScopeWorks, which was written in OpenGL and Objective-C, to a Multiplatform app in SwiftUI and Metal. I'm using the MetalKit MTKView class, wrapped for SwiftUI as an NSViewRepresentable or UIViewRepresentable. I then provide an MTKViewDelegate that provides a draw method. The draw method fetches the current render pass descriptor, creates a command buffer, sets up a render pipeline, and does its drawing. My renderer's makePipeline method looks like this: func makePipeline() { let library = device.makeDefaultLibrary() let pipelineDesc = MTLRenderPipelineDescriptor() pipelineDesc.vertexFunction = library?.makeFunction(name: "vertex_main") pipelineDesc.fragmentFunction = library?.makeFunction(name: "fragment_main") pipelineDesc.colorAttachments[0].pixelFormat = .bgra8Unorm pipeline = try! device.makeRenderPipelineState(descriptor: pipelineDesc) } And my shaders look like this: struct VertexOut { float4 position [[position]]; float2 texCoord; }; vertex VertexOut vertex_main(const device float2* position [[buffer(0)]], uint vid [[vertex_id]]) { VertexOut out; float2 pos = position[vid]; out.position = float4(pos, 0, 1); out.texCoord = pos * 0.5 + 0.5; // basic mapping return out; } fragment float4 fragment_main(VertexOut in [[stage_in]], texture2d<float> tex [[texture(0)]], constant float4& color [[buffer(1)]]) { constexpr sampler s(address::repeat, filter::linear); // float4 texColor = tex.sample(s, in.texCoord); // return texColor * color; float4 textureColor = {1, 2, 3, 4}; if (all(color == textureColor)) { return tex.sample(s, in.texCoord); } else { return color; } // Sample the texture directly — no color tint applied return tex.sample(s, in.texCoord); } The first part of my MTKViewDelegate's draw method looks like this: func draw(in view: MTKView) { guard let drawable = view.currentDrawable, let descriptor = view.currentRenderPassDescriptor, let pipeline = pipeline, let texture = texture else { return } let commandBuffer = commandQueue.makeCommandBuffer()! let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: descriptor)! encoder.setRenderPipelineState(pipeline) encoder.setFragmentTexture(texture, index: 0) descriptor.colorAttachments[0].clearColor = MTLClearColor(red: 0.0, green: 0, blue: 0, alpha: 1.0) // Draw six equilateral triangles forming the hexagon let radius: Float = 0.6 for i in 0..<6 { let angle = Float(i) * (.pi / 3) let cosA = cos(angle) let sinA = sin(angle) let nextA = Float(i+1) * (.pi / 3) let cosB = cos(nextA) let sinB = sin(nextA) let verts: [simd_float2] = [ simd_float2(0, 0), simd_float2(radius * cosA, radius * sinA), simd_float2(radius * cosB, radius * sinB) ] encoder.setVertexBytes(verts, length: MemoryLayout<simd_float2>.stride * 3, index: 0) // Tell the fragment shader to use the texture color. var textureColor: simd_float4 = simd_float4(1, 2, 3, 4) encoder.setFragmentBytes(&textureColor, length: MemoryLayout<SIMD4<Float>>.stride, index: 1) encoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3) One of the things the existing app does is load PNG or TIFF images with an alpha channel, and then overlay parts of the image on top of themselves flipped, so you get interesting Moiré patterns in the lines in the resulting kaleidoscope. For now I'm working on a single sample image, loading it into a texture in Metal, and just rendering it as a hexagon and drawing lines for the triangles that make up the hexagon. (For now I'm using the vertex coordinates as the texture coordinates, so I get a hexagonal part of my texture rather than a single triangular part tessellated into a hexagon. I'll fix that later.) In both iOS and OS I set the clear color to black at the beginning of the draw function. The issue: The source image is mostly transparent, but with a lot of partly transparent pixels. Here's what it looks like in Photoshop, where you can see the transparent parts as a checkerboard pattern: (I tried to crop the original image to show the approximate part that I'm rendering in a hexagon, but it's not exact. Look for the same shapes in the different images to compare them.) When I render my hexagon in the Metal view in the iOS version of the app, it looks like it's forcing each pixel to fully opaque or fully transparent: And in the macOS version of the app, it seems to force ALL the pixels to opaque: I haven't shown all the setup code, because it's' a lot. Is there some rendering mode setup I'm missing in order to get it to draw the pixels into the output based on their opacity, including partial opacity?
2
0
1k
Mar ’26
CoreML regression between macOS 26.0.1 and macOS 26.1 Beta causing scrambled tensor outputs
We’ve encountered what appears to be a CoreML regression between macOS 26.0.1 and macOS 26.1 Beta. In macOS 26.0.1, CoreML models run and produce correct results. However, in macOS 26.1 Beta, the same models produce scrambled or corrupted outputs, suggesting that tensor memory is being read or written incorrectly. The behavior is consistent with a low-level stride or pointer arithmetic issue — for example, using 16-bit strides on 32-bit data or other mismatches in tensor layout handling. Reproduction Install ON1 Photo RAW 2026 or ON1 Resize 2026 on macOS 26.0.1. Use the newest Highest Quality resize model, which is Stable Diffusion–based and runs through CoreML. Observe correct, high-quality results. Upgrade to macOS 26.1 Beta and run the same operation again. The output becomes visually scrambled or corrupted. We are also seeing similar issues with another Stable Diffusion UNet model that previously worked correctly on macOS 26.0.1. This suggests the regression may affect multiple diffusion-style architectures, likely due to a change in CoreML’s tensor stride, layout computation, or memory alignment between these versions. Notes The affected models are exported using standard CoreML conversion pipelines. No custom operators or third-party CoreML runtime layers are used. The issue reproduces consistently across multiple machines. It would be helpful to know if there were changes to CoreML’s tensor layout, precision handling, or MLCompute backend between macOS 26.0.1 and 26.1 Beta, or if this is a known regression in the current beta.
8
4
2.4k
Mar ’26
Cannot find devices in RemoteImmersiveSpace
Hi, I'm running the Spatial Rendering App sample on a Macbook Pro running 26.4 Beta and the Vision Pro running visionOS 26.3.1. Handoff and SharePlay are on, both devices are on the same Apple ID and network, and SharePlay screen sharing works fine between the two devices. However, when calling openImmersiveSpace, the device picker fails to present and no devices are found. Errors from console: ((processConfiguration != nil && configuration != nil) || (processConfiguration == nil && configuration == nil)) - .../ExtensionKit/Source/HostViewController/Internal/EXHostSessionDriver.m:80: `processConfiguration` and `configuration` must be both non-nil or both nil Unable to obtain a task name port right for pid 638: (os/kern) failure (0x5) Unable to present an ImmersiveSpace for Scene id 'Compositor Services' Is this a known bug or I'm I missing something? Thanks!
2
1
1.6k
Mar ’26
_FusedMatMul with [BiasAdd, Relu] produces incorrect results in graph mode on Metal GPU
When running a tf.function-traced graph on the Metal GPU, any operation that combines MatMul → BiasAdd → Relu (the fused pattern emitted by tf.keras.layers.Dense(activation='relu')) produces numerically incorrect output — errors on the order of tens of units, not floating-point noise. Eager mode on the same Metal GPU is correct. Graph mode forced to CPU (tf.config.set_visible_devices([], 'GPU')) is also correct. The bug is deterministic and data-independent (reproduces with random weights). the three-op combination of MatMul + BiasAdd + Relu trigger the error. Specifically: relu(tf.nn.bias_add(tf.matmul(x, W), b)) in graph mode on Metal is wrong, while relu(tf.matmul(x, W) + b) (using AddV2 instead of BiasAdd) is correct. Removing the Relu also makes the result correct — tf.nn.bias_add(tf.matmul(x, W), b) without a following Relu produces correct output at every shape tested. This points to the Metal plugin's fused _FusedMatMul kernel with fused_ops=[BiasAdd, Relu] as the culprit. Disabling the TF core grappler remapping pass (tf.config.optimizer.set_experimental_options({'remapping': False})) does not fix the issue, confirming that the fusion decision is made inside the Metal plugin's own kernel selection, below the TF core graph optimizer. The bug reproduces across all shapes tested (batch 4–200, inner dimension K 512–8192, output 128–2048) and is not specific to any particular weight values. A minimal reproducer: import tensorflow as tf import numpy as np # Any shape works; larger K makes the error more obvious M, K, N = 64, 2048, 1024 W = tf.Variable(tf.random.normal([K, N])) b = tf.Variable(tf.random.normal([N])) x = tf.random.normal([M, K]) @tf.function def graph_fused(x): return tf.nn.relu(tf.nn.bias_add(tf.matmul(x, W), b)) @tf.function def graph_safe(x): return tf.nn.relu(tf.matmul(x, W) + b) # AddV2 instead of BiasAdd eager_ref = tf.nn.relu(tf.nn.bias_add(tf.matmul(x, W), b)) # eager = correct fused_out = graph_fused(x) # Metal graph mode = WRONG safe_out = graph_safe(x) # Metal graph mode = correct print(f"eager vs graph_fused (BiasAdd): {tf.reduce_max(tf.abs(eager_ref - fused_out)).numpy():.1f}") # ^ typically 30–80+ (WRONG) print(f"eager vs graph_safe (AddV2): {tf.reduce_max(tf.abs(eager_ref - safe_out)).numpy():.2e}") # ^ typically ~1e-5 (correct) Environment: TensorFlow 2.18.1, Keras 3.11.2, tensorflow-metal (latest as of 2026-05-26), Apple Silicon Mac. Impact: This breaks any Keras model that uses Dense(activation='relu') when called inside a tf.function or via SavedModel serving on the Metal GPU. Eager-mode inference is unaffected.
Replies
0
Boosts
0
Views
1k
Activity
1w
Metal GPU Driver Crash on M5 Pro + macOS 26.5 — kIOGPUCommandBufferCallbackErrorOutOfMemory with <2GB working sets
Metal GPU Driver Crash on M5 Pro + macOS 26.5 — kIOGPUCommandBufferCallbackErrorOutOfMemory with <2GB working sets Summary The Metal driver AGXMetalG17X 351.2 on macOS 26.5 (25F71) for the M5 Pro chip crashes with kIOGPUCommandBufferCallbackErrorOutOfMemory (00000008) when running LLM inference workloads with working sets as small as ~1.5GB, despite 24GB of unified memory being available and Apple Diagnostics confirming the hardware is fully functional. This affects multiple tools: MLX, llama.cpp (Metal backend), and native apps using Metal for inference. System Component Value Model MacBook Pro (Mac17,9) Chip Apple M5 Pro (applegpu_g17s) GPU Cores 16 RAM 24 GB LPDDR5 macOS 26.5 (25F71) Metal Metal 4 GPU Driver AGXMetalG17X 351.2 Xcode 26.5 (17F42) Reproduction MLX (Python) pip install mlx mlx-lm python -m mlx_lm.generate \ --model mlx-community/Qwen2.5-3B-Instruct-4bit \ --max-tokens 10 \ --prompt "Hello" Expected: Normal text generation Actual: Crash with: libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory) llama.cpp brew install llama.cpp llama-cli --model model.gguf --prompt "Hello" --n-predict 20 --n-gpu-layers 99 Expected: Fast GPU generation Actual: Process hangs indefinitely Test Results Tool Model Peak Memory Result MLX Qwen2.5-0.5B-4bit 0.36 GB ✅ Works MLX Qwen2.5-1.5B-4bit 0.98 GB ✅ Works MLX Qwen3-1.7B-4bit 1.01 GB ✅ Works MLX Qwen2.5-3B-4bit ~1.5 GB ❌ Metal OOM crash MLX Qwen3-4B-4bit ~2.1 GB ❌ Metal OOM crash MLX Qwen3-8B-4bit ~4.5 GB ❌ Metal OOM crash llama.cpp Qwen2.5-0.5B GGUF ~0.5 GB ❌ Hangs with GPU llama.cpp Qwen2.5-0.5B GGUF ~0.5 GB ✅ Works with CPU only Key Evidence Hardware is healthy — Apple Diagnostics passed all tests Basic Metal works — matmul, array ops work fine CPU inference works — llama.cpp with -ngl 0 runs correctly The error is NOT about actual memory exhaustion — kIOGPUCommandBufferCallbackErrorOutOfMemory means the kernel rejects the Metal memory commit, not that physical memory is full. The system reports 17.76GB available for Metal working set. Crash Log Extract Thread 31 Crashed: 0 libsystem_kernel.dylib __pthread_kill + 8 1 libsystem_pthread.dylib pthread_kill + 296 2 libsystem_c.dylib abort + 148 3 Metal MTLReportFailure.cold.1 + 48 4 Metal MTLReportFailure + 576 5 Metal -[_MTLCommandBuffer addCompletedHandler:] + 104 ... Exception Type: EXC_CRASH (SIGABRT) Termination Reason: Namespace SIGNAL, Code 6, Abort trap: 6 Related Issues ml-explore/mlx#3586 — Metal compiler regression on macOS 26.5 ml-explore/mlx#3534 — M5 float32 precision issue ml-explore/mlx#3568 — M5 random divergence ml-explore/mlx#3539 — Metal residency OOM (M4 Max) Request Please investigate the AGXMetalG17X driver for M5 Pro on macOS 26.5. The driver appears to incorrectly reject Metal memory commits for LLM inference workloads, even when the working set is well within the system's reported limits (1.5GB requested vs 17.76GB available). Happy to provide full crash logs, sysdiagnose archives, or run additional tests.
Replies
0
Boosts
0
Views
73
Activity
1w
MetalToolchain and auto updates...
Hello, I can understand why you do not ship the MetalToolchain with the default Xcode installation any more due to the relatively low usage and high download size. That said, every time Xcode runs an auto update it wipes MetalToolchain and breaks my local development build. It would be nice if the updates would be smart enough to honor the fact that. I have already run: "xcodebuild -downloadComponent MetalToolchain" and include that in the update, rather than deleting the module. Thanks, Chris
Replies
1
Boosts
0
Views
187
Activity
1w
Inexplicable Metal crash ever since iOS 26.5 beta 4
Hi all, I'm working on updating my audio visualizer app. I'm adding new visualizers based on Metal 4 compute shaders. They worked in iOS 26.4 and iOS 26.5 up until beta 3. However, after that, the visualizers started crashing the phone and forcing a restart. On the latest version of iOS 26.5, the crash is still there. I submitted feedback, but haven't heard anything back just yet. I was wondering if others have faced this same issue, and if there are any workarounds. Here is my repo if you want to look at the code (forgive me if it's sloppy, I'm quite new to graphics programming and Metal): https://github.com/aabagdi/VisualMan/tree/main Thank you!
Replies
4
Boosts
0
Views
1.3k
Activity
1w
XPC Communication between Editor app and user-compiled code
Hello! I'm trying to implement an editor app (macOS) that allows the user to write code, which will be compiled and executed, showing the result in the editor window. Imagine it like SwiftUI previews, but the graphic output is created with Metal, not SwiftUI. I found that IOSurface can be used to share that kind of data over XPC, so I would not have to rely on the private NSRemoteView. However, I'm confused if it is, at all, possible for my editor app to connect to an XPC Service, that was NOT bundled with it (but compiled by it at runtime). I succeeded to launch an XPC service defined as: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>Label</key> <string>com.myteam.myproject.service</string> <key>MachServices</key> <dict> <key>com.myteam.myproject.service</key> <true/> </dict> <key>Program</key> <string>/Path/to/service/run_my_service.sh</string> </dict> </plist> But the call to let connection = NSXPCConnection(machServiceName: "com.myteam.myproject.service") let proxy = connection.remoteObjectProxyWithErrorHandler { error in continuation.resume(throwing: error) } as? MyServiceProtocol fails with "The connection to service named com.myteam.myproject.service was invalidated: Connection init failed at lookup with error 3 - No such process." I have added <key>com.apple.security.temporary-exception.mach-lookup.global-name</key> <array> <string>com.myteam.myproject.service</string> </array> to my entitlements. Since the tutorials I followed are quite old, I'm wondering if support for something like this was dropped at some point. Thanks for any advice!
Replies
6
Boosts
0
Views
563
Activity
2w
Possibilities of Overclocking Apple Silicon
I've been testing Apple Silicon devices in their desktop configurations on the Mac Studio and now retired Mac Pro and it seems like they're greatly bottlenecked by their clock speeds. For reference here's my testing results. Testing Results: Mac Studio M2 Max • 32GBs RAM • 30 core GPU • 1TB Storage CPU Utilization • 60% • 20W CPU Temperature • 47ºC GPU Utilization • 100% • 20W GPU Temperature • 55ºC Fan Speed • 50% Workload Duration • 2hrs Another point is that the clock speed on the M2 Max's CPU is 3.5 GHz and on the GPU it is 1.44 GHz at max performance. Which the Mac Studio has no trouble pushing. My question is how do I push those clock speeds higher? Cause 1.44 GHz at 55ºC is evidence for extensive headroom. I'm sure there are tools internally for testing the upper limits of the silicon, but it makes no sense why it would be set so low the Mac Studio is at no worries of melting. Is there any way to push the performance of my Mac Studio? FB22713867 - Possibilities of Overclocking Apple Silicon
Replies
1
Boosts
0
Views
279
Activity
3w
Metal, Vulkan, OpenGL & Godot
Greetings! I'm preparing to publish an app in Apple Store. It's a 2D Audio app made in Godot, already published in Google Store.. As we know, OpenGL is considered deprecated since iOS 12 / 2018 .. However given the current state of Metal, or Vulkan integration in Godot, and with the idea of bringing the Best possible experience on iOS.. I'm not completely sure what will be the best API to use as primary option.. -As good as Metal, or even Vulkan work in Godot; the fact of the matter is, each API has its strong and weak points.. -Metal: Native on iOS, fully compliant and supported. However it has two weak points: Initial Compilation Freeze - +5 sec. Performance Hit, (although negligible for final user) app uses 25% more CPU (on my iPhone 12). Battery drain? -Vulkan: In godot, Vulkan > MoltenVk > Metal More complex translation layer, but interestingly gives slightly better Performance than Metal.. Initial Compilation doesn't cause Freeze, because is lazy/delayed and performed while the app is starting. Uses 25% less CPU than Metal and gives slightly more stable Framerate. (iPhone 12) However, given the extra complexity it could be more prone to error, or Compatibility Problems, which are known and have been reported with older iOS devices (iPads come to mind..) Right? -OpenGL: No Initial Compilation Needed Max Performance, No CPU munch Universally supported, (in theory?) works Perfectly on my iPhone 12 with iOS 26.3 and 26.4.2 And all in all, gives the best Performance and user experience. -And that's pretty much the situation! Since the graphics API of choice, will have an effect and directly translate to User experience... what's then the best one? -This will be the first app I Publish on Apple Store, so as you can imagine I want to Comply with Apple as much as possible; and bring iOS users the best possible experience. However each one of the APIs seem to have a negative aspect.. Metal: 5sec Compilation Freeze Vulkan: Compatibility Problems? OpenGL: "Deprecated" In practical terms, right now, OpenGL gives the best Performance, and the best User Experience.. So what to do? -The Android version is published in Google Store in OpenGL Compat mode. Works perfectly. Even tho OpenGL has been Deprecated on iOS for 7+ years, it has survived all along, with no announced removal date from Apple. And it seems to work perfectly and be fully operational up to the latest iOS 26 version.. right? Maybe Apple is maintaining it for stability and compatibility reasons, even if they're no longer actively developing it? Butthee "deprecated" label sounds alarming, as if support could drop any day.. So what will be the best choice in this situation? -Will an app built primarily for OpenGL, (with Metal fallback) be Rejected right away in Apple Store? -Otoh Vulkan (via MoltenVK) could be a middle term solution, second best Performance, no Compilation Freeze.. But yeah, the Compatibility aspect is important; and while considerable improvements have been made in Godot's implementation, the current status or possible outcome is harder to assess.. Both Metal and OpenGL seem safer options in that sense..
Replies
5
Boosts
0
Views
1k
Activity
Apr ’26
LowLevelInstanceData & animation
AppleOS 26 introduces LowLevelInstanceData that can reduce CPU draw calls significantly by instancing. However, I have noticed trouble with animating each individual instance. As I wanted low-level control, I'm using a custom system and LowLevelInstanceData.replace(using:) to update the transform each frame. The update closure itself is extremely efficient (Xcode Instruments reports nearly no cost). But I noticed extremely high runloop time, reach around 20ms. Time Profiler shows that the CPU is blocked by kernel.release.t6401. I think it is caused by synchronization between CPU and GPU, however, as I am already using a MTLCommandBuffer to coordinate it, I don't understand why I am still seeing large CPU time.
Replies
3
Boosts
0
Views
896
Activity
Apr ’26
SCNTechnique clearColor Always Shows sceneBackground When Passes Share Depth Buffer
Problem Description I'm encountering an issue with SCNTechnique where the clearColor setting is being ignored when multiple passes share the same depth buffer. The clear color always appears as the scene background, regardless of what value I set. The minimal project for reproducing the issue: https://www.dropbox.com/scl/fi/30mx06xunh75wgl3t4sbd/SCNTechniqueCustomSymbols.zip?rlkey=yuehjtk7xh2pmdbetv2r8t2lx&st=b9uobpkp&dl=0 Problem Details In my SCNTechnique configuration, I have two passes that need to share the same depth buffer for proper occlusion handling: "passes": [ "box1_pass": [ "draw": "DRAW_SCENE", "includeCategoryMask": 1, "colorStates": [ "clear": true, "clearColor": "0 0 0 0" // Expecting transparent black ], "depthStates": [ "clear": true, "enableWrite": true ], "outputs": [ "depth": "box1_depth", "color": "box1_color" ], ], "box2_pass": [ "draw": "DRAW_SCENE", "includeCategoryMask": 2, "colorStates": [ "clear": true, "clearColor": "0 0 0 0" // Also expecting transparent black ], "depthStates": [ "clear": false, "enableWrite": false ], "outputs": [ "depth": "box1_depth", // Sharing the same depth buffer "color": "box2_color", ], ], "final_quad": [ "draw": "DRAW_QUAD", "metalVertexShader": "myVertexShader", "metalFragmentShader": "myFragmentShader", "inputs": [ "box1_color": "box1_color", "box2_color": "box2_color", ], "outputs": [ "color": "COLOR" ] ] ] And the metal shader used to display box1_color and box2_color with splitting: fragment half4 myFragmentShader(VertexOut in [[stage_in]], texture2d<half, access::sample> box1_color [[texture(0)]], texture2d<half, access::sample> box2_color [[texture(1)]]) { half4 color1 = box1_color.sample(s, in.texcoord); half4 color2 = box2_color.sample(s, in.texcoord); if (in.texcoord.x < 0.5) { return color1; } return color2; }; Expected Behavior Both passes should clear their color targets to transparent black (0, 0, 0, 0) The depth buffer should be shared between passes for proper occlusion Actual Behavior Both box1_color and box2_color targets contain the scene background instead of being cleared to transparent (see attached image) This happens even when I explicitly set clearColor: "0 0 0 0" for both passes Setting scene.background.contents = UIColor.clear makes the clearColor work as expected, but I need to keep the scene background for other purposes What I've Tried Setting different clearColor values - all are ignored when sharing depth buffer Using DRAW_NODE instead of DRAW_SCENE - didn't solve the issue Creating a separate pass to capture the background - the background still appears in the other passes Various combinations of clear flags and render orders Environment iOS/macOS, running with "My Mac (Designed for iPad)" Xcode 16.2 Question Is this a known limitation of SceneKit when passes share a depth buffer? Is there a workaround to achieve truly transparent clear colors while maintaining a shared depth buffer for occlusion testing? The core issue seems to be that SceneKit automatically renders the scene background in every DRAW_SCENE pass when a shared depth buffer is detected, overriding any clearColor settings. Any insights or workarounds would be greatly appreciated. Thank you!
Replies
1
Boosts
0
Views
907
Activity
Apr ’26
Cannot load .mtlpackage to MTLLibrary
After watching WWDC 2025 session "Combine Metal 4 machine learning and graphics", I have decided to give it a shot to integrate the latest MTL4MachineLearningCommandEncoder to my existing render pipeline. After a lot of trial and errors, I managed to set up the pipeline and have the app compiled. However, I am now stuck on creating a MTLLibrary with .mtlpackage. Here is the code I have to create a MTLLibrary according the WWDC session https://developer.apple.com/videos/play/wwdc2025/262/?time=550: let coreMLFilePath = bundle.path(forResource: "my_model", ofType: "mtlpackage")! let coreMLURL = URL(string: coreMLFilePath)! do { metalDevice.makeLibrary(URL: coreMLURL) } catch { print("error: \(error)") } With the above code, I am getting error: Error Domain=MTLLibraryErrorDomain Code=1 "Invalid metal package" UserInfo={NSLocalizedDescription=Invalid metal package} What is the correct way to create a MTLLibrary with .mtlpackage? Do I see this error because the .mtlpackage I am using is incorrect? How should I go with debugging this? I'd really appreciate if I could get some help on this as I have been stuck with it for some time now. Thanks in advance!
Replies
1
Boosts
0
Views
544
Activity
Apr ’26
Can a compute pipeline be as efficient as a render pipeline for rasterization?
I'm new to graphics and game design and I just wanted to know if a compute pipeline could be as efficient as a render pipeline for rasterization and an explanation on how and why. Also is it possible to manually perform rasterization with a render pipeline as in manipulate individual pixel data in a metal texture yourself but do it with a render pipeline?
Replies
1
Boosts
0
Views
580
Activity
Apr ’26
GPTK 3 and D3DMetal issue with Modern Pipeline Creation
Death Stranding 2: On the Beach (v1.0.48.0, Steam) crashes during rendering initialization when running through CrossOver 26 with D3DMetal 3.0 on an Apple M2 Max Mac Studio running macOS Sequoia. The game successfully initializes Streamline, NVAPI, DLSS (Result::eOk), DLSSG (Result::eOk), Reflex, and XeSS — all subsystems report success. The crash occurs immediately after, during rendering pipeline creation, before the game reaches NXStorage initialization or window creation. Minidump analysis confirms the crash is an access violation (0xc0000005) at DS2.exe+0x67233d, writing to address 0x0. RAX=0x0 (null pointer being dereferenced), R12=0xFFFFFFFFFFFFFFFF (error/invalid handle return). The game appears to call a D3D12 API — likely CheckFeatureSupport or a pipeline state creation function — that D3DMetal acknowledges as supported but returns null or invalid data for. The game trusts the response and dereferences the null pointer. Two other Nixxes titles using the same engine and D3DMetal setup run without issue: Spider-Man 2 (~50 FPS) and Horizon Zero Dawn Remastered (~34 FPS). DS2 uses newer technology versions (DLSS 4, FSR 4, XeSS 2) and a newer DirectX 12 Agility SDK, which likely queries D3D12 features that D3DMetal does not yet fully implement. The crash also reproduces when D3DMetal reports as AMD vendor (1002) instead of NVIDIA (10de), crashing at the same executable offset, confirming it is a D3D12 feature reporting gap in D3DMetal rather than a vendor-specific issue. How To Reproduce Install Crossover 26+ on MacOS 26.4 Install Steam and download Death Stranding 2 Run Death Stranding 2 and check logs after crash in Documents\DEATH STRANDING 2 ON THE BEACH Feedback Requests FB22285513 — Game Porting Toolkit 3 issue with Modern Pipeline Creation
Replies
1
Boosts
4
Views
821
Activity
Apr ’26
Xcode26 Replay frame broken
Got a broken frame when using Xcode to capture a frame and replay it from a Unity game. It seems like the vertex buffer is broken; I see a bunch of "nan"s in the vertex buffer. However, the game displays correct when running, and it only happend when I upgrade my Xcode and iphone to Xcode26 and IOS26 ios26
Replies
1
Boosts
0
Views
423
Activity
Apr ’26
Missing DirectX Calls for Tearing and Depth Bound Test in D3DMetal and GPTK 3
I want to address the missing or incomplete DirectX calls from D3DMetal and Game Porting Toolkit 3. These missing calls have in part caused issue with our porting process and we are reconsidering. Missing or Incomplete Calls DXGI_FEATURE_PRESENT_ALLOW_TEARING — IDXGIFactory5::CheckFeatureSupport — this calls has to do with how VSync is handled and some modern games require it to initialize. Currently D3DMetal return 0 maybe by design but most likely because it’s not integrated. Adding a stub that returns 1 can fix this. I’m my use case I simply Noped the check and forced it to continue. D3D12_FEATURE_D3D12_OPTIONS2.DepthBoundsTestSupported — this call is also not present. Which causes games to not initialize rendering. Thankfully this was fixed by once again skipping the check. But this is essential for water rendering. This could be one reason currently water is not rendering in our game. IDXGIOutput6::GetDesc1().ColorSpace — returns DXGI_COLOR_SPACE_RGB_FULL_G22_NONE_P709 (SDR) on external HDR compatible displays. We were able to fix this by forcing HDR to be enabled. It should return HDR support. These calls may exist but they need to be updated to return the correct values. Specifically for depth bound test you can reference MoltenVK which sets it up on top of Metal since it’s not a native feature. The water issue could be also an issue with how the shaders are compiled. But I’m unable to check because of the closed source nature of GPTK and its debuggers. What is a better way we can debug our game to see why the water isn’t rendering. Does D3DMetal have some debug options or something similar? Feedback Number FB22330617 - Missing DirectX Calls for Tearing and Depth Bound Test in D3DMetal and GPTK 3 We hope these issues are resolved quickly because we were thinking of a simultaneous release with our Windows version, but we can't ship with such large bugs.
Replies
6
Boosts
3
Views
551
Activity
Apr ’26
Xcode Metal Capture crash when using MTLSamplerState
The sample code just draw a triangle and sample texture. both sample code can draw a correct triangle and sample texture as expected. there are no error message from terminal. Sample code using constexpr Sampler can capture and replay well. Sample code using a argumentTable to bind a MTLSamplerState was crashed when using Metal capture and replay on Xcode. Here are sample codes. Sample Code Test Environment: M1 Pro MacOS 26.3 (25D125) Xcode Version 26.2 (17C52) Feedback ID: FB22031701
Replies
2
Boosts
0
Views
443
Activity
Apr ’26
D3DMetal Extreme Over Synchronization Issues
Explanation Currently, D3DMetal’s GPU synchronization approach introduces significant compute overhead on the CPU. This specifically affects D3D12 games that use modern rendering pipelines on Apple Silicon. Specifically, I’ve tested Death Stranding 2 On the Beach for how it handles its rendering. And the results are extreme: frame times are suffering from a 42% decrease from synchronization. Although there are obviously other effects at play, such as the overhead introduced by Rosetta and Wine, both of them don’t introduce as much overhead as D3DMetal. This issue isn’t just specific to Death Stranding 2 On the Beach; most games running through D3DMetal suffer from this. Most games still seem to force synchronization to ~30 ms to reach the 30 fps amount. But it could be better with better synchronization, such as how DXMT handles it. Instead of doubling the work, it allows Metal to single-handedly track resource dependencies internally. This is in part due to the unfortunate bad mapping of D3D12 calls onto shared logic between D3D11 and D3D12. System M2 Max Mac Studio — 32 GBs — 30-core GPU macOS 26.4 Tahoe CrossOver 26.1 RC Death Stranding 2 On the Beach — Steam Assassin’s Creed Valhalla — Steam & Ubisoft Connect Thank you for your commitment. Another game that I recommend testing to really see this swell is Assassin’s Creed Valhalla. Feedback FB22426600 - D3DMetal Extereme Over Syncranization Issues
Replies
1
Boosts
1
Views
594
Activity
Apr ’26
Xcode_26 not compiling Metal project
Hello Xcode 26.0.1 (17A400) Missing some Metal components When building a program using Metal, it induces an unexpected error : “error: error: cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain Command CompileMetalFile failed with a nonzero exit code” Which terminates the build The fix given “xcodebuild -downloadComponent MetalToolchain” using sudo does not work Did someone find a work around or could resolve the issue? Many thanks Jean MacBook Air M4; macOS 26.0.1; Xcode 26.0.1
Replies
5
Boosts
2
Views
497
Activity
Mar ’26
How to load and draw texture with opacity in Metal
The background I'm finally working to convert my very old Mac kaleidoscope application, ScopeWorks, which was written in OpenGL and Objective-C, to a Multiplatform app in SwiftUI and Metal. I'm using the MetalKit MTKView class, wrapped for SwiftUI as an NSViewRepresentable or UIViewRepresentable. I then provide an MTKViewDelegate that provides a draw method. The draw method fetches the current render pass descriptor, creates a command buffer, sets up a render pipeline, and does its drawing. My renderer's makePipeline method looks like this: func makePipeline() { let library = device.makeDefaultLibrary() let pipelineDesc = MTLRenderPipelineDescriptor() pipelineDesc.vertexFunction = library?.makeFunction(name: "vertex_main") pipelineDesc.fragmentFunction = library?.makeFunction(name: "fragment_main") pipelineDesc.colorAttachments[0].pixelFormat = .bgra8Unorm pipeline = try! device.makeRenderPipelineState(descriptor: pipelineDesc) } And my shaders look like this: struct VertexOut { float4 position [[position]]; float2 texCoord; }; vertex VertexOut vertex_main(const device float2* position [[buffer(0)]], uint vid [[vertex_id]]) { VertexOut out; float2 pos = position[vid]; out.position = float4(pos, 0, 1); out.texCoord = pos * 0.5 + 0.5; // basic mapping return out; } fragment float4 fragment_main(VertexOut in [[stage_in]], texture2d<float> tex [[texture(0)]], constant float4& color [[buffer(1)]]) { constexpr sampler s(address::repeat, filter::linear); // float4 texColor = tex.sample(s, in.texCoord); // return texColor * color; float4 textureColor = {1, 2, 3, 4}; if (all(color == textureColor)) { return tex.sample(s, in.texCoord); } else { return color; } // Sample the texture directly — no color tint applied return tex.sample(s, in.texCoord); } The first part of my MTKViewDelegate's draw method looks like this: func draw(in view: MTKView) { guard let drawable = view.currentDrawable, let descriptor = view.currentRenderPassDescriptor, let pipeline = pipeline, let texture = texture else { return } let commandBuffer = commandQueue.makeCommandBuffer()! let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: descriptor)! encoder.setRenderPipelineState(pipeline) encoder.setFragmentTexture(texture, index: 0) descriptor.colorAttachments[0].clearColor = MTLClearColor(red: 0.0, green: 0, blue: 0, alpha: 1.0) // Draw six equilateral triangles forming the hexagon let radius: Float = 0.6 for i in 0..<6 { let angle = Float(i) * (.pi / 3) let cosA = cos(angle) let sinA = sin(angle) let nextA = Float(i+1) * (.pi / 3) let cosB = cos(nextA) let sinB = sin(nextA) let verts: [simd_float2] = [ simd_float2(0, 0), simd_float2(radius * cosA, radius * sinA), simd_float2(radius * cosB, radius * sinB) ] encoder.setVertexBytes(verts, length: MemoryLayout<simd_float2>.stride * 3, index: 0) // Tell the fragment shader to use the texture color. var textureColor: simd_float4 = simd_float4(1, 2, 3, 4) encoder.setFragmentBytes(&textureColor, length: MemoryLayout<SIMD4<Float>>.stride, index: 1) encoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3) One of the things the existing app does is load PNG or TIFF images with an alpha channel, and then overlay parts of the image on top of themselves flipped, so you get interesting Moiré patterns in the lines in the resulting kaleidoscope. For now I'm working on a single sample image, loading it into a texture in Metal, and just rendering it as a hexagon and drawing lines for the triangles that make up the hexagon. (For now I'm using the vertex coordinates as the texture coordinates, so I get a hexagonal part of my texture rather than a single triangular part tessellated into a hexagon. I'll fix that later.) In both iOS and OS I set the clear color to black at the beginning of the draw function. The issue: The source image is mostly transparent, but with a lot of partly transparent pixels. Here's what it looks like in Photoshop, where you can see the transparent parts as a checkerboard pattern: (I tried to crop the original image to show the approximate part that I'm rendering in a hexagon, but it's not exact. Look for the same shapes in the different images to compare them.) When I render my hexagon in the Metal view in the iOS version of the app, it looks like it's forcing each pixel to fully opaque or fully transparent: And in the macOS version of the app, it seems to force ALL the pixels to opaque: I haven't shown all the setup code, because it's' a lot. Is there some rendering mode setup I'm missing in order to get it to draw the pixels into the output based on their opacity, including partial opacity?
Replies
2
Boosts
0
Views
1k
Activity
Mar ’26
CoreML regression between macOS 26.0.1 and macOS 26.1 Beta causing scrambled tensor outputs
We’ve encountered what appears to be a CoreML regression between macOS 26.0.1 and macOS 26.1 Beta. In macOS 26.0.1, CoreML models run and produce correct results. However, in macOS 26.1 Beta, the same models produce scrambled or corrupted outputs, suggesting that tensor memory is being read or written incorrectly. The behavior is consistent with a low-level stride or pointer arithmetic issue — for example, using 16-bit strides on 32-bit data or other mismatches in tensor layout handling. Reproduction Install ON1 Photo RAW 2026 or ON1 Resize 2026 on macOS 26.0.1. Use the newest Highest Quality resize model, which is Stable Diffusion–based and runs through CoreML. Observe correct, high-quality results. Upgrade to macOS 26.1 Beta and run the same operation again. The output becomes visually scrambled or corrupted. We are also seeing similar issues with another Stable Diffusion UNet model that previously worked correctly on macOS 26.0.1. This suggests the regression may affect multiple diffusion-style architectures, likely due to a change in CoreML’s tensor stride, layout computation, or memory alignment between these versions. Notes The affected models are exported using standard CoreML conversion pipelines. No custom operators or third-party CoreML runtime layers are used. The issue reproduces consistently across multiple machines. It would be helpful to know if there were changes to CoreML’s tensor layout, precision handling, or MLCompute backend between macOS 26.0.1 and 26.1 Beta, or if this is a known regression in the current beta.
Replies
8
Boosts
4
Views
2.4k
Activity
Mar ’26
Cannot find devices in RemoteImmersiveSpace
Hi, I'm running the Spatial Rendering App sample on a Macbook Pro running 26.4 Beta and the Vision Pro running visionOS 26.3.1. Handoff and SharePlay are on, both devices are on the same Apple ID and network, and SharePlay screen sharing works fine between the two devices. However, when calling openImmersiveSpace, the device picker fails to present and no devices are found. Errors from console: ((processConfiguration != nil && configuration != nil) || (processConfiguration == nil && configuration == nil)) - .../ExtensionKit/Source/HostViewController/Internal/EXHostSessionDriver.m:80: `processConfiguration` and `configuration` must be both non-nil or both nil Unable to obtain a task name port right for pid 638: (os/kern) failure (0x5) Unable to present an ImmersiveSpace for Scene id 'Compositor Services' Is this a known bug or I'm I missing something? Thanks!
Replies
2
Boosts
1
Views
1.6k
Activity
Mar ’26