Render advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.

Posts under Metal tag

129 Posts

Post

Replies

Boosts

Views

Activity

Optimizing HZB Mip-Chain Generation and Bindless Argument Tables in a Custom Metal Engine
Hi everyone, I’ve been developing a custom, end-to-end 3D rendering engine called Crescent from scratch using C++20 and Metal-cpp (targeting macOS and visionOS). My primary goal is to build a zero-bottleneck, GPU-driven pipeline that maximizes the potential of Apple Silicon’s Unified Memory and TBDR architecture. While the fundamental systems are stable, I am looking for architectural feedback from Metal framework engineers regarding specific synchronization and latency challenges. Current Core Implementations: GPU-Driven Instance Culling: High-performance occlusion culling using a Hierarchical Z-Buffer (HZB) approach via Compute Shaders. Clustered Forward Shading: Support for high-count dynamic lights through view-space clustering. Temporal Stability: Custom TAA with history rejection and Motion Blur resolve. Asset Infrastructure: Robust GUID-based scene serialization and a JSON-driven ECS hierarchy. The Architectural Challenge: I am currently seeing slight synchronization overhead when generating the HZB mip-chain. On Apple Silicon, I am evaluating the cost of encoder transitions versus cache-friendly barriers. && m_hzbInitPipeline && m_hzbDownsamplePipeline && !m_hzbMipViews.empty(); if (canBuildHzb) { MTL::ComputeCommandEncoder* hzbInit = commandBuffer->computeCommandEncoder(); hzbInit->setComputePipelineState(m_hzbInitPipeline); hzbInit->setTexture(m_depthTexture, 0); hzbInit->setTexture(m_hzbMipViews[0], 1); if (m_pointClampSampler) { hzbInit->setSamplerState(m_pointClampSampler, 0); } else if (m_linearClampSampler) { hzbInit->setSamplerState(m_linearClampSampler, 0); } const uint32_t hzbWidth = m_hzbMipViews[0]->width(); const uint32_t hzbHeight = m_hzbMipViews[0]->height(); const uint32_t threads = 8; MTL::Size tgSize = MTL::Size(threads, threads, 1); MTL::Size gridSize = MTL::Size((hzbWidth + threads - 1) / threads * threads, (hzbHeight + threads - 1) / threads * threads, 1); hzbInit->dispatchThreads(gridSize, tgSize); hzbInit->endEncoding(); for (size_t mip = 1; mip < m_hzbMipViews.size(); ++mip) { MTL::Texture* src = m_hzbMipViews[mip - 1]; MTL::Texture* dst = m_hzbMipViews[mip]; if (!src || !dst) { continue; } MTL::ComputeCommandEncoder* downEncoder = commandBuffer->computeCommandEncoder(); downEncoder->setComputePipelineState(m_hzbDownsamplePipeline); downEncoder->setTexture(src, 0); downEncoder->setTexture(dst, 1); const uint32_t mipWidth = dst->width(); const uint32_t mipHeight = dst->height(); MTL::Size downGrid = MTL::Size((mipWidth + threads - 1) / threads * threads, (mipHeight + threads - 1) / threads * threads, 1); downEncoder->dispatchThreads(downGrid, tgSize); downEncoder->endEncoding(); } if (m_instanceCullHzbPipeline) { dispatchInstanceCulling(m_instanceCullHzbPipeline, true); } } My Questions: Encoder Synchronization: Would you recommend moving this loop into a single ComputeCommandEncoder using MTLBarrier between dispatches to maintain L2 cache residency, or is the overhead of separate encoders negligible for depth-downsampling on TBDR? visionOS Bindless Latency: For stereo rendering on visionOS, what are the best practices for managing MTL4ArgumentTable updates at 90Hz+? I want to ensure that updating bindless resources for each eye doesn't introduce unnecessary CPU-to-GPU latency. Memory Management: Are there specific hints for Memoryless textures that could be applied to intermediate HZB levels to save bandwidth during this process? I’ve attached a screenshot of a scene rendered with the engine (PBR, SSR, and IBL).
0
0
444
Feb ’26
Xcode Metal Trace
Code is download from apple official metal4 sample [https://developer.apple.com/documentation/metal/drawing-a-triangle-with-metal-4?language=objc] enable metal gpu trace in macOS schema and trace a frame in Xcode. Xcode may show segment fault on App from some 'GTTrace' function when click trace button. When replay a .gputrace file, Xcode may crash , throw an internal error or a XPC error. The example code using old metal-renderer can trace without any problem and everything works fine. Test Environment: Xcode Version 26.2 (17C52) macOS 26.2 (25C56) M1 Pro 16GB A2442
2
0
517
Jan ’26
Metal 4: Proper usage of requestResidency() with unique per-frame textures at 120fps
Hello, I have some confusion regarding ResidencySet. Specifically, about the requestResidency() function: how often should we call it? I have a captureOutput(_:didOutput:from:) method that is triggered at 60 or 120 fps. Inside this method, I am calling the following code every frame: computeResidencySet.removeAllAllocations() сomputeResidencySet.addAllocation(TextureA) computeResidencySet.addAllocation(TextureB) computeResidencySet.addAllocation(TextureC) computeResidencySet.commit() computeResidencySet.requestResidency() // Should we call it every frame? Please keep in mind that TextureA, TextureB, and TextureC are unique for each call (new instances are provided on every frame)."
1
0
618
Jan ’26
Linker trying to link Metal toolchain for every object file on Catalyst
When building our project for Mac Catalyst with Xcode 26.2, we get this warning almost a hundred times, once for every object file: directory not found for option '-L/var/run/com.apple.security.cryptexd/mnt/com.apple.MobileAsset.MetalToolchain-v17.3.48.0.UZtKea/Metal.xctoolchain/usr/lib/swift/maccatalyst' Somehow, every Link <FileName>.o build step got the following parameter, regardless if the target contained Metal files or not: -L/var/run/com.apple.security.cryptexd/mnt/com.apple.MobileAsset.MetalToolchain-v17.3.48.0.UZtKea/Metal.xctoolchain/usr/lib/swift/maccatalyst The toolchain is mounted at this point, but the directory usr/lib/swift/maccatalyst doesn't exist. When building the project for iOS, the option doesn't exist and the warning is not shown. We already check the build settings, but we couldn't find a reason why the linker is trying to link against the toolchain here. Even for targets that do contain Metal files, we get the following linker warning: search path '/var/run/com.apple.security.cryptexd/mnt/com.apple.MobileAsset.MetalToolchain-v17.3.48.0.UZtKea/Metal.xctoolchain/usr/lib/swift/maccatalyst' not found Is this a known issue? Is there a way to get rid of these warnings?
2
3
475
3w
SpriteKit Offline Rendering with SKRenderer
Hi! I'd like to share a technical sample app, SKRenderer Demo. This app demonstrates: Setting up SKRenderer Recording SpriteKit scenes to image sequences Recording SpriteKit scenes to video using IOSurface and AVFoundation Applying Core Image filters Exploring SpriteKit's simulation timing and physics determinism Use Case Record SpriteKit simulations as video or images for sharing and creating content. I explored several approaches, including the excellent view.texture(from:crop:) for live recording from SKView. The SKRenderer approach assumes recording happens asynchronously: you capture user interactions as commands during live interaction, then replay those commands through an offline render pass to generate the final output. I hope this helps others working on replay systems, simulation capture, or SpriteKit projects in general!
0
0
272
Jan ’26
'__abort_with_payload' from CompositorNonUI on visionOS 26.2 (device + simulator, Omniverse streaming)
I am developing a custom app for Apple Vision Pro using Compositor Services to stream content from NVIDIA Omniverse. The app is based on: https://github.com/NVIDIA-Omniverse/apple-configurator-sample Environment: Device: Apple Vision Pro OS Version: visionOS 26.2 Xcode Version: 26.2 The Issue: The application crashes hard (__abort_with_payload) in "libsystem_kernel.dylib" on Task 6 immediately after initialization. This appears to be a deliberate abort triggered by the compositor, not a typical crash. The issue occurs on both physical device and simulator. Important detail: The console output shows a specific CLIENT BUG assertion. By checking the metadata of the warning, I found that it is related to "Library: CompositorNonUI". Relevant console output before abort: Missed 'FrameLimiter' target of 90.0 Hz running compositor services to get IPD, FOV, etc fence tx observer 14f27 timed out after 0.600000 fence tx observer bc1b timed out after 0.600000 BUG IN CLIENT: For mixed reality experiences please use cp_drawable_compute_projection API
0
0
145
Jan ’26
# [CRITICAL] Metal RHI Memory Leak - Resource exhaustion vulnerability (CWE-400) - Bug Report
[CRITICAL] Metal API Memory Leak - Heap Memory Never Released to OS (CWE-400) Security Classification This issue constitutes a resource exhaustion vulnerability (CWE-400): Aspect Details Type Uncontrolled Resource Consumption CWE CWE-400 Vector Local (any Metal application) Impact System instability, denial of service User Control None - no mitigation available Recovery Requires application restart Summary Metal heap allocations are never released back to macOS, even when the memory is entirely unused. This causes continuous, unbounded memory growth until system instability or crash. The issue affects any application using Metal API heap allocation. This was discovered in Unreal Engine 5, but reproduces in a completely blank UE5 project with zero application code - confirming this is Metal framework behavior, not application-level. Environment OS: macOS Tahoe 26.2 Hardware: Apple Silicon M4 Max (also reproduced on M1, M2, M3) API: Metal Reproduction Steps Run any Metal application that allocates and deallocates GPU buffers via Metal heaps Open Activity Monitor and observe the application's memory usage Let the application run idle (no user interaction required) Observe memory growing continuously at ~1-2 MB per second Memory never plateaus or stabilizes Eventually system becomes unstable For testing: Any Unreal Engine 5.4+ project on macOS will reproduce this. Even a blank project with no gameplay code exhibits the leak. (Tested on UE 5.7.1) Observed Behavior Memory Analysis Using Unreal's memreport -full command, two reports taken 86 seconds apart: Metric Report 1 (183s) Report 2 (269s) Delta Process Physical 4373.64 MB 4463.39 MB +89.75 MB Metal Heap Buffer 7168 MB 8192 MB +1024 MB Unused Heap 3453 MB 4477 MB +1024 MB Object Count 73,840 73,840 0 (no change) Key Finding Metal Heap grew by exactly 1 GB while "Unused Heap" also grew by 1 GB. This demonstrates: Metal is allocating new heap blocks in ~1 GB increments Previously allocated heap memory becomes "unused" but is never released The unused memory accumulates indefinitely No application-level objects are leaking (count remains constant) Memory Growth Pattern Continuous growth while idle (no user interaction) Growth rate: approximately 1-2 MB per second No plateau or stabilization occurs Metal allocates new 1 GB heap blocks rather than reusing freed space Eventually leads to system instability and crash What is NOT Causing This We verified the following are NOT the source: Application objects - Object count remains constant Application code - Blank project with no code reproduces the issue Texture streaming - Disabling texture streaming had no effect CPU garbage collection - Running GC has no effect (this is GPU memory) Mitigations Attempted (None Worked) setPurgeableState Setting resources to purgeable state before release: [buffer setPurgeableState:MTLPurgeableStateEmpty]; Result: Metal ignores this hint and does not reclaim heap memory. Avoiding Heap Pooling Forcing individual buffer allocations instead of heap-based pooling. Result: Leak persists - Metal still manages underlying allocations. Aggressive Buffer Compaction Attempting to compact/defragment buffers within heaps every frame. Result: Only moves data between existing heaps. Does NOT release heaps back to OS. Reducing Pool Sizes Minimizing all buffer pool sizes to force more frequent reuse. Result: Slightly slows the leak rate but does not stop it. Root Cause Analysis How Metal Heap Allocation Appears to Work Metal allocates GPU heap blocks in large chunks (~1 GB observed) Application requests buffers from these heaps When application releases buffers, memory becomes "unused" within the heap Metal does NOT release heap blocks back to macOS, even when entirely unused When fragmentation prevents reuse, Metal allocates new heap blocks Result: Continuous memory growth with no upper bound The Core Problem There appears to be no Metal API to force heap memory release. The only way to reclaim this memory is to destroy the Metal device entirely, which requires restarting the application. Expected Behavior Metal should: Release unused heaps - When a heap block is entirely unused, release it back to macOS Respect purgeable hints - Honor setPurgeableState calls from applications Compact allocations - Defragment heap allocations to reduce fragmentation Provide control APIs - Allow applications to request heap compaction or release Enforce limits - Have configurable maximum heap memory consumption Security Implications Local Denial of Service - Any Metal application can exhaust system memory, causing instability affecting all running applications Memory Pressure Attack - Forces other applications to swap to disk, degrading system-wide performance No Upper Bound - Memory consumption continues until system failure Unmitigable - End users have no way to prevent or limit the leak Affects All Metal Apps - Any application using Metal heaps is potentially affected Impact Applications become unstable after extended use System-wide performance degrades as memory pressure increases Users must periodically restart applications Developers cannot work around this at the application level Long-running applications (games, creative tools, servers) are particularly affected Request Investigate Metal heap memory management behavior Implement heap release when blocks become entirely unused Honor setPurgeableState hints from applications Consider providing an API for applications to request heap compaction Document any intended behavior or workarounds Additional Notes This issue has been observed across multiple Unreal Engine versions (5.4, 5.7) and multiple Apple Silicon generations (M1 through M4). The behavior is consistent and reproducible. The Unreal Engine team has implemented various CVars to attempt mitigation (rhi.Metal.HeapBufferBytesToCompact, rhi.Metal.ResourcePurgeInPool, etc.) but none successfully address the issue because the root cause is at the Metal framework level. Tested: January 2026 Platform: macOS Tahoe 26.2, Apple Silicon (M1/M2/M3/M4)
5
2
1.1k
Jan ’26
Metal debug log in Swift Package
My goal is to print a debug message from a shader. I follow the guide that orders to set -fmetal-enable-logging metal compiler flag and following environment variables: MTL_LOG_LEVEL=MTLLogLevelDebug MTL_LOG_BUFFER_SIZE=2048 MTL_LOG_TO_STDERR=1 However there's an issue with the guide, it's only covers Xcode project setup, however I'm working on a Swift Package. It has a Metal-only target that's included into main target like this: targets: [ // A separate target for shaders. .target( name: "MetalShaders", resources: [ .process("Metal") ], plugins: [ // https://github.com/schwa/MetalCompilerPlugin .plugin(name: "MetalCompilerPlugin", package: "MetalCompilerPlugin") ] ), // Main target .target( name: "MegApp", dependencies: ["MetalShaders"] ), .testTarget( name: "MegAppTests", dependencies: [ "MegApp", "MetalShaders", ] ] So to apply compiler flag I use MetalCompilerPlugin which emits debug.metallib, it also allows to define DEBUG macro for shaders. This code compiles: #ifdef DEBUG logger.log_error("Hello There!"); os_log_default.log_debug("Hello thread: %d", gid); // this proves that code exectutes result.flag = true; #endif Environment is set via .xctestplan and valideted to work with ProcessInfo. However, nothing is printed to Xcode console nor to Console app. In attempt to fix it I'm trying to setup a MTLLogState, however the makeLogState(descriptor:) fails with error: if #available(iOS 18.0, *) { let logDescriptor = MTLLogStateDescriptor() logDescriptor.level = .debug logDescriptor.bufferSize = 2048 // Error Domain=MTLLogStateErrorDomain Code=2 "Cannot create residency set for MTLLogState: (null)" UserInfo={NSLocalizedDescription=Cannot create residency set for MTLLogState: (null)} let logState = try! device.makeLogState(descriptor: logDescriptor) commandBufferDescriptor.logState = logState } Some LLMs suggested that this is connected with Simulator, and truly, I run the tests on simulator. However tests don't want to run on iPhone... I found solution running them on My Mac (Mac Catalyst). Surprisingly descriptor log works there, even without MTLLogState. But the Simulator behaviour seems like a bug...
1
1
676
Jan ’26
Hover effects w/ Compositor Services w/ PSVR2 controllers
Hi, I would like clarification on whether the new hover effects feature introduced in vision os 26 supported pinch gestures through the psvr 2 controllers. In your sample application, I was not able to confirm that this was working. Only pinch clicking with my hands worked. Pulling the trigger on the controller whilst looking at a 3d object did not activate the hover effect spatial event in the sample application. (The object is showing the highlight though) This is inconsistent with hover effect behavior with psvr2 controllers on swift ui views, where the trigger press does count as a button click. The sample I used was this one: https://developer.apple.com/documentation/compositorservices/rendering_hover_effects_in_metal_immersive_apps
0
0
454
Jan ’26
recent JAX versions fail on Metal
Hi, I'm not sure whether this is the appropriate forum for this topic. I just followed a link from the JAX Metal plugin page https://developer.apple.com/metal/jax/ I'm writing a Python app with JAX, and recent JAX versions fail on Metal. E.g. v0.8.2 I have to downgrade JAX pretty hard to make it work: pip install jax==0.4.35 jaxlib==0.4.35 jax-metal==0.1.1 Can we get an updated release of jax-metal that would fix this issue? Here is the error I get with JAX v0.8.2: WARNING:2025-12-26 09:55:28,117:jax._src.xla_bridge:881: Platform 'METAL' is experimental and not all JAX functionality may be correctly supported! WARNING: All log messages before absl::InitializeLog() is called are written to STDERR W0000 00:00:1766771728.118004 207582 mps_client.cc:510] WARNING: JAX Apple GPU support is experimental and not all JAX functionality is correctly supported! Metal device set to: Apple M3 Max systemMemory: 36.00 GB maxCacheSize: 13.50 GB I0000 00:00:1766771728.129886 207582 service.cc:145] XLA service 0x600001fad300 initialized for platform METAL (this does not guarantee that XLA will be used). Devices: I0000 00:00:1766771728.129893 207582 service.cc:153] StreamExecutor device (0): Metal, <undefined> I0000 00:00:1766771728.130856 207582 mps_client.cc:406] Using Simple allocator. I0000 00:00:1766771728.130864 207582 mps_client.cc:384] XLA backend will use up to 28990554112 bytes on device 0 for SimpleAllocator. Traceback (most recent call last): File "<string>", line 1, in <module> import jax; print(jax.numpy.arange(10)) ~~~~~~~~~~~~~~~~^^^^ File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/numpy/lax_numpy.py", line 5951, in arange return _arange(start, stop=stop, step=step, dtype=dtype, out_sharding=sharding) File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/numpy/lax_numpy.py", line 6012, in _arange return lax.broadcasted_iota(dtype, (size,), 0, out_sharding=out_sharding) ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/lax/lax.py", line 3415, in broadcasted_iota return iota_p.bind(dtype=dtype, shape=shape, ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^ dimension=dimension, sharding=out_sharding) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/core.py", line 633, in bind return self._true_bind(*args, **params) ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/core.py", line 649, in _true_bind return self.bind_with_trace(prev_trace, args, params) ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/core.py", line 661, in bind_with_trace return trace.process_primitive(self, args, params) ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^ File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/core.py", line 1210, in process_primitive return primitive.impl(*args, **params) ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/dispatch.py", line 91, in apply_primitive outs = fun(*args) jax.errors.JaxRuntimeError: UNKNOWN: -:0:0: error: unknown attribute code: 22 -:0:0: note: in bytecode version 6 produced by: StableHLO_v1.13.0 -------------------- For simplicity, JAX has removed its internal frames from the traceback of the following exception. Set JAX_TRACEBACK_FILTERING=off to include these. I0000 00:00:1766771728.149951 207582 mps_client.h:209] MetalClient destroyed.
0
0
580
Dec ’25
Error: "CoreImage Metal library does not contain function"
Hey I'm using the CIDepthBlurEffect Core Image Filter in my app. It seems to work ok but I get these errors in the console when calling the class. CoreImage Metal library does not contain function for name: sparserendering_xhlrb_scan CoreImage Metal library does not contain function for name: sparserendering_xhlrb_diffuse CoreImage Metal library does not contain function for name: sparserendering_xhlrb_copy_back CoreImage Metal library does not contain function for name: plain_or_sRGB_copy Am I missing some sort of import to gain these Metal functions? I am using my own custom shaders but I assume you'd be able to use them along side the built in ones.
0
0
540
Dec ’25
Few screen tap feedback ideas (and implementation YouTube video)
I've been playing around with iPad PRO M5 13" as part of my goal to implement some music relating SPH particle simulation effects on it - and this involves utilizing tap events also from the incredible looking fresh screen the device has. See more information from here, all should be overreactively implemented but the ideas remain (with almost zero cost copy fragment shader) : `https://youtu.be/ci-GSgQ0wlM` This attached image shows the tap effects implementation brought just bit a little further than in the video.
1
0
1.2k
Dec ’25
CompileMetalFile Failed With Xcode Cloud
Hello guys, recently I integrated a third-party library into my code to handle blur effects (Glur). This library leverages Metal's compute capabilities and appears to automatically depend on the Metal toolchain during the build process. When using Xcode Cloud, the archive step consistently fails with a "CompileMetalFile Failed" error. However, when I manually archive the project in Xcode locally, everything works fine without any issues. I’ve confirmed that the macOS and Xcode versions specified in my Xcode Cloud workflow are correct. Reply if you know how to fix it or it will be fixed future, thanks. I'm using Xcode 26.2(17C52) and macOS(15.7.1 (24G231))
0
0
140
Dec ’25
Quantumsort, next level of Radix Sort implementations
Hello All, I've been finally able to finish my so called Quantumsort implementation, which is basically only slightly modified Radix Sort at the basis, but utilizes the somewhat newish Apple Metal SIMD commands heavily. Or rather rather renders the threadgroup shared memory void altogether with those SIMD sharing options available. The sorting itself works "well enough" for my needs here on these Github shared main.swift and sort.metal files already, and the question relates more on it that - can someone spot would it be possible to get rid of the "sorted_temp" array memory too and use one "sorted" list only at all times..? Here's the related repository and copilots I prefer you and you and you more for haha tho : https://guthib.com/roger-more-fi/quantumsort Cheers, make it to the perfection those good last ones of '25 'aights Apple soldiers!
1
0
415
Dec ’25
CIRAWFilter.outputImage first-time cost is huge (~3s), subsequent calls are ~3ms. Any official way to pre-initialize RAW pipeline (without taking a real photo)?
Hi Apple Developer Forums, I’m developing an iOS camera app that processes RAW captures using Core Image. I’m seeing a large “first use” performance penalty specifically when creating the CIImage from CIRAWFilter.outputImage. What’s slow (important detail) I’m measuring the time for: let rawFilter = CIRAWFilter(imageData: rawData, identifierHint: hint) let ciImage = rawFilter.outputImage This is not CIContext.render(...) / createCGImage(...). It’s just the time to access outputImage (i.e., building the Core Image graph / RAW pipeline setup). Observed behavior First time accessing CIRAWFilter.outputImage: ~3 seconds Second time (same app session, similar RAW): ~3 milliseconds So something heavy is happening only on first use (decoder initialization, pipeline setup, shader/library compilation, caching, etc.). Using Metal System Trace, I also noticed that during the slow first call there are many “Create MTLLibrary” events, while the second call doesn’t show this pattern. Warm-up attempts using bundled DNG I tried to “warm up” early (e.g., on camera screen entry) by loading a bundled DNG and then accessing CIRAWFilter.outputImage by taking a photo: Warm-up with a ~247 KB DNG → first real RAW outputImage cost drops to ~1.42s Warm-up with a ~25 MB DNG → first real RAW outputImage cost drops to ~843ms This helps, but it’s still far from the steady-state ~3ms. Warm-up by capturing a real RAW (works, but concerns) The only method that fully eliminates the delay is to trigger a real RAW capture programmatically before the user’s first photo, then use that captured rawData to warm up the CIRAWFilter.outputImage path. This brings the first user-facing capture close to the steady-state timing. However: In some regions, the camera shutter sound cannot be suppressed, so “hidden warm-up capture” is unacceptable UX. I’m also unsure whether triggering a real capture without an explicit user action could raise compliance/privacy concerns, even if the image is immediately discarded and never saved/uploaded. Questions Is the large first-time cost of CIRAWFilter.outputImage expected (RAW pipeline initialization / shader compilation)? Is there an Apple-recommended way to pre-initialize the Core Image RAW pipeline / Metal resources so the first outputImage is fast, without taking a real photo? Are there any best practices (e.g. CIContext creation timing, prepareRender(...), specific options) that reliably reduce this first-use overhead for CIRAWFilter? Attachments Figure 1: First RAW capture with no warm-up (~3s outputImage time) Figure 2: First RAW capture after warm-up with bundled DNG (improved but still hundreds of ms) Thanks for any guidance or experience sharing!
3
2
650
Jan ’26
NSScreen's maximumExtendedDynamicRangeColorComponentValue does not seem to provide the proper value after sleep/wake on third party HDR displays even when there is EDR content on screen in macOS Tahoe
The maximumExtendedDynamicRangeColorComponentValue should provide some value between 1.0 and maximumPotentialExtendedDynamicRangeColorComponentValue depending on the available EDR headroom if there is any content on-screen that uses EDR. This works fine in most scenarios but in macOS 26 Tahoe (including in 26.2) this seemingly breaks down when a third party external display is in HDR mode and the Mac goes to sleep and wakes up. After wake only a value of 1.0 is provided by the third party external display's NSScreen object, no matter what (although when the SDR peak brightness is being changed using the brightness slider, didChangeScreenParametersNotification is firing and the system should provide a proper updated headroom value). This makes dynamic tone-mapping that adapts to actual screen brightness impossible. Everything works fine in Sequoia. In Tahoe the user needs to turn off HDR, then go through a sleep/wake cycle and turn HDR back on to have this fixed, which is obviously not a sustainable workaround.
1
0
291
Dec ’25
Metal toolchain compilation error after migration to MacOS 26 and XCode 26.1
Hello. I migrated yesterday two machines from OS15 to OS26 (same Ram; M1Max chip). On the first one, a MBP, everything is allright, and my old Metal projects can compile and run, even with MacOS12 as a destination. On the second computer (MacStudio) I got a compile "error: cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain". I spent hours on many forums and tried all proposed solutions and still get the same error. Any idea? Thanks
2
0
164
Dec ’25
Race conditions when changing CAMetalLayer.drawableSize?
Is the pseudocode below thread-safe? Imagine that the Main thread sets the CAMetalLayer's drawableSize to a new size meanwhile the rendering thread is in the middle of rendering into an existing MTLDrawable which does still have the old size. Is the change of metalLayer.drawableSize thread-safe in the sense that I can present an old MTLDrawable which has a different resolution than the current value of metalLayer.drawableSize? I assume that setting the drawableSize property informs Metal that the next MTLDrawable offered by the CAMetalLayer should have the new size, right? Is it valid to assume that "metalLayer.drawableSize = newSize" and "metalLayer.nextDrawable()" are internally synchronized, so it cannot happen that metalLayer.nextDrawable() would produce e.g. a MTLDrawable with the old width but with the new height (or a completely invalid resolution due to potential race conditions)? func onWindowResized(newSize: CGSize) { // Called on the Main thread metalLayer.drawableSize = newSize } func onVsync(drawable: MTLDrawable) { // Called on a background rendering thread renderer.renderInto(drawable: drawable) }
1
1
533
Dec ’25
App Freezes on iPadOS 26.x - GPU Metal Errors
I work on a Qt/QML app that uses Esri Maps SDK for Qt and that is deployed to both Windows and iPads. With a recent iPad OS upgrade to 26.1, many iPad users are reporting the application freezing after panning and/or identifying features in the map. It runs fine for our Windows users. I was able to reproduce this and grabbed the following error messages when the freeze happens: IOGPUMetalError: Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault) IOGPUMetalError: Invalid Resource (00000009:kIOGPUCommandBufferCallbackErrorInvalidResource) Environment: Qt 6.5.4 (Qt for iOS) Esri Maps SDK for Qt 200.3 iPadOS 26.1 Because it appears to be a Metal error, I tried using OpenGL (Qt offers a way to easily set hte target graphics api): QQuickWindow::setGraphicsApi(QSGRendererInterface::GraphicsApi::OpenGL) Which worked! No more freezing. But I'm seeing many posts that OpenGL has been deprecated by Apple. I've seen posts that Apple deprecated OpenGL ES. But it seems to still be available with iPadOS 26.1. If so, will this fix (above) just cause problems with a future iPadOS update? Any other suggestions to address this issue? Upgrading our version of Qt + Esri SDK to the latest version is not an option for us. We are in the process to upgrade the full application, but it is a year or two out. So, we just need a fix to buy us some time for now. Appreciate any thoughts/insights....
3
0
566
Dec ’25
Optimizing HZB Mip-Chain Generation and Bindless Argument Tables in a Custom Metal Engine
Hi everyone, I’ve been developing a custom, end-to-end 3D rendering engine called Crescent from scratch using C++20 and Metal-cpp (targeting macOS and visionOS). My primary goal is to build a zero-bottleneck, GPU-driven pipeline that maximizes the potential of Apple Silicon’s Unified Memory and TBDR architecture. While the fundamental systems are stable, I am looking for architectural feedback from Metal framework engineers regarding specific synchronization and latency challenges. Current Core Implementations: GPU-Driven Instance Culling: High-performance occlusion culling using a Hierarchical Z-Buffer (HZB) approach via Compute Shaders. Clustered Forward Shading: Support for high-count dynamic lights through view-space clustering. Temporal Stability: Custom TAA with history rejection and Motion Blur resolve. Asset Infrastructure: Robust GUID-based scene serialization and a JSON-driven ECS hierarchy. The Architectural Challenge: I am currently seeing slight synchronization overhead when generating the HZB mip-chain. On Apple Silicon, I am evaluating the cost of encoder transitions versus cache-friendly barriers. && m_hzbInitPipeline && m_hzbDownsamplePipeline && !m_hzbMipViews.empty(); if (canBuildHzb) { MTL::ComputeCommandEncoder* hzbInit = commandBuffer->computeCommandEncoder(); hzbInit->setComputePipelineState(m_hzbInitPipeline); hzbInit->setTexture(m_depthTexture, 0); hzbInit->setTexture(m_hzbMipViews[0], 1); if (m_pointClampSampler) { hzbInit->setSamplerState(m_pointClampSampler, 0); } else if (m_linearClampSampler) { hzbInit->setSamplerState(m_linearClampSampler, 0); } const uint32_t hzbWidth = m_hzbMipViews[0]->width(); const uint32_t hzbHeight = m_hzbMipViews[0]->height(); const uint32_t threads = 8; MTL::Size tgSize = MTL::Size(threads, threads, 1); MTL::Size gridSize = MTL::Size((hzbWidth + threads - 1) / threads * threads, (hzbHeight + threads - 1) / threads * threads, 1); hzbInit->dispatchThreads(gridSize, tgSize); hzbInit->endEncoding(); for (size_t mip = 1; mip < m_hzbMipViews.size(); ++mip) { MTL::Texture* src = m_hzbMipViews[mip - 1]; MTL::Texture* dst = m_hzbMipViews[mip]; if (!src || !dst) { continue; } MTL::ComputeCommandEncoder* downEncoder = commandBuffer->computeCommandEncoder(); downEncoder->setComputePipelineState(m_hzbDownsamplePipeline); downEncoder->setTexture(src, 0); downEncoder->setTexture(dst, 1); const uint32_t mipWidth = dst->width(); const uint32_t mipHeight = dst->height(); MTL::Size downGrid = MTL::Size((mipWidth + threads - 1) / threads * threads, (mipHeight + threads - 1) / threads * threads, 1); downEncoder->dispatchThreads(downGrid, tgSize); downEncoder->endEncoding(); } if (m_instanceCullHzbPipeline) { dispatchInstanceCulling(m_instanceCullHzbPipeline, true); } } My Questions: Encoder Synchronization: Would you recommend moving this loop into a single ComputeCommandEncoder using MTLBarrier between dispatches to maintain L2 cache residency, or is the overhead of separate encoders negligible for depth-downsampling on TBDR? visionOS Bindless Latency: For stereo rendering on visionOS, what are the best practices for managing MTL4ArgumentTable updates at 90Hz+? I want to ensure that updating bindless resources for each eye doesn't introduce unnecessary CPU-to-GPU latency. Memory Management: Are there specific hints for Memoryless textures that could be applied to intermediate HZB levels to save bandwidth during this process? I’ve attached a screenshot of a scene rendered with the engine (PBR, SSR, and IBL).
Replies
0
Boosts
0
Views
444
Activity
Feb ’26
Xcode Metal Trace
Code is download from apple official metal4 sample [https://developer.apple.com/documentation/metal/drawing-a-triangle-with-metal-4?language=objc] enable metal gpu trace in macOS schema and trace a frame in Xcode. Xcode may show segment fault on App from some 'GTTrace' function when click trace button. When replay a .gputrace file, Xcode may crash , throw an internal error or a XPC error. The example code using old metal-renderer can trace without any problem and everything works fine. Test Environment: Xcode Version 26.2 (17C52) macOS 26.2 (25C56) M1 Pro 16GB A2442
Replies
2
Boosts
0
Views
517
Activity
Jan ’26
Metal 4: Proper usage of requestResidency() with unique per-frame textures at 120fps
Hello, I have some confusion regarding ResidencySet. Specifically, about the requestResidency() function: how often should we call it? I have a captureOutput(_:didOutput:from:) method that is triggered at 60 or 120 fps. Inside this method, I am calling the following code every frame: computeResidencySet.removeAllAllocations() сomputeResidencySet.addAllocation(TextureA) computeResidencySet.addAllocation(TextureB) computeResidencySet.addAllocation(TextureC) computeResidencySet.commit() computeResidencySet.requestResidency() // Should we call it every frame? Please keep in mind that TextureA, TextureB, and TextureC are unique for each call (new instances are provided on every frame)."
Replies
1
Boosts
0
Views
618
Activity
Jan ’26
Linker trying to link Metal toolchain for every object file on Catalyst
When building our project for Mac Catalyst with Xcode 26.2, we get this warning almost a hundred times, once for every object file: directory not found for option '-L/var/run/com.apple.security.cryptexd/mnt/com.apple.MobileAsset.MetalToolchain-v17.3.48.0.UZtKea/Metal.xctoolchain/usr/lib/swift/maccatalyst' Somehow, every Link <FileName>.o build step got the following parameter, regardless if the target contained Metal files or not: -L/var/run/com.apple.security.cryptexd/mnt/com.apple.MobileAsset.MetalToolchain-v17.3.48.0.UZtKea/Metal.xctoolchain/usr/lib/swift/maccatalyst The toolchain is mounted at this point, but the directory usr/lib/swift/maccatalyst doesn't exist. When building the project for iOS, the option doesn't exist and the warning is not shown. We already check the build settings, but we couldn't find a reason why the linker is trying to link against the toolchain here. Even for targets that do contain Metal files, we get the following linker warning: search path '/var/run/com.apple.security.cryptexd/mnt/com.apple.MobileAsset.MetalToolchain-v17.3.48.0.UZtKea/Metal.xctoolchain/usr/lib/swift/maccatalyst' not found Is this a known issue? Is there a way to get rid of these warnings?
Replies
2
Boosts
3
Views
475
Activity
3w
SpriteKit Offline Rendering with SKRenderer
Hi! I'd like to share a technical sample app, SKRenderer Demo. This app demonstrates: Setting up SKRenderer Recording SpriteKit scenes to image sequences Recording SpriteKit scenes to video using IOSurface and AVFoundation Applying Core Image filters Exploring SpriteKit's simulation timing and physics determinism Use Case Record SpriteKit simulations as video or images for sharing and creating content. I explored several approaches, including the excellent view.texture(from:crop:) for live recording from SKView. The SKRenderer approach assumes recording happens asynchronously: you capture user interactions as commands during live interaction, then replay those commands through an offline render pass to generate the final output. I hope this helps others working on replay systems, simulation capture, or SpriteKit projects in general!
Replies
0
Boosts
0
Views
272
Activity
Jan ’26
'__abort_with_payload' from CompositorNonUI on visionOS 26.2 (device + simulator, Omniverse streaming)
I am developing a custom app for Apple Vision Pro using Compositor Services to stream content from NVIDIA Omniverse. The app is based on: https://github.com/NVIDIA-Omniverse/apple-configurator-sample Environment: Device: Apple Vision Pro OS Version: visionOS 26.2 Xcode Version: 26.2 The Issue: The application crashes hard (__abort_with_payload) in "libsystem_kernel.dylib" on Task 6 immediately after initialization. This appears to be a deliberate abort triggered by the compositor, not a typical crash. The issue occurs on both physical device and simulator. Important detail: The console output shows a specific CLIENT BUG assertion. By checking the metadata of the warning, I found that it is related to "Library: CompositorNonUI". Relevant console output before abort: Missed 'FrameLimiter' target of 90.0 Hz running compositor services to get IPD, FOV, etc fence tx observer 14f27 timed out after 0.600000 fence tx observer bc1b timed out after 0.600000 BUG IN CLIENT: For mixed reality experiences please use cp_drawable_compute_projection API
Replies
0
Boosts
0
Views
145
Activity
Jan ’26
# [CRITICAL] Metal RHI Memory Leak - Resource exhaustion vulnerability (CWE-400) - Bug Report
[CRITICAL] Metal API Memory Leak - Heap Memory Never Released to OS (CWE-400) Security Classification This issue constitutes a resource exhaustion vulnerability (CWE-400): Aspect Details Type Uncontrolled Resource Consumption CWE CWE-400 Vector Local (any Metal application) Impact System instability, denial of service User Control None - no mitigation available Recovery Requires application restart Summary Metal heap allocations are never released back to macOS, even when the memory is entirely unused. This causes continuous, unbounded memory growth until system instability or crash. The issue affects any application using Metal API heap allocation. This was discovered in Unreal Engine 5, but reproduces in a completely blank UE5 project with zero application code - confirming this is Metal framework behavior, not application-level. Environment OS: macOS Tahoe 26.2 Hardware: Apple Silicon M4 Max (also reproduced on M1, M2, M3) API: Metal Reproduction Steps Run any Metal application that allocates and deallocates GPU buffers via Metal heaps Open Activity Monitor and observe the application's memory usage Let the application run idle (no user interaction required) Observe memory growing continuously at ~1-2 MB per second Memory never plateaus or stabilizes Eventually system becomes unstable For testing: Any Unreal Engine 5.4+ project on macOS will reproduce this. Even a blank project with no gameplay code exhibits the leak. (Tested on UE 5.7.1) Observed Behavior Memory Analysis Using Unreal's memreport -full command, two reports taken 86 seconds apart: Metric Report 1 (183s) Report 2 (269s) Delta Process Physical 4373.64 MB 4463.39 MB +89.75 MB Metal Heap Buffer 7168 MB 8192 MB +1024 MB Unused Heap 3453 MB 4477 MB +1024 MB Object Count 73,840 73,840 0 (no change) Key Finding Metal Heap grew by exactly 1 GB while "Unused Heap" also grew by 1 GB. This demonstrates: Metal is allocating new heap blocks in ~1 GB increments Previously allocated heap memory becomes "unused" but is never released The unused memory accumulates indefinitely No application-level objects are leaking (count remains constant) Memory Growth Pattern Continuous growth while idle (no user interaction) Growth rate: approximately 1-2 MB per second No plateau or stabilization occurs Metal allocates new 1 GB heap blocks rather than reusing freed space Eventually leads to system instability and crash What is NOT Causing This We verified the following are NOT the source: Application objects - Object count remains constant Application code - Blank project with no code reproduces the issue Texture streaming - Disabling texture streaming had no effect CPU garbage collection - Running GC has no effect (this is GPU memory) Mitigations Attempted (None Worked) setPurgeableState Setting resources to purgeable state before release: [buffer setPurgeableState:MTLPurgeableStateEmpty]; Result: Metal ignores this hint and does not reclaim heap memory. Avoiding Heap Pooling Forcing individual buffer allocations instead of heap-based pooling. Result: Leak persists - Metal still manages underlying allocations. Aggressive Buffer Compaction Attempting to compact/defragment buffers within heaps every frame. Result: Only moves data between existing heaps. Does NOT release heaps back to OS. Reducing Pool Sizes Minimizing all buffer pool sizes to force more frequent reuse. Result: Slightly slows the leak rate but does not stop it. Root Cause Analysis How Metal Heap Allocation Appears to Work Metal allocates GPU heap blocks in large chunks (~1 GB observed) Application requests buffers from these heaps When application releases buffers, memory becomes "unused" within the heap Metal does NOT release heap blocks back to macOS, even when entirely unused When fragmentation prevents reuse, Metal allocates new heap blocks Result: Continuous memory growth with no upper bound The Core Problem There appears to be no Metal API to force heap memory release. The only way to reclaim this memory is to destroy the Metal device entirely, which requires restarting the application. Expected Behavior Metal should: Release unused heaps - When a heap block is entirely unused, release it back to macOS Respect purgeable hints - Honor setPurgeableState calls from applications Compact allocations - Defragment heap allocations to reduce fragmentation Provide control APIs - Allow applications to request heap compaction or release Enforce limits - Have configurable maximum heap memory consumption Security Implications Local Denial of Service - Any Metal application can exhaust system memory, causing instability affecting all running applications Memory Pressure Attack - Forces other applications to swap to disk, degrading system-wide performance No Upper Bound - Memory consumption continues until system failure Unmitigable - End users have no way to prevent or limit the leak Affects All Metal Apps - Any application using Metal heaps is potentially affected Impact Applications become unstable after extended use System-wide performance degrades as memory pressure increases Users must periodically restart applications Developers cannot work around this at the application level Long-running applications (games, creative tools, servers) are particularly affected Request Investigate Metal heap memory management behavior Implement heap release when blocks become entirely unused Honor setPurgeableState hints from applications Consider providing an API for applications to request heap compaction Document any intended behavior or workarounds Additional Notes This issue has been observed across multiple Unreal Engine versions (5.4, 5.7) and multiple Apple Silicon generations (M1 through M4). The behavior is consistent and reproducible. The Unreal Engine team has implemented various CVars to attempt mitigation (rhi.Metal.HeapBufferBytesToCompact, rhi.Metal.ResourcePurgeInPool, etc.) but none successfully address the issue because the root cause is at the Metal framework level. Tested: January 2026 Platform: macOS Tahoe 26.2, Apple Silicon (M1/M2/M3/M4)
Replies
5
Boosts
2
Views
1.1k
Activity
Jan ’26
Metal debug log in Swift Package
My goal is to print a debug message from a shader. I follow the guide that orders to set -fmetal-enable-logging metal compiler flag and following environment variables: MTL_LOG_LEVEL=MTLLogLevelDebug MTL_LOG_BUFFER_SIZE=2048 MTL_LOG_TO_STDERR=1 However there's an issue with the guide, it's only covers Xcode project setup, however I'm working on a Swift Package. It has a Metal-only target that's included into main target like this: targets: [ // A separate target for shaders. .target( name: "MetalShaders", resources: [ .process("Metal") ], plugins: [ // https://github.com/schwa/MetalCompilerPlugin .plugin(name: "MetalCompilerPlugin", package: "MetalCompilerPlugin") ] ), // Main target .target( name: "MegApp", dependencies: ["MetalShaders"] ), .testTarget( name: "MegAppTests", dependencies: [ "MegApp", "MetalShaders", ] ] So to apply compiler flag I use MetalCompilerPlugin which emits debug.metallib, it also allows to define DEBUG macro for shaders. This code compiles: #ifdef DEBUG logger.log_error("Hello There!"); os_log_default.log_debug("Hello thread: %d", gid); // this proves that code exectutes result.flag = true; #endif Environment is set via .xctestplan and valideted to work with ProcessInfo. However, nothing is printed to Xcode console nor to Console app. In attempt to fix it I'm trying to setup a MTLLogState, however the makeLogState(descriptor:) fails with error: if #available(iOS 18.0, *) { let logDescriptor = MTLLogStateDescriptor() logDescriptor.level = .debug logDescriptor.bufferSize = 2048 // Error Domain=MTLLogStateErrorDomain Code=2 "Cannot create residency set for MTLLogState: (null)" UserInfo={NSLocalizedDescription=Cannot create residency set for MTLLogState: (null)} let logState = try! device.makeLogState(descriptor: logDescriptor) commandBufferDescriptor.logState = logState } Some LLMs suggested that this is connected with Simulator, and truly, I run the tests on simulator. However tests don't want to run on iPhone... I found solution running them on My Mac (Mac Catalyst). Surprisingly descriptor log works there, even without MTLLogState. But the Simulator behaviour seems like a bug...
Replies
1
Boosts
1
Views
676
Activity
Jan ’26
Why there's no rgb32Float in Metal?
I noticed that MTLPixelFormat has this cases: case r32Float = 55 case rg32Float = 105 case rgba32Float = 125 But no case rgb32Float. What's the reason for such a discrimination?
Replies
1
Boosts
0
Views
283
Activity
Jan ’26
Hover effects w/ Compositor Services w/ PSVR2 controllers
Hi, I would like clarification on whether the new hover effects feature introduced in vision os 26 supported pinch gestures through the psvr 2 controllers. In your sample application, I was not able to confirm that this was working. Only pinch clicking with my hands worked. Pulling the trigger on the controller whilst looking at a 3d object did not activate the hover effect spatial event in the sample application. (The object is showing the highlight though) This is inconsistent with hover effect behavior with psvr2 controllers on swift ui views, where the trigger press does count as a button click. The sample I used was this one: https://developer.apple.com/documentation/compositorservices/rendering_hover_effects_in_metal_immersive_apps
Replies
0
Boosts
0
Views
454
Activity
Jan ’26
recent JAX versions fail on Metal
Hi, I'm not sure whether this is the appropriate forum for this topic. I just followed a link from the JAX Metal plugin page https://developer.apple.com/metal/jax/ I'm writing a Python app with JAX, and recent JAX versions fail on Metal. E.g. v0.8.2 I have to downgrade JAX pretty hard to make it work: pip install jax==0.4.35 jaxlib==0.4.35 jax-metal==0.1.1 Can we get an updated release of jax-metal that would fix this issue? Here is the error I get with JAX v0.8.2: WARNING:2025-12-26 09:55:28,117:jax._src.xla_bridge:881: Platform 'METAL' is experimental and not all JAX functionality may be correctly supported! WARNING: All log messages before absl::InitializeLog() is called are written to STDERR W0000 00:00:1766771728.118004 207582 mps_client.cc:510] WARNING: JAX Apple GPU support is experimental and not all JAX functionality is correctly supported! Metal device set to: Apple M3 Max systemMemory: 36.00 GB maxCacheSize: 13.50 GB I0000 00:00:1766771728.129886 207582 service.cc:145] XLA service 0x600001fad300 initialized for platform METAL (this does not guarantee that XLA will be used). Devices: I0000 00:00:1766771728.129893 207582 service.cc:153] StreamExecutor device (0): Metal, <undefined> I0000 00:00:1766771728.130856 207582 mps_client.cc:406] Using Simple allocator. I0000 00:00:1766771728.130864 207582 mps_client.cc:384] XLA backend will use up to 28990554112 bytes on device 0 for SimpleAllocator. Traceback (most recent call last): File "<string>", line 1, in <module> import jax; print(jax.numpy.arange(10)) ~~~~~~~~~~~~~~~~^^^^ File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/numpy/lax_numpy.py", line 5951, in arange return _arange(start, stop=stop, step=step, dtype=dtype, out_sharding=sharding) File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/numpy/lax_numpy.py", line 6012, in _arange return lax.broadcasted_iota(dtype, (size,), 0, out_sharding=out_sharding) ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/lax/lax.py", line 3415, in broadcasted_iota return iota_p.bind(dtype=dtype, shape=shape, ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^ dimension=dimension, sharding=out_sharding) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/core.py", line 633, in bind return self._true_bind(*args, **params) ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/core.py", line 649, in _true_bind return self.bind_with_trace(prev_trace, args, params) ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/core.py", line 661, in bind_with_trace return trace.process_primitive(self, args, params) ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^ File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/core.py", line 1210, in process_primitive return primitive.impl(*args, **params) ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/dispatch.py", line 91, in apply_primitive outs = fun(*args) jax.errors.JaxRuntimeError: UNKNOWN: -:0:0: error: unknown attribute code: 22 -:0:0: note: in bytecode version 6 produced by: StableHLO_v1.13.0 -------------------- For simplicity, JAX has removed its internal frames from the traceback of the following exception. Set JAX_TRACEBACK_FILTERING=off to include these. I0000 00:00:1766771728.149951 207582 mps_client.h:209] MetalClient destroyed.
Replies
0
Boosts
0
Views
580
Activity
Dec ’25
Error: "CoreImage Metal library does not contain function"
Hey I'm using the CIDepthBlurEffect Core Image Filter in my app. It seems to work ok but I get these errors in the console when calling the class. CoreImage Metal library does not contain function for name: sparserendering_xhlrb_scan CoreImage Metal library does not contain function for name: sparserendering_xhlrb_diffuse CoreImage Metal library does not contain function for name: sparserendering_xhlrb_copy_back CoreImage Metal library does not contain function for name: plain_or_sRGB_copy Am I missing some sort of import to gain these Metal functions? I am using my own custom shaders but I assume you'd be able to use them along side the built in ones.
Replies
0
Boosts
0
Views
540
Activity
Dec ’25
Few screen tap feedback ideas (and implementation YouTube video)
I've been playing around with iPad PRO M5 13" as part of my goal to implement some music relating SPH particle simulation effects on it - and this involves utilizing tap events also from the incredible looking fresh screen the device has. See more information from here, all should be overreactively implemented but the ideas remain (with almost zero cost copy fragment shader) : `https://youtu.be/ci-GSgQ0wlM` This attached image shows the tap effects implementation brought just bit a little further than in the video.
Replies
1
Boosts
0
Views
1.2k
Activity
Dec ’25
CompileMetalFile Failed With Xcode Cloud
Hello guys, recently I integrated a third-party library into my code to handle blur effects (Glur). This library leverages Metal's compute capabilities and appears to automatically depend on the Metal toolchain during the build process. When using Xcode Cloud, the archive step consistently fails with a "CompileMetalFile Failed" error. However, when I manually archive the project in Xcode locally, everything works fine without any issues. I’ve confirmed that the macOS and Xcode versions specified in my Xcode Cloud workflow are correct. Reply if you know how to fix it or it will be fixed future, thanks. I'm using Xcode 26.2(17C52) and macOS(15.7.1 (24G231))
Replies
0
Boosts
0
Views
140
Activity
Dec ’25
Quantumsort, next level of Radix Sort implementations
Hello All, I've been finally able to finish my so called Quantumsort implementation, which is basically only slightly modified Radix Sort at the basis, but utilizes the somewhat newish Apple Metal SIMD commands heavily. Or rather rather renders the threadgroup shared memory void altogether with those SIMD sharing options available. The sorting itself works "well enough" for my needs here on these Github shared main.swift and sort.metal files already, and the question relates more on it that - can someone spot would it be possible to get rid of the "sorted_temp" array memory too and use one "sorted" list only at all times..? Here's the related repository and copilots I prefer you and you and you more for haha tho : https://guthib.com/roger-more-fi/quantumsort Cheers, make it to the perfection those good last ones of '25 'aights Apple soldiers!
Replies
1
Boosts
0
Views
415
Activity
Dec ’25
CIRAWFilter.outputImage first-time cost is huge (~3s), subsequent calls are ~3ms. Any official way to pre-initialize RAW pipeline (without taking a real photo)?
Hi Apple Developer Forums, I’m developing an iOS camera app that processes RAW captures using Core Image. I’m seeing a large “first use” performance penalty specifically when creating the CIImage from CIRAWFilter.outputImage. What’s slow (important detail) I’m measuring the time for: let rawFilter = CIRAWFilter(imageData: rawData, identifierHint: hint) let ciImage = rawFilter.outputImage This is not CIContext.render(...) / createCGImage(...). It’s just the time to access outputImage (i.e., building the Core Image graph / RAW pipeline setup). Observed behavior First time accessing CIRAWFilter.outputImage: ~3 seconds Second time (same app session, similar RAW): ~3 milliseconds So something heavy is happening only on first use (decoder initialization, pipeline setup, shader/library compilation, caching, etc.). Using Metal System Trace, I also noticed that during the slow first call there are many “Create MTLLibrary” events, while the second call doesn’t show this pattern. Warm-up attempts using bundled DNG I tried to “warm up” early (e.g., on camera screen entry) by loading a bundled DNG and then accessing CIRAWFilter.outputImage by taking a photo: Warm-up with a ~247 KB DNG → first real RAW outputImage cost drops to ~1.42s Warm-up with a ~25 MB DNG → first real RAW outputImage cost drops to ~843ms This helps, but it’s still far from the steady-state ~3ms. Warm-up by capturing a real RAW (works, but concerns) The only method that fully eliminates the delay is to trigger a real RAW capture programmatically before the user’s first photo, then use that captured rawData to warm up the CIRAWFilter.outputImage path. This brings the first user-facing capture close to the steady-state timing. However: In some regions, the camera shutter sound cannot be suppressed, so “hidden warm-up capture” is unacceptable UX. I’m also unsure whether triggering a real capture without an explicit user action could raise compliance/privacy concerns, even if the image is immediately discarded and never saved/uploaded. Questions Is the large first-time cost of CIRAWFilter.outputImage expected (RAW pipeline initialization / shader compilation)? Is there an Apple-recommended way to pre-initialize the Core Image RAW pipeline / Metal resources so the first outputImage is fast, without taking a real photo? Are there any best practices (e.g. CIContext creation timing, prepareRender(...), specific options) that reliably reduce this first-use overhead for CIRAWFilter? Attachments Figure 1: First RAW capture with no warm-up (~3s outputImage time) Figure 2: First RAW capture after warm-up with bundled DNG (improved but still hundreds of ms) Thanks for any guidance or experience sharing!
Replies
3
Boosts
2
Views
650
Activity
Jan ’26
NSScreen's maximumExtendedDynamicRangeColorComponentValue does not seem to provide the proper value after sleep/wake on third party HDR displays even when there is EDR content on screen in macOS Tahoe
The maximumExtendedDynamicRangeColorComponentValue should provide some value between 1.0 and maximumPotentialExtendedDynamicRangeColorComponentValue depending on the available EDR headroom if there is any content on-screen that uses EDR. This works fine in most scenarios but in macOS 26 Tahoe (including in 26.2) this seemingly breaks down when a third party external display is in HDR mode and the Mac goes to sleep and wakes up. After wake only a value of 1.0 is provided by the third party external display's NSScreen object, no matter what (although when the SDR peak brightness is being changed using the brightness slider, didChangeScreenParametersNotification is firing and the system should provide a proper updated headroom value). This makes dynamic tone-mapping that adapts to actual screen brightness impossible. Everything works fine in Sequoia. In Tahoe the user needs to turn off HDR, then go through a sleep/wake cycle and turn HDR back on to have this fixed, which is obviously not a sustainable workaround.
Replies
1
Boosts
0
Views
291
Activity
Dec ’25
Metal toolchain compilation error after migration to MacOS 26 and XCode 26.1
Hello. I migrated yesterday two machines from OS15 to OS26 (same Ram; M1Max chip). On the first one, a MBP, everything is allright, and my old Metal projects can compile and run, even with MacOS12 as a destination. On the second computer (MacStudio) I got a compile "error: cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain". I spent hours on many forums and tried all proposed solutions and still get the same error. Any idea? Thanks
Replies
2
Boosts
0
Views
164
Activity
Dec ’25
Race conditions when changing CAMetalLayer.drawableSize?
Is the pseudocode below thread-safe? Imagine that the Main thread sets the CAMetalLayer's drawableSize to a new size meanwhile the rendering thread is in the middle of rendering into an existing MTLDrawable which does still have the old size. Is the change of metalLayer.drawableSize thread-safe in the sense that I can present an old MTLDrawable which has a different resolution than the current value of metalLayer.drawableSize? I assume that setting the drawableSize property informs Metal that the next MTLDrawable offered by the CAMetalLayer should have the new size, right? Is it valid to assume that "metalLayer.drawableSize = newSize" and "metalLayer.nextDrawable()" are internally synchronized, so it cannot happen that metalLayer.nextDrawable() would produce e.g. a MTLDrawable with the old width but with the new height (or a completely invalid resolution due to potential race conditions)? func onWindowResized(newSize: CGSize) { // Called on the Main thread metalLayer.drawableSize = newSize } func onVsync(drawable: MTLDrawable) { // Called on a background rendering thread renderer.renderInto(drawable: drawable) }
Replies
1
Boosts
1
Views
533
Activity
Dec ’25
App Freezes on iPadOS 26.x - GPU Metal Errors
I work on a Qt/QML app that uses Esri Maps SDK for Qt and that is deployed to both Windows and iPads. With a recent iPad OS upgrade to 26.1, many iPad users are reporting the application freezing after panning and/or identifying features in the map. It runs fine for our Windows users. I was able to reproduce this and grabbed the following error messages when the freeze happens: IOGPUMetalError: Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault) IOGPUMetalError: Invalid Resource (00000009:kIOGPUCommandBufferCallbackErrorInvalidResource) Environment: Qt 6.5.4 (Qt for iOS) Esri Maps SDK for Qt 200.3 iPadOS 26.1 Because it appears to be a Metal error, I tried using OpenGL (Qt offers a way to easily set hte target graphics api): QQuickWindow::setGraphicsApi(QSGRendererInterface::GraphicsApi::OpenGL) Which worked! No more freezing. But I'm seeing many posts that OpenGL has been deprecated by Apple. I've seen posts that Apple deprecated OpenGL ES. But it seems to still be available with iPadOS 26.1. If so, will this fix (above) just cause problems with a future iPadOS update? Any other suggestions to address this issue? Upgrading our version of Qt + Esri SDK to the latest version is not an option for us. We are in the process to upgrade the full application, but it is a year or two out. So, we just need a fix to buy us some time for now. Appreciate any thoughts/insights....
Replies
3
Boosts
0
Views
566
Activity
Dec ’25