I'm using a class with tangents to render on RealityKit for VisionOS but in Vision26 it cause a crash on App and there not documentation how implement cp_drawable_compute_projection I have tried a few options but without success. Could you help me to implement it ?
The part of code is:
return drawable.views.map { view in
let userViewpointMatrix = (simdDeviceAnchor * view.transform).inverse
let projectionMatrix = ProjectiveTransform3D(
leftTangent: Double(view.tangents[0]),
rightTangent: Double(view.tangents[1]),
topTangent: Double(view.tangents[2]),
bottomTangent: Double(view.tangents[3]),
nearZ: Double(drawable.depthRange.y),
farZ: Double(drawable.depthRange.x),
reverseZ: true
)
let screenSize = SIMD2(x: Int(view.textureMap.viewport.width),
y: Int(view.textureMap.viewport.height))
return ModelRendererViewportDescriptor(viewport: view.textureMap.viewport,
projectionMatrix: .init(projectionMatrix),
viewMatrix: userViewpointMatrix * translationMatrix * rotationMatrix * scalingMatrix * commonUpCalibration,
screenSize: screenSize)
}
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Posts under Metal tag
149 Posts
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
I maintain a couple of CoreImage libraries that provide custom Metal kernel backed CIFilters. In iOS/iPadOS 26, the CIColorKernel.apply() method invoked in the CIFilter subclass fails to add the coreimage::destination parameter to the Metal function call:
-[CIColorKernel applyWithExtent:arguments:options:] argument count mismatch for kernel 'FractalNoise3D', expected 13 but saw 12.
I've compiled the code with Xcode 26 and deployed to iOS 18 devices without any breakage, so this is definitely an iOS problem, not an Xcode problem.
Library here: https://github.com/JoshuaSullivan/SimplexNoiseFilter
Feedback ID: FB17874311
I'm building a weather map that shows the rain on the map. I'm able to retrieve PNG images that are used as tiles to put onto the map. I then reload all the tiles on the map with each timeframe (tile set for every 10 minutes).
I'm able to get the map loaded up and I'm able to place the tiles and reload the data for each time slot. I preload all the PNG data needed for the tiles and store that NSData for them in memory so that they are quick for loading and showing on the map.
I have timer's set to reload the overlay with the next set of tiles for each time slot. Giving the view of a moving precipitation map over time (just like you'd see on any weather map.)
I have 12 time slots (timestamps) showing every 10 minutes for the past 2 hours. I have it showing each in sequence and then repeating.
Over time I get a crash with this error as a Thread 1: signal SIGABRT.
Failed to acquire drawable, rendering to temporary texture
validateRenderPassDescriptor:782: failed assertion `RenderPass Descriptor Validation
MTLRenderPassAttachmentDescriptor MTLStoreActionMultisampleResolve store action at attachment 0 requires resolve texture
'
validateRenderPassDescriptor:782: failed assertion `RenderPass Descriptor Validation
MTLRenderPassAttachmentDescriptor MTLStoreActionMultisampleResolve store action at attachment 0 requires resolve texture
'
Through some searching I've discovered that this seems to be console output from Metal. I assume Metal is used for MapKit to render the overlay tiles?
I'm using the same custom overlay where I set the timestamp on it and then tell it to reload. I also reuse the same MKOverlayRenderer as shown here...
- (MKOverlayRenderer*)mapView:(MKMapView*)mapView rendererForOverlay:(id<MKOverlay>)overlay
{
if ([overlay isKindOfClass:[MKTileOverlay class]]) {
if (!self.rainRenderer) {
self.rainRenderer = [[MKTileOverlayRenderer alloc] initWithTileOverlay:overlay];
self.rainRenderer.alpha = 0.5;
}
return self.rainRenderer;
}
return nil;
}
And here's the function that reloads the overlay...
- (void) updateRainFrame {
self.currentFrameIndex = (self.currentFrameIndex + 1) % self.timestamps.count;
if ((self.currentFrameIndex >= 0) && (self.timestamps.count > self.currentFrameIndex)) {
NSLog (@"self.currentFrameIndex = %lu", self.currentFrameIndex);
NSString *timestamp = self.timestamps[self.currentFrameIndex];
[self.overlay setTimestamp:timestamp];
[self.rainRenderer reloadData];
}
}
The time it takes to crash seems arbitrary. Sometimes it's very quick. Less than a minute. But usually it's several minutes. 10 or 20 minutes or more. Feels like some sort of race condition that's occurring. Perhaps ARC is not able to release the images for the tiles quick enough for each overlay reload? That's a wild guess but I think it's something more deeper in Metal as I feel I would see other errors related to memory availability.
Some of my searches point to something about MSAA needing to be turned off in Metal to resolve this. However I have no idea how I would do that through MapKit.
Any suggestions? Let me know if there is somehow a way to capture more from the crash to give more insight.
I am develop visionOS app. I am now very interested in Metal and Compositor Services, but I have not explored them in depth. I know that Metal has a higher degree of control freedom. I am wondering if using Compositor Services will have fewer functions than RealityKit in AR technology (such as scene reconstruction and understanding, hover effect, etc.).
Just downloaded XCode 26 and I see build fails despite Metal toolchain 26.0 downloaded. What am I missing?
cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain
Hi,
I'm developing a SwiftUI app using RealityKit and ARKit for an AR measuring feature. I’ve noticed that after navigating away from my AR view and performing extensive cleanup (including removing all anchors/entities, pausing the ARSession, and nil-ing out all references), memory usage remains elevated and sometimes grows with repeated AR sessions.
Each time I enter and exit the AR view, memory increases
The memory does not return to the baseline after cleanup, even though all custom objects are deallocated.
Are there best practices beyond what I’ve described to ensure all ARKit/RealityKit resources are released after an AR session?
Hey, I need to know how to use texture mapping for rendering a spectrogram in metal. As I need smoothens the spectrogram. In my current project I am using vertex based approach which results in blocky behaviour between each quad. I need to smooth across each qaud so that It will smoothly gradient over.
Hi everyone,
I'm creating an educational App that allows doing computational design in an immersive environment with the Vision Pro. The App is free and can be found here:
https://apps.apple.com/us/app/arcade-topology/id6742103633
The problem I have is that the mesh of voxels I currently create use ModelEntity and I recently read that this is horrible for scalability. I already start to see issues when I try to use thousands of voxels. I also read somewhere that I should then take advantage of GPUs and use metal to that end. I was wondering if someone could point me to a tutorial or article that discusses this. In essence, I need to create a 3D voxel mesh, and those voxels have to update their opacity within an iterative loop.
Thanks!
—Alejandro
Problem Description
I'm encountering an issue with SCNTechnique where the clearColor setting is being ignored when multiple passes share the same depth buffer. The clear color always appears as the scene background, regardless of what value I set. The minimal project for reproducing the issue: https://www.dropbox.com/scl/fi/30mx06xunh75wgl3t4sbd/SCNTechniqueCustomSymbols.zip?rlkey=yuehjtk7xh2pmdbetv2r8t2lx&st=b9uobpkp&dl=0
Problem Details
In my SCNTechnique configuration, I have two passes that need to share the same depth buffer for proper occlusion handling:
"passes": [
"box1_pass": [
"draw": "DRAW_SCENE",
"includeCategoryMask": 1,
"colorStates": [
"clear": true,
"clearColor": "0 0 0 0" // Expecting transparent black
],
"depthStates": [
"clear": true,
"enableWrite": true
],
"outputs": [
"depth": "box1_depth",
"color": "box1_color"
],
],
"box2_pass": [
"draw": "DRAW_SCENE",
"includeCategoryMask": 2,
"colorStates": [
"clear": true,
"clearColor": "0 0 0 0" // Also expecting transparent black
],
"depthStates": [
"clear": false,
"enableWrite": false
],
"outputs": [
"depth": "box1_depth", // Sharing the same depth buffer
"color": "box2_color",
],
],
"final_quad": [
"draw": "DRAW_QUAD",
"metalVertexShader": "myVertexShader",
"metalFragmentShader": "myFragmentShader",
"inputs": [
"box1_color": "box1_color",
"box2_color": "box2_color",
],
"outputs": [
"color": "COLOR"
]
]
]
And the metal shader used to display box1_color and box2_color with splitting:
fragment half4 myFragmentShader(VertexOut in [[stage_in]],
texture2d<half, access::sample> box1_color [[texture(0)]],
texture2d<half, access::sample> box2_color [[texture(1)]]) {
half4 color1 = box1_color.sample(s, in.texcoord);
half4 color2 = box2_color.sample(s, in.texcoord);
if (in.texcoord.x < 0.5) {
return color1;
}
return color2;
};
Expected Behavior
Both passes should clear their color targets to transparent black (0, 0, 0, 0)
The depth buffer should be shared between passes for proper occlusion
Actual Behavior
Both box1_color and box2_color targets contain the scene background instead of being cleared to transparent (see attached image)
This happens even when I explicitly set clearColor: "0 0 0 0" for both passes
Setting scene.background.contents = UIColor.clear makes the clearColor work as expected, but I need to keep the scene background for other purposes
What I've Tried
Setting different clearColor values - all are ignored when sharing depth buffer
Using DRAW_NODE instead of DRAW_SCENE - didn't solve the issue
Creating a separate pass to capture the background - the background still appears in the other passes
Various combinations of clear flags and render orders
Environment
iOS/macOS, running with "My Mac (Designed for iPad)"
Xcode 16.2
Question
Is this a known limitation of SceneKit when passes share a depth buffer? Is there a workaround to achieve truly transparent clear colors while maintaining a shared depth buffer for occlusion testing?
The core issue seems to be that SceneKit automatically renders the scene background in every DRAW_SCENE pass when a shared depth buffer is detected, overriding any clearColor settings.
Any insights or workarounds would be greatly appreciated. Thank you!
Hey, I've been struggling with this for some days now.
I am trying to write to a sparse texture in a compute shader. I'm performing the following steps:
Set up a sparse heap and create a texture from it
Map the whole area of the sparse texture using updateTextureMapping(..)
Overwrite every value with the value "4" in a compute shader
Blit the texture to a shared buffer
Assert that the values in the buffer are "4".
I have a minimal example (which is still pretty long unfortunately).
It works perfectly when removing the line heapDesc.type = .sparse.
What am I missing? I could not find any information that writes to sparse textures are unsupported. Any help would be greatly appreciated.
import Metal
func sparseTexture64x64Demo() throws {
// ── Metal objects
guard let device = MTLCreateSystemDefaultDevice()
else { throw NSError(domain: "SparseNotSupported", code: -1) }
let queue = device.makeCommandQueue()!
let lib = device.makeDefaultLibrary()!
let pipeline = try device.makeComputePipelineState(function: lib.makeFunction(name: "addOne")!)
// ── Texture descriptor
let width = 64, height = 64
let format: MTLPixelFormat = .r32Uint // 4 B per texel
let desc = MTLTextureDescriptor()
desc.textureType = .type2D
desc.pixelFormat = format
desc.width = width
desc.height = height
desc.storageMode = .private
desc.usage = [.shaderWrite, .shaderRead]
// ── Sparse heap
let bytesPerTile = device.sparseTileSizeInBytes
let meta = device.heapTextureSizeAndAlign(descriptor: desc)
let heapBytes = ((bytesPerTile + meta.size + bytesPerTile - 1) / bytesPerTile) * bytesPerTile
let heapDesc = MTLHeapDescriptor()
heapDesc.type = .sparse
heapDesc.storageMode = .private
heapDesc.size = heapBytes
let heap = device.makeHeap(descriptor: heapDesc)!
let tex = heap.makeTexture(descriptor: desc)!
// ── CPU buffers
let bytesPerPixel = MemoryLayout<UInt32>.stride
let rowStride = width * bytesPerPixel
let totalBytes = rowStride * height
let dstBuf = device.makeBuffer(length: totalBytes, options: .storageModeShared)!
let cb = queue.makeCommandBuffer()!
let fence = device.makeFence()!
// 2. Map the sparse tile, then signal the fence
let rse = cb.makeResourceStateCommandEncoder()!
rse.updateTextureMapping(
tex,
mode: .map,
region: MTLRegionMake2D(0, 0, width, height),
mipLevel: 0,
slice: 0)
rse.update(fence) // ← capture all work so far
rse.endEncoding()
let ce = cb.makeComputeCommandEncoder()!
ce.waitForFence(fence)
ce.setComputePipelineState(pipeline)
ce.setTexture(tex, index: 0)
let threadsPerTG = MTLSize(width: 8, height: 8, depth: 1)
let tgCount = MTLSize(width: (width + 7) / 8,
height: (height + 7) / 8,
depth: 1)
ce.dispatchThreadgroups(tgCount, threadsPerThreadgroup: threadsPerTG)
ce.updateFence(fence)
ce.endEncoding()
// Blit texture into shared buffer
let blit = cb.makeBlitCommandEncoder()!
blit.waitForFence(fence)
blit.copy(
from: tex,
sourceSlice: 0,
sourceLevel: 0,
sourceOrigin: MTLOrigin(x: 0, y: 0, z: 0),
sourceSize: MTLSize(width: width, height: height, depth: 1),
to: dstBuf,
destinationOffset: 0,
destinationBytesPerRow: rowStride,
destinationBytesPerImage: totalBytes)
blit.endEncoding()
cb.commit()
cb.waitUntilCompleted()
assert(cb.error == nil, "GPU error: \(String(describing: cb.error))")
// ── Verify a few texels
let out = dstBuf.contents().bindMemory(to: UInt32.self, capacity: width * height)
print("first three texels:", out[0], out[1], out[width]) // 0 1 64
assert(out[0] == 4 && out[1] == 4 && out[width] == 4)
}
Metal shader:
#include <metal_stdlib>
using namespace metal;
kernel void addOne(texture2d<uint, access::write> tex [[texture(0)]],
uint2 gid [[thread_position_in_grid]])
{
tex.write(4, gid);
}
I'm trying to build an MDLMesh then add normals
let mdlMesh = MDLMesh.newBox(withDimensions: SIMD3<Float>(1, 1, 1),
segments: SIMD3<UInt32>(2, 2, 2),
geometryType: MDLGeometryType.triangles,
inwardNormals:false,
allocator: allocator)
mdlMesh.addNormals(withAttributeNamed: MDLVertexAttributeNormal, creaseThreshold: 0)
When I render the mesh, some normals are (0,0,0). I don't know if the problem is in the mesh, or in the conversion to MTKMesh. Is there a way to examine an MDLMesh with the geometry viewer?
When I look at the variable values for my mdlMesh I get this:
Not too useful. I don't know how to track down the normals.
What's the best way to find out where the normals getting broken?
I am building a MacOS desktop app (https://anukari.com) that is using Metal compute to do real-time audio/DSP processing, as I have a problem that is highly parallelizable and too computationally expensive for the CPU.
However it seems that the way in which I am using the GPU, even when my app is fully compute-limited, the OS never increases the power/performance state. Because this is a real-time audio synthesis application, it's a huge problem to not be able to take advantage of the full clock speeds that the GPU is capable of, because the app can't keep up with real-time.
I discovered this issue while profiling the app using Instrument's Metal tracing (and Game tracing) modes. In the profiling configuration under "Metal Application" there is a drop-down to select the "Performance State." If I run the application under Instruments with Performance State set to Maximum, it runs amazingly well, and all my problems go away.
For comparison, when I run the app on its own, outside of Instruments, the expensive GPU computation it's doing takes around 2x as long to complete, meaning that the app performs half as well.
I've done a ton of work to micro-optimize my Metal compute code, based on every scrap of information from the WWDC videos, etc. A problem I'm running into is that I think that the more efficient I make my code, the less it signals to the OS that I want high GPU clock speeds!
I think part of why the OS is confused is that in most use cases, my computation can be done using only a small number of Metal threadgroups. I'm guessing that the OS heuristics see that only a small fraction of the GPU is saturated and fail to scale up the power/clock state.
I'm not sure what to do here; I'm in a bit of a bind. One possibility is that I intentionally schedule busy work -- spin threadgroups just to waste energy and signal to the OS that I need higher clock speeds. This is obviously a really bad idea, but it might work.
Is there any other (better) way for my app to signal to the OS that it is doing real-time latency-sensitive computation on the GPU and needs the clock speeds to be scaled up?
Note that game mode is not really an option, as my app also runs as an AU plugin inside hosts like Garageband, so it can't be made fullscreen, etc.
I made a box with MDLMesh.newBox(). I added normals.
let mdlMesh = MDLMesh.newBox(withDimensions: SIMD3<Float>(1, 1, 1),
segments: SIMD3<UInt32>(2, 2, 2),
geometryType: MDLGeometryType.triangles,
inwardNormals:false,
allocator: allocator)
mdlMesh.addNormals(withAttributeNamed: MDLVertexAttributeNormal, creaseThreshold: 0.25)
After I convert to MTKMesh the normals are (0,0,0) for a group of vertices. I can only inspect the geometry after I convert to MTKMesh. Is there a way you can use Geometry Viewer on a MDLMesh?
hi
When analyzing our game using Instruments, I've always been confused about the two items "Drawable Present" and "Drawable Presented" in the GPU column. The timing of Drawable Present seems to be when the CPU layer calls commandbuffer:present, rather than when the actual encoding is completed on the GPU. Also, what does drawable presented specifically mean? In our case, when a CPU stall occurs, it appears that the vsync interval changes in the next frame, and a surface that has already been calculated is not displayed. Why is this happening?
I am new to Xcode and trying to learn how to use Metal for my internship. I am trying to link the binaries of Foundation.framework, Metal.framework, and Quartcore.framework. But whenever I try to build it always fails to find any of them. I have my Header Search Path as $(PROJECT_DIR)/metal-cpp, I tried adding some for the Frameworks but that did not work either. I do have the binaries linked in the Build Phases, so I don't know what else I could be missing.
In the Creating A 3D Application With Hydra Rendering tutorial on the Apple Developer website, on the last step where I execute this command:
cmake -S ~/Users/macuser/CreatingA3DApplicationWithHydraRendering/ -B ~/Users/macuser/CreatingA3DApplicationWithHydraRendering/
I keep getting an error:
CMake Error at CMakeLists.txt:5 (include):
include could not find requested file:
/Users/macuser/USDInstall/bin/pxrConfig.cmake
I've tried to follow the instructions as mentioned in the README.md file included in the project files at least 5 times as well as moving the pxrConfig.cmake file around and copying it in different folders, then executed the command and was still unsuccessful into generating the proper file expected to compile and render the HydraPlayer renderer. How do I get cmake to generate the Xcode file to create the HydraPlayer renderer?
I’m trying to follow Apple’s “WWDC24: Bring your machine learning and AI models to Apple Silicon” session to convert the Mistral-7B-Instruct-v0.2 model into a Core ML package, but I’ve run into a roadblock that I can’t seem to overcome. I’ve uploaded my full conversion script here for reference:
https://pastebin.com/T7Zchzfc
When I run the script, it progresses through tracing and MIL conversion but then fails at the backend_mlprogram stage with this error:
https://pastebin.com/fUdEzzKM
The core of the error is:
ValueError: Op "keyCache_tmp" (op_type: identity) Input x="keyCache" expects list, tensor, or scalar but got state[tensor[1,32,8,2048,128,fp16]]
I’ve registered my KV-cache buffers in a StatefulMistralWrapper subclass of nn.Module, matching the keyCache and valueCache state names in my ct.StateType definitions, but Core ML’s backend pass reports the state tensor as an invalid input. I’m using Core ML Tools 8.3.0 on Python 3.9.6, targeting iOS18, and forcing CPU conversion (MPS wasn’t available). Any pointers on how to satisfy the handle_unused_inputs pass or properly declare/cache state for GQA models in Core ML would be greatly appreciated!
Thanks in advance for your help,
Usman Khan
Topic:
Machine Learning & AI
SubTopic:
Core ML
Tags:
Metal
Metal Performance Shaders
Core ML
tensorflow-metal
Hi,
seems MSL is missing support for a clock() shader instruction available in other graphics APIs like Vulkan or OpenGL for example..
useful for counting cost in number of clock cycles of some code insider shader with much finer granularity than launching a micro kernel with same instructions and measuring cycles cost from CPU..
also useful for MoltenVK to support that extensions..
thanks..
Hi,
Introducing Swift Concurrency to my Metal app has been a bit challenging as Swift Concurrency is limited by the cooperative thread pool.
GPU work is obviously not CPU bound and can block forward moving progress, especially when using waitUntilCompleted on the command buffer. For concurrent render work this has the potential of under utilizing the CPU and even creating dead locks.
My question is, what is the Metal's teams general recommendation when it comes to concurrency? It seems to me that Dispatch or OperationQueues are still the preferred way for Metal bound tasks in order to gain maximum performance?
To integrate with Swift Concurrency my idea is to use continuations that kick off render jobs via Dispatch or Queues? Would this be the best solution to bridge async tasks with Metal work?
Thanks!
Starting with iOS 18.0 beta 1, I've noticed that RealityKit frequently crashes in the simulator when an app launches and presents an ARView.
I was able to create a small sample app with repro steps that demonstrates the issue, and I've submitted feedback: FB16144085
I've included a crash log with the feedback.
If possible, I'd appreciate it if an Apple engineer could investigate and suggest a workaround. It's awkward to be restricted to the iOS 17 simulator, which does not exhibit this behavior.
Please let me know if there's anything I can do to help.
Thank you.