We set the CVDisplayLink on macOS to 0 or 120, and get the following. This then clamps maximum refresh to 60Hz on the 120Hz ProMotion display on a MBP M2 Max laptop. How is this not fixed in 4 macOS releases?
CoreVideo: currentVBLDelta returned 200000 for display 1 -- ignoring unreasonable value
CoreVideo: [0x7fe2fb816020] Bad CurrentVBLDelta for display 1 is zero. defaulting to 60Hz.
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Can't seem to get the Metal HUD to display value range's (pre 26 Tahoe). The documented environment variable MTL_HUD_SHOW_VALUE_RANGE doesn't seem to work.
https://developer.apple.com/documentation/xcode/monitoring-your-metal-apps-graphics-performance#Display-the-value-range-of-metrics
Anyone having any luck?
Topic:
Graphics & Games
SubTopic:
Metal
Hello,
Shaders in our application is written using HLSL and we rely on Metal Shader Converter to convert DXIL to Metal IR. We ran into an issue that causes metal pipeline state creation to fail when vertex stage-in function is used on AMD GPUs.
Here's the error reported by Metal in Xcode output:
Compiler failed with XPC_ERROR_CONNECTION_INTERRUPTED
XPC_ERROR_CONNECTION_INTERRUPTED
MTLCompiler: Compilation failed with XPC_ERROR_CONNECTION_INTERRUPTED on 4 try. This error suggests an unexpected interruption in the connection. Possible reasons: a crash in the compiler service, termination by the OS due to resource constraints (e.g., jetsam), a timeout in the service, or an issue with IPC. Verify system stability and check the logs for more details.
Compiler failed with XPC_ERROR_CONNECTION_INVALID
XPC_ERROR_CONNECTION_INVALID
MTLCompiler: Compiler encountered XPC_ERROR_CONNECTION_INVALID: failed to check-in, peer may have been unloaded: mach_error=10000003 (is the OS shutting down or process jetsammed?)
Compilation failed due to an interrupted connection: XPC_ERROR_CONNECTION_INTERRUPTED. This error occurred after multiple retries.
which seems to indicate a internal compiler error.
I have a minimal repro here: https://github.com/kcloudy0717/metal_pso_fail/tree/main, simply follow the instructions in README.
I rewrote my graphics pipeline to use Load/Store better for clearing and don't care cases. All my tests pass, and in the Metal debugger, all the draw calls succeed.
But when I present drawables (before [commandBuffer commit]) I only get a pink screen. I've tried everything I can think of: making sure the pixel formats are the same for the back buffer as my render targets, etc. But it's still pink.
Could you point me in the right direction so I can fix this, or help describe why it's pink. That would be really helpful.
Thank you,
Brian Hapgood
Topic:
Graphics & Games
SubTopic:
Metal
Hi,
I’m using the latest iPad Pro (13-inch) and I can see that Metal offers an rgb10a2unorm texture for rendering, but when I render a grey ramp and measure the actual luminance, I get a pattern that I would expect from an 8-bit texture (see below). Before I start ripping apart all my code, is there anything else I need to do to convince iOS to render my texture in 10-bit?
I already tried setting the PixelFormat in my CMetalLayer to rgb10a2unorm, but that didn’t change anything.
Hello, I am quite new to using the metal API and was wondering if it was common (or even possible) if you knew that, when a pipeline was created, you never needed to make another one with the same shaders again, if it is safe to release the library the was used to reference the shaders? Only asking because this is possible in other apis, but apple never mentions (as far as I have found) if this is safe or not safe to do.
Topic:
Graphics & Games
SubTopic:
Metal
Hello. In the iOS app i'm working on we are very tight on memory budget and I was looking at ways to reduce our texture memory usage. However I noticed that comparing ASTC8x8 to ASTC12x12, there is no actual difference in allocated memory for most of our textures despite ASTC12x12 having less than half the bpp of 8x8. The difference between the two only becomes apparent for textures 1024x1024 and larger, and even in that case the actual texture data is sometimes only 60% of the allocation size. I understand there must be some alignment and padding going on, but this seems extreme. For an example scene in my app with astc12x12 for most textures there is over a 100mb difference in astc size on disk versus when loaded, so I would love to be able to recover even a portion of that memory.
Here is some test code with some measurements i've taken using an iphone 11:
for(int i = 0; i < 11; i++) {
MTLTextureDescriptor *texDesc = [[MTLTextureDescriptor alloc] init];
texDesc.pixelFormat = MTLPixelFormatASTC_12x12_LDR;
int dim = 12;
int n = 2 << i;
int mips = i+1;
texDesc.width = n;
texDesc.height = n;
texDesc.mipmapLevelCount = mips;
texDesc.resourceOptions = MTLResourceStorageModeShared;
texDesc.usage = MTLTextureUsageShaderRead;
// Calculate the equivalent astc texture size
int blocks = 0;
if(mips == 1) {
blocks = n/dim + (n%dim>0? 1 : 0);
blocks *= blocks;
} else {
for(int j = 0; j < mips; j++) {
int a = 2 << j;
int cur = a/dim + (a%dim>0? 1 : 0);
blocks += cur*cur;
}
}
auto tex = [objCObj newTextureWithDescriptor:texDesc];
printf("%dx%d, mips %d, Astc: %d, Metal: %d\n", n, n, mips, blocks*16, (int)tex.allocatedSize);
}
MTLPixelFormatASTC_12x12_LDR
128x128, mips 7, Astc: 2768, Metal: 6016
256x256, mips 8, Astc: 10512, Metal: 32768
512x512, mips 9, Astc: 40096, Metal: 98304
1024x1024, mips 10, Astc: 158432, Metal: 262144
128x128, mips 1, Astc: 1936, Metal: 4096
256x256, mips 1, Astc: 7744, Metal: 16384
512x512, mips 1, Astc: 29584, Metal: 65536
1024x1024, mips 1, Astc: 118336, Metal: 147456
MTLPixelFormatASTC_8x8_LDR
128x128, mips 7, Astc: 5488, Metal: 6016
256x256, mips 8, Astc: 21872, Metal: 32768
512x512, mips 9, Astc: 87408, Metal: 98304
1024x1024, mips 10, Astc: 349552, Metal: 360448
128x128, mips 1, Astc: 4096, Metal: 4096
256x256, mips 1, Astc: 16384, Metal: 16384
512x512, mips 1, Astc: 65536, Metal: 65536
1024x1024, mips 1, Astc: 262144, Metal: 262144
I also tried using MTLHeaps (placement and automatic) hoping they might be better, but saw nearly the same numbers.
Is there any way to have metal allocate these textures in a more compact way to save on memory?
Hi,
Introducing Swift Concurrency to my Metal app has been a bit challenging as Swift Concurrency is limited by the cooperative thread pool.
GPU work is obviously not CPU bound and can block forward moving progress, especially when using waitUntilCompleted on the command buffer. For concurrent render work this has the potential of under utilizing the CPU and even creating dead locks.
My question is, what is the Metal's teams general recommendation when it comes to concurrency? It seems to me that Dispatch or OperationQueues are still the preferred way for Metal bound tasks in order to gain maximum performance?
To integrate with Swift Concurrency my idea is to use continuations that kick off render jobs via Dispatch or Queues? Would this be the best solution to bridge async tasks with Metal work?
Thanks!
my app use mtkview to render video, but [MTKView initwithFrame:device] takes 2-3s in some some 2019 macbook pro, system macos 15.0.1.
how can I do?
Hello, I am using MTKView to display: camera preview & video playback. I am testing on iPhone 16. App crashes at a random moment whenever MTKView is rendering CIImage.
MetalView:
public enum MetalActionType {
case image(CIImage)
case buffer(CVPixelBuffer)
}
public struct MetalView: UIViewRepresentable {
let mtkView = MTKView()
public let actionPublisher: any Publisher<MetalActionType, Never>
public func makeCoordinator() -> Coordinator {
Coordinator(self)
}
public func makeUIView(context: UIViewRepresentableContext<MetalView>) -> MTKView {
guard let metalDevice = MTLCreateSystemDefaultDevice() else {
return mtkView
}
mtkView.device = metalDevice
mtkView.framebufferOnly = false
mtkView.clearColor = MTLClearColor(red: 0, green: 0, blue: 0, alpha: 0)
mtkView.drawableSize = mtkView.frame.size
mtkView.delegate = context.coordinator
mtkView.isPaused = true
mtkView.enableSetNeedsDisplay = true
mtkView.preferredFramesPerSecond = 60
context.coordinator.ciContext = CIContext(
mtlDevice: metalDevice, options: [.priorityRequestLow: true, .highQualityDownsample: false])
context.coordinator.metalCommandQueue = metalDevice.makeCommandQueue()
context.coordinator.actionSubscriber = actionPublisher.sink { type in
switch type {
case .buffer(let pixelBuffer):
context.coordinator.updateCIImage(pixelBuffer)
break
case .image(let image):
context.coordinator.updateCIImage(image)
break
}
}
return mtkView
}
public func updateUIView(_ nsView: MTKView, context: UIViewRepresentableContext<MetalView>) {
}
public class Coordinator: NSObject, MTKViewDelegate {
var parent: MetalView
var metalCommandQueue: MTLCommandQueue!
var ciContext: CIContext!
private var image: CIImage? {
didSet {
Task { @MainActor in
self.parent.mtkView.setNeedsDisplay() //<--- call Draw method
}
}
}
var actionSubscriber: (any Combine.Cancellable)?
private let operationQueue = OperationQueue()
init(_ parent: MetalView) {
self.parent = parent
operationQueue.qualityOfService = .background
super.init()
}
public func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {
}
public func draw(in view: MTKView) {
guard let drawable = view.currentDrawable, let ciImage = image,
let commandBuffer = metalCommandQueue.makeCommandBuffer(), let ci = ciContext
else {
return
}
//making sure nothing is nil, now we can add the current frame to the operationQueue for processing
operationQueue.addOperation(
MetalOperation(
drawable: drawable, drawableSize: view.drawableSize, ciImage: ciImage,
commandBuffer: commandBuffer, pixelFormat: view.colorPixelFormat, ciContext: ci))
}
//consumed by Subscriber
func updateCIImage(_ img: CIImage) {
image = img
}
//consumed by Subscriber
func updateCIImage(_ buffer: CVPixelBuffer) {
image = CIImage(cvPixelBuffer: buffer)
}
}
}
now the MetalOperation class:
private class MetalOperation: Operation, @unchecked Sendable {
let drawable: CAMetalDrawable
let drawableSize: CGSize
let ciImage: CIImage
let commandBuffer: MTLCommandBuffer
let pixelFormat: MTLPixelFormat
let ciContext: CIContext
init(
drawable: CAMetalDrawable, drawableSize: CGSize, ciImage: CIImage,
commandBuffer: MTLCommandBuffer, pixelFormat: MTLPixelFormat, ciContext: CIContext
) {
self.drawable = drawable
self.drawableSize = drawableSize
self.ciImage = ciImage
self.commandBuffer = commandBuffer
self.pixelFormat = pixelFormat
self.ciContext = ciContext
}
override func main() {
let width = Int(drawableSize.width)
let height = Int(drawableSize.height)
let ciWidth = Int(ciImage.extent.width) //<-- Thread 22: EXC_BAD_ACCESS (code=1, address=0x5e71f5490) A bad access to memory terminated the process.
let ciHeight = Int(ciImage.extent.height)
let destination = CIRenderDestination(
width: width, height: height, pixelFormat: pixelFormat, commandBuffer: commandBuffer,
mtlTextureProvider: { [self] () -> MTLTexture in
return drawable.texture
})
let transform = CGAffineTransform(
scaleX: CGFloat(width) / CGFloat(ciWidth), y: CGFloat(height) / CGFloat(ciHeight))
do {
try ciContext.startTask(toClear: destination)
try ciContext.startTask(toRender: ciImage.transformed(by: transform), to: destination)
} catch {
}
commandBuffer.present(drawable)
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
}
}
Now I am no Metal expert, but I believe it's a very simple execution that shouldn't cause memory leak especially after we have already checked for whether CIImage is nil or not. I have also tried running this code without OperationQueue and also tried with @autoreleasepool but none of them has solved this problem.
Am I missing something?
I’ve been trying to run Jurassic World Evolution 2 using the Game Porting Toolkit on macOS, but the game doesn’t launch and crashes immediately. Based on the error and research, it seems the issue is related to missing support for D3D12_TILED_RESOURCES_TIER_2 in the Metal API.
If this is the case, does anyone know if support for tiled resources is planned for future updates of the toolkit? Or are there any potential workarounds for bypassing this limitation?
I am trying to load some PNG data with MTKTextureLoader newTextureWithData,but the result shows wrong at the alpha area.
Here is the code. I have an image URL, after it downloads successfully, I try to use the data or UIImagePNGRepresentation (image), they all show wrong.
UIImage *tempImg = [UIImage imageWithData:data];
CGImageRef cgRef = tempImg.CGImage;
MTKTextureLoader *loader = [[MTKTextureLoader alloc] initWithDevice:device];
id<MTLTexture> temp1 = [loader newTextureWithData:data options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil];
NSData *tempData = UIImagePNGRepresentation(tempImg);
id<MTLTexture> temp2 = [loader newTextureWithData:tempData options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil];
id<MTLTexture> temp3 = [loader newTextureWithCGImage:cgRef options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil];
}] resume];
What evidence exists that it's safe to call nextDrawable() on CAMetalLayer off the main thread? I have seen developers claiming that it's OK, but the official docs are silent on the topic. Attempting to do so with Strict Concurrency Checking set to Complete complains that CAMetalLayer is not @Sendable.
I want to call it off the main thread since there doesn't seem to be any way to prevent it from blocking the UI for up to a second. I have read hints and allegations that this won't happen if you avoid asking for too many drawables, but that doesn't seem to be true 100% of the time in my experience.
Supposing it is allowed, I wonder how races are handled such as when the layer's size is changed on the main thread, or if the layer is removed from the layer hierarchy.
Topic:
Graphics & Games
SubTopic:
Metal
I am trying to use the SVGF denoiser to denoise my ray traced shadows (and also other textures later). I do get a smoothed image, but with wonky denoising.
I need the depth-normal textures and motion textures for the SVGF and assume that these are badly filled in my case. However, neither in the above linked documentation nor in the WWDC19 video I find how they should be defined. I am looking to answers to:
Is depth in red or alpha channel for the depth-normal texture?
Are the normals in screen space?
Is depth linear?
Is it distance or z coordinate in view space? Or even logarithmically scaled or something else?
Are the motion vectors supposed to be in pixels per frame?
What is the orientation of the axis? Is y up or down?
Are there are other restrictions on the formats?
Also the linked code did not help me (I have not found any SVGF so far; also all the code is in Objective-C++, not Swift, but that's a different topic).
So how should I fill these textures.
Can someone point me to the documentation where these kinds of questions are answered?
Hello!
I need to "draw" a set of particles into the texture. It would be trivial in render encoder of course. However, I would like to implement the task in compute kernel. Every particle draw operation is expected to set 5 texels - "center" one and left/right/upper/lower. Particles can and will overlap, so concurrent draws are to be expected.
I tried using texture atomics - atomic_store() to be more precise. This worked, albeit pretty slowly - too slow for my purpose.
Just to test what would happen, I tried using normal texture write(). I was expecting to see some kind of visual artefacts, but to my surprise, it worked very well (and much faster).
My question: is it safe? I understand that calling write() doesn't guarantee any ordering of the operations, so if multiple threads write to the same texel, the final value may come from any of those threads. But suppose all the threads were to write the very same color? Can I assume that the texel in question will have said color after the compute kernel finishes?
I am using M2 Pro MacBook, but ideally I would love to get the answer for the all Apple Silicon devices. My texture format is R32Int (so as to be able to use atomics), but I could do with any single-channel format, the purpose of the texture is to be binary mask of sorts.
Thanks!
I used xcode gpu capture to profile render pipeline's bandwidth of my game.Then i found depth buffer and stencil buffer use the same buffer whitch it's format is Depth32Float_Stencil8.
But why in a single pass of pipeline, this buffer was loaded twice, and the Load Attachment Size of Encoder Statistics was double.
Is there any bug with xcode gpu capture?Or the pass really loaded the buffer twice times?
Topic:
Graphics & Games
SubTopic:
Metal
Hi,
Apple’s documentation on Order-Independent Transparency (OIT) describes an approach using image blocks, where an array of size 4 is allocated per fragment to store depth and color in a tile shading compute pass.
However, when increasing the scene’s depth complexity by adding more overlapping quads, the OIT implementation fails due to the fixed array size.
Is there a way to dynamically allocate storage for fragments based on actual depth complexity encountered during rasterization, rather than using a fixed-size array? Specifically, can an adaptive array of fragments be maintained and sorted by depth, where the size grows as needed instead of being limited to 4 entries?
Any insights or alternative approaches would be greatly appreciated.
Thank you!
Hello! I'm currently porting a videogame console emulator to iOS and I'm trying to make the renderer (tested on MacOS) work on iOS as well.
The emulator core is written in C++ and uses metal-cpp for rendering, whereas the iOS frontend is written in Swift with SwiftUI. I have an Objective-C++ bridging header for bridging the Swift and C++ sides.
On the Swift side, I create an MTKView. Inside the MTKView delegate, I run the emulator for 1 video frame and pass it the view's backing layer for it to render the final output image with. The emulator runs and returns, but when it returns I get a crash in Swift land (callstack attached below), inside objc_release, which indicates I'm doing something wrong with memory management.
My bridging interface (ios_driver.h):
#pragma once
#include <Foundation/Foundation.h>
#include <QuartzCore/QuartzCore.h>
void iosCreateEmulator();
void iosRunFrame(CAMetalLayer* layer);
Bridge implementation (ios_driver.mm):
#import <Foundation/Foundation.h>
extern "C" {
#include "ios_driver.h"
}
<...>
#define IOS_EXPORT extern "C" __attribute__((visibility("default")))
std::unique_ptr<Emulator> emulator = nullptr;
IOS_EXPORT void iosCreateEmulator() { ... }
// Runs 1 video frame of the emulator and
IOS_EXPORT void iosRunFrame(CAMetalLayer* layer) {
void* layerBridged = (__bridge void*)layer;
// Pass the CAMetalLayer to the emulator
emulator->getRenderer()->setMTKLayer(layerBridged);
// Runs the emulator for 1 frame and renders the output image using our layer
emulator->runFrame();
}
My MTKView delegate:
class Renderer: NSObject, MTKViewDelegate {
var parent: ContentView
var device: MTLDevice!
init(_ parent: ContentView) {
self.parent = parent
if let device = MTLCreateSystemDefaultDevice() {
self.device = device
}
super.init()
}
func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {}
func draw(in view: MTKView) {
var metalLayer = view.layer as! CAMetalLayer
// Run the emulator for 1 frame & display the output image
iosRunFrame(metalLayer)
}
}
Finally, the emulator's render function that interacts with the layer:
void RendererMTL::setMTKLayer(void* layer) {
metalLayer = (CA::MetalLayer*)layer;
}
void RendererMTL::display() {
CA::MetalDrawable* drawable = metalLayer->nextDrawable();
if (!drawable) {
return;
}
MTL::Texture* texture = drawable->texture();
<rest of rendering follows here using the drawable & its texture>
}
This is the Swift callstack at the time of the crash:
To my understanding, I shouldn't be violating ARC rules as my bridging header uses CAMetalLayer* instead of void* and Swift will automatically account for ARC when passing CoreFoundation objects to Objective-C. However I don't have any other idea as to what might be causing this. I've been trying to debug this code for a couple of days without much success.
If you need more info, the emulator code is also on Github
Metal renderer: https://github.com/wheremyfoodat/Panda3DS/blob/ios/src/core/renderer_mtl/renderer_mtl.cpp#L58-L68
Bridge implementation: https://github.com/wheremyfoodat/Panda3DS/blob/ios/src/ios_driver.mm
Bridging header: https://github.com/wheremyfoodat/Panda3DS/blob/ios/include/ios_driver.h
Any help is more than appreciated. Thank you for your time in advance.
The flushContextInternal function in glr_sync.mm:262 called abort internally. What caused this? Was it due to high device temperature or some other reason?
Date/Time: 2024-08-29 09:20:09.3102 +0800
Launch Time: 2024-08-29 08:53:11.3878 +0800
OS Version: iPhone OS 16.7.10 (20H350)
Release Type: User
Baseband Version: 8.50.04
Report Version: 104
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Triggered by Thread: 0
Thread 0 name:
Thread 0 Crashed:
0 libsystem_kernel.dylib 0x00000001ed053198 __pthread_kill + 8 (:-1)
1 libsystem_pthread.dylib 0x00000001fc5e25f8 pthread_kill + 208 (pthread.c:1670)
2 libsystem_c.dylib 0x00000001b869c4b8 abort + 124 (abort.c:118)
3 AppleMetalGLRenderer 0x00000002349f574c GLDContextRec::flushContextInternal() + 700 (glr_sync.mm:262)
4 DiSpecialDriver 0x000000010824b07c Di::RHI::onRenderFrameEnd() + 184 (RHIDevice.cpp:118)
5 DiSpecialDriver 0x00000001081b85f8 Di::Client::drawFrame() + 120 (Client.cpp:155)
2024-08-27_14-44-10.8104_+0800-07d9de9207ce4c73289507e608e5de4320d02ccf.crash
Topic:
Graphics & Games
SubTopic:
Metal
I have this drawing app that I have been working on for the past few years when I have free time. I recently rebuilt the app in Metal to build out other brushes and improve performance, need to render 10000s of lines in realtime.
I’m running into this issue trying to create a uniform opacity per path. I have a solution but do not love it - as this is a realtime app and the solution could have some bottlenecks. If I just generate a triangle strip from touch points and do my best to smooth, resample, and handle miters I will always get some overlaps. See:
To create a uniform opacity I render to an offscreen texture with blending disabled. I then pre-multiply the color and draw that texture to a composite texture with blending on (I do this per path). This works but gets tricky when you introduce a textured brush, the edges of the texture in the frag shader cut out the line.
Pasted Graphic 1.png
Solution: I discard below a threshold
fragment float4 fragment_line(VertexOut in [[stage_in]],
texture2d<float> texture [[ texture(0) ]]) {
constexpr sampler s(coord::normalized, address::mirrored_repeat, filter::linear);
float2 texCoord = in.texCoord;
float4 texColor = texture.sample(s, texCoord);
if (texColor.a < 0.01) discard_fragment(); // may be slow (from what I read)
return in.color * texColor;
}
Better but still not perfect.
Question: I'm looking for better ways to create a uniform opacity per path. I tried .max blending but that will cause no blending of other paths. Any tips, ideas, much appreciated. If this is too detailed of a question just achieve.