I am working on a project for macOS where I am taking an AVCaptureSession's CVPixelBuffer and I need to convert it into a MTLTexture for rendering. On macOS the pixel format is 2vuy, there does not seem to be a clear format conversion while converting to a metal texture. I have been able to convert it to a texture but the color space seems to be off as it is rendering distorted colors with a double image.
I believe 2vuy is a single pane color space and I have tried to account for that, but I am unaware of what is off.
I have attached The CVPixelBuffer and The distorted MTLTexture along with a laundry list of errors.
On iOS my conversions are fine, it is only the macOS 2vuy pixel format that seems to have issues.
My code for the conversion is also attached.
If there are any suggestions or guidance on how to properly convert a 2vuy CVPixelBuffer to a MTLTexture I would greatly appreciate it.
Many Thanks
Conversion_Logs.txt
ConversionCode.swift
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Posts under Metal tag
160 Posts
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
I’m building a professional camera app where users can customize the video recording format and color grading. In the func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) method, I handle video frames and use Metal for real-time color grading. This works well when device.activeColorSpace is sRGB or P3, and the results are great. However, when the color space is HLG_BT2020 or appleLog, the MTKTextureLoader.newTexture(cgImage: cgImage, options: options) method throws an error. After researching, I found that the video frame in these color spaces has a bit-per-channel (bpc) greater than 8 after being converted to CGImage, causing the texture creation to fail. I tried converting the CGImage to a lower bpc to successfully create the texture, but the final output image is garbled and not as expected. Is there a solution to this issue?
Currently I am using mixed style immersive view to place both my WindowView(plain style) and ImmersiveView content together. The issue is that the rendering depth testing may always let the virtual content block my normal WindowView. Is it possible to manually set windowedVIew always displays in the front of my virtual view in mixed style immersion? (I know modelSortGroup but it doesn't quite fits here)
Or if I can dynamically change the .progressive value when the immersive space is open (set the value to zero means .mixed itself right?)
Hi,
I am working with a large project. We are compiling each material to its own .metallib. They all include many common files full of inline functions. Finally we link it all together at the end with a single big pathtrace kernel. Everything works as expected, however the compile times have gotten completely out of hand and it takes multiple minutes to compile at runtime (to native code). I have gathered that I can do this offline by using metal-tt however if I am wondering if there is a way to reduce the compile times in such a scenario, and how to investigate what the root cause of the problem is. I suspect it could have to do with the fact that every materials metallib contains duplications of all the inline functions. Any ideas on how to profile and debug this?
Thanks,
Rasmus
Hello. In the iOS app i'm working on we are very tight on memory budget and I was looking at ways to reduce our texture memory usage. However I noticed that comparing ASTC8x8 to ASTC12x12, there is no actual difference in allocated memory for most of our textures despite ASTC12x12 having less than half the bpp of 8x8. The difference between the two only becomes apparent for textures 1024x1024 and larger, and even in that case the actual texture data is sometimes only 60% of the allocation size. I understand there must be some alignment and padding going on, but this seems extreme. For an example scene in my app with astc12x12 for most textures there is over a 100mb difference in astc size on disk versus when loaded, so I would love to be able to recover even a portion of that memory.
Here is some test code with some measurements i've taken using an iphone 11:
for(int i = 0; i < 11; i++) {
MTLTextureDescriptor *texDesc = [[MTLTextureDescriptor alloc] init];
texDesc.pixelFormat = MTLPixelFormatASTC_12x12_LDR;
int dim = 12;
int n = 2 << i;
int mips = i+1;
texDesc.width = n;
texDesc.height = n;
texDesc.mipmapLevelCount = mips;
texDesc.resourceOptions = MTLResourceStorageModeShared;
texDesc.usage = MTLTextureUsageShaderRead;
// Calculate the equivalent astc texture size
int blocks = 0;
if(mips == 1) {
blocks = n/dim + (n%dim>0? 1 : 0);
blocks *= blocks;
} else {
for(int j = 0; j < mips; j++) {
int a = 2 << j;
int cur = a/dim + (a%dim>0? 1 : 0);
blocks += cur*cur;
}
}
auto tex = [objCObj newTextureWithDescriptor:texDesc];
printf("%dx%d, mips %d, Astc: %d, Metal: %d\n", n, n, mips, blocks*16, (int)tex.allocatedSize);
}
MTLPixelFormatASTC_12x12_LDR
128x128, mips 7, Astc: 2768, Metal: 6016
256x256, mips 8, Astc: 10512, Metal: 32768
512x512, mips 9, Astc: 40096, Metal: 98304
1024x1024, mips 10, Astc: 158432, Metal: 262144
128x128, mips 1, Astc: 1936, Metal: 4096
256x256, mips 1, Astc: 7744, Metal: 16384
512x512, mips 1, Astc: 29584, Metal: 65536
1024x1024, mips 1, Astc: 118336, Metal: 147456
MTLPixelFormatASTC_8x8_LDR
128x128, mips 7, Astc: 5488, Metal: 6016
256x256, mips 8, Astc: 21872, Metal: 32768
512x512, mips 9, Astc: 87408, Metal: 98304
1024x1024, mips 10, Astc: 349552, Metal: 360448
128x128, mips 1, Astc: 4096, Metal: 4096
256x256, mips 1, Astc: 16384, Metal: 16384
512x512, mips 1, Astc: 65536, Metal: 65536
1024x1024, mips 1, Astc: 262144, Metal: 262144
I also tried using MTLHeaps (placement and automatic) hoping they might be better, but saw nearly the same numbers.
Is there any way to have metal allocate these textures in a more compact way to save on memory?
I am working on a project for macOS where I am taking an AVCaptureSession's CVPixelBuffer and I need to convert it into a MTLTexture for rendering. On macOS the pixel format is 2vuy, there does not seem to be a clear format conversion while converting to a metal texture. I have been able to convert it to a texture but the color space seems to be off as it is rendering distorted colors with a double image.
I believe 2vuy is a single pane color space and I have tried to account for that, but I am unaware of what is off.
I have attached The CVPixelBuffer and The distorted MTLTexture along with a laundry list of errors.
On iOS my conversions are fine, it is only the macOS 2vuy pixel format that seems to have issues.
My code for the conversion is also attached.
If there are any suggestions or guidance on how to properly convert a 2vuy CVPixelBuffer to a MTLTexture I would greatly appreciate it.
Many Thanks
Conversion_Logs.txt
ConversionCode.swift
Hello!
I have a question about how thread groups work with tile shading. When running "traditional" compute, I get to choose both thread group size and the grid size. However, when using tile shading kernel I only have dispatchThreadsPerTile method - this controls how many threads will be ran in each tile. So far so good, but what about thread groups?
The examples in video "Tile Shading on A11" seem to suggest that there will be only one thread group per tile. In the video, [[thread_index_in_threadgroup]] is called "local_id" and it is used to access the image block.
I assume this is the default configuration. So when one does the following:
Creates MTLRenderPassDescriptor with tileWidth set to W and tileHeight set to H
Fires up the tile shading kernel using dispatchThreadsPerTile with MTLSize size = { W, H, 1 }
I understand that the result is 1-to-1 mapping between the tile "pixels" and kernel threads. Now, what I would like to do is to have more than one thread group there. I want this for performance reasons: I have a certain compute kernel which I know executes very well with small thread group size. In fact, { 32, 1, 1 } seems to be the fastest. My understanding is that even if I set tile size to 16x16, and so I am executing 256 threads there, there will only be one SIMD group active in a thread group. Meaning that this SIMD group has to execute 8 times over the tile.
Is it possible somehow? Or perhaps the limitations of the API are pointing at the limitations of hardware itself, and if I want to execute with SIMD group sized thread groups I have to use "traditional" compute encoder?
Will be grateful for help.
Michał
I have this drawing app that I have been working on for the past few years when I have free time. I recently rebuilt the app in Metal to build out other brushes and improve performance, need to render 10000s of lines in realtime.
I’m running into this issue trying to create a uniform opacity per path. I have a solution but do not love it - as this is a realtime app and the solution could have some bottlenecks. If I just generate a triangle strip from touch points and do my best to smooth, resample, and handle miters I will always get some overlaps. See:
To create a uniform opacity I render to an offscreen texture with blending disabled. I then pre-multiply the color and draw that texture to a composite texture with blending on (I do this per path). This works but gets tricky when you introduce a textured brush, the edges of the texture in the frag shader cut out the line.
Pasted Graphic 1.png
Solution: I discard below a threshold
fragment float4 fragment_line(VertexOut in [[stage_in]],
texture2d<float> texture [[ texture(0) ]]) {
constexpr sampler s(coord::normalized, address::mirrored_repeat, filter::linear);
float2 texCoord = in.texCoord;
float4 texColor = texture.sample(s, texCoord);
if (texColor.a < 0.01) discard_fragment(); // may be slow (from what I read)
return in.color * texColor;
}
Better but still not perfect.
Question: I'm looking for better ways to create a uniform opacity per path. I tried .max blending but that will cause no blending of other paths. Any tips, ideas, much appreciated. If this is too detailed of a question just achieve.
I have been concentrating on developing the visionOS application. While I am currently quite familiar with RealityKit, CompositorServices has also captured my attention. I have not yet acquired knowledge of CompositorServices. Could you please clarify whether it is essential for me to learn CompositorServices? Additionally, I would appreciate it if you could provide insights into the advantages of RealityKit and CompositorServices.
In this video, tile fragment shading is recommended for image processing. In this example, the unpack function takes two arguments, one of which is RasterizerData. As I understand it, this is the data passed to us from the previous stage (Vertex) of the graphics pipeline.
However, the properties of MTLTileRenderPipelineDescriptor do not include an option for specifying a Vertex function. Therefore, in this render pass, a mix of commands is used: first, a draw command is executed to obtain UV coordinates, and then threads are dispatched.
My question is: without using a draw command, only dispatch, how can I get pixel coordinates in the fragment tile function? For the kernel tile function, everything is clear.
typedef struct
{
float4 OPTexture [[ color(0) ]];
float4 IntermediateTex [[ color(1) ]];
} FragmentIO;
fragment FragmentIO Unpack(RasterizerData in [[ stage_in ]],
texture2d<float, access::sample> srcImageTexture [[texture(0)]])
{
FragmentIO out;
//...
// Run necessary per-pixel operations
out.OPTexture = // assign computed value;
out.IntermediateTex = // assign computed value;
return out;
}
When I toggle a panel like navigationsidebar, I get a message in the console. I guess it's not a big issue, but is there a way to fix this message? because it appears in every project.
Unable to open mach-O at path: /AppleInternal/Library/BuildRoots/d187757d-b9a3-11ef-83e5-aabfac210453/Library/Caches/com.apple.xbs/Binaries/RenderBox/install/TempContent/Root/System/Library/PrivateFrameworks/RenderBox.framework/Versions/A/Resources/default.metallib Error:2
I am trying to learn Metal development on my MacBook Pro M1 Pro (Sequoia 15.3.1) on Xcode Playground, but when I write these two lines of code:
import Metal
let device = MTLCreateSystemDefaultDevice()!
I get the error The LLDB RPC server has crashed. Any ideas as to what I can do to solve this? I have rebooted the machine and reinstalled Xcode...
My app is running Compute Shaders that use non-uniform thread groups.
When I run the app in the debugger with a simulator target the app crashes on encoder.dispatchThreads and the error message is:
Dispatch Threads with Non-Uniform Threadgroup Size is not supported on this device.
Previously the log output states that:
Metal Shader Validation is unsupported for Simulator.
However:
When I stop the debugger and just run the app in the simulator without the debugger attached, the app just runs fine and does not crash.
The SwiftUI Preview that also triggers the Compute Shader when preparing data also just runs fine without a crash.
I can run and debug on a real device no problem - I just don't have all sizes available.
Is there anything I need to check in my lldb/simulator configuration? It obviously does work, just the debugger cannot really deal with it?
Any input would be nice as this really slows my down as I have to be extremely careful when debugging on the simulator.
I’m working on a Vision Pro app using Metal and need to implement multi-pass rendering. Specifically, I want to render intermediate results to a texture, then use that texture in a second pass for post-processing before presenting the final output.
What’s the best approach in visionOS? Should I use multiple render passes in a single command buffer or separate command buffers? Any insights on efficiently handling this in RealityKit or Metal?
Thanks!
Hi,
Apple’s documentation on Order-Independent Transparency (OIT) describes an approach using image blocks, where an array of size 4 is allocated per fragment to store depth and color in a tile shading compute pass.
However, when increasing the scene’s depth complexity by adding more overlapping quads, the OIT implementation fails due to the fixed array size.
Is there a way to dynamically allocate storage for fragments based on actual depth complexity encountered during rasterization, rather than using a fixed-size array? Specifically, can an adaptive array of fragments be maintained and sorted by depth, where the size grows as needed instead of being limited to 4 entries?
Any insights or alternative approaches would be greatly appreciated.
Thank you!
Hi, I've got a Swift Framework with a bunch of Metal files. Currently users have to manually include a Metal Lib in their bundle provided separately, to use the Swift Package.
First question; Is there a way to make a Metal Lib target in a Swift Package, and just include the .metal files? (without a binary asset)
Second question; If not, Swift 5.3 has resource support, how would you recommend to bundle a Metal Lib in a Swift Package?
Hello! I'm currently porting a videogame console emulator to iOS and I'm trying to make the renderer (tested on MacOS) work on iOS as well.
The emulator core is written in C++ and uses metal-cpp for rendering, whereas the iOS frontend is written in Swift with SwiftUI. I have an Objective-C++ bridging header for bridging the Swift and C++ sides.
On the Swift side, I create an MTKView. Inside the MTKView delegate, I run the emulator for 1 video frame and pass it the view's backing layer for it to render the final output image with. The emulator runs and returns, but when it returns I get a crash in Swift land (callstack attached below), inside objc_release, which indicates I'm doing something wrong with memory management.
My bridging interface (ios_driver.h):
#pragma once
#include <Foundation/Foundation.h>
#include <QuartzCore/QuartzCore.h>
void iosCreateEmulator();
void iosRunFrame(CAMetalLayer* layer);
Bridge implementation (ios_driver.mm):
#import <Foundation/Foundation.h>
extern "C" {
#include "ios_driver.h"
}
<...>
#define IOS_EXPORT extern "C" __attribute__((visibility("default")))
std::unique_ptr<Emulator> emulator = nullptr;
IOS_EXPORT void iosCreateEmulator() { ... }
// Runs 1 video frame of the emulator and
IOS_EXPORT void iosRunFrame(CAMetalLayer* layer) {
void* layerBridged = (__bridge void*)layer;
// Pass the CAMetalLayer to the emulator
emulator->getRenderer()->setMTKLayer(layerBridged);
// Runs the emulator for 1 frame and renders the output image using our layer
emulator->runFrame();
}
My MTKView delegate:
class Renderer: NSObject, MTKViewDelegate {
var parent: ContentView
var device: MTLDevice!
init(_ parent: ContentView) {
self.parent = parent
if let device = MTLCreateSystemDefaultDevice() {
self.device = device
}
super.init()
}
func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {}
func draw(in view: MTKView) {
var metalLayer = view.layer as! CAMetalLayer
// Run the emulator for 1 frame & display the output image
iosRunFrame(metalLayer)
}
}
Finally, the emulator's render function that interacts with the layer:
void RendererMTL::setMTKLayer(void* layer) {
metalLayer = (CA::MetalLayer*)layer;
}
void RendererMTL::display() {
CA::MetalDrawable* drawable = metalLayer->nextDrawable();
if (!drawable) {
return;
}
MTL::Texture* texture = drawable->texture();
<rest of rendering follows here using the drawable & its texture>
}
This is the Swift callstack at the time of the crash:
To my understanding, I shouldn't be violating ARC rules as my bridging header uses CAMetalLayer* instead of void* and Swift will automatically account for ARC when passing CoreFoundation objects to Objective-C. However I don't have any other idea as to what might be causing this. I've been trying to debug this code for a couple of days without much success.
If you need more info, the emulator code is also on Github
Metal renderer: https://github.com/wheremyfoodat/Panda3DS/blob/ios/src/core/renderer_mtl/renderer_mtl.cpp#L58-L68
Bridge implementation: https://github.com/wheremyfoodat/Panda3DS/blob/ios/src/ios_driver.mm
Bridging header: https://github.com/wheremyfoodat/Panda3DS/blob/ios/include/ios_driver.h
Any help is more than appreciated. Thank you for your time in advance.
I notice some metal-cpp classes have static funtion like
static URL* fileURLWithPath(const class String* pPath);
static class ComputePassDescriptor* computePassDescriptor();
static class AccelerationStructurePassDescriptor* accelerationStructurePassDescriptor();
which return a new object.
these classes also provide 'alloc' and 'init' function to create object by default.
for object created by 'alloc' and 'init', I use something like NS::Shaderd_Ptr or call release directly to free memory. Because 'alloc' and 'init' not explicit call on these static function.
I wonder how to correctly free object created by these static function? did they managed by autorelease pool?
Hello ladies and gentlemen, I'm writing a simple renderer on the main actor using Metal and Swift 6. I am at the stage now where I want to create a render pipeline state using asynchronous API:
@MainActor
class Renderer {
let opaqueMeshRPS: MTLRenderPipelineState
init(/*...*/) async throws {
let descriptor = MTLRenderPipelineDescriptor()
// ...
opaqueMeshRPS = try await device.makeRenderPipelineState(descriptor: descriptor)
}
}
I get a compilation error if try to use the asynchronous version of the makeRenderPipelineState method:
Non-sendable type 'any MTLRenderPipelineState' returned by implicitly asynchronous call to nonisolated function cannot cross actor boundary
Which is understandable, since MTLRenderPipelineState is not Sendable. But it looks like no matter where or how I try to access this method, I just can't do it - you have this API, but you can't use it, you can only use the synchronous versions.
Am I missing something or is Metal just not usable with Swift 6 right now?
On an iOS 18 phone, I use AVCaptureSession to capture HDR with x420 format. The output CMSampleBuffer is HLG colorspace, the propagated attachments contain kCVImageBufferAmbientViewingEnvironmentKey and kCVImageBufferSceneIlluminationKey. Now I use CAMetalLayer to render the CVPixelBuffer to the screen, but the brightness is brighter than AVSampleBufferDisplayLayer.
Here is my code.
- (void)_updateColorSpaceIfNeed:(CVPixelBufferRef)pixelBuffer {
CAMetalLayer *layer = (CAMetalLayer *)_mtkView.layer;
if (![layer isKindOfClass:CAMetalLayer.class]) return;
layer.wantsExtendedDynamicRangeContent = YES;
CFDataRef ambientViewingEnvironment = (CFDataRef)CVBufferCopyAttachment(pixelBuffer, kCVImageBufferAmbientViewingEnvironmentKey, NULL);
NSData *data = (__bridge NSData *)ambientViewingEnvironment;
if (ambientViewingEnvironment) CFRelease(ambientViewingEnvironment);
CAEDRMetadata *metadata = [CAEDRMetadata HLGMetadataWithAmbientViewingEnvironment:data];
// CAEDRMetadata *metadata = [CAEDRMetadata HLGMetadata];
layer.EDRMetadata = metadata;
layer.pixelFormat = MTLPixelFormatRGBA16Float;
CGColorSpaceRef colorspace = CGColorSpaceCreateWithName(kCGColorSpaceITUR_2100_HLG);
layer.colorspace = colorspace;
if (colorspace) CGColorSpaceRelease(colorspace);
}
Why does the CAEDRMetadata class have "HLGMetadataWithAmbientViewingEnvironment:" and "HLGMetadata" methods, but does not provide the "HLGMetadataWithAmbientViewingEnvironment:sceneIllumination" method?
I want to know how kCVImageBufferAmbientViewingEnvironmentKey and kCVImageBufferSceneIlluminationKey affect tone mapping. Is there any documentation I can refer to?