Here is the test code run in a macOS app (MacOS 15 Beta3).
If the excutable path does not contain Chinese character, every thing go as We expect. Otherwise(simply place excutable in a Chinese named directory) , the MTLLibrary We made by newLibraryWithSource: function contains no functions, We just got logs:
"Library contains the following functions: {}"
"Function 'squareKernel' not found."
Note: macOS 14 works fine
id<MTLDevice> device = MTLCreateSystemDefaultDevice();
if (!device) {
NSLog(@"not support Metal.");
}
NSString *shaderSource = @
"#include <metal_stdlib>\n"
"using namespace metal;\n"
"kernel void squareKernel(device float* data [[buffer(0)]], uint gid [[thread_position_in_grid]]) {\n"
" data[gid] *= data[gid];\n"
"}";
MTLCompileOptions *options = [[MTLCompileOptions alloc] init];
options.languageVersion = MTLLanguageVersion2_0;
NSError *error = nil;
id<MTLLibrary> library = [device newLibraryWithSource:shaderSource options:options error:&error];
if (error) {
NSLog(@"New MTLLibrary error: %@", error);
}
NSArray<NSString *> *functionNames = [library functionNames];
NSLog(@"Library contains the following functions: %@", functionNames);
id<MTLFunction> computeShaderFunction = [library newFunctionWithName:@"squareKernel"];
if (computeShaderFunction) {
NSLog(@"Found function 'squareKernel'.");
NSError *pipelineError = nil;
id<MTLComputePipelineState> pipelineState = [device newComputePipelineStateWithFunction:computeShaderFunction error:&pipelineError];
if (pipelineError) {
NSLog(@"Create pipeline state error: %@", pipelineError);
}
NSLog(@"Create pipeline state succeed!");
} else {
NSLog(@"Function 'squareKernel' not found.");
}
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Posts under Metal tag
200 Posts
Sort by:
Post
Replies
Boosts
Views
Activity
I had a problem that my app did not work outside Xcode (had to include the metalkit.framework
I found a simple trick to get my NSLog outputs and locate the issue:
NSURL *downloadsDirectory = [[[NSFileManager defaultManager] URLsForDirectory:NSDownloadsDirectory inDomains:NSUserDomainMask] firstObject];
NSURL *fileURL = [downloadsDirectory URLByAppendingPathComponent:@"Log.txt"];
NSString *filePath = [fileURL path];
// Redirect stderr to the file
freopen([filePath fileSystemRepresentation], "a+", stderr);
Now you get debug info in your Download folder!
I have recently developed an interest in the shader effects commonly found in Apple's UI and have been studying them. Additionally, as I own a Vision Pro, I have a strong desire to understand LowLevelMesh and am currently analyzing the sample code after watching the related session.
The part where I am completely stuck and unable to understand is the initializer section of CurveExtruder.
/// Initializes the `CurveExtruder` with the shape to sweep along the curve.
///
/// - Parameters:
/// - shape: The 2D shape to sweep along the curve.
init(shape: [SIMD2<Float>]) {
self.shape = shape
// Compute topology //
// Triangle fan lists each vertex in `shape` once for each ring, except for vertex `0` of `shape` which
// is listed twice. Plus one extra index for the end-index (0xFFFFFFFF).
let indexCountPerFan = 2 * (shape.count + 1) + 1
var topology: [UInt32] = []
topology.reserveCapacity(indexCountPerFan)
// Build triangle fan.
for vertexIndex in shape.indices.reversed() {
topology.append(UInt32(vertexIndex))
topology.append(UInt32(shape.count + vertexIndex))
}
// Wrap around to the first vertex.
topology.append(UInt32(shape.count - 1))
topology.append(UInt32(2 * shape.count - 1))
// Add end-index.
topology.append(UInt32.max)
assert(topology.count == indexCountPerFan)
I have tried to understand why the capacity reserved for the topology array is 2 * (shape.count + 1) + 1, but I am struggling to figure it out.
I do not understand the principle behind the order in which vertexIndex is added to the topology.
The confusion is even greater because, while the comment mentions trianglefan, the actual creation of the LowLevelMesh.Part object uses the topology: .triangleStrip argument. (Did I misunderstand? I know that the topology option includes triangle, but this uses duplicated vertices.)
I am feeling very stuck. It's hard to find answers even through search options or LLMs. Maybe this requires specialized knowledge in computer graphics, which makes me feel embarrassed to ask.
However, personally, I have tried various directions without external help but still cannot find a clear path, so I am desperately seeking assistance!
P.S. As Korean is my primary language, I apologize in advance if there are any awkward or rude expressions.
Hey, I’m building a camera app where I am applying real time effects to the view finder. One of those effects is a variable blur, so to improve performance I am scaling down the input image using CIFilter.lanczosScaleTransform(). This works fine and runs at 30FPS, but when running the metal profiler I can see that the scaling transforms use a lot of GPU time, almost as much as the variable blur. Is there a more efficient way to do this?
The simplified chain is like this:
Scale down viewFinder CVPixelBuffer (CIFilter.lanczosScaleTransform)
Scale up depthMap CVPixelBuffer to match viewFinder size (CIFilter.lanczosScaleTransform)
Create CIImages from both CVPixelBuffers
Apply VariableDepthBlur (CIFilter.maskedVariableBlur)
Scale up final image to metal view size (CIFilter.lanczosScaleTransform)
Render CIImage to a MTKView using CIRenderDestination
From some research, I wonder if scaling the CVPixelBuffer using the accelerate framework would be faster? Also, Instead of scaling the final image, perhaps I could offload this to the metal view?
Any pointers greatly appreciated!
Hi, all.
I've been writing various computational functions using Metal.
However, in the following operation functions, unlike + and *, there is an accuracy issue in the / operation.
This is a function that divides a matrix of shape [n, x, y] and a scalar [1].
When compared to numpy or torch, if I change the operator of the above function to * or + instead of /, I can get completely the same results, but in the case of /, there is a difference in the mean of more than 1e-5.
(For reference, this was written with reference to the metal kernel code in llama.cpp)
kernel void kernel_div_single_f16(
device const half * src0,
device const half * src1,
device half * dst,
constant int64_t & ne00,
constant int64_t & ne01,
constant int64_t & ne02,
constant int64_t & ne03,
uint3 tgpig[[threadgroup_position_in_grid]],
uint3 tpitg[[thread_position_in_threadgroup]],
uint3 ntg[[threads_per_threadgroup]]) {
const int64_t i03 = tgpig.z;
const int64_t i02 = tgpig.y;
const int64_t i01 = tgpig.x;
const uint offset = i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00;
for (int i0 = tpitg.x; i0 < ne00; i0 += ntg.x) {
dst[offset + i0] = src0[offset+i0] / *src1;
}
}
My mac book is,
Macbork Pro(16, 2021) / macOS 12.5 / Apple M1 Pro.
Are there any issues related to Div? Thanks in advance for your reply.
Hello!
I noticed that after WWDC 24 there was support added for MTKView in visionOS 1.0+. This is great! But when I use an MTKView in anything before visionOS 2.0 it doesn't work and the app ends up crashing.
Console error when running on a device that is on visionOS 1.2:
Symbol not found: _$s27_CompositorServices_SwiftUI0A5LayerV13configuration8rendererAcA0aE13Configuration_p_ySo019CP_OBJECT_cp_layer_G0CScMYcctcfC
Expected in: <EFD973D2-97E1-380B-B89A-13CC3820B7F7> /System/Library/Frameworks/_CompositorServices_SwiftUI.framework/_CompositorServices_SwiftUI
Looks like MTKView may be using compositor services under the hood?
Any help would be great.
Thank you!
I am using Metal for rendering, and when calling the newCommandQueue interface of Metal, there is a certain probability that I will get a nil return value. However, when I call the MTLCreateSystemDefaultDevice interface, I can get a non-empty return value, which means my device supports Metal. I would like to ask what causes the newCommandQueue to return nil? Is there any way to avoid this situation? Thank you.
tl;dr how can I get raw YUV in a Metal fragment shader from a VideoToolbox 10-bit/BT.2020 HEVC stream without any extra/secret format conversions?
With VideoToolbox and 10-bit HEVC, I've found that it defaults to CVPixelBuffers w/ formats kCVPixelFormatType_Lossless_420YpCbCr10PackedBiPlanarFullRange or kCVPixelFormatType_Lossy_420YpCbCr10PackedBiPlanarFullRange. To mitigate this, I have the following snippet of code to my application:
// We need our pixels unpacked for 10-bit so that the Metal textures actually work
var pixelFormat:OSType? = nil
let bpc = getBpcForVideoFormat(videoFormat!)
let isFullRange = getIsFullRangeForVideoFormat(videoFormat!)
// TODO: figure out how to check for 422/444, CVImageBufferChromaLocationBottomField?
if bpc == 10 {
pixelFormat = isFullRange ? kCVPixelFormatType_420YpCbCr10BiPlanarFullRange : kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange
}
let videoDecoderSpecification:[NSString: AnyObject] = [kVTVideoDecoderSpecification_EnableHardwareAcceleratedVideoDecoder:kCFBooleanTrue]
var destinationImageBufferAttributes:[NSString: AnyObject] = [kCVPixelBufferMetalCompatibilityKey: true as NSNumber, kCVPixelBufferPoolMinimumBufferCountKey: 3 as NSNumber]
if pixelFormat != nil {
destinationImageBufferAttributes[kCVPixelBufferPixelFormatTypeKey] = pixelFormat! as NSNumber
}
var decompressionSession:VTDecompressionSession? = nil
err = VTDecompressionSessionCreate(allocator: nil, formatDescription: videoFormat!, decoderSpecification: videoDecoderSpecification as CFDictionary, imageBufferAttributes: destinationImageBufferAttributes as CFDictionary, outputCallback: nil, decompressionSessionOut: &decompressionSession)
In short, I need kCVPixelFormatType_420YpCbCr10BiPlanar so that I have a straightforward MTLPixelFormat.r16Unorm/MTLPixelFormat.rg16Unorm texture binding for Y/CbCr. Metal, seemingly, has no direct pixel format for 420YpCbCr10PackedBiPlanar. I'd also rather not use any color conversion in VideoToolbox, in order to save on processing (and to ensure that the color transforms/transfer characteristics match between streamer/client, since I also have a custom transfer characteristic to mitigate blocking in dark scenes).
However, I noticed that in visionOS 2, the CVPixelBuffer I receive is no longer a compressed render target (likely a bug), which caused GPU texture read bandwidth to skyrocket from 2GiB/s to 30GiB/s. More importantly, this implies that VideoToolbox may in fact be doing an extra color conversion step, wasting memory bandwidth.
Does Metal actually have no way to handle 420YpCbCr10PackedBiPlanar? Are there any examples for reading 10-bit HDR HEVC buffers directly with Metal?
The title is self-exploratory. I wasn't able to find the CAMetalDisplayLink on the most recent metal-cpp release (metal-cpp_macOS15_iOS18-beta). Are there any plans to include it in the next release?
What is the info property of SwiftUI::Layer?
I couldn't find any document or resource about it.
It appears in SwiftUI::Layer's definition:
struct Layer {
metal::texture2d<half> tex;
float2 info[5];
/// Samples the layer at `p`, in user-space coordinates,
/// interpolating linearly between pixel values. Returns an RGBA
/// pixel value, with color components premultipled by alpha (i.e.
/// [R*A, G*A, B*A, A]), in the layer's working color space.
half4 sample(float2 p) const {
p = metal::fma(p.x, info[0], metal::fma(p.y, info[1], info[2]));
p = metal::clamp(p, info[3], info[4]);
return tex.sample(metal::sampler(metal::filter::linear), p);
}
};
Suppose I want to draw a red rectangle onto my render target using a compute shader.
id<MTLComputeCommandEncoder> encoder = [commandBuffer computeCommandEncoder];
[encoder setComputePipelineState:pipelineState];
simd_ushort2 position = simd_make_ushort2(100, 100);
simd_ushort2 size = simd_make_ushort2(50, 50);
[encoder setBytes:&position length:sizeof(position) atIndex:0];
[encoder setTexture:drawable.texture atIndex:0];
[encoder dispatchThreads:MTLSizeMake(size.x, size.y, 1)
threadsPerThreadgroup:MTLSizeMake(32, 32, 1)];
[encoder endEncoding];
#include <metal_stdlib>
using namespace metal;
kernel void
Compute(ushort2 position_in_grid [[thread_position_in_grid]],
constant ushort2 &position,
texture2d<half, access::write> texture)
{
texture.write(half4(1, 0, 0, 1), position_in_grid + position);
}
This works just fine:
Now, say for whatever reason I want to start using imageblocks in my compute kernel. First, I set the imageblock size on the CPU side:
id<MTLComputeCommandEncoder> encoder = [commandBuffer computeCommandEncoder];
[encoder setComputePipelineState:pipelineState];
MTLSize threadgroupSize = MTLSizeMake(32, 32, 1);
[encoder setImageblockWidth:threadgroupSize.width
height:threadgroupSize.height];
simd_ushort2 position = simd_make_ushort2(100, 100);
simd_ushort2 size = simd_make_ushort2(50, 50);
[encoder setBytes:&position length:sizeof(position) atIndex:0];
[encoder setTexture:drawable.texture atIndex:0];
MTLSize gridSize = MTLSizeMake(size.x, size.y, 1);
[encoder dispatchThreads:gridSize threadsPerThreadgroup:threadgroupSize];
And then I update the compute kernel to simply declare the imageblock – note I never actually read from or write to it:
#include <metal_stdlib>
using namespace metal;
struct Foo
{
int foo;
};
kernel void
Compute(ushort2 position_in_grid [[thread_position_in_grid]],
constant ushort2 &position,
texture2d<half, access::write> texture,
imageblock<Foo> imageblock)
{
texture.write(half4(1, 0, 0, 1), position_in_grid + position);
}
And now out of nowhere Metal’s shader validation starts complaining about mismatched texture usage flags:
2024-06-22 00:57:15.663132+1000 TextureUsage[80558:4539093] [GPUDebug] Texture usage flags mismatch executing kernel function "Compute" encoder: "1", dispatch: 0
2024-06-22 00:57:15.672004+1000 TextureUsage[80558:4539093] [GPUDebug] Texture usage flags mismatch executing kernel function "Compute" encoder: "1", dispatch: 0
2024-06-22 00:57:15.682422+1000 TextureUsage[80558:4539093] [GPUDebug] Texture usage flags mismatch executing kernel function "Compute" encoder: "1", dispatch: 0
2024-06-22 00:57:15.687587+1000 TextureUsage[80558:4539093] [GPUDebug] Texture usage flags mismatch executing kernel function "Compute" encoder: "1", dispatch: 0
2024-06-22 00:57:15.698106+1000 TextureUsage[80558:4539093] [GPUDebug] Texture usage flags mismatch executing kernel function "Compute" encoder: "1", dispatch: 0
The texture I’m writing to comes from a CAMetalDrawable whose associated CAMetalLayer has framebufferOnly set to NO. What am I missing?
A sample of some error messages that are presented in the Xcode log for executon of a program. There is nothing in the messages that will help identify a component as the origin of the message, nor is there a locatable derinition for the various labels and fields of the text. What component or even framework does this set of messages originate? Your search engine is useless because it returns gibberish. It doesn’t even follow the common behavior of SEARCH ENGINES because it takes label strings compounded from common words and searches for the common word instead of using the catenated string that is the internal variable name that is in the text.
2024-06-22 19:45:58.089943-0500 RoomPlanExampleApp[733:30145] [ClientDonation] (+[PPSClientDonation isRegisteredSubsystem:category:]) Permission denied: GenerativeFunctionMetrics / ANEInferenceOperationPrepareForEncode. I am looking for a definition of the error with a way to locate the context in which the error occurs.
2024-06-22 19:45:58.089967-0500 RoomPlanExampleApp[733:30145] [ClientDonation] (+[PPSClientDonation sendEventWithIdentifier:payload:]) Invalid inputs: payload={
aneModelPath = "/System/Library/PrivateFrameworks/RoomScanCore.framework/PrecompiledModels/lcnn_floorplan_model.bundle/H14G.bundle/main/segment_0__ane/net.hwx";
bundleIdentfier = "com.example.apple-samplecode.RoomPlanExampleApp9QSS565686";
}
2024-06-22 19:45:58.094770-0500 RoomPlanExampleApp[733:30145] [ClientDonation] (+[PPSClientDonation sendEventWithIdentifier:payload:]) Invalid inputs: payload={
bundleIdentfier = "com.example.apple-samplecode.RoomPlanExampleApp9QSS565686";
e5FunctionName = main;
numSegments = 1;
}
I have an immersive space that is rendered using metal. Is there a way that I can position swiftUI views at coordinates relative to positions in my immersive space?
I know that I can display a volume with RealityKit content simultaneously to my metal content. The volume's coordinate system specifically, it's bounds, does not, coincide with my entire metal scene.
One approach I thought of would be to open two views in my immersive space. That way, I could simply add Attachment's to invisible RealityKit Entities in one view at positions where I have content in my metal scene.
unfortunately it seems that, while I can declare an ImmersiveSpace be composed of multiple RealityViews
ImmersiveSpace(){
RealityView { content in
// load first view
} update: { content in
// update
}
}
RealityView{ content in
//load second view
}
} update: { content in
//update
}
}
That results in two coinciding realty kit views in the immersive space.
I can not however do something like this:
ImmersiveSpace(){
CompositorLayer(configuration: ContentStageConfiguration()){ layerRenderer in
//set up my metal renderer and stuff
}
RealityView{ content in
//set up a view where I could use attachments
} update: { content in
}
}
I'm having issues compiling the visionOS app via XcodeCloud.
Here's the error:
2024-06-20T09:24:47.634651911Z compileSkybox /Volumes/workspace/DerivedData/Build/Intermediates.noindex/ArchiveIntermediates/Fi22/InstallationBuildProductsLocation/Applications/VR22.app/ImageBasedLighting.rclink /Volumes/workspace/repository/Fi22/ImageBasedLighting.skybox (in target 'Fi22' from project 'Fi22')
2024-06-20T09:24:47.634669847Z cd /Volumes/workspace/repository
2024-06-20T09:24:47.634681756Z /Applications/Xcode.app/Contents/Developer/usr/bin/rctool create skybox -skyboxPath=/Volumes/workspace/repository/Fi22/ImageBasedLighting.skybox -o=/Volumes/workspace/DerivedData/Build/Intermediates.noindex/ArchiveIntermediates/Fi22/InstallationBuildProductsLocation/Applications/VR22.app
2024-06-20T09:24:47.634730433Z [91mError: [39m There is no available Metal device on this system.
2024-06-20T09:24:47.634745602Z Command compileSkybox failed with a nonzero exit code
How do I configure XcodeCloud to use an instance with Metal support? Another option would be precompiling the skybox but I couldn't find any info on that either.
Has there been an adjustment in the maximum number of DrawableQueues that can be swapped for textures in VisionOS 2? Or an adjustment in the total amount of RAM allowed in a scene?
I have been having a difficult time getting more than one DrawableQueue to appear when it worked fine in VisionOS 1.x.
Hi,
Reading the
copyFromBuffer
documentation states that on macOS, sourceOffset, destinationOffset, and size "needs to be a multiple of 4, but can be any value in iOS and tvOS".
However, I have noticed that, at least on my M2 Max, this limitation does not seem to exist as there are no warnings and the copy works correctly regardless of the offset value.
I'm curious to know if this is something that should still be avoided. Is the multiple of 4 limitation reserved for non Apple Silicon devices and that note can be ignored for Apple Silicon?
I ask because I am a contributor to Metal.jl, and recently noticed that our tests pass even when copying using copyWithBuffer with offsets and sizes that are not multiples of 4. If that coul cause issues/correctness problems, we would need to fix that.
Thank you.
Christian
Hi,
Introducing Swift Concurrency to my Metal app has been a bit challenging as Swift Concurrency is limited by the cooperative thread pool.
GPU work is obviously not CPU bound and can block forward moving progress, especially when using waitUntilCompleted on the command buffer. For concurrent render work this has the potential of under utilizing the CPU and even creating dead locks.
My question is, what is the Metal's teams general recommendation when it comes to concurrency? It seems to me that Dispatch or OperationQueues are still the preferred way for Metal bound tasks in order to gain maximum performance?
To integrate with Swift Concurrency my idea is to use continuations that kick off render jobs via Dispatch or Queues? Would this be the best solution to bridge async tasks with Metal work?
Thanks!
I've been upgrading Xcode consistently for years and have never seen Metal shaders behave differently from one version to another until now.
On macOS 14.5, Xcode 16 beta, suddenly several color outputs turn out completely black where there should be color. All validation is on and nothing seems to be wrong (and hasn't been since maybe Xcode version 11).
I've attached two screens. The first is the normal color scheme, the second is in Xcode 16. The settings are the exact same.
Normal:
Buggy with black + transparent colors (so it seems like either colors are overflowing or are all 0s)?
Before I file a bug report or code level request, may I have some thoughts on how to debug this? The only clue I have is that I'm using bindless to multiply color texture samples with color values from my vertex struct. But it still fails even if I use hard-coded values for the texture samples, meaning somehow the color values are not being sent to the shader correctly? This is the most stable part of my rendering pipeline, so I'm surprised if the issue is there.
Thank you.
Hello everyone,
Super exciting stuff released this year!
I was playing around with the Metal passthrough sample code
(see: https://developer.apple.com/documentation/compositorservices/interacting_with_virtual_content_blended_with_passthrough)
... and noticed that the upperLimbVisibility set to .automatic does not seem to work and my hand is always on top.
How to reproduce:
Draw something
Position your hand behind the brush stroke
Notice that your hands are always rendered on top
Taking a GPU frame capture reveals that the depth is correctly written.
Xcode: Version 16.0 beta (16A5171c)
VisionOS: visionOS 2.0 (22N5252n)
It’s great that we’ll be able to use Metal custom renderers in passthrough mode on visionOS.
https://developer.apple.com/wwdc24/10092
This is a lot of complicated set-up, however. It’s also unclear how occlusion and custom algorithms / raytracing will work in tandem with scene understanding. May we have a project template and/or sample? Preferably with the C api and not just swift. This would be much-appreciated and helpful to everyone who wants this set-up. I’d like to see the whole process.
Thank you for introducing this feature!