Render advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.

Posts under Metal tag

200 Posts
Sort by:

Post

Replies

Boosts

Views

Activity

MacOS 15 Beta3: Metal Shader with newLibraryWithSource didn't work if the executable path contains Chinese character.
Here is the test code run in a macOS app (MacOS 15 Beta3). If the excutable path does not contain Chinese character, every thing go as We expect. Otherwise(simply place excutable in a Chinese named directory) , the MTLLibrary We made by newLibraryWithSource: function contains no functions, We just got logs: "Library contains the following functions: {}" "Function 'squareKernel' not found." Note: macOS 14 works fine id<MTLDevice> device = MTLCreateSystemDefaultDevice(); if (!device) { NSLog(@"not support Metal."); } NSString *shaderSource = @ "#include <metal_stdlib>\n" "using namespace metal;\n" "kernel void squareKernel(device float* data [[buffer(0)]], uint gid [[thread_position_in_grid]]) {\n" " data[gid] *= data[gid];\n" "}"; MTLCompileOptions *options = [[MTLCompileOptions alloc] init]; options.languageVersion = MTLLanguageVersion2_0; NSError *error = nil; id<MTLLibrary> library = [device newLibraryWithSource:shaderSource options:options error:&error]; if (error) { NSLog(@"New MTLLibrary error: %@", error); } NSArray<NSString *> *functionNames = [library functionNames]; NSLog(@"Library contains the following functions: %@", functionNames); id<MTLFunction> computeShaderFunction = [library newFunctionWithName:@"squareKernel"]; if (computeShaderFunction) { NSLog(@"Found function 'squareKernel'."); NSError *pipelineError = nil; id<MTLComputePipelineState> pipelineState = [device newComputePipelineStateWithFunction:computeShaderFunction error:&pipelineError]; if (pipelineError) { NSLog(@"Create pipeline state error: %@", pipelineError); } NSLog(@"Create pipeline state succeed!"); } else { NSLog(@"Function 'squareKernel' not found."); }
0
0
42
7h
A little trick
I had a problem that my app did not work outside Xcode (had to include the metalkit.framework I found a simple trick to get my NSLog outputs and locate the issue: NSURL *downloadsDirectory = [[[NSFileManager defaultManager] URLsForDirectory:NSDownloadsDirectory inDomains:NSUserDomainMask] firstObject]; NSURL *fileURL = [downloadsDirectory URLByAppendingPathComponent:@"Log.txt"]; NSString *filePath = [fileURL path]; // Redirect stderr to the file freopen([filePath fileSystemRepresentation], "a+", stderr); Now you get debug info in your Download folder!
0
0
94
1d
Understanding the Topology of LowLevelMesh in RealityKitDrawingApp Sample Code
I have recently developed an interest in the shader effects commonly found in Apple's UI and have been studying them. Additionally, as I own a Vision Pro, I have a strong desire to understand LowLevelMesh and am currently analyzing the sample code after watching the related session. The part where I am completely stuck and unable to understand is the initializer section of CurveExtruder. /// Initializes the `CurveExtruder` with the shape to sweep along the curve. /// /// - Parameters: /// - shape: The 2D shape to sweep along the curve. init(shape: [SIMD2<Float>]) { self.shape = shape // Compute topology // // Triangle fan lists each vertex in `shape` once for each ring, except for vertex `0` of `shape` which // is listed twice. Plus one extra index for the end-index (0xFFFFFFFF). let indexCountPerFan = 2 * (shape.count + 1) + 1 var topology: [UInt32] = [] topology.reserveCapacity(indexCountPerFan) // Build triangle fan. for vertexIndex in shape.indices.reversed() { topology.append(UInt32(vertexIndex)) topology.append(UInt32(shape.count + vertexIndex)) } // Wrap around to the first vertex. topology.append(UInt32(shape.count - 1)) topology.append(UInt32(2 * shape.count - 1)) // Add end-index. topology.append(UInt32.max) assert(topology.count == indexCountPerFan) I have tried to understand why the capacity reserved for the topology array is 2 * (shape.count + 1) + 1, but I am struggling to figure it out. I do not understand the principle behind the order in which vertexIndex is added to the topology. The confusion is even greater because, while the comment mentions trianglefan, the actual creation of the LowLevelMesh.Part object uses the topology: .triangleStrip argument. (Did I misunderstand? I know that the topology option includes triangle, but this uses duplicated vertices.) I am feeling very stuck. It's hard to find answers even through search options or LLMs. Maybe this requires specialized knowledge in computer graphics, which makes me feel embarrassed to ask. However, personally, I have tried various directions without external help but still cannot find a clear path, so I am desperately seeking assistance! P.S. As Korean is my primary language, I apologize in advance if there are any awkward or rude expressions.
0
0
109
1d
Performant alternative to scaling a CIImage / PixelBuffer
Hey, I’m building a camera app where I am applying real time effects to the view finder. One of those effects is a variable blur, so to improve performance I am scaling down the input image using CIFilter.lanczosScaleTransform(). This works fine and runs at 30FPS, but when running the metal profiler I can see that the scaling transforms use a lot of GPU time, almost as much as the variable blur. Is there a more efficient way to do this? The simplified chain is like this: Scale down viewFinder CVPixelBuffer (CIFilter.lanczosScaleTransform) Scale up depthMap CVPixelBuffer to match viewFinder size (CIFilter.lanczosScaleTransform) Create CIImages from both CVPixelBuffers Apply VariableDepthBlur (CIFilter.maskedVariableBlur) Scale up final image to metal view size (CIFilter.lanczosScaleTransform) Render CIImage to a MTKView using CIRenderDestination From some research, I wonder if scaling the CVPixelBuffer using the accelerate framework would be faster? Also, Instead of scaling the final image, perhaps I could offload this to the metal view? Any pointers greatly appreciated!
2
0
90
2h
Div calculation issue in metal
Hi, all. I've been writing various computational functions using Metal. However, in the following operation functions, unlike + and *, there is an accuracy issue in the / operation. This is a function that divides a matrix of shape [n, x, y] and a scalar [1]. When compared to numpy or torch, if I change the operator of the above function to * or + instead of /, I can get completely the same results, but in the case of /, there is a difference in the mean of more than 1e-5. (For reference, this was written with reference to the metal kernel code in llama.cpp) kernel void kernel_div_single_f16( device const half * src0, device const half * src1, device half * dst, constant int64_t & ne00, constant int64_t & ne01, constant int64_t & ne02, constant int64_t & ne03, uint3 tgpig[[threadgroup_position_in_grid]], uint3 tpitg[[thread_position_in_threadgroup]], uint3 ntg[[threads_per_threadgroup]]) { const int64_t i03 = tgpig.z; const int64_t i02 = tgpig.y; const int64_t i01 = tgpig.x; const uint offset = i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00; for (int i0 = tpitg.x; i0 < ne00; i0 += ntg.x) { dst[offset + i0] = src0[offset+i0] / *src1; } } My mac book is, Macbork Pro(16, 2021) / macOS 12.5 / Apple M1 Pro. Are there any issues related to Div? Thanks in advance for your reply.
0
0
116
6d
MTKView is now available on visionOS but isn't working on visionOS 1.x
Hello! I noticed that after WWDC 24 there was support added for MTKView in visionOS 1.0+. This is great! But when I use an MTKView in anything before visionOS 2.0 it doesn't work and the app ends up crashing. Console error when running on a device that is on visionOS 1.2: Symbol not found: _$s27_CompositorServices_SwiftUI0A5LayerV13configuration8rendererAcA0aE13Configuration_p_ySo019CP_OBJECT_cp_layer_G0CScMYcctcfC Expected in: <EFD973D2-97E1-380B-B89A-13CC3820B7F7> /System/Library/Frameworks/_CompositorServices_SwiftUI.framework/_CompositorServices_SwiftUI Looks like MTKView may be using compositor services under the hood? Any help would be great. Thank you!
0
2
182
1w
newCommandQueue got nil
I am using Metal for rendering, and when calling the newCommandQueue interface of Metal, there is a certain probability that I will get a nil return value. However, when I call the MTLCreateSystemDefaultDevice interface, I can get a non-empty return value, which means my device supports Metal. I would like to ask what causes the newCommandQueue to return nil? Is there any way to avoid this situation? Thank you.
1
0
187
1w
Is there a way to directly go from VideoToolbox to Metal for 10-bit/BT.2020 YCbCr HEVC?
tl;dr how can I get raw YUV in a Metal fragment shader from a VideoToolbox 10-bit/BT.2020 HEVC stream without any extra/secret format conversions? With VideoToolbox and 10-bit HEVC, I've found that it defaults to CVPixelBuffers w/ formats kCVPixelFormatType_Lossless_420YpCbCr10PackedBiPlanarFullRange or kCVPixelFormatType_Lossy_420YpCbCr10PackedBiPlanarFullRange. To mitigate this, I have the following snippet of code to my application: // We need our pixels unpacked for 10-bit so that the Metal textures actually work var pixelFormat:OSType? = nil let bpc = getBpcForVideoFormat(videoFormat!) let isFullRange = getIsFullRangeForVideoFormat(videoFormat!) // TODO: figure out how to check for 422/444, CVImageBufferChromaLocationBottomField? if bpc == 10 { pixelFormat = isFullRange ? kCVPixelFormatType_420YpCbCr10BiPlanarFullRange : kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange } let videoDecoderSpecification:[NSString: AnyObject] = [kVTVideoDecoderSpecification_EnableHardwareAcceleratedVideoDecoder:kCFBooleanTrue] var destinationImageBufferAttributes:[NSString: AnyObject] = [kCVPixelBufferMetalCompatibilityKey: true as NSNumber, kCVPixelBufferPoolMinimumBufferCountKey: 3 as NSNumber] if pixelFormat != nil { destinationImageBufferAttributes[kCVPixelBufferPixelFormatTypeKey] = pixelFormat! as NSNumber } var decompressionSession:VTDecompressionSession? = nil err = VTDecompressionSessionCreate(allocator: nil, formatDescription: videoFormat!, decoderSpecification: videoDecoderSpecification as CFDictionary, imageBufferAttributes: destinationImageBufferAttributes as CFDictionary, outputCallback: nil, decompressionSessionOut: &decompressionSession) In short, I need kCVPixelFormatType_420YpCbCr10BiPlanar so that I have a straightforward MTLPixelFormat.r16Unorm/MTLPixelFormat.rg16Unorm texture binding for Y/CbCr. Metal, seemingly, has no direct pixel format for 420YpCbCr10PackedBiPlanar. I'd also rather not use any color conversion in VideoToolbox, in order to save on processing (and to ensure that the color transforms/transfer characteristics match between streamer/client, since I also have a custom transfer characteristic to mitigate blocking in dark scenes). However, I noticed that in visionOS 2, the CVPixelBuffer I receive is no longer a compressed render target (likely a bug), which caused GPU texture read bandwidth to skyrocket from 2GiB/s to 30GiB/s. More importantly, this implies that VideoToolbox may in fact be doing an extra color conversion step, wasting memory bandwidth. Does Metal actually have no way to handle 420YpCbCr10PackedBiPlanar? Are there any examples for reading 10-bit HDR HEVC buffers directly with Metal?
2
0
230
2w
What is the info property of SwiftUI::Layer
What is the info property of SwiftUI::Layer? I couldn't find any document or resource about it. It appears in SwiftUI::Layer's definition: struct Layer { metal::texture2d<half> tex; float2 info[5]; /// Samples the layer at `p`, in user-space coordinates, /// interpolating linearly between pixel values. Returns an RGBA /// pixel value, with color components premultipled by alpha (i.e. /// [R*A, G*A, B*A, A]), in the layer's working color space. half4 sample(float2 p) const { p = metal::fma(p.x, info[0], metal::fma(p.y, info[1], info[2])); p = metal::clamp(p, info[3], info[4]); return tex.sample(metal::sampler(metal::filter::linear), p); } };
0
1
262
3w
Why do I get a “Texture usage flags mismatch” shader validation error when declaring an imageblock in a compute pass?
Suppose I want to draw a red rectangle onto my render target using a compute shader. id<MTLComputeCommandEncoder> encoder = [commandBuffer computeCommandEncoder]; [encoder setComputePipelineState:pipelineState]; simd_ushort2 position = simd_make_ushort2(100, 100); simd_ushort2 size = simd_make_ushort2(50, 50); [encoder setBytes:&position length:sizeof(position) atIndex:0]; [encoder setTexture:drawable.texture atIndex:0]; [encoder dispatchThreads:MTLSizeMake(size.x, size.y, 1) threadsPerThreadgroup:MTLSizeMake(32, 32, 1)]; [encoder endEncoding]; #include <metal_stdlib> using namespace metal; kernel void Compute(ushort2 position_in_grid [[thread_position_in_grid]], constant ushort2 &position, texture2d<half, access::write> texture) { texture.write(half4(1, 0, 0, 1), position_in_grid + position); } This works just fine: Now, say for whatever reason I want to start using imageblocks in my compute kernel. First, I set the imageblock size on the CPU side: id<MTLComputeCommandEncoder> encoder = [commandBuffer computeCommandEncoder]; [encoder setComputePipelineState:pipelineState]; MTLSize threadgroupSize = MTLSizeMake(32, 32, 1); [encoder setImageblockWidth:threadgroupSize.width height:threadgroupSize.height]; simd_ushort2 position = simd_make_ushort2(100, 100); simd_ushort2 size = simd_make_ushort2(50, 50); [encoder setBytes:&position length:sizeof(position) atIndex:0]; [encoder setTexture:drawable.texture atIndex:0]; MTLSize gridSize = MTLSizeMake(size.x, size.y, 1); [encoder dispatchThreads:gridSize threadsPerThreadgroup:threadgroupSize]; And then I update the compute kernel to simply declare the imageblock – note I never actually read from or write to it: #include <metal_stdlib> using namespace metal; struct Foo { int foo; }; kernel void Compute(ushort2 position_in_grid [[thread_position_in_grid]], constant ushort2 &position, texture2d<half, access::write> texture, imageblock<Foo> imageblock) { texture.write(half4(1, 0, 0, 1), position_in_grid + position); } And now out of nowhere Metal’s shader validation starts complaining about mismatched texture usage flags: 2024-06-22 00:57:15.663132+1000 TextureUsage[80558:4539093] [GPUDebug] Texture usage flags mismatch executing kernel function "Compute" encoder: "1", dispatch: 0 2024-06-22 00:57:15.672004+1000 TextureUsage[80558:4539093] [GPUDebug] Texture usage flags mismatch executing kernel function "Compute" encoder: "1", dispatch: 0 2024-06-22 00:57:15.682422+1000 TextureUsage[80558:4539093] [GPUDebug] Texture usage flags mismatch executing kernel function "Compute" encoder: "1", dispatch: 0 2024-06-22 00:57:15.687587+1000 TextureUsage[80558:4539093] [GPUDebug] Texture usage flags mismatch executing kernel function "Compute" encoder: "1", dispatch: 0 2024-06-22 00:57:15.698106+1000 TextureUsage[80558:4539093] [GPUDebug] Texture usage flags mismatch executing kernel function "Compute" encoder: "1", dispatch: 0 The texture I’m writing to comes from a CAMetalDrawable whose associated CAMetalLayer has framebufferOnly set to NO. What am I missing?
3
0
360
4w
Permission denied: GenerativeFunctionMetrics
A sample of some error messages that are presented in the Xcode log for executon of a program. There is nothing in the messages that will help identify a component as the origin of the message, nor is there a locatable derinition for the various labels and fields of the text. What component or even framework does this set of messages originate? Your search engine is useless because it returns gibberish. It doesn’t even follow the common behavior of SEARCH ENGINES because it takes label strings compounded from common words and searches for the common word instead of using the catenated string that is the internal variable name that is in the text. 2024-06-22 19:45:58.089943-0500 RoomPlanExampleApp[733:30145] [ClientDonation] (+[PPSClientDonation isRegisteredSubsystem:category:]) Permission denied: GenerativeFunctionMetrics / ANEInferenceOperationPrepareForEncode. I am looking for a definition of the error with a way to locate the context in which the error occurs. 2024-06-22 19:45:58.089967-0500 RoomPlanExampleApp[733:30145] [ClientDonation] (+[PPSClientDonation sendEventWithIdentifier:payload:]) Invalid inputs: payload={ aneModelPath = "/System/Library/PrivateFrameworks/RoomScanCore.framework/PrecompiledModels/lcnn_floorplan_model.bundle/H14G.bundle/main/segment_0__ane/net.hwx"; bundleIdentfier = "com.example.apple-samplecode.RoomPlanExampleApp9QSS565686"; } 2024-06-22 19:45:58.094770-0500 RoomPlanExampleApp[733:30145] [ClientDonation] (+[PPSClientDonation sendEventWithIdentifier:payload:]) Invalid inputs: payload={ bundleIdentfier = "com.example.apple-samplecode.RoomPlanExampleApp9QSS565686"; e5FunctionName = main; numSegments = 1; }
1
0
264
3w
Attachment-like functionality in metal immersive space
I have an immersive space that is rendered using metal. Is there a way that I can position swiftUI views at coordinates relative to positions in my immersive space? I know that I can display a volume with RealityKit content simultaneously to my metal content. The volume's coordinate system specifically, it's bounds, does not, coincide with my entire metal scene. One approach I thought of would be to open two views in my immersive space. That way, I could simply add Attachment's to invisible RealityKit Entities in one view at positions where I have content in my metal scene. unfortunately it seems that, while I can declare an ImmersiveSpace be composed of multiple RealityViews ImmersiveSpace(){ RealityView { content in // load first view } update: { content in // update } } RealityView{ content in //load second view } } update: { content in //update } } That results in two coinciding realty kit views in the immersive space. I can not however do something like this: ImmersiveSpace(){ CompositorLayer(configuration: ContentStageConfiguration()){ layerRenderer in //set up my metal renderer and stuff } RealityView{ content in //set up a view where I could use attachments } update: { content in } }
2
0
258
3w
XcodeCloud and VisionOS
I'm having issues compiling the visionOS app via XcodeCloud. Here's the error: 2024-06-20T09:24:47.634651911Z compileSkybox /Volumes/workspace/DerivedData/Build/Intermediates.noindex/ArchiveIntermediates/Fi22/InstallationBuildProductsLocation/Applications/VR22.app/ImageBasedLighting.rclink /Volumes/workspace/repository/Fi22/ImageBasedLighting.skybox (in target 'Fi22' from project 'Fi22') 2024-06-20T09:24:47.634669847Z cd /Volumes/workspace/repository 2024-06-20T09:24:47.634681756Z /Applications/Xcode.app/Contents/Developer/usr/bin/rctool create skybox -skyboxPath=/Volumes/workspace/repository/Fi22/ImageBasedLighting.skybox -o=/Volumes/workspace/DerivedData/Build/Intermediates.noindex/ArchiveIntermediates/Fi22/InstallationBuildProductsLocation/Applications/VR22.app 2024-06-20T09:24:47.634730433Z [91mError: [39m There is no available Metal device on this system. 2024-06-20T09:24:47.634745602Z Command compileSkybox failed with a nonzero exit code How do I configure XcodeCloud to use an instance with Metal support? Another option would be precompiling the skybox but I couldn't find any info on that either.
1
1
216
Jun ’24
copyFromBuffer offset and size working even when not multiple of 4
Hi, Reading the copyFromBuffer documentation states that on macOS, sourceOffset, destinationOffset, and size "needs to be a multiple of 4, but can be any value in iOS and tvOS". However, I have noticed that, at least on my M2 Max, this limitation does not seem to exist as there are no warnings and the copy works correctly regardless of the offset value. I'm curious to know if this is something that should still be avoided. Is the multiple of 4 limitation reserved for non Apple Silicon devices and that note can be ignored for Apple Silicon? I ask because I am a contributor to Metal.jl, and recently noticed that our tests pass even when copying using copyWithBuffer with offsets and sizes that are not multiples of 4. If that coul cause issues/correctness problems, we would need to fix that. Thank you. Christian
0
0
248
Jun ’24
Metal and Swift Concurrency
Hi, Introducing Swift Concurrency to my Metal app has been a bit challenging as Swift Concurrency is limited by the cooperative thread pool. GPU work is obviously not CPU bound and can block forward moving progress, especially when using waitUntilCompleted on the command buffer. For concurrent render work this has the potential of under utilizing the CPU and even creating dead locks. My question is, what is the Metal's teams general recommendation when it comes to concurrency? It seems to me that Dispatch or OperationQueues are still the preferred way for Metal bound tasks in order to gain maximum performance? To integrate with Swift Concurrency my idea is to use continuations that kick off render jobs via Dispatch or Queues? Would this be the best solution to bridge async tasks with Metal work? Thanks!
4
0
355
Jun ’24
Bug? Xcode 16 macOS 15 SDK on macos 14.5 causes Metal Shader Colors to be Wrong
I've been upgrading Xcode consistently for years and have never seen Metal shaders behave differently from one version to another until now. On macOS 14.5, Xcode 16 beta, suddenly several color outputs turn out completely black where there should be color. All validation is on and nothing seems to be wrong (and hasn't been since maybe Xcode version 11). I've attached two screens. The first is the normal color scheme, the second is in Xcode 16. The settings are the exact same. Normal: Buggy with black + transparent colors (so it seems like either colors are overflowing or are all 0s)? Before I file a bug report or code level request, may I have some thoughts on how to debug this? The only clue I have is that I'm using bindless to multiply color texture samples with color values from my vertex struct. But it still fails even if I use hard-coded values for the texture samples, meaning somehow the color values are not being sent to the shader correctly? This is the most stable part of my rendering pipeline, so I'm surprised if the issue is there. Thank you.
1
0
420
Jun ’24
[Metal Passthrough] upperLimbVisibility not respected in sample code
Hello everyone, Super exciting stuff released this year! I was playing around with the Metal passthrough sample code (see: https://developer.apple.com/documentation/compositorservices/interacting_with_virtual_content_blended_with_passthrough) ... and noticed that the upperLimbVisibility set to .automatic does not seem to work and my hand is always on top. How to reproduce: Draw something Position your hand behind the brush stroke Notice that your hands are always rendered on top Taking a GPU frame capture reveals that the depth is correctly written. Xcode: Version 16.0 beta (16A5171c) VisionOS: visionOS 2.0 (22N5252n)
1
0
277
2w
Sample Project for WWDC24 10092 Metal with Passthrough?
It’s great that we’ll be able to use Metal custom renderers in passthrough mode on visionOS. https://developer.apple.com/wwdc24/10092 This is a lot of complicated set-up, however. It’s also unclear how occlusion and custom algorithms / raytracing will work in tandem with scene understanding. May we have a project template and/or sample? Preferably with the C api and not just swift. This would be much-appreciated and helpful to everyone who wants this set-up. I’d like to see the whole process. Thank you for introducing this feature!
2
1
426
Jun ’24