Render advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.

Posts under Metal tag

200 Posts
Sort by:

Post

Replies

Boosts

Views

Activity

Metal os_log not working
I wanted to try the new logging feature for Metal but could not get it to work. I modified the PerformingCalculationsOnAGPU example by adding os_log_default.log_debug("Hello thread: %d", index); to log the current thread id. But never saw any messages neither in the console nor in Xcode. I also added the -fmetal-enable-logging flag. I am running the Sequoia release candidate 15.0 (24A335) on M1 Max and Xcode 16.0 (16A242). What am I missing?
1
0
49
8h
how do you pass a texture to a metal shader in swiftui?
specifically using the newer colorEffect, layerEffect, etc routes. the below does not seem to work. do i have to use the old MTK stuff? import SwiftUI import MetalKit struct HoloPreview: View { let startDate = Date() @State private var noiseTexture: MTLTexture? var body: some View { TimelineView(.animation) { context in RoundedRectangle(cornerRadius: 20) .fill(Color.red) .layerEffect(ShaderLibrary.iridescentEffect( .float(startDate.timeIntervalSinceNow), .texture(noiseTexture) ), maxSampleOffset: .zero) } .onAppear { noiseTexture = loadTexture(named: "perlinNoiseMap") } } func loadTexture(named imageName: String) -> MTLTexture? { guard let device = MTLCreateSystemDefaultDevice(), let url = Bundle.main.url(forResource: imageName, withExtension: "png") else { return nil } let textureLoader = MTKTextureLoader(device: device) let texture = try? textureLoader.newTexture(URL: url, options: nil) return texture } } #Preview { HoloPreview() }
1
0
73
2d
Running 120Hz with low latency on M1 Max
I am trying to get a little game prototype up and running using Metal using the metal-cpp libraries where I run everything natively at 120Hz with a coupled renderer using Vsync turned on so that I have the absolute physically minimum input to photon latency possible. // Create the metal view SDL_MetalView metal_view = SDL_Metal_CreateView(window); CA::MetalLayer *swap_chain = (CA::MetalLayer *)SDL_Metal_GetLayer(metal_view); // Set up the Metal device MTL::Device *device = MTL::CreateSystemDefaultDevice(); swap_chain->setDevice(device); swap_chain->setPixelFormat(MTL::PixelFormat::PixelFormatBGRA8Unorm); swap_chain->setDisplaySyncEnabled(true); swap_chain->setMaximumDrawableCount(2); I am using SDL3 just for creating the window. Now when I go through my game / render loop - I stall for a long time on getting the next drawable which is understandable - my app runs in about 2-3ms. m_CurrentContext->m_Drawable = m_SwapChain->nextDrawable(); m_CurrentContext->m_CommandBuffer = m_CommandQueue->commandBuffer()->retain(); char frame_label[32]; snprintf(frame_label, sizeof(frame_label), "Frame %d", m_FrameIndex); m_CurrentContext->m_CommandBuffer->setLabel(NS::String::string(frame_label, NS::UTF8StringEncoding)); m_CurrentContext->m_RenderPassDescriptor[ERenderPassTypeNormal] = MTL::RenderPassDescriptor::alloc()->init(); MTL::RenderPassColorAttachmentDescriptor* cd = m_CurrentContext->m_RenderPassDescriptor[ERenderPassTypeNormal]->colorAttachments()->object(0); cd->setTexture(m_CurrentContext->m_Drawable->texture()); cd->setLoadAction(MTL::LoadActionClear); cd->setClearColor(MTL::ClearColor( 0.53f, 0.81f, 0.98f, 1.0f )); cd->setStoreAction(MTL::StoreActionStore); However my ProMotion display does not reliably run at 120Hz when fullscreen and using the direct to display system - it seems to run faster when windowed in composite which is the opposite of what I would expect. The Metal HUD says 120Hz, but the delay to getting the next drawable and looking at what Instruments is saying tells otherwise. When I profile it, the game loop has completed and is sitting there waiting for the next drawable, but the screen does not want to complete in 8.33ms, so the whole thing slows down for no discernible reason. Also as a game developer it is very strange for the command buffer to actually need the drawable texture free to be allowed to encode commands - usually the command buffers and swapping the front and back render buffers are not directly dependent on each other. Usually you only actually need the render buffer texture free when you want to draw to it. I could give myself another drawable, but because I am completing in less than 3ms, all it would do would be to add another frame of latency. I also looked at the FramePacing example and its behaviour is even worse at having high framerate with low latency - the direct to display is always rejected for some reason. Is this just a flaw in the Metal API? Or am I missing something important? I hope someone can help - the behaviour of the display is baffling.
1
0
132
3d
realitytool requires Metal for this operation and it is not available in this build environment
Hello, I'm getting started for my project with Xcode Cloud since I upgraded to the macOS Sequioa Beta and Xcode 16 now refuses to archive builds for TestFlight. Somewhere very late in the build process I get the following error: realitytool requires Metal for this operation and it is not available in this build environment The log says this happens at: Compile Skybox urban.skybox My project uses RealityKit. How can I fix this issue? Thanks!
1
0
78
3d
Video Background Removal
I am searching for a method to remove background from a video. it can be from camera Session fileOutput url or from photo library. I was able to accomplish live preview of removed background with the depth data and some metal framework code from the example Enhancing Live Video by Leveraging TrueDepth Camera Data. However I count figure out a way to save this as a video so that I can upload it. Also this method is using over 150% of cpu ( Xcode cpu usage ), which seems to be quite a lot and the device is getting heated up so fast and drops the frames when It hot. I also found something similar from GitHub using CoreML example by Dmitry Voitekh which only uses less than 40% cpu. Any information regarding this will be helpful. Objective : Remove Background from video and save it
5
0
251
3d
Can an SDF be rendered using RealityKit?
I'm trying to ray-march an SDF inside a RealityKit surface shader. For the SDF primitive to correctly render with other primitives, the depth of the fragment needs to be set according to the ray-surface intersection point. Is there a way to do that within a RealityKit surface shader? It seems the only values I can set are within surface::surface_properties. If not, can an SDF still be rendered in RealityKit using ray-marching?
1
1
209
1w
How is CAShapeLayer implemented
Hello, I want to create a painting app for iOS and I saw many examples use a CAShapeLayer to draw a UIBezierPath. As I understand CoreAnimation uses the GPU so I was wondering how is this implemented on the GPU? Or in other words, how would you do it with Metal or OpenGL? I can only think of continuously updating a texture in response to the user's drawing but that would be a very resource intensive operation... Thanks
2
0
189
1w
CIFilter chain failing to render parts of output
I’ve built a iOS camera app that applies many CIFilters to an image captured by the camera. Some of my users have reported that on occasion the images have large parts that are blank, see below: Frustratingly, I can’t reproduce this myself! Does anyone know what could he causing it, is it a memory issue? I haven’t posted the code as there’s a lot to look over and I’m not sure it would help diagnose it. Thanks for any pointers.
0
0
224
1w
Memory Leak using simple app with visionOS
Hello. When displaying a simple app like this: struct ContentView: View { var body: some View { EmptyView() } } And run the Leaks app from the developer tools in Xcode, I see a memory leak which I don't see when running the same application on iOS. You can simply run the app and it will show a memory leak. And this is what I see in the Leaks application. Any ideas on what is going on? Thanks!
2
0
252
1w
RealityKit 3D texture
In my Metal-based app, I ray-march a 3D texture. I'd like to use RealityKit instead of my own code. I see there is a LowLevelTexture (beta) where I could specify a 3D texture. However on the Metal side, there doesn't seem to be any way to access a 3D texture (realitykit::texture::textures::custom returns a texture2d). Any work-arounds? Could I even do something icky like cast the texture2d to a texture3d in MSL? (is that even possible?) Could I encode the 3d texture into an argument buffer and get that in somehow?
1
0
230
1w
Disable Automatic Color Space conversion on Vision Pro Metal Shader
I am trying to convert a ThreeJS project to Metal for the Vision Pro. The issue is ThreeJS doesn't do any color space conversion (when I output a color in a fragment shader and then read it using the digital color meter in SRGB mode I get the same value I inputed in the fragment shader) This is not the case when using metal. When setting up my LayerRenderer I set the colorFormat to rgba16Unorm since it is the only non srgb color format supported on the vision pro apps. However switching between bgra8Unorm_srgb and rgba16Unorm seems to have no affect. when I set up the renderPassDescriptor I use the drawable colorTexture renderPassDescriptor.colorAttachments[0].texture = drawable.colorTextures[0] and when printing its pixel format it seems to be passed from the configuration. If there is anyway to disable this behavior or perform an inverse function of such that I get the original value out from the shader, that would be appreciated.
0
0
185
2w
To use ARSCNView to capture a 3D model of a scene and obtain the mesh information, how can I retrieve the texture information for the mesh?
arScnView = ARSCNView(frame: CGRect.zero, options: nil) arScnView.delegate = self arScnView.automaticallyUpdatesLighting = true arScnView.allowsCameraControl = true addSubview(arScnView) arSession = arScnView.session arSession.delegate = self config = ARWorldTrackingConfiguration() config.sceneReconstruction = .meshWithClassification config.environmentTexturing = .automatic func session(_ session: ARSession, didAdd anchors: [ARAnchor]) { anchors.forEach({ anchor in if let meshAnchor = anchor as? ARMeshAnchor { let node = meshAnchor.toSCNNode() self.arScnView.scene.rootNode.addChildNode(node) } if let environmentProbeAnchor = anchor as? AREnvironmentProbeAnchor { // Can I retrieve the texture map corresponding to ARMeshAnchor from Environment Probe Anchor? // Or how can I retrieve the texture map corresponding to ARMeshAnchor? } }) } How can I scan a 3D scene and save it as USDZ? I want to achieve the following scenario?
0
0
183
1w
Metal UIView to transform what's behind it
I'm trying to create a custom Metal-based visual effect as a UIView to be used inside an existing UIKit-based interface. (An example might be a view that applies a blur effect to what's behind it.) I need to capture the MTLTexture of what's behind the view so that I can feed it to MTLRenderCommandEncoder.setFragmentTexture(_:index:). Can someone show me how or point me to an example? Thanks!
1
0
189
2w
CIImageProcessorKernel using Metal Compute Pipeline error
Greetings! I have been battling with a bit of a tough issue. My use case is running a pixelwise regression model on a 2D array of images using CIImageProcessorKernel and a custom Metal Shader. It mostly works great, but the issue that arises is that if the regression calculation in Metal takes too long, an error occurs and the resulting output texture has strange artifacts, for example: The specific error is: Error excuting command buffer = Error Domain=MTLCommandBufferErrorDomain Code=1 "Internal Error (0000000e:Internal Error)" UserInfo={NSLocalizedDescription=Internal Error (0000000e:Internal Error), NSUnderlyingError=0x60000320ca20 {Error Domain=IOGPUCommandQueueErrorDomain Code=14 "(null)"}} (com.apple.CoreImage) There are multiple levels of concurrency: Swift Concurrency calling the Core Image code (which shouldn't have an impact) and of course the Metal command buffer. Is there anyway to ensure the compute command encoder can complete its work? Here is the full implementation of my CIImageProcessorKernel subclass: class ParametricKernel: CIImageProcessorKernel { static let device = MTLCreateSystemDefaultDevice()! override class var outputFormat: CIFormat { return .BGRA8 } override class func formatForInput(at input: Int32) -> CIFormat { return .BGRA8 } override class func process(with inputs: [CIImageProcessorInput]?, arguments: [String : Any]?, output: CIImageProcessorOutput) throws { guard let commandBuffer = output.metalCommandBuffer, let images = arguments?["images"] as? [CGImage], let mask = arguments?["mask"] as? CGImage, let fillTime = arguments?["fillTime"] as? CGFloat, let betaLimit = arguments?["betaLimit"] as? CGFloat, let alphaLimit = arguments?["alphaLimit"] as? CGFloat, let errorScaling = arguments?["errorScaling"] as? CGFloat, let timing = arguments?["timing"], let TTRThreshold = arguments?["ttrthreshold"] as? CGFloat, let input = inputs?.first, let sourceTexture = input.metalTexture, let destinationTexture = output.metalTexture else { return } guard let kernelFunction = device.makeDefaultLibrary()?.makeFunction(name: "parametric") else { return } guard let commandEncoder = commandBuffer.makeComputeCommandEncoder() else { return } let imagesTexture = Texture.textureFromImages(images) let pipelineState = try device.makeComputePipelineState(function: kernelFunction) commandEncoder.setComputePipelineState(pipelineState) commandEncoder.setTexture(imagesTexture, index: 0) let maskTexture = Texture.textureFromImages([mask]) commandEncoder.setTexture(maskTexture, index: 1) commandEncoder.setTexture(destinationTexture, index: 2) var errorScalingFloat = Float(errorScaling) let errorBuffer = device.makeBuffer(bytes: &errorScalingFloat, length: MemoryLayout<Float>.size, options: []) commandEncoder.setBuffer(errorBuffer, offset: 0, index: 1) // Other buffers omitted.... let threadsPerThreadgroup = MTLSizeMake(16, 16, 1) let width = Int(ceil(Float(sourceTexture.width) / Float(threadsPerThreadgroup.width))) let height = Int(ceil(Float(sourceTexture.height) / Float(threadsPerThreadgroup.height))) let threadGroupCount = MTLSizeMake(width, height, 1) commandEncoder.dispatchThreadgroups(threadGroupCount, threadsPerThreadgroup: threadsPerThreadgroup) commandEncoder.endEncoding() } }
3
0
308
2w
How many warps can be run in parallel on a single shader core?
The Metal feature set tables specifies that beginning with the Apple4 family, the "Maximum threads per threadgroup" is 1024. Given that a single threadgroup is guaranteed to be run on the same GPU shader core, it means that a shader core of any new Apple GPU must be capable of running at least 1024/32 = 32 warps in parallel. From the WWDC session "Scale compute workloads across Apple GPUs (6:17)": For relatively complex kernels, 1K to 2K concurrent threads per shader core is considered a very good occupancy. The cited sentence suggests that a single shader core is capable of running at least 2K (I assume this is meant to be 2048) threads in parallel, so 2048/32 = 64 warps running in parallel. However, I am curious what is the maximum theoretical amount of warps running in parallel on a single shader core (it sounds like it is more than 64). The WWDC session mentions 2K to be only "very good" occupancy. How many threads would be "the best possible" occupancy?
1
0
252
3w
In Metal compute kernels, when do thread variables get spilled into the device memory?
How many 32-bit variables can I use concurrently in a single thread of a Metal compute kernel without worrying about the variables getting spilled into the device memory? Alternatively: how many 32-bit registers does a single thread have available for itself? Let's say that each thread of my compute kernel needs to store and work with its own array of N float variables, where N can be 128, 256, 512 or more. To achieve maximum possible performance, I do not want to the local thread variables to get spilled into the slow device memory. I want all N variables to be stored "on-chip", in the thread memory space. To make my question more concrete, let's say there is an array thread float localArray[N]. Assuming an unrealistic hypothetical scenario where localArray is the only variable in the whole kernel, what is the maximum value of N for which no portion of localArray would get spilled into the device memory? I searched in the Metal feature set tables, but I could not find any details.
0
0
184
3w
Enabling EDR on Apple TV for custom HDR video playback with VideoToolbox
How is it possible to enable EDR on Apple TV without AVFoundation for custom HDR video playback? The use case is a custom video player for HDR playback via VideoToolbox and Metal, which seem to render colors correctly on iOS but not on tvOS. All related documentation and WWDC sessions describe APIs that are unavailable for tvOS: let metalLayer = CAMetalLayer() metalLayer.wantsExtendedDynamicRangeContent = true metalLayer.edrMetadata = CAEDRMetadata.hdr10(minLuminance: 0.0, maxLuminance: 1000, opticalOutputScale: 100) What's the alternative path for tvOS to have correct system tone mapping for a setup like: metalLayer.pixelFormat = .rgba16Float // (or .bgr10_xr) metalLayer.colorspace = CGColorSpace(name: CGColorSpace.itur_2100_PQ) Video format: HEVC, YUV 4:2:0 10bit, BT.2020 PQ. We do set the preferredDisplayCriteria on AVDisplayManager and thus video range matching is in place. WWDC Ref: https://developer.apple.com/videos/play/wwdc2022/110565?time=557
1
1
206
3w