I wanted to try the new logging feature for Metal but could not get it to work.
I modified the PerformingCalculationsOnAGPU example by adding os_log_default.log_debug("Hello thread: %d", index); to log the current thread id. But never saw any messages neither in the console nor in Xcode.
I also added the -fmetal-enable-logging flag. I am running the Sequoia release candidate 15.0 (24A335) on M1 Max and Xcode 16.0 (16A242).
What am I missing?
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Posts under Metal tag
200 Posts
Sort by:
Post
Replies
Boosts
Views
Activity
When I load some usdz file , it crash 100%, Why ?
It crash in simulate , but not crash in Vision Pro
-[MTLDebugDevice newBufferWithBytes:length:options:]:723: failed assertion `Buffer Validation
newBufferWith*:length 0x100fff80 must not exceed 256 MB.
specifically using the newer colorEffect, layerEffect, etc routes. the below does not seem to work. do i have to use the old MTK stuff?
import SwiftUI
import MetalKit
struct HoloPreview: View {
let startDate = Date()
@State private var noiseTexture: MTLTexture?
var body: some View {
TimelineView(.animation) { context in
RoundedRectangle(cornerRadius: 20)
.fill(Color.red)
.layerEffect(ShaderLibrary.iridescentEffect(
.float(startDate.timeIntervalSinceNow),
.texture(noiseTexture)
), maxSampleOffset: .zero)
}
.onAppear {
noiseTexture = loadTexture(named: "perlinNoiseMap")
}
}
func loadTexture(named imageName: String) -> MTLTexture? {
guard let device = MTLCreateSystemDefaultDevice(),
let url = Bundle.main.url(forResource: imageName, withExtension: "png") else {
return nil
}
let textureLoader = MTKTextureLoader(device: device)
let texture = try? textureLoader.newTexture(URL: url, options: nil)
return texture
}
}
#Preview {
HoloPreview()
}
I am trying to get a little game prototype up and running using Metal using the metal-cpp libraries where I run everything natively at 120Hz with a coupled renderer using Vsync turned on so that I have the absolute physically minimum input to photon latency possible.
// Create the metal view
SDL_MetalView metal_view = SDL_Metal_CreateView(window);
CA::MetalLayer *swap_chain = (CA::MetalLayer *)SDL_Metal_GetLayer(metal_view);
// Set up the Metal device
MTL::Device *device = MTL::CreateSystemDefaultDevice();
swap_chain->setDevice(device);
swap_chain->setPixelFormat(MTL::PixelFormat::PixelFormatBGRA8Unorm);
swap_chain->setDisplaySyncEnabled(true);
swap_chain->setMaximumDrawableCount(2);
I am using SDL3 just for creating the window. Now when I go through my game / render loop - I stall for a long time on getting the next drawable which is understandable - my app runs in about 2-3ms.
m_CurrentContext->m_Drawable = m_SwapChain->nextDrawable();
m_CurrentContext->m_CommandBuffer = m_CommandQueue->commandBuffer()->retain();
char frame_label[32];
snprintf(frame_label, sizeof(frame_label), "Frame %d", m_FrameIndex);
m_CurrentContext->m_CommandBuffer->setLabel(NS::String::string(frame_label, NS::UTF8StringEncoding));
m_CurrentContext->m_RenderPassDescriptor[ERenderPassTypeNormal] = MTL::RenderPassDescriptor::alloc()->init();
MTL::RenderPassColorAttachmentDescriptor* cd = m_CurrentContext->m_RenderPassDescriptor[ERenderPassTypeNormal]->colorAttachments()->object(0);
cd->setTexture(m_CurrentContext->m_Drawable->texture());
cd->setLoadAction(MTL::LoadActionClear);
cd->setClearColor(MTL::ClearColor( 0.53f, 0.81f, 0.98f, 1.0f ));
cd->setStoreAction(MTL::StoreActionStore);
However my ProMotion display does not reliably run at 120Hz when fullscreen and using the direct to display system - it seems to run faster when windowed in composite which is the opposite of what I would expect. The Metal HUD says 120Hz, but the delay to getting the next drawable and looking at what Instruments is saying tells otherwise.
When I profile it, the game loop has completed and is sitting there waiting for the next drawable, but the screen does not want to complete in 8.33ms, so the whole thing slows down for no discernible reason.
Also as a game developer it is very strange for the command buffer to actually need the drawable texture free to be allowed to encode commands - usually the command buffers and swapping the front and back render buffers are not directly dependent on each other. Usually you only actually need the render buffer texture free when you want to draw to it. I could give myself another drawable, but because I am completing in less than 3ms, all it would do would be to add another frame of latency.
I also looked at the FramePacing example and its behaviour is even worse at having high framerate with low latency - the direct to display is always rejected for some reason.
Is this just a flaw in the Metal API? Or am I missing something important? I hope someone can help - the behaviour of the display is baffling.
Hello,
I'm getting started for my project with Xcode Cloud since I upgraded to the macOS Sequioa Beta and Xcode 16 now refuses to archive builds for TestFlight.
Somewhere very late in the build process I get the following error:
realitytool requires Metal for this operation and it is not available in this build environment
The log says this happens at:
Compile Skybox urban.skybox
My project uses RealityKit. How can I fix this issue?
Thanks!
I am searching for a method to remove background from a video. it can be from camera Session fileOutput url or from photo library.
I was able to accomplish live preview of removed background with the depth data and some metal framework code from the example Enhancing Live Video by Leveraging TrueDepth Camera Data. However I count figure out a way to save this as a video so that I can upload it.
Also this method is using over 150% of cpu ( Xcode cpu usage ), which seems to be quite a lot and the device is getting heated up so fast and drops the frames when It hot.
I also found something similar from GitHub using CoreML example by Dmitry Voitekh which only uses less than 40% cpu.
Any information regarding this will be helpful.
Objective : Remove Background from video and save it
I'm trying to ray-march an SDF inside a RealityKit surface shader. For the SDF primitive to correctly render with other primitives, the depth of the fragment needs to be set according to the ray-surface intersection point. Is there a way to do that within a RealityKit surface shader? It seems the only values I can set are within surface::surface_properties.
If not, can an SDF still be rendered in RealityKit using ray-marching?
Hello,
I want to create a painting app for iOS and I saw many examples use a CAShapeLayer to draw a UIBezierPath.
As I understand CoreAnimation uses the GPU so I was wondering how is this implemented on the GPU? Or in other words, how would you do it with Metal or OpenGL?
I can only think of continuously updating a texture in response to the user's drawing but that would be a very resource intensive operation...
Thanks
Guten Tag,
my project is simple, first I want draw wired Hexa,-Tetra- and Octahedrons.
I draw a cube with Metal but I didn't found rotation, translation and scale.
I have searched help , the examples I found are too complicated for me.
Mit freundlichen Grüßen
VanceRegnet
I’ve built a iOS camera app that applies many CIFilters to an image captured by the camera. Some of my users have reported that on occasion the images have large parts that are blank, see below:
Frustratingly, I can’t reproduce this myself! Does anyone know what could he causing it, is it a memory issue? I haven’t posted the code as there’s a lot to look over and I’m not sure it would help diagnose it.
Thanks for any pointers.
Hello.
When displaying a simple app like this:
struct ContentView: View {
var body: some View {
EmptyView()
}
}
And run the Leaks app from the developer tools in Xcode, I see a memory leak which I don't see when running the same application on iOS.
You can simply run the app and it will show a memory leak. And this is what I see in the Leaks application.
Any ideas on what is going on?
Thanks!
In my Metal-based app, I ray-march a 3D texture. I'd like to use RealityKit instead of my own code. I see there is a LowLevelTexture (beta) where I could specify a 3D texture. However on the Metal side, there doesn't seem to be any way to access a 3D texture (realitykit::texture::textures::custom returns a texture2d).
Any work-arounds? Could I even do something icky like cast the texture2d to a texture3d in MSL? (is that even possible?) Could I encode the 3d texture into an argument buffer and get that in somehow?
I am trying to convert a ThreeJS project to Metal for the Vision Pro. The issue is ThreeJS doesn't do any color space conversion (when I output a color in a fragment shader and then read it using the digital color meter in SRGB mode I get the same value I inputed in the fragment shader) This is not the case when using metal. When setting up my LayerRenderer I set the colorFormat to rgba16Unorm since it is the only non srgb color format supported on the vision pro apps. However switching between bgra8Unorm_srgb and rgba16Unorm seems to have no affect.
when I set up the renderPassDescriptor I use the drawable colorTexture
renderPassDescriptor.colorAttachments[0].texture = drawable.colorTextures[0]
and when printing its pixel format it seems to be passed from the configuration.
If there is anyway to disable this behavior or perform an inverse function of such that I get the original value out from the shader, that would be appreciated.
arScnView = ARSCNView(frame: CGRect.zero, options: nil)
arScnView.delegate = self
arScnView.automaticallyUpdatesLighting = true
arScnView.allowsCameraControl = true
addSubview(arScnView)
arSession = arScnView.session
arSession.delegate = self
config = ARWorldTrackingConfiguration()
config.sceneReconstruction = .meshWithClassification
config.environmentTexturing = .automatic
func session(_ session: ARSession, didAdd anchors: [ARAnchor])
{
anchors.forEach({ anchor in
if let meshAnchor = anchor as? ARMeshAnchor {
let node = meshAnchor.toSCNNode()
self.arScnView.scene.rootNode.addChildNode(node)
}
if let environmentProbeAnchor = anchor as? AREnvironmentProbeAnchor {
// Can I retrieve the texture map corresponding to ARMeshAnchor from Environment Probe Anchor?
// Or how can I retrieve the texture map corresponding to ARMeshAnchor?
}
})
}
How can I scan a 3D scene and save it as USDZ?
I want to achieve the following scenario?
I'm trying to create a custom Metal-based visual effect as a UIView to be used inside an existing UIKit-based interface. (An example might be a view that applies a blur effect to what's behind it.) I need to capture the MTLTexture of what's behind the view so that I can feed it to MTLRenderCommandEncoder.setFragmentTexture(_:index:). Can someone show me how or point me to an example? Thanks!
Greetings! I have been battling with a bit of a tough issue. My use case is running a pixelwise regression model on a 2D array of images using CIImageProcessorKernel and a custom Metal Shader.
It mostly works great, but the issue that arises is that if the regression calculation in Metal takes too long, an error occurs and the resulting output texture has strange artifacts, for example:
The specific error is:
Error excuting command buffer = Error Domain=MTLCommandBufferErrorDomain Code=1 "Internal Error (0000000e:Internal Error)" UserInfo={NSLocalizedDescription=Internal Error (0000000e:Internal Error), NSUnderlyingError=0x60000320ca20 {Error Domain=IOGPUCommandQueueErrorDomain Code=14 "(null)"}} (com.apple.CoreImage)
There are multiple levels of concurrency: Swift Concurrency calling the Core Image code (which shouldn't have an impact) and of course the Metal command buffer.
Is there anyway to ensure the compute command encoder can complete its work?
Here is the full implementation of my CIImageProcessorKernel subclass:
class ParametricKernel: CIImageProcessorKernel {
static let device = MTLCreateSystemDefaultDevice()!
override class var outputFormat: CIFormat {
return .BGRA8
}
override class func formatForInput(at input: Int32) -> CIFormat {
return .BGRA8
}
override class func process(with inputs: [CIImageProcessorInput]?, arguments: [String : Any]?, output: CIImageProcessorOutput) throws {
guard
let commandBuffer = output.metalCommandBuffer,
let images = arguments?["images"] as? [CGImage],
let mask = arguments?["mask"] as? CGImage,
let fillTime = arguments?["fillTime"] as? CGFloat,
let betaLimit = arguments?["betaLimit"] as? CGFloat,
let alphaLimit = arguments?["alphaLimit"] as? CGFloat,
let errorScaling = arguments?["errorScaling"] as? CGFloat,
let timing = arguments?["timing"],
let TTRThreshold = arguments?["ttrthreshold"] as? CGFloat,
let input = inputs?.first,
let sourceTexture = input.metalTexture,
let destinationTexture = output.metalTexture
else {
return
}
guard let kernelFunction = device.makeDefaultLibrary()?.makeFunction(name: "parametric") else {
return
}
guard let commandEncoder = commandBuffer.makeComputeCommandEncoder() else {
return
}
let imagesTexture = Texture.textureFromImages(images)
let pipelineState = try device.makeComputePipelineState(function: kernelFunction)
commandEncoder.setComputePipelineState(pipelineState)
commandEncoder.setTexture(imagesTexture, index: 0)
let maskTexture = Texture.textureFromImages([mask])
commandEncoder.setTexture(maskTexture, index: 1)
commandEncoder.setTexture(destinationTexture, index: 2)
var errorScalingFloat = Float(errorScaling)
let errorBuffer = device.makeBuffer(bytes: &errorScalingFloat, length: MemoryLayout<Float>.size, options: [])
commandEncoder.setBuffer(errorBuffer, offset: 0, index: 1)
// Other buffers omitted....
let threadsPerThreadgroup = MTLSizeMake(16, 16, 1)
let width = Int(ceil(Float(sourceTexture.width) / Float(threadsPerThreadgroup.width)))
let height = Int(ceil(Float(sourceTexture.height) / Float(threadsPerThreadgroup.height)))
let threadGroupCount = MTLSizeMake(width, height, 1)
commandEncoder.dispatchThreadgroups(threadGroupCount, threadsPerThreadgroup: threadsPerThreadgroup)
commandEncoder.endEncoding()
}
}
The Metal feature set tables specifies that beginning with the Apple4 family, the "Maximum threads per threadgroup" is 1024. Given that a single threadgroup is guaranteed to be run on the same GPU shader core, it means that a shader core of any new Apple GPU must be capable of running at least 1024/32 = 32 warps in parallel.
From the WWDC session "Scale compute workloads across Apple GPUs (6:17)":
For relatively complex kernels, 1K to 2K concurrent threads per shader core is considered a very good occupancy.
The cited sentence suggests that a single shader core is capable of running at least 2K (I assume this is meant to be 2048) threads in parallel, so 2048/32 = 64 warps running in parallel.
However, I am curious what is the maximum theoretical amount of warps running in parallel on a single shader core (it sounds like it is more than 64). The WWDC session mentions 2K to be only "very good" occupancy. How many threads would be "the best possible" occupancy?
Our app encountered the following error:
Execution of the command buffer was aborted due to an error during execution. Ignored (for causing prior/excessive GPU errors) (00000004:kIOGPUCommandBufferCallbackErrorSubmissionsIgnored)
How many 32-bit variables can I use concurrently in a single thread of a Metal compute kernel without worrying about the variables getting spilled into the device memory? Alternatively: how many 32-bit registers does a single thread have available for itself?
Let's say that each thread of my compute kernel needs to store and work with its own array of N float variables, where N can be 128, 256, 512 or more. To achieve maximum possible performance, I do not want to the local thread variables to get spilled into the slow device memory. I want all N variables to be stored "on-chip", in the thread memory space.
To make my question more concrete, let's say there is an array thread float localArray[N]. Assuming an unrealistic hypothetical scenario where localArray is the only variable in the whole kernel, what is the maximum value of N for which no portion of localArray would get spilled into the device memory?
I searched in the Metal feature set tables, but I could not find any details.
How is it possible to enable EDR on Apple TV without AVFoundation for custom HDR video playback? The use case is a custom video player for HDR playback via VideoToolbox and Metal, which seem to render colors correctly on iOS but not on tvOS.
All related documentation and WWDC sessions describe APIs that are unavailable for tvOS:
let metalLayer = CAMetalLayer()
metalLayer.wantsExtendedDynamicRangeContent = true
metalLayer.edrMetadata = CAEDRMetadata.hdr10(minLuminance: 0.0, maxLuminance: 1000, opticalOutputScale: 100)
What's the alternative path for tvOS to have correct system tone mapping for a setup like:
metalLayer.pixelFormat = .rgba16Float // (or .bgr10_xr)
metalLayer.colorspace = CGColorSpace(name: CGColorSpace.itur_2100_PQ)
Video format: HEVC, YUV 4:2:0 10bit, BT.2020 PQ.
We do set the preferredDisplayCriteria on AVDisplayManager and thus video range matching is in place.
WWDC Ref: https://developer.apple.com/videos/play/wwdc2022/110565?time=557