How many 32-bit variables can I use concurrently in a single thread of a Metal compute kernel without worrying about the variables getting spilled into the device memory? Alternatively: how many 32-bit registers does a single thread have available for itself?
Let's say that each thread of my compute kernel needs to store and work with its own array of N float variables, where N can be 128, 256, 512 or more. To achieve maximum possible performance, I do not want to the local thread variables to get spilled into the slow device memory. I want all N variables to be stored "on-chip", in the thread memory space.
To make my question more concrete, let's say there is an array thread float localArray[N]. Assuming an unrealistic hypothetical scenario where localArray is the only variable in the whole kernel, what is the maximum value of N for which no portion of localArray would get spilled into the device memory?
I searched in the Metal feature set tables, but I could not find any details.
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Posts under Metal tag
160 Posts
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
After following the instructions here:
https://developer.apple.com/metal/cpp/
I attempted building my project and Xcode presented several errors. In essence it's complaining about some redeclarations in the Metal-CPP headers.
NSBundle.hpp and NSError.hpp are included in the metal-cpp/foundation directory from the metal-cpp download.
Any help in getting these issues resolved is appreciated.
Thanks!
*** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '-[NSBundle allFrameworks]: unrecognized selector sent to instance
NS::Bundle* Bundle = NS::Bundle::mainBundle(); Bundle->allFrameworks();
call to allFrameworks() and allBundles() will throw exception, but other functions work well.
Following code crashes (sigsegv in lldb-rpc-server) when run as swift 6, but runs correctly when run as swift 5 (from "Metal by tutorials"):
import PlaygroundSupport
import MetalKit
print("start")
guard let device = MTLCreateSystemDefaultDevice() else {
fatalError("GPU is not supported")
}
let frame = CGRect(x: 0, y: 0, width: 600, height: 600)
let view = MTKView(frame: frame, device: device)
view.clearColor = MTLClearColor(red: 1, green: 1, blue: 0.8, alpha: 1)
let allocator = MTKMeshBufferAllocator(device: device)
let mdlMesh = MDLMesh(sphereWithExtent: [0.75,0.75,0.75], segments: [100, 100], inwardNormals: false, geometryType: .triangles, allocator: allocator)
let mesh = try MTKMesh(mesh: mdlMesh, device: device)
guard let commandQueue = device.makeCommandQueue() else {
fatalError("Could not create a command queue")
}
let shader = """
#include <metal_stdlib>
using namespace metal;
struct VertexIn {
float4 position [[attribute(0)]];
};
vertex float4 vertex_main(const VertexIn vertex_in [[stage_in]])
{
return vertex_in.position;
}
fragment float4 fragment_main() {
return float4(1, 0, 0, 1);
}
"""
print("A")
let library = try device.makeLibrary(source: shader, options: nil)
let vertexFunction = library.makeFunction(name: "vertex_main")
let fragmentFunction = library.makeFunction(name: "fragment_main")
let pipelineDescriptor = MTLRenderPipelineDescriptor()
pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
pipelineDescriptor.vertexFunction = vertexFunction
pipelineDescriptor.fragmentFunction = fragmentFunction
print("X")
pipelineDescriptor.vertexDescriptor = MTKMetalVertexDescriptorFromModelIO(mesh.vertexDescriptor)
let pipelineState = try device.makeRenderPipelineState(descriptor: pipelineDescriptor)
guard let commandBuffer = commandQueue.makeCommandBuffer(),
let renderPassDescriptor = view.currentRenderPassDescriptor,
let renderEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: renderPassDescriptor)
else {
fatalError()
}
renderEncoder.setRenderPipelineState(pipelineState)
renderEncoder.setVertexBuffer(mesh.vertexBuffers[0].buffer, offset: 0, index: 0)
guard let submesh = mesh.submeshes.first else {
fatalError()
}
renderEncoder.drawIndexedPrimitives(type: .triangle, indexCount: submesh.indexCount, indexType: submesh.indexType, indexBuffer: submesh.indexBuffer.buffer, indexBufferOffset: 0)
renderEncoder.endEncoding()
guard let drawable = view.currentDrawable else {
fatalError()
}
commandBuffer.present(drawable)
commandBuffer.commit()
print("test")
PlaygroundPage.current.liveView = view
Crash report: https://gist.githubusercontent.com/tumdum/8aa53bc806619c0d21c93a55fae07937/raw/370b00c07b08fff8856f9fc678de9888faa8d06e/crash.log
I'm on macOS 15.1.1 (24B2091) + Xcode 16.2 (16C5032a)
I am working on a custom resolve tile shader for a client. I see a big difference in performance depending on where we write to:
1- the resolve texture of the color attachment
2- a rw tile shader texture set via [renderEncoder setTileTexture: myResolvedTexture]
Option 2 is more than twice as slow than option 1.
Our compute shader writes to 4 UAVs so just using the resolve texture entry is not possible.
Why such a difference as there is no more data being written? Can option 2 be as fast as option 1?
I can demonstrate the issue in a modified version of the Multisample code sample.
I searched the Metal Shading Language Specification Version 3.0 document, however I cannot see any function for inverting a matrix. Is there really no function in Metal for inverting a matrix?
I often need to this in linear equations and have so far resorted to writing the necessary function each time, most of the time just copy-and-pasting code.
inverse exists in SIMD and GLSL, so why not in Metal? It seems so unexpected that this function does not exist that I am almost certain I have just overlooked something obvious. I even tried 1 / M, to no avail.
Hi everyone,
I'm currently working on a Swift Playgrounds project where I need to incorporate a Metal shader file. However, when I tried to include my shader file (PincushionShader.metal), I encountered the following error:
Is it possible to use Metal shader files within Swift Playgrounds, it is really important for my swift student challenge scene? If not, are there any workarounds or recommended approaches for testing Metal shaders in a similar environment?
Any guidance or suggestions would be greatly appreciated!
Topic:
Community
SubTopic:
Swift Student Challenge
Tags:
Swift Student Challenge
Metal
Swift Playground
Playground Support
Hello guys,
I have a question regarding the submission requirements. My app uses ARKit and requires Metal files for shaders, which are not supported by Swift Playgrounds. Therefore, I developed my app using Xcode.
(swift playgrounds returning error for metal file)
Since my app relies on a real device for proper functionality, I would like to know if, under these circumstances, the scene build is performed by Xcode. If the build were instead done by Swift Playgrounds, my scene would not function correctly.
I'm asking that because of this note
Thank you for your time and assistance.
Topic:
Community
SubTopic:
Swift Student Challenge
Tags:
Swift Student Challenge
Metal
Swift Playground
Xcode
I am currently finalizing my Swift Student Challenge submission, and Metal shaders are an essential part of my app. However, during submission, I noticed a note explaining: "Note: Xcode app playgrounds are run in Simulator", which is not possible for my app, as it also requires the camera of a physical device to function. So, I am currently transferring my app from Xcode into Swift Playgrounds, which I presume will run on physical devices.
However, I noticed that Swift Playgrounds do not yet support Metal shaders directly, so I am now pre-compiling my shaders to load them at runtime instead. Note that all the code below was run either in the terminal or in Xcode.
I have already compiled my Metal shaders with:
xcrun -sdk iphoneos metal -o Shaders.ir -c Shaders.metal
xcrun -sdk iphoneos metallib Shaders.ir -o Shaders.metallib
Which seems to have run without any problems.
When I run:
let shaderPath = Bundle.main.path(forResource: "Shaders", ofType: "metallib")
let shaderURL = URL(fileURLWithPath: shaderPath!)
let shaderData = try! Data(contentsOf: shaderURL)
do {
let device = MTLCreateSystemDefaultDevice()!
let library = try shaderData.withUnsafeBytes { bytes -> MTLLibrary? in
let dispatchData = DispatchData(bytes: bytes)
return try device.makeLibrary(data: dispatchData as __DispatchData)
}
print(library!.functionNames)
} catch {
print(error.localizedDescription)
}
My Metal shader functions are printed correctly in the console. However, based on my research, it seems like a MTLLibrary cannot be converted into a SwiftUI ShaderLibrary.
That is why I am now looking at these two initializers:
ShaderLibrary(url: URL)
ShaderLibrary(data: Data)
Which state: Creates a new Metal shader library from the contents of url/data, which must be the contents of precompiled Metal library. Functions compiled from the returned library will only be cached as long as the returned library exists., which I believe should work for my use case.
However, the problem arises when I run this code:
let shaderPath = Bundle.main.path(forResource: "Shaders", ofType: "metallib")
let shaderURL = URL(fileURLWithPath: shaderPath!)
let library = ShaderLibrary(url: shaderURL)
My app consistently seems to crash on the ShaderLibrary initialization, rendering the app unusable. Why does ShaderLibrary(url: shaderURL) cause a crash, even though my .metallib file is valid? Are there additional requirements for loading a ShaderLibrary that I may have missed?
In my project I need to do the following:
In runtime create metal Dynamic library from source.
In runtime create metal Executable library from source and Link it with my previous created Dynamic library.
Create compute pipeline using those two libraries created above.
But I get the following error at the third step:
Error Domain=AGXMetalG15X_M1 Code=2 "Undefined symbols:
_Z5noisev, referenced from: OnTheFlyKernel
" UserInfo={NSLocalizedDescription=Undefined symbols:
_Z5noisev, referenced from: OnTheFlyKernel
}
import Foundation
import Metal
class MetalShaderCompiler {
let device = MTLCreateSystemDefaultDevice()!
var pipeline: MTLComputePipelineState!
func compileDylib() -> MTLDynamicLibrary {
let source = """
#include <metal_stdlib>
using namespace metal;
half3 noise() {
return half3(1, 0, 1);
}
"""
let option = MTLCompileOptions()
option.libraryType = .dynamic
option.installName = "@executable_path/libFoundation.metallib"
let library = try! device.makeLibrary(source: source, options: option)
let dylib = try! device.makeDynamicLibrary(library: library)
return dylib
}
func compileExlib(dylib: MTLDynamicLibrary) -> MTLLibrary {
let source = """
#include <metal_stdlib>
using namespace metal;
extern half3 noise();
kernel void OnTheFlyKernel(texture2d<half, access::read> src [[texture(0)]],
texture2d<half, access::write> dst [[texture(1)]],
ushort2 gid [[thread_position_in_grid]]) {
half4 rgba = src.read(gid);
rgba.rgb += noise();
dst.write(rgba, gid);
}
"""
let option = MTLCompileOptions()
option.libraryType = .executable
option.libraries = [dylib]
let library = try! self.device.makeLibrary(source: source, options: option)
return library
}
func runtime() {
let dylib = self.compileDylib()
let exlib = self.compileExlib(dylib: dylib)
let pipelineDescriptor = MTLComputePipelineDescriptor()
pipelineDescriptor.computeFunction = exlib.makeFunction(name: "OnTheFlyKernel")
pipelineDescriptor.preloadedLibraries = [dylib]
pipeline = try! device.makeComputePipelineState(descriptor: pipelineDescriptor, options: .bindingInfo, reflection: nil)
}
}
Our app uses Metal for image processing. We have found that if our app (and its possible intensive image processing) is started quickly after user is logged in, then calls to Metal may be hanging/stuck for a good while.
Example: it can take 1-2 minutes for something that usually takes 3-5 seconds! Metal threads are just hanging in a memmove...
In Activity Monitor we see a lot of things are happening right after log-in. But why Metal calls are blocking for so long is unknown to us...
The workaround is to wait a minute before we start our app and start intensive image processing using Metal. But hard to explain this workaround to end-users...
It doesn't happen on all computers but fairly easy to reproduce on some computers.
We are using macOS 15.3.1. M1/M3 Max.
Any good ideas for how to proceed with this problem and possible reach out to Apple engineers?
Thanks! :)
We as a team of engineers work on an app intended to visualize medical images. The type of situations where the app is used involves time critical decision making for acute clinical conditions. Stability of the app and performance are of utmost importance and can directly help timely treatment action. The app we are developing uses multiple libraries and tools like vtk, webgl, opengl, webkit, gl-matrix etc.
The problem specifically can be described as follows, it has been observed that when 3D volume is rendered in the app and we try to rotate the volume the rotation is slow, unresposive and laggy. Specifically, we have noticed that iOS 18.1 the volume rotation is much smoother as compared to latest iOS 18.2. Eariler, we have faced somewhat similar issue with iOS 17 but it got improved in iOS 18.1. This performance regression is affecting the user experience in our healthcare application.
We have taken reference from the cornerstone.js code and you can reproduce the issue using the following example: https://www.cornerstonejs.org/live-examples/volumeviewport3d
Steps to Reproduce:
Load the above mentioned test example on an iPhone running version 18.2 using safari.
Perform volume rendering using the provided dataset.
Measure the time taken by volume for each rotate or drag action.
Repeat the same steps on an iPhone running version 18.1 for comparison.
Additional Information:
Device Model Tested:
iPhone12, iPhone13, iPhone14
iOS Version With Issue:
18.2
18.3(Beta)
I would appreciate any insights or suggestions on how to address this performance regression. If additional information is needed, please let me know.
Thank you.
Hello! I have asked this question in previous years, but I want to make sure I can be safe as each challenge could be different.
Are applicants for the Swift Student Challenge allowed to use the features and technologies involved with Metal/MetalKit? Last year, the answer was yes. I have seen a few people here and there use it with Swift and won.
I would like to know if we can use it for the 2025 challenge for this year as well.
Thanks! :)
Topic:
Developer Tools & Services
SubTopic:
Swift Playground
Tags:
Swift Student Challenge
Metal
MetalKit
The title is self-exploratory. I wasn't able to find the CAMetalDisplayLink on the most recent metal-cpp release (metal-cpp_macOS15_iOS18-beta). Are there any plans to include it in the next release?
Hi,
The metal-cpp distribution appears to only contain headers for Foundation and Quartzcore. The LearnMetalCPP download [1] provides a ZIP with an metal-cpp-extensions directory containing AppKit.hpp and MetalKit.hpp headers. First question: Are these headers distributed anywhere else more publicly? Without these headers only the renderer can be fully written in C++ as far as I can tell, i.e. no complete C++ NSApplication. Second question: Will these headers, if needed, be maintained (e.g. updated and/or extended) by Apple along side metal-cpp?
[1] https://developer.apple.com/metal/cpp/
Thank you and regards.
Following the post on
https://developer.apple.com/documentation/realitykit/custommaterial it's simple to use shader for materials and get uniforms and params from each vertex. However it's not available for visionOS. Any alternative to use in this case? I want to write shader to fill material by myself. (I have shader experience from web, familiar with fragment shader)
Hi,
We are trying to port our Unity app from other XR devices to Vision Pro. Thus it's way easier for us to use the Metal rendering layer, fully immersive. And to stay true to the platform, we want to keep the gaze/pinch interaction system.
But we just noticed that, unlike Polyspatial XR apps, VisionOS XR in Metal does not provide gaze info unless the user is actively pinching... Which forbids any attempt to give visual feedback on what they are looking at (buttons, etc).
Is this planned in Apple's roadmap ?
Thanks
I have a huge sphere where the camera stays inside the sphere and turn on front face culling on my ShaderGraphMaterial applied on that sphere, so that I can place other 3D stuff inside. However when it comes to attachment, the object occlusion never works as I am expecting. Specifically my attachments are occluded by my sphere (some are not so the behavior is not deterministic.
Then I suspect it was the issue of depth testing so I started using ModelSortGroup to reorder the rending sequence. However it doesn't work. As I was searching through the internet, this post's comments shows that ModelSortGroup simply doesn't work on attachments.
So I wonder how should I tackle this issue now? To let my attachments appear inside my sphere.
OS/Sys: VisionOS 2.3/XCode 16.3
Hello!
I need to "draw" a set of particles into the texture. It would be trivial in render encoder of course. However, I would like to implement the task in compute kernel. Every particle draw operation is expected to set 5 texels - "center" one and left/right/upper/lower. Particles can and will overlap, so concurrent draws are to be expected.
I tried using texture atomics - atomic_store() to be more precise. This worked, albeit pretty slowly - too slow for my purpose.
Just to test what would happen, I tried using normal texture write(). I was expecting to see some kind of visual artefacts, but to my surprise, it worked very well (and much faster).
My question: is it safe? I understand that calling write() doesn't guarantee any ordering of the operations, so if multiple threads write to the same texel, the final value may come from any of those threads. But suppose all the threads were to write the very same color? Can I assume that the texel in question will have said color after the compute kernel finishes?
I am using M2 Pro MacBook, but ideally I would love to get the answer for the all Apple Silicon devices. My texture format is R32Int (so as to be able to use atomics), but I could do with any single-channel format, the purpose of the texture is to be binary mask of sorts.
Thanks!
I have a bare-bones Metal app setup where I attach a CAMetalLayer to a window that inherits from a NSWindow with a custom delegate. Everything else is vanilla. I'm also using metal-cpp and metal shader converter.
I'm running into a issue where the application runs fine in the beginning, but once I resize the window, it starts hitching. It turns out that [CAMetalLayer nextDrawable:] frequently (but not always) takes around a full second (plus or minus a few milliseconds) to return once drawableSize has been updated.
I've tried setting allowsNextDrawableTimeout to false which doesn't work; it returns a valid drawable after a second instead of nil. Setting displaySyncEnabled to false reduces the likelihood of this happening to around 50% from 90%+ but does not eliminate it. Setting maximumDrawableCount to 2 or 3 does not seem to make a difference.
By dumping the resource IDs of the returned textures I've noticed something interesting: Before resizing, the layer seems to shuffle between 2 textures or at least 2 resource IDs, but after resizing it starts to create new textures for each returned drawable. Occasionally it seems to reuse a previous resource ID, but it does not seem to have anything to do with whether the method returns quickly or not.
Why does this happen, and how can I fix it? Should I create a new CAMetalLayer when resizing the window instead of updating drawableSize?