Project: I have some data wich could be transformed by shader, result may be kept in rgb channels of image. Great.
But now to mix dozens of those results? Not one by one, image after image, but all at once. Something like „complicated average” color of particular pixel from all delivered images.
Is it possible?
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Currently looking for Metal developers to port Quake 2 RTX to Metal RT in order to give Apple Silicon Macs an amazing Pathtracing demo, This project falls under NightSightProductions who is also working on a Portal 2 with RTX Remaster. if you are interested and want to help further Mac gaming, message me here or on discord at king_vulpes
Hi! I just installed GPTK2 on my new Mac , but the Terminal gave “Error:OpenSSL1.1 has been disabled.”
How should I fix it?Or waiting for the GPTK2 beta4?
Thanks.
I am making a framework in C++ using metal-cpp, basically a small game engine. I am also consequently using metal-cpp-extensions provided in LearnMetalCPP to make applications work.
For one of my classes, I needed to add AppKit.hpp inside a public header file, so I moved it and its associate headers(NSApplication.hpp, NSMenu.hpp, etc.) from Project headers to Public in Build Phases' Headers, however, it started giving me the error "cast of C pointer type 'void *' to Objective-C pointer type 'Class' requires a bridged cast" at several points in the AppKit headers. They don't appear when AppKit and its associates are in the Project headers, or when they are in the Private headers and no headers import it.
I imagined that disabling Objective-C ARC and Using __bridge casts outside of ARC in Build Settings would solve it, but it didn't budge.
I imagined it wouldn't involve actively changing the headers would be the answer, but even if I try to put __bridge before the problematic casts, it didn't recognize __bridge.
How do I solve this? And why is it only happening in Public and not Project headers?
I have an M1 Pro with a 16-core GPU. When I run a shader with 8193 threads, atomic_thread_fence is violated across the boundary between thread 8191 (the last thread in the 7th threadgroup) and 8192 (the first thread in the 9th threadgroup).
I've attached the Metal and Swift files, but I'll repost the relevant kernel here. It's a function that launches N threads to iterate through a binary tree from the leaves, where the first thread to reach the parent terminates and the second one populates it with the sum of the nodes two children.
// clang-format off
void sum(device const int& size,
device const int* __restrict__ in,
device int* __restrict__ out,
device atomic_int* visited,
uint i [[thread_position_in_grid]]) {
// clang-format on
int val = in[i];
uint cur = (size + i - 1);
out[cur] = val;
atomic_thread_fence(mem_flags::mem_device, memory_order_seq_cst);
cur = (cur - 1) / 2;
int proceed = atomic_fetch_add_explicit(&visited[cur], 1, memory_order_relaxed);
while (proceed == 1) {
uint left = 2 * cur + 1;
uint right = 2 * cur + 2;
uint val_left = out[left];
uint val_right = out[right];
uint val_cur = val_left + val_right;
out[cur] = val_cur;
if (cur == 0) {
break;
}
cur = (cur - 1) / 2;
atomic_thread_fence(mem_flags::mem_device, memory_order_seq_cst);
proceed = atomic_fetch_add_explicit(&visited[cur], 1, memory_order_relaxed);
}
}
What I'm observing is that thread 8192 hits the atomic_fetch_add first and terminates, while thread 8191 hits it second (observes that thread 8192 had incremented it by 1) and proceeds into the loop. Thread 8191 reads out[16383] (which it populated with 8191) and out[16384] (which thread 8192 populated with 8192 prior to the atomic_thread_fence). Instead of reading 8192 from out[16384] though, it reads 0.
Maybe I'm missing something but this seems like a pretty clear violation of the atomic_thread_fence which (I thought) was supposed to guarantee that the write from thread 8192 to out[16384] would be visible to any thread observing the effects of the following atomic_fetch_add. Is atomic_fetch_add not a store operation? Modifying it to an atomic_store or atomic_exchange still results in the bug. Adding another atomic_thread_fence between the atomic_fetch_add and reading of out also doesn't change anything.
I only begin to observe this on grid sizes of 8193 and upwards. That's 9 threadgroups per grid, which I assume could be related to my M1 Pro GPU having 16 cores.
Running the same example on an A17 Pro GPU doesn't show any of this behavior up through a tested grid size of 4194303 (2^22-1), at which point testing larger grid sizes starts to run into other issues so I can't test anything larger.
Removing the atomic_thread_fences on both the M1 and A17 cause the test to fail at much smaller grid sizes, as expected.
sum.metal
main.swift
I am working on a custom resolve tile shader for a client. I see a big difference in performance depending on where we write to:
1- the resolve texture of the color attachment
2- a rw tile shader texture set via [renderEncoder setTileTexture: myResolvedTexture]
Option 2 is more than twice as slow than option 1.
Our compute shader writes to 4 UAVs so just using the resolve texture entry is not possible.
Why such a difference as there is no more data being written? Can option 2 be as fast as option 1?
I can demonstrate the issue in a modified version of the Multisample code sample.
Is there a working example of imageblock_slice with implicit layout somewhere?
I get a compilation error when i write this:
imageblock_slilce color_slice = img_blk.slice(frag->color);
Error:
No matching member function for call to 'slice'
candidate template ignored: couldn't infer template argument 'E'
candidate function template not viable: requires 2 arguments, but 1 was provided
Too few template arguments for class template 'imageblock_slice'
It seems the syntax has changed since the Imageblocks presentation https://developer.apple.com/videos/play/tech-talks/603/
I tried supplying the struct type of the image block between <> but it still does not work.
I'm trying to pass a buffer of float2 items from CPU to GPU.
In the kernel, I can provide a parameter for the buffer:
device const float2* values
for example.
How do I specify float2 as the type for the MTL::Buffer?
I managed to get the code to work by "cheating" by defining a simple class that has the same data members as a float2, but there is probably a better way.
class Coord_f { public: float x{0.0f}; float y{0.0f}; };
then using code to allocate like this:
NS::TransferPtr(device->newBuffer(n_elements * sizeof(Coord_f), MTL::ResourceStorageModeManaged))
The headers for metal-cpp do not appear to define vector objects like float2, but I'm doubtless missing something.
Thanks.
Hello, Apple!
This post is a bug report for Metal driver in MacOS Sequoia.
I'm working on opensource game engine and one of my users reported a bug, on "MacOS Sequoia" + "AMD Radeon RX 6900 XT". Engine crashes, when liking a compute pipeline, with following NSError:
"Compiler encountered an internal error: I"
Offended shader (depth aware blur): https://shader-playground.timjones.io/27565de40391f62f078c891077ba758c
On my end, compiling same shader on M1 (Sonoma) or with offline compiler doesn't reproduce the issue.
Post with bug-report on github: https://github.com/Try/OpenGothic/issues/712
Looking forward for your help and driver fix ;)
Topic:
Graphics & Games
SubTopic:
Metal
I have a bare-bones Metal app setup where I attach a CAMetalLayer to a window that inherits from a NSWindow with a custom delegate. Everything else is vanilla. I'm also using metal-cpp and metal shader converter.
I'm running into a issue where the application runs fine in the beginning, but once I resize the window, it starts hitching. It turns out that [CAMetalLayer nextDrawable:] frequently (but not always) takes around a full second (plus or minus a few milliseconds) to return once drawableSize has been updated.
I've tried setting allowsNextDrawableTimeout to false which doesn't work; it returns a valid drawable after a second instead of nil. Setting displaySyncEnabled to false reduces the likelihood of this happening to around 50% from 90%+ but does not eliminate it. Setting maximumDrawableCount to 2 or 3 does not seem to make a difference.
By dumping the resource IDs of the returned textures I've noticed something interesting: Before resizing, the layer seems to shuffle between 2 textures or at least 2 resource IDs, but after resizing it starts to create new textures for each returned drawable. Occasionally it seems to reuse a previous resource ID, but it does not seem to have anything to do with whether the method returns quickly or not.
Why does this happen, and how can I fix it? Should I create a new CAMetalLayer when resizing the window instead of updating drawableSize?
I am interested in learning the Metal framework for rendering development. However, most of Apple’s official documentation uses Objective-C code. Therefore, I am seeking guidance on whether it is more advantageous for me to focus solely on learning Swift to gain proficiency in Metal.
When inspecting the geometry in Xcode's metal debugger, I noticed that the shown "frustrum box" didn't make sense. Since Metal uses depth range 0,1 in NDC space, I would expect a vertex that is projected to z:0 to be on the front clipping plane of the frustrum shown in the geometry inspector. This is however not the case. A vertex with ndc z:0 is shown halfway inside the frustrum. Vertices with ndc z less than 0 are correctly culled during rendering, while the geometry inspector's frustrum shows that the vertex is stil inside the frustrum.
The image shows vertices that are visually in the middle of the frustrum on z axis, but at the same time the out position shows that they are projected to z:0. How is this possible, unless there's a bug in the geometry inspector?
Hey everyone,
I’m trying to run Kingdom Come: Deliverance 2 using the Game Porting Toolkit, but I’m encountering a black screen when launching the game. From what I know about the game’s requirements, it might be using Shader Model 6.5, which supports advanced features like DirectX Raytracing (DXR) Tier 1.1. This leads me to suspect that the issue could be related to missing support for DirectX 12.1 Features or Shader Model 6.5 in GPTK.
Does anyone know if these features are currently supported by GPTK? If not, are there any plans to implement them in future updates? Alternatively, is there any workaround for games that rely on Shader Model 6.5 and ray tracing?
Thanks a lot for your help!
Our app uses Metal for image processing. We have found that if our app (and its possible intensive image processing) is started quickly after user is logged in, then calls to Metal may be hanging/stuck for a good while.
Example: it can take 1-2 minutes for something that usually takes 3-5 seconds! Metal threads are just hanging in a memmove...
In Activity Monitor we see a lot of things are happening right after log-in. But why Metal calls are blocking for so long is unknown to us...
The workaround is to wait a minute before we start our app and start intensive image processing using Metal. But hard to explain this workaround to end-users...
It doesn't happen on all computers but fairly easy to reproduce on some computers.
We are using macOS 15.3.1. M1/M3 Max.
Any good ideas for how to proceed with this problem and possible reach out to Apple engineers?
Thanks! :)
Hello ladies and gentlemen, I'm writing a simple renderer on the main actor using Metal and Swift 6. I am at the stage now where I want to create a render pipeline state using asynchronous API:
@MainActor
class Renderer {
let opaqueMeshRPS: MTLRenderPipelineState
init(/*...*/) async throws {
let descriptor = MTLRenderPipelineDescriptor()
// ...
opaqueMeshRPS = try await device.makeRenderPipelineState(descriptor: descriptor)
}
}
I get a compilation error if try to use the asynchronous version of the makeRenderPipelineState method:
Non-sendable type 'any MTLRenderPipelineState' returned by implicitly asynchronous call to nonisolated function cannot cross actor boundary
Which is understandable, since MTLRenderPipelineState is not Sendable. But it looks like no matter where or how I try to access this method, I just can't do it - you have this API, but you can't use it, you can only use the synchronous versions.
Am I missing something or is Metal just not usable with Swift 6 right now?
My app is running Compute Shaders that use non-uniform thread groups.
When I run the app in the debugger with a simulator target the app crashes on encoder.dispatchThreads and the error message is:
Dispatch Threads with Non-Uniform Threadgroup Size is not supported on this device.
Previously the log output states that:
Metal Shader Validation is unsupported for Simulator.
However:
When I stop the debugger and just run the app in the simulator without the debugger attached, the app just runs fine and does not crash.
The SwiftUI Preview that also triggers the Compute Shader when preparing data also just runs fine without a crash.
I can run and debug on a real device no problem - I just don't have all sizes available.
Is there anything I need to check in my lldb/simulator configuration? It obviously does work, just the debugger cannot really deal with it?
Any input would be nice as this really slows my down as I have to be extremely careful when debugging on the simulator.
*** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '-[NSBundle allFrameworks]: unrecognized selector sent to instance
NS::Bundle* Bundle = NS::Bundle::mainBundle(); Bundle->allFrameworks();
call to allFrameworks() and allBundles() will throw exception, but other functions work well.
Hey all! I'm got my hands on a refurbished mac mini m1 and already diving into metal. At the moment, i'm currently studying graphics programming with opengl and got to a point where I can almost create a 3d cube. However, I noticed there aren't many tutorials for metal cpp but rather demos. One thing I love about graphic programming, is skinning/skeletal animation. At the moment, I can't find any sources or tutorials on how to load skeletal animations into metal-cpp. So, if I create my character in blender and had all types of animations all loaded into a .FBX or maybe .DAE and load this into metal api with metal-cpp, how can I go on about how this works?
I am trying to learn Metal development on my MacBook Pro M1 Pro (Sequoia 15.3.1) on Xcode Playground, but when I write these two lines of code:
import Metal
let device = MTLCreateSystemDefaultDevice()!
I get the error The LLDB RPC server has crashed. Any ideas as to what I can do to solve this? I have rebooted the machine and reinstalled Xcode...
After following the instructions here:
https://developer.apple.com/metal/cpp/
I attempted building my project and Xcode presented several errors. In essence it's complaining about some redeclarations in the Metal-CPP headers.
NSBundle.hpp and NSError.hpp are included in the metal-cpp/foundation directory from the metal-cpp download.
Any help in getting these issues resolved is appreciated.
Thanks!