Posts

Post not yet marked as solved
1 Replies
566 Views
I posted this question to StackOverflow. Perhaps it is better suited here where Apple developers are more likely to see it. I was looking through the project linked on the page "Selecting Device Objects for Compute Processing" in the Metal documentation (linked here - https://developer.apple.com/documentation/metal/gpu_selection_in_macos/selecting_device_objects_for_compute_processing) There, I noticed a clever use of threadgroup memory that I am hoping to adopt in my own particle simulator. However, before I do so I need to understand a particular aspect of threadgroup memory and what the developers are doing in this scenario. The code contains a segment like so: metal // In AAPLKernels.metal // Parameter of the kernel threadgroup float4* sharedPosition [[threadgroup(0)]] // Body ... // For each particle / body for(i = 0; i params.numBodies; i += numThreadsInGroup) { // Because sharedPosition uses the threadgroup address space, 'numThreadsInGroup' elements // of sharedPosition will be initialized at once (not just one element at lid as it // may look like) sharedPosition[threadInGroup] = oldPosition[sourcePosition]; j = 0; while(j numThreadsInGroup) { acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr); acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr); acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr); acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr); acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr); acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr); acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr); acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr); } // while sourcePosition += numThreadsInGroup; } // for In particular, the comment just before the assignment of sharedPosition starting with "Because..." I found confusing. I haven't read anywhere that threadgroup memory writes happen on all threads in the same threadgroup simultaneously; in fact, I thought a barrier would be needed before reading from the shared memory pool again to avoid undefined behavior since *each* thread is subsequently reading from the entire pool of threadgroup memory after the assignment (the assignment being a write of course). Why is a barrier unnecessary here?
Posted Last updated
.
Post not yet marked as solved
6 Replies
959 Views
I have been unable to use the metal debugger ever since Apple released Xcode 12 as an update on the app store. It is very frustrating. Xcode 12.0.1 simply crashed on frame capture or after trying to debug a fragment/vertex. Now, Xcode 12.2 issues the following message: "Shader Debugger is not supported in this system configuration. Please install an Xcode with an SDK that is aligned to your target device OS version." I have macOS 10.15.7 and have not upgraded to Big Sur yet I downloaded Xcode 11.7 from the developer website but again, Xcode simply crashes. I will try other older Xcode versions but this should not be something that developers face, especially those working with Metal as it is nearly impossible to debug shaders without the shader debugger. Has anybody else had this issue? If so, what did you do to resolve it?
Posted Last updated
.
Post marked as solved
2 Replies
829 Views
In the WWDC talks on Metal that I have watched so far, many of the videos talk about Apple's A_ (fill in the blank, 11, 12, etc.) chip and the power it gives to the developer, such as allowing developers to leverage tile memory by opting to use TBDR. On macOS (at least Intel macs without the M1 chip), TBDR is unavailable, and other objects that leverage tile memory like image blocks are also unavailable. That made me wonder about the structure of the GPUs on macOS and external GPUs like the Blackmagic eGPU (which is currently hooked up to my computer). Are the concepts of tile memory ubiquitous across GPU architectures? For example, if in a Metal kernel function we declared threadgroup float tgfloats[16]; Is this value stored in tile memory (threadgroup memory) on the Blackmagic? Or is there an equivalent storage that is dependent on hardware but available on all hardware in some form? I know there are some WWDCs that deal with multiple GPUs which will probably be helpful, but extra information is always useful. Any links to information about GPU hardware architectures would be appreciated as well
Posted Last updated
.
Post marked as solved
2 Replies
685 Views
I have been working with Metal for a little while now and I have encountered the threadgroup address space. After reading a little about it in Apple’s MSL reference, I am aware of how threadgroups are formed and how they can be split into SIMD groups; however, I have not yet seen threadgroup memory in action. Can someone give me some examples of when/how threadgroup memory is used? Specifically, how is the [[threadgroup(n)]] attribute used in both kernel and fragment shaders? References to WWDC videos, articles, and/or other resources would be appreciated.
Posted Last updated
.
Post not yet marked as solved
3 Replies
517 Views
I recently updated to Xcode 12.0.1 and was looking to debug my shaders until I received the error DYPShaderDebuggerErrorDomain:2 "Could not generate shader metadata." crippling my workflow. Is there anybody else having this issue and/or knows how can I get it fixed? I did not encounter this issue in Xcode 11.
Posted Last updated
.