Metal Performance Shaders

RSS for tag

Optimize graphics and compute performance with kernels that are fine-tuned for the unique characteristics of each Metal GPU family using Metal Performance Shaders.

Posts under Metal Performance Shaders tag

29 Posts
Sort by:






Transfering data to Metal MTLBuffer dynamically in Objective-c
I have a following MTLBuffer created. How can I send INPUTVALUE to the memINPUT buffer? I need to send repeatedly in Objective-C. // header file @property id<MTLBuffer> memINPUT; // main file int length = 1000; ... memINPUT = [_device newBufferWithLength:(sizeof(float)*length) options:0]; ... float INPUTVALUE[length]; for (int i=0; i < length; i++) { INPUTVALUE[i] = (float)i; } // How to send to INPUTVALUE to memINPUT? ... The following is Swift version. I am looking for Objective-c version. memINPUT.contents().copyMemory(from: INPUTVALUE, byteCount: length * MemoryLayout<Float>.stride);
Jul ’23
Where is MPSGraphTool?
In the video here, the speaker refers to MPSGraphTool, which is supposed to convert from CoreML and other formats to the new MPSGraphPackage format. Searching for MPSGraphTool on Google returns only that video, and there is no mention of it on the forums here or elsewhere. When can we expect the tool to be released? How can we find out more information about it? My use case is that the ANECompilerService that runs on the Mac / iOS devices to compile CoreML Models / Programs is extremely slow and unreliable for large models. It often crashes entirely, sitting at 100% CPU usage forever and never completing the task at hand, meaning the user is stuck in a loading state. This also applies in Xcode when running a performance test. I would really like to compile the graph once and just run it on device directly.
Jul ’23
Build and execute metal app which perform calcuations on gpu without using xcode
I am following this on building a metal app for performing a GPU calculation. I am not able to figure out how to build and execute the project from the command line. Any help on how to build a main.m file using xcrun will be useful. I have tried xcrun -sdk macosx clang MetalComputeBasic/main.m but it doesn't work.
Jun ’23
How to sample a Mesh in Metal Ray-tracing Structure.
I am learning Accelerating ray tracing using Metal. The area light has its own struct in this sample code, but I want to sample rays directly from the LightMesh. Can I get the instances and geometry of lightMesh without using resources buffer? It seems the geometries are already loaded in the GPU because Metal3 is able to do the intersection test. However, I can only get primitive_data during the intersection, and cannot get the information when I tried to do sampling. Thanks a lot!
Jun ’23
Bindless/GPU-Driven approach with dynamic scenes?
I have been experimenting with different rendering approaches in Metal and am hitting a wall when it comes to reconciling "bindless" or GPU-driven approaches* with a dynamic scene where meshes can be added, removed, and changed. All the examples I have found of such approaches use fixed scenes, where all the data is fixed before the first draw call into something like a MeshBuffer that holds all scene geometry in the form of Mesh objects (for instance). While I can assume that recreating a MeshBuffer from scratch each frame would be possible but completely undesirable, and that there may be some clever tricks with pointers to update a MeshBuffer as needed, I would like to know if there is an established or optimal solution to this problem, or if these approaches are simply incompatible with dynamic geometry. Any example projects that do what I am asking that I may have missed would be appreciated, too. * I know these are not the same, but seem to share some common characteristics, namely providing your entire geometry to the GPU at once. Looping over an array of meshes and calling drawIndexedPrimitives from the CPU does not post any such obstacles, but also precludes some of the benefits of offloading work to the GPU, or having access to all geometry on the GPU for things like path tracing.
Jun ’23 failed assertion `destination kernel width and filter kernel width mismatch'
Hi, I am training an adversarial auto encoder using PyTorch 2.0.0 on Apple M2 (Ventura 13.1), with conda 23.1.0 as manager. I encountered this error: /AppleInternal/Library/BuildRoots/5b8a32f9-5db2-11ed-8aeb-7ef33c48bc85/Library/Caches/ failed assertion `destination kernel width and filter kernel width mismatch' /Users/vk/miniconda3/envs/betavae/lib/python3.10/multiprocessing/ UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown To my knowledge, the code broke down when running self.manual_backward(loss["g_loss"]) this block: g_opt.zero_grad() self.manual_backward(loss["g_loss"]) g_opt.step() The same code run without problems on linux distribution. Any thoughts on how to fix it are highly appreciated!
Jul ’23
Can I run CatBoost/XGBoost on my GPU(s) on my Mac?
I'm interested in using CatBoost and XGBoost for some machine learning projects on my Mac, and I was wondering if it's possible to run these algorithms on my GPU(s) to speed up training times. I have a Mac with an AMD Radeon Pro 5600M and an Intel UHD Graphics 630 GPUs, and I'm running macOS Ventura 13.2.1. I've read that both CatBoost and XGBoost support GPU acceleration, but I'm not sure if this is possible on my system. Can anyone point me in the right direction for getting started with GPU-accelerated CatBoost/XGBoost on macOS? Are there any specific drivers or tools I need to install, or any other considerations I should be aware of? Thank you.
Sep ’23