Metal Performance Shaders

RSS for tag

Optimize graphics and compute performance with kernels that are fine-tuned for the unique characteristics of each Metal GPU family using Metal Performance Shaders.

Metal Performance Shaders Documentation

Posts under Metal Performance Shaders tag

37 Posts
Sort by:
Post marked as solved
1 Replies
456 Views
Hello. I'm working with Metal in Apple Vision Pro, and I've assumed that I can use Mesh shaders to work with Meshlets. But when creating the RenderPipeline, I get the following error message: "device does not support mesh shaders". The test is on the simulator, and my question is: Will Apple Vision Pro support Mesh shaders on physical devices? Thanks.
Posted
by
Post not yet marked as solved
0 Replies
435 Views
Hi, I am trying to extend the pytorch library. I would like to add MPS native Cholesky Decomposition. I finally got it working (mostly). But I am struggling to implement the status codes. What I did: // init status id<MTLBuffer> status = [device newBufferWithLength:sizeof(int) options:MTLResourceStorageModeShared]; if (status) { int* statusPtr = (int*)[status contents]; *statusPtr = 42; // Set the initial content to 42 NSLog(@"Status Value: %d", *statusPtr); } else { NSLog(@"Failed to allocate status buffer"); } ... [commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> commandBuffer) { // Your completion code here int* statusPtr = (int*)[status contents]; int statusVal = *statusPtr; NSLog(@"Status Value: %d", statusVal); // Update the 'info' tensor here based on statusVal // ... }]; for (const auto i : c10::irange(batchSize)) { ... [filter encodeToCommandBuffer:commandBuffer sourceMatrix:sourceMatrix resultMatrix:solutionMatrix status:status]; } (full code here: https://github.com/pytorch/pytorch/blob/ab6a550f35be0fdbb58b06ff8bfda1ab0cc236d0/aten/src/ATen/native/mps/operations/LinearAlgebra.mm) But this code prints the following when input with a non positive definite tensor: 2023-09-02 19:06:24.167 python[11777:2982717] Status Value: 42 2023-09-02 19:06:24.182 python[11777:2982778] Status Value: 0 initial tensor: tensor([[-0.0516, 0.7090, 0.9474], [ 0.8520, 0.3647, -1.5575], [ 0.5346, -0.3149, 1.9950]], device='mps:0') L: tensor([[-0.0516, 0.0000, 0.0000], [ 0.8520, -0.3612, 0.0000], [ 0.5346, -0.3149, 1.2689]], device='mps:0') What am I doing wrong? Why do I get a 0 (success) status even tough the matrix is not positive definite. Thank you in advance!
Posted
by
Post marked as solved
3 Replies
1k Views
Hello everyone! I have a small concern about one little thing when it comes to programming in metal. There are some models that I wish to use along with animations and skins on them, the file extension for them is called gltf. glTF has been used in a number of projects such as unity and unreal engine and godot and blender. I was wondering if metal supports this file extension or not. Anyone here knows the answer?
Posted
by
Post not yet marked as solved
1 Replies
575 Views
Hello, I've been working on an app that involves training a neural network model on the iPhone. I've been using the Metal Performance Shaders Graph (MPS Graph) for this purpose. In the training process the loss becomes Nan on iOS17 (21A329). I noticed that the official sample code for Training a Neural Network using MPS Graph (link) works perfectly fine on Xcode 14.3.1 with iOS 16.6.1. However, when I run the same code on Xcode 15.0 beta 8 with iOS 17.0 (21A329), the training process produces a NaN loss in function updateProgressCubeAndLoss. The official sample code and my own app exhibit the same issue. Has anyone else experienced this issue? Is this a known bug, or is there something specific that needs to be adjusted for iOS 17? Any guidance would be greatly appreciated. Thank you!
Posted
by
Post not yet marked as solved
7 Replies
1.7k Views
Hello - I have been struggling to find a solution online and I hope you can help me timely. I have installed the latest tesnorflow and tensorflow-metal, I even went to install the ternsorflow-nightly. My app generates the following as a result of my fit function on a CNN model with 8 layers. 2023-09-29 22:21:06.115768: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M1 Pro 2023-09-29 22:21:06.115846: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 16.00 GB 2023-09-29 22:21:06.116048: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 5.33 GB 2023-09-29 22:21:06.116264: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2023-09-29 22:21:06.116483: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) Most importantly, the learning process is very slow and I'd like to take advantage of al the new features of the latest versions. What can I do?
Posted
by
Post not yet marked as solved
6 Replies
897 Views
I only get this error when using the JAX Metal device (CPU is fine). It seems to be a problem whenever I want to modify values of an array in-place using at and set. note: see current operation: %2903 = "mhlo.scatter"(%arg3, %2902, %2893) ({ ^bb0(%arg4: tensor<f32>, %arg5: tensor<f32>): "mhlo.return"(%arg5) : (tensor<f32>) -> () }) {indices_are_sorted = true, scatter_dimension_numbers = #mhlo.scatter<update_window_dims = [0, 1], inserted_window_dims = [1], scatter_dims_to_operand_dims = [1]>, unique_indices = true} : (tensor<10x100x4xf32>, tensor<1xsi32>, tensor<10x4xf32>) -> tensor<10x100x4xf32> blocks = blocks.at[i].set( ...
Posted
by
Post not yet marked as solved
2 Replies
739 Views
I'm experimenting with Vision OS and Apple Vision Pro using the Xcode Beta. I'm using Xcode 15.1 Beta and visionOS 1.0 beta 4. I'm currently doing a project where I draw a polygon using a mesh generated from MeshDescriptor/MeshResource and present it in an ImmersiveView. I want to change the color of parts, i.e. not all of, my 3D rendered polygon and I want to do it dynamically. For example when the user presses a button. I have gotten into Shaders and the CustomMaterial from RealityKit, only to find out that CustomMaterial is not supported on Vision OS! Does anyone know how I can color portions/parts of a mesh that is generated from MeshDescriptor and MeshResource?
Posted
by
Post not yet marked as solved
2 Replies
506 Views
device: iphone 11 os: ios 15.6 I have a metal applicaton on IOS where a series of computer shaders are encoded, then disptached and comiited together at last. When I capture a GPU trace of my application, however I noticed there are these gaps between each computer shader invocation. And these gaps seem to take up a big part of the GPU time. I'm wondering what are these gaps and what are causing them. Since all compute dispatch commands are commiited toghether at once, these gaps shouldn't be synchronizations between cpu and GPU PS: In my application, later compute commands mostly depend on former ones and would use the result buffer from former invocations. But as shown in the picture, bandwith and read/write buffer limiter are not high as far as I'm concerned.
Posted
by
Post not yet marked as solved
2 Replies
615 Views
I Instrument's CPU Profiling tool I've noticed that a significant portion (22.5%) of the CPU-side overhead related to MPS matrix multiplication (GEMM) is in a call to getenv(). Please see attached screenshot. It seems unnecessary to perform this same check over and over, as whatever hack that needs this should be able to perform the getenv() only once and cache the result for future use.
Posted
by
Post not yet marked as solved
0 Replies
293 Views
Hi, I am using xcode frame capture to profile my app's shader. And I got some question about the shader per line profile statistics. Please see the two screen shot first, it is my compute shader. Begin: End: The first image is the head of the shader. The profile show's that the shader entry function takes 72.44% of the time. And at the end of the shader, the profile shows that the right brace '}' takes 60.45%. Here is my question: How to properly understand the profile data? What's the real performance data of this shader? Why the shader entry function does not take 100% of the time? Can someone help me to answer the question? Thanks! Boson
Posted
by
Post not yet marked as solved
0 Replies
358 Views
Hello, I'm currently facing an issue while working with MetalPerformanceShaders in testing a Python project. Code: https://github.com/Thinklab- SJTU/Crossformer Error Description: /AppleInternal/Library/BuildRoots/4e1473ee-9f66-11ee-8daf-cedaeb4cabe2/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:126: failed assertion `[MPSNDArrayDescriptor sliceDimension:withSubrange:] error: the range subRange.start + subRange.length does not fit in dimension[2] (15)' I've tried updating macOS to the latest version. Also, I've attempted running the code with MPS_NO_DEVICE_CHECK=1 to bypass device checks. But errors happened again. I'm seeking insights or solutions to this problem and If anyone has encountered a similar issue or has suggestions on how to troubleshoot and resolve this assertion failure, I would greatly appreciate your help. Project Details: Language: Python Environment: PyCharm CE Relevant Technologies: MetalPerformanceShaders, MPSNDArray Thank you for your time and assistance!
Posted
by
Post not yet marked as solved
0 Replies
333 Views
it appears that the Metal Debugging interface does not support this method, at least the function hashing algorithm does not have a pattern for it in the symbol dictionary as presented. Where do we get updated C- libraries and functions that sync with the things that are presented in the Demo Kits and Samples that Apple puts in the user domain? Why does this stuff get out into the wild insufficiently tested? It seems thet the demo kits made available to users should be included in the test domain used to verify new code releases. I came from a development environment where the 6 month release cycle involved automated execution of the test suite before it went beta or anywhere else.
Posted
by
Post not yet marked as solved
0 Replies
349 Views
Hello, I have been following the excellent/informative "Metal for Machine Learning" from WWDC19 to learn how to do on device training (I have a specific use case for this) and it is all working really well using the MPSNNGraph. However, I would like to call my own metal compute/render function/pipeline to transform the inference result before calculating the loss, does anyone know if this possible and what would this look like in code? Please see my current code below, at the comment I need to call an intermediate compute/render function to transform the inference result image before passing to the MPSNNForwardLossNode. let rgbImageNode = MPSNNImageNode(handle: nil) let inferGraph = makeInferenceGraph() let reshape = MPSNNReshapeNode(source: inferGraph.resultImage, resultWidth: 64, resultHeight: 64, resultFeatureChannels: 4) //Need to call render or compute pipeline to post process in the inference result image let rgbLoss = MPSNNForwardLossNode(source:reshape.resultImage, labels:rgbImageNode, lossDescriptor:lossDescriptor) let initGrad = MPSNNInitialGradientNode(source:rgbLoss.resultImage) let gradNodes = initGrad.trainingGraph(withSourceGradient:nil, nodeHandler:nil) guard let trainGraph = MPSNNGraph(device: device, resultImage: gradNodes![0].resultImage, resultImageIsNeeded: true) else{ fatalError("Unable to get training graph.") } Thanks
Posted
by
Post not yet marked as solved
1 Replies
243 Views
I'm brand new to Metal. I've googled, but can't get the right answer to come up. (Thanks, unhelpful ChatGPT generated answers polluting everything, but I digress...) Ultimately, I'm trying to figure out how to use Metal to render 3D DICOM data on iOS specifically. If you're not familiar with DICOM, let's just say I've got a whole stack of CT image slices. Or to get really simple, I've got a cube of voxel values with differing values at each voxel coordinate. Where do I even start in Metal to render something like this? (I was trying to get the VTK toolkit compiled for iOS, which uses OpenGL, but that appears to be a dead end. And besides, Metal is supposed to be so much better.) Thanks for any tips/leads/suggestions/general pointers.
Posted
by
Post not yet marked as solved
1 Replies
41 Views
What best to pass to the options parameter of: MTLDevice.makeBuffer(length:options:) MTLDevice.makeBuffer(bytes:length:options:) MTLDevice.makeBuffer(bytesNoCopy:length:options:deallocator:) Basically I'm looking for a "plain English" explanation of MTLResourceOptions doc page.
Posted
by