arScnView = ARSCNView(frame: CGRect.zero, options: nil)
arScnView.delegate = self
arScnView.automaticallyUpdatesLighting = true
arScnView.allowsCameraControl = true
addSubview(arScnView)
arSession = arScnView.session
arSession.delegate = self
config = ARWorldTrackingConfiguration()
config.sceneReconstruction = .meshWithClassification
config.environmentTexturing = .automatic
func session(_ session: ARSession, didAdd anchors: [ARAnchor])
{
anchors.forEach({ anchor in
if let meshAnchor = anchor as? ARMeshAnchor {
let node = meshAnchor.toSCNNode()
self.arScnView.scene.rootNode.addChildNode(node)
}
if let environmentProbeAnchor = anchor as? AREnvironmentProbeAnchor {
// Can I retrieve the texture map corresponding to ARMeshAnchor from Environment Probe Anchor?
// Or how can I retrieve the texture map corresponding to ARMeshAnchor?
}
})
}
How can I scan a 3D scene and save it as USDZ?
I want to achieve the following scenario?
Metal Performance Shaders
RSS for tagOptimize graphics and compute performance with kernels that are fine-tuned for the unique characteristics of each Metal GPU family using Metal Performance Shaders.
Posts under Metal Performance Shaders tag
19 Posts
Sort by:
Post
Replies
Boosts
Views
Activity
I'm making an app that reads a ProRes file, processes each frame through metal to resize and scale it, then outputs a new ProRes file. In the future the app will support other codecs but for now just ProRes. I'm reading the ProRes 422 buffers in the kCVPixelFormatType_422YpCbCr16 pixel format. This is what's recommended by Apple in this video https://developer.apple.com/wwdc20/10090?time=599.
When the MTLTexture is run through a metal performance shader, the colorspace seems to force RGB or is just not allowing yCbCr textures as the output is all green/purple. If you look at the render code, you will see there's a commented out block of code to just blit copy the outputTexture, if you perform the copy instead of the scaling through MPS, the output colorspace is fine. So it appears the issue is from Metal Performance Shaders.
Side note - I noticed that when using this format, it brings in the YpCbCr texture as a single plane. I thought it's preferred to handle this as two separate planes? That said, if I have two separate planes, that makes my app more complicated as I would need to scale both planes or merge it to RGB. But I'm going for the most performance possible.
A sample project can be found here: https://www.dropbox.com/scl/fo/jsfwh9euc2ns2o3bbmyhn/AIomDYRhxCPVaWw9XH-qaN0?rlkey=sp8g0sb86af1u44p3xy9qa3b9&dl=0
Inside the supporting files, there is a test movie. For ease, I would move this to somewhere easily accessible (i.e Desktop).
Load and run the example project.
Click 'Select Video'
Select that video you placed on your desktop
It will now output a new video next to the selected one, named "Output.mov"
The new video should just be scaled at 50%, but the colorspace is all wrong.
Below is a photo of before and after the metal performance shader.
Here is the test code run in a macOS app (MacOS 15 Beta3).
If the excutable path does not contain Chinese character, every thing go as We expect. Otherwise(simply place excutable in a Chinese named directory) , the MTLLibrary We made by newLibraryWithSource: function contains no functions, We just got logs:
"Library contains the following functions: {}"
"Function 'squareKernel' not found."
Note: macOS 14 works fine
id<MTLDevice> device = MTLCreateSystemDefaultDevice();
if (!device) {
NSLog(@"not support Metal.");
}
NSString *shaderSource = @
"#include <metal_stdlib>\n"
"using namespace metal;\n"
"kernel void squareKernel(device float* data [[buffer(0)]], uint gid [[thread_position_in_grid]]) {\n"
" data[gid] *= data[gid];\n"
"}";
MTLCompileOptions *options = [[MTLCompileOptions alloc] init];
options.languageVersion = MTLLanguageVersion2_0;
NSError *error = nil;
id<MTLLibrary> library = [device newLibraryWithSource:shaderSource options:options error:&error];
if (error) {
NSLog(@"New MTLLibrary error: %@", error);
}
NSArray<NSString *> *functionNames = [library functionNames];
NSLog(@"Library contains the following functions: %@", functionNames);
id<MTLFunction> computeShaderFunction = [library newFunctionWithName:@"squareKernel"];
if (computeShaderFunction) {
NSLog(@"Found function 'squareKernel'.");
NSError *pipelineError = nil;
id<MTLComputePipelineState> pipelineState = [device newComputePipelineStateWithFunction:computeShaderFunction error:&pipelineError];
if (pipelineError) {
NSLog(@"Create pipeline state error: %@", pipelineError);
}
NSLog(@"Create pipeline state succeed!");
} else {
NSLog(@"Function 'squareKernel' not found.");
}
Unity 2022.3.33f1
For some reason modifying MeshRenderer material shader SetVectorArray doesn't work on IOS, but it works on android and windows builds!!
I was working on Fog Of War, where I used SimpleFOW by Revision3 it's very simple FOW shader where it manipulates the alpha based on UVs vertices.
This is the FogOfWarShaderControl.cs script
using System.Collections;
using System.Collections.Generic;
using UnityEngine;
namespace SimpleFOW
{
[RequireComponent(typeof(MeshRenderer))]
public class FogOfWarShaderControl : MonoBehaviour
{
public static FogOfWarShaderControl Instance { get; private set; }
[Header("Maximum amount of revealing points")]
[SerializeField] private uint maximumPoints = 512;
[Header("Game Camera")]
[SerializeField] private Camera mainCamera;
private List<Vector4> points = new List<Vector4>();
private Vector2 meshSize, meshExtents;
private Vector4[] sendBuffer;
private MeshRenderer meshRenderer;
private void Awake()
{
Instance = this;
Init();
}
// Initialize required variables
public void Init()
{
meshRenderer = GetComponent<MeshRenderer>();
meshExtents = meshRenderer.bounds.extents;
meshSize = meshRenderer.bounds.size;
points = new List<Vector4>();
sendBuffer = new Vector4[maximumPoints];
}
// Transform world point to UV coordinate of FOW mesh
public Vector2 WorldPointToMeshUV(Vector2 wp)
{
Vector2 toRet = Vector2.zero;
toRet.x = (transform.position.x - wp.x + meshExtents.x) / meshSize.x;
toRet.y = (transform.position.y - wp.y + meshExtents.y) / meshSize.y;
return toRet;
}
// Show or hide FOW
public void SetEnabled(bool on)
{
meshRenderer.enabled = on;
}
// Add revealing point to FOW renderer if amount of points is lower than MAX_POINTS
public void AddPoint(Vector2 worldPoint)
{
if (points.Count < maximumPoints)
{
points.Add(WorldPointToMeshUV(worldPoint));
}
}
// Remove FOW revealing point
public void RemovePoint(Vector2 worldPoint)
{
if (worldPoint == new Vector2(0, 0))
{
return;
}
if (points.Contains(WorldPointToMeshUV(worldPoint)))
{
points.Remove(WorldPointToMeshUV(worldPoint));
}
}
// Send any change to revealing point list to shader for rendering
public void SendPoints()
{
points.ToArray().CopyTo(sendBuffer, 0);
meshRenderer.material.SetVectorArray("_PointArray", sendBuffer);
meshRenderer.material.SetInt("_PointCount", points.Count);
}
// Send new range value to shader
public void SendRange(float range)
{
meshRenderer.material.SetFloat("_RadarRange", range);
}
// Send new scale value to shader
public void SendScale(float scale)
{
meshRenderer.material.SetFloat("_Scale", scale);
}
}
}
And this is the FogOfWar.shader
Shader "Revision3/FogOfWar"
{
Properties
{
_MainTex ("Texture", 2D) = "black" {}
_PointCount("Point count", Range(0,512)) = 0
_Scale("Scale", Float) = 1.0
_RadarRange("Range", Float) = .5
_MaxAlpha("Maximum Alpha", Float) = 1.0
}
SubShader
{
Tags { "RenderType"="Transparent" "Queue"="Transparent" }
LOD 100
ZWrite Off
Blend SrcAlpha OneMinusSrcAlpha
Pass
{
CGPROGRAM
#pragma vertex vert
#pragma fragment frag
// make fog work
#pragma multi_compile_fog
#include "UnityCG.cginc"
struct appdata
{
float4 vertex : POSITION;
float2 uv : TEXCOORD0;
};
struct v2f
{
float2 uv : TEXCOORD0;
UNITY_FOG_COORDS(1)
float4 vertex : SV_POSITION;
};
sampler2D _MainTex;
float4 _MainTex_ST;
float _RadarRange;
uint _PointCount;
float _Scale;
float _MaxAlpha;
float2 _PointArray[512];
v2f vert (appdata v)
{
v2f o;
o.vertex = UnityObjectToClipPos(v.vertex);
o.uv = TRANSFORM_TEX(v.uv, _MainTex);
return o;
}
float getDistance(float2 pa[512], float2 uv) {
float cdist = 99999.0;
for (uint i = 0; i < _PointCount; i++) {
cdist = min(cdist, distance(pa[i]*_Scale, uv));
}
return cdist;
}
fixed4 frag (v2f i) : SV_Target
{
// sample the texture
fixed4 col = tex2D(_MainTex, i.uv);
i.uv *= _Scale;
if (_PointCount > 0)
col.w = min(_MaxAlpha, max(0.0f, getDistance(_PointArray, i.uv) - _RadarRange));
else
col.w = _MaxAlpha;
return col;
}
ENDCG
}
}
}
Now I create a gameobject called FogOfWar as follows
And then in Unit.cs script and Building.cs script I add the following logic
private Vector3 lastPos;
private void Update()
{
if (lastPos != transform.position)
{
FogOfWarShaderControl.Instance.RemovePoint(lastPos);
FogOfWarShaderControl.Instance.AddPoint(transform.position);
lastPos = transform.position;
FogOfWarShaderControl.Instance.SendPoints();
}
}
Now this gives me the effect of FOW as follows on IOS
where the result should be as follows on other devices
I don't know what causes this to happen only on IOS devices.
The logic works fine on android/windows/Linux/editor but not IOS devices.
So why metal API doesn't support shader set vector array?!
What best to pass to the options parameter of:
MTLDevice.makeBuffer(length:options:)
MTLDevice.makeBuffer(bytes:length:options:)
MTLDevice.makeBuffer(bytesNoCopy:length:options:deallocator:)
Basically I'm looking for a "plain English" explanation of MTLResourceOptions doc page.
I'm brand new to Metal. I've googled, but can't get the right answer to come up. (Thanks, unhelpful ChatGPT generated answers polluting everything, but I digress...)
Ultimately, I'm trying to figure out how to use Metal to render 3D DICOM data on iOS specifically. If you're not familiar with DICOM, let's just say I've got a whole stack of CT image slices. Or to get really simple, I've got a cube of voxel values with differing values at each voxel coordinate.
Where do I even start in Metal to render something like this?
(I was trying to get the VTK toolkit compiled for iOS, which uses OpenGL, but that appears to be a dead end. And besides, Metal is supposed to be so much better.)
Thanks for any tips/leads/suggestions/general pointers.
Hello,
I have been following the excellent/informative "Metal for Machine Learning" from WWDC19 to learn how to do on device training (I have a specific use case for this) and it is all working really well using the MPSNNGraph.
However, I would like to call my own metal compute/render function/pipeline to transform the inference result before calculating the loss, does anyone know if this possible and what would this look like in code?
Please see my current code below, at the comment I need to call an intermediate compute/render function to transform the inference result image before passing to the MPSNNForwardLossNode.
let rgbImageNode = MPSNNImageNode(handle: nil)
let inferGraph = makeInferenceGraph()
let reshape = MPSNNReshapeNode(source: inferGraph.resultImage, resultWidth: 64, resultHeight: 64, resultFeatureChannels: 4)
//Need to call render or compute pipeline to post process in the inference result image
let rgbLoss = MPSNNForwardLossNode(source:reshape.resultImage, labels:rgbImageNode, lossDescriptor:lossDescriptor)
let initGrad = MPSNNInitialGradientNode(source:rgbLoss.resultImage)
let gradNodes = initGrad.trainingGraph(withSourceGradient:nil, nodeHandler:nil)
guard let trainGraph = MPSNNGraph(device: device, resultImage: gradNodes![0].resultImage, resultImageIsNeeded: true) else{
fatalError("Unable to get training graph.")
}
Thanks
Hi there, I've met a problem that in my working project build settings. There is no Metal Compiler Build Setting, but it works well in my demo project.
And I'm certain I've move the shader files(.metal) into the bundle.
How could I resolve this problem?
XCode Version 15.0
13.6.1 (22G313)
it appears that the Metal Debugging interface does not support this method, at least the function hashing algorithm does not have a pattern for it in the symbol dictionary as presented. Where do we get updated C- libraries and functions that sync with the things that are presented in the Demo Kits and Samples that Apple puts in the user domain? Why does this stuff get out into the wild insufficiently tested? It seems thet the demo kits made available to users should be included in the test domain used to verify new code releases. I came from a development environment where the 6 month release cycle involved automated execution of the test suite before it went beta or anywhere else.
Hello,
I'm currently facing an issue while working with MetalPerformanceShaders in testing a Python project.
Code:
https://github.com/Thinklab- SJTU/Crossformer
Error Description:
/AppleInternal/Library/BuildRoots/4e1473ee-9f66-11ee-8daf-cedaeb4cabe2/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:126: failed assertion `[MPSNDArrayDescriptor sliceDimension:withSubrange:] error: the range subRange.start + subRange.length does not fit in dimension[2] (15)'
I've tried updating macOS to the latest version. Also, I've attempted running the code with MPS_NO_DEVICE_CHECK=1 to bypass device checks. But errors happened again.
I'm seeking insights or solutions to this problem and If anyone has encountered a similar issue or has suggestions on how to troubleshoot and resolve this assertion failure, I would greatly appreciate your help.
Project Details:
Language: Python
Environment: PyCharm CE
Relevant Technologies: MetalPerformanceShaders, MPSNDArray
Thank you for your time and assistance!
Hi,
I am using xcode frame capture to profile my app's shader. And I got some question about the shader per line profile statistics. Please see the two screen shot first, it is my compute shader.
Begin:
End:
The first image is the head of the shader. The profile show's that the shader entry function takes 72.44% of the time.
And at the end of the shader, the profile shows that the right brace '}' takes 60.45%.
Here is my question:
How to properly understand the profile data? What's the real performance data of this shader?
Why the shader entry function does not take 100% of the time?
Can someone help me to answer the question?
Thanks!
Boson
I Instrument's CPU Profiling tool I've noticed that a significant portion (22.5%) of the CPU-side overhead related to MPS matrix multiplication (GEMM) is in a call to getenv(). Please see attached screenshot.
It seems unnecessary to perform this same check over and over, as whatever hack that needs this should be able to perform the getenv() only once and cache the result for future use.
I'm using instrument to profile my metal app, found there are more than one display devices shows on my view, I'm wondering what these display devices mean, thx .
device: iphone 11 os: ios 15.6
I have a metal applicaton on IOS where a series of computer shaders are encoded, then disptached and comiited together at last. When I capture a GPU trace of my application, however I noticed there are these gaps between each computer shader invocation. And these gaps seem to take up a big part of the GPU time.
I'm wondering what are these gaps and what are causing them. Since all compute dispatch commands are commiited toghether at once, these gaps shouldn't be synchronizations between cpu and GPU
PS: In my application, later compute commands mostly depend on former ones and would use the result buffer from former invocations. But as shown in the picture, bandwith and read/write buffer limiter are not high as far as I'm concerned.
I'm experimenting with Vision OS and Apple Vision Pro using the Xcode Beta. I'm using Xcode 15.1 Beta and visionOS 1.0 beta 4.
I'm currently doing a project where I draw a polygon using a mesh generated from MeshDescriptor/MeshResource and present it in an ImmersiveView.
I want to change the color of parts, i.e. not all of, my 3D rendered polygon and I want to do it dynamically. For example when the user presses a button.
I have gotten into Shaders and the CustomMaterial from RealityKit, only to find out that CustomMaterial is not supported on Vision OS!
Does anyone know how I can color portions/parts of a mesh that is generated from MeshDescriptor and MeshResource?
I only get this error when using the JAX Metal device (CPU is fine). It seems to be a problem whenever I want to modify values of an array in-place using at and set.
note: see current operation:
%2903 = "mhlo.scatter"(%arg3, %2902, %2893) ({
^bb0(%arg4: tensor<f32>, %arg5: tensor<f32>):
"mhlo.return"(%arg5) : (tensor<f32>) -> ()
}) {indices_are_sorted = true, scatter_dimension_numbers = #mhlo.scatter<update_window_dims = [0, 1], inserted_window_dims = [1], scatter_dims_to_operand_dims = [1]>, unique_indices = true} : (tensor<10x100x4xf32>, tensor<1xsi32>, tensor<10x4xf32>) -> tensor<10x100x4xf32>
blocks = blocks.at[i].set(
...
Hello - I have been struggling to find a solution online and I hope you can help me timely. I have installed the latest tesnorflow and tensorflow-metal, I even went to install the ternsorflow-nightly. My app generates the following as a result of my fit function on a CNN model with 8 layers.
2023-09-29 22:21:06.115768: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M1 Pro
2023-09-29 22:21:06.115846: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 16.00 GB
2023-09-29 22:21:06.116048: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 5.33 GB
2023-09-29 22:21:06.116264: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-09-29 22:21:06.116483: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: )
Most importantly, the learning process is very slow and I'd like to take advantage of al the new features of the latest versions. What can I do?
Hello,
I've been working on an app that involves training a neural network model on the iPhone. I've been using the Metal Performance Shaders Graph (MPS Graph) for this purpose. In the training process the loss becomes Nan on iOS17 (21A329).
I noticed that the official sample code for Training a Neural Network using MPS Graph (link) works perfectly fine on Xcode 14.3.1 with iOS 16.6.1. However, when I run the same code on Xcode 15.0 beta 8 with iOS 17.0 (21A329), the training process produces a NaN loss in function updateProgressCubeAndLoss. The official sample code and my own app exhibit the same issue.
Has anyone else experienced this issue? Is this a known bug, or is there something specific that needs to be adjusted for iOS 17?
Any guidance would be greatly appreciated.
Thank you!
I'm interested in using CatBoost and XGBoost for some machine learning projects on my Mac, and I was wondering if it's possible to run these algorithms on my GPU(s) to speed up training times.
I have a Mac with an AMD Radeon Pro 5600M and an Intel UHD Graphics 630 GPUs, and I'm running macOS Ventura 13.2.1. I've read that both CatBoost and XGBoost support GPU acceleration, but I'm not sure if this is possible on my system.
Can anyone point me in the right direction for getting started with GPU-accelerated CatBoost/XGBoost on macOS? Are there any specific drivers or tools I need to install, or any other considerations I should be aware of?
Thank you.