Hi!
How to define and call an inline function in Metal? Or simple function that will return some value.
Case:
inline uint index4D(constant _4D& shape,
constant uint& n,
constant uint& c,
constant uint& h,
constant uint& w) {
return n * shape.C * shape.H * shape.W + c * shape.H * shape.W + h * shape.W + w;
}
When I call it in my kernel function I get No matching function for call error.
Thx in advance.
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Post
Replies
Boosts
Views
Activity
Hello,
We are experimenting with Metal to accelerate some peculiar numerical computation. Our workloads are relatively small, so the ability to avoid moving data to and from the GPU's memory is very appealing. However, we are observing higher overhead compared to CUDA, which negates the benefits of avoiding data transfer.
In our tests using an empty kernel, CUDA completes in 0.001 ms (Intel i7 10700K, RTX 3080), while Metal's waitUntilCompleted takes 0.12 ms (M2 Max). As we do not have prior experience with Metal, we are wondering if we are using the APIs just fine and this timing is expected, or if there is a way to reduce it.
Thank you in advance for any comment!
test-metal.cpp
Hello,
This exact question was already asked in this forum (8 years ago) but I can't find a definitive answer:
Does Metal allow using the same color texture as both an input and output (color attachment) of a fragment shader? Is the behavior defined somewhere?
I believe this results in undefined behavior under both DirectX and OpenGL, so I'd assume the same for Metal, but then why doesn't Metal warn me about this as it does on some many other "misconfigurations"? It also seems to work correctly in my case, as I found out by accident.
Would love to get a clarification!
Thanks ahead!
I'm experiencing a strange issue where I'm seeing black in a metal drawable where it should be a different color. When I capture the frame and inspect the returned value from the fragment function, it's correct, but the drawable isn't.
This screenshot hopefully illustrates the issue.
I've not found any references to similar issues. I saw something about some out of bounds or NaN values being dropped to 0 (which would be black), but the debugger doesn't indicate this is happening.
I am looking to implement CAMetalDisplayLink on a separate thread on a macOS application. I am basing my implementation on the following example project:
Achieving Smooth Frame Rates with Metal Display Link
This project allows you to configure whether a separate thread is used for rendering by setting RENDER_ON_MAIN_THREAD in GameConfig to 0. However, when I set it to use a separate thread nothing is rendered. Stepping through the code shows that a separate thread is created, but a CAMetalDisplayLinkUpdate is never received. Does anyone know why this does not work?
Hi everyone,
I encountered a very strange shader bug that seems related to Metal only (not OpenGL).
You can find the full description of the issue on the Babylon.js forums here: https://forum.babylonjs.com/t/strange-shader-related-issue-on-macos-with-safari-and-chrome-not-firefox/54289 (sorry, I couldn't post a clickable link here as this seems to be blocked here).
I have a workaround to fix the issue (as described in the link above), but this really looks like an issue in Metal itself.
Let me know if you need more details or explanations.
First I get this
ar_world_tracking_provider_query_device_anchor_at_timestamp <0x302b9c0a0>: The device_anchor can only be queried when the world tracking provider is running.
This seemed to all break with the auto-update to 2.0.1. Simulator runs the code fine.
I seem to see an infinite stall here
frameLayer.endUpdate()
// Pace frames by waiting for the optimal prediction time.
try await LayerRenderer.Clock().sleep(until: timing.optimalInputTime, tolerance: nil)
// Start submitting the updated frame.
frameLayer.startSubmission() <-
I've been working with ARKit and Metal on the Vision Pro, but I've encountered a slight flickering issue with the mesh rendered using Metal. The flickering tends to occur around the edges of objects or on pixels with high color contrast, and it becomes more noticeable as the distance increases. Is there any way to resolve this issue?
Is there anywhere that the format of error/warning/other? messages that are generated by the Metal compiler are documented?
Hey guys,
is it possible to implement mirror like reflections like in this project:
https://developer.apple.com/documentation/metal/metal_sample_code_library/rendering_reflections_in_real_time_using_ray_tracing
for visionOS? Or is the hardware not prepared for Metal Raytracing?
Thanks in advance
Given
I do not understand much at all about how to write shaders
I do not understand the math associated with page-curl effects
I am trying to:
implement a page-curl shader for use on SwiftUI views.
I've lifted a shader from HIROKI IKEUCHI that I believe they lifted from a non-metal shader resource online, and I'm trying to digest it.
One thing I want to do is to paint the "underside" of the view with a given color and maintain the transparency of rounded corners when they are flipped over.
So, if an underside pixel is "clear" then I want to sample the pixel at that position on the original layer instead of the "curl effect" pixel.
There are two comments in the shader below where I check the alpha, and underside flags, and paint the color red as a debug test.
The shader gives this result:
The outside of those rounded corners is appropriately red and the white border pixels are detected as "not-clear". But the "inner" portion of the border is... mistakingly red?
I don't get it. Any help would be appreciated. I feel tapped out and I don't have any IRL resources I can ask.
//
// PageCurl.metal
// ShaderDemo3
//
// Created by HIROKI IKEUCHI on 2023/10/17.
//
#include <metal_stdlib>
#include <SwiftUI/SwiftUI_Metal.h>
using namespace metal;
#define pi float(3.14159265359)
#define blue half4(0.0, 0.0, 1.0, 1.0)
#define red half4(1.0, 0.0, 0.0, 1.0)
#define radius float(0.4)
// そのピクセルの色を返す
[[ stitchable ]] half4 pageCurl
(
float2 _position,
SwiftUI::Layer layer,
float4 bounds,
float2 _clickedPoint,
float2 _mouseCursor
) {
half4 undersideColor = half4(0.5, 0.5, 1.0, 1.0);
float2 originalPosition = _position;
// y座標の補正
float2 position = float2(_position.x, bounds.w - _position.y);
float2 clickedPoint = float2(_clickedPoint.x, bounds.w - _clickedPoint.y);
float2 mouseCursor = float2(_mouseCursor.x, bounds.w - _mouseCursor.y);
float aspect = bounds.z / bounds.w;
float2 uv = position * float2(aspect, 1.) / bounds.zw;
float2 mouse = mouseCursor.xy * float2(aspect, 1.) / bounds.zw;
float2 mouseDir = normalize(abs(clickedPoint.xy) - mouseCursor.xy);
float2 origin = clamp(mouse - mouseDir * mouse.x / mouseDir.x, 0., 1.);
float mouseDist = clamp(length(mouse - origin)
+ (aspect - (abs(clickedPoint.x) / bounds.z) * aspect) / mouseDir.x, 0., aspect / mouseDir.x);
if (mouseDir.x < 0.)
{
mouseDist = distance(mouse, origin);
}
float proj = dot(uv - origin, mouseDir);
float dist = proj - mouseDist;
float2 linePoint = uv - dist * mouseDir;
half4 pixel = layer.sample(position);
if (dist > radius)
{
pixel = half4(0.0, 0.0, 0.0, 0.0); // background behind curling layer (note: 0.0 opacity)
pixel.rgb *= pow(clamp(dist - radius, 0., 1.) * 1.5, .2);
}
else if (dist >= 0.0)
{
// THIS PORTION HANDLES THE CURL SHADED PORTION OF THE RESULT
// map to cylinder point
float theta = asin(dist / radius);
float2 p2 = linePoint + mouseDir * (pi - theta) * radius;
float2 p1 = linePoint + mouseDir * theta * radius;
bool underside = (p2.x <= aspect && p2.y <= 1. && p2.x > 0. && p2.y > 0.);
uv = underside ? p2 : p1;
uv = float2(uv.x, 1.0 - uv.y); // invert y
pixel = layer.sample(uv * float2(1. / aspect, 1.) * float2(bounds[2], bounds[3])); // ME<----
if (underside && pixel.a == 0.0) { //<---- PIXEL.A IS 0.0 WHYYYYY
pixel = red;
}
// Commented out while debugging alpha issues
// if (underside && pixel.a == 0.0) {
// pixel = layer.sample(originalPosition);
// } else if (underside) {
// pixel = undersideColor; // underside
// }
// Shadow the pixel being returned
pixel.rgb *= pow(clamp((radius - dist) / radius, 0., 1.), .2);
}
else
{
// THIS PORTION HANDLES THE NON-CURL-SHADED PORTION OF THE SAMPLING.
float2 p = linePoint + mouseDir * (abs(dist) + pi * radius);
bool underside = (p.x <= aspect && p.y <= 1. && p.x > 0. && p.y > 0.);
uv = underside ? p : uv;
uv = float2(uv.x, 1.0 - uv.y); // invert y
pixel = layer.sample(uv * float2(1. / aspect, 1.) * float2(bounds[2], bounds[3])); // ME
if (underside && pixel.a == 0.0) { //<---- PIXEL.A IS 0.0 WHYYYYY
pixel = red;
}
// Commented out while debugging alpha issues
// if (underside && pixel.a == 0.0) {
// // If the new underside pixel is clear, we should sample the original image's pixel.
// pixel = layer.sample(originalPosition);
// } else if (underside) {
// pixel = undersideColor;
// }
}
return pixel;
}
Hi,
I can capture a frame on the Apple TV, but when I try to profile the capture for GPU timing information, I got "Abort Trap 6" error and with following error in the report:
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Triggered by Thread: 7
Application Specific Information:
abort() called
Last Exception Backtrace:
0 CoreFoundation 0x18c0a99d0 __exceptionPreprocess + 160
1 libobjc.A.dylib 0x18b596d24 objc_exception_throw + 71
2 CoreFoundation 0x18bfa7308 -[__NSArrayM insertObject:atIndex:] + 1239
3 MTLReplayController 0x101f5d148 DYMTLReplayFrameProfiler_loadAnalysis + 1140
4 MTLReplayController 0x101e97f90 GTMTLReplayClient_collectGPUShaderTimelineData + 224
5 MTLReplayController 0x101e81794 __30-[GTMTLReplayService profile:]_block_invoke_4 + 288
6 Foundation 0x18eb6072c __NSOPERATION_IS_INVOKING_MAIN__ + 11
7 Foundation 0x18eb5cc1c -[NSOperation start] + 623
8 Foundation 0x18eb60edc __NSOPERATIONQUEUE_IS_STARTING_AN_OPERATION__ + 11
9 Foundation 0x18eb60bc4 __NSOQSchedule_f + 167
10 libdispatch.dylib 0x18b8d6a84 _dispatch_block_async_invoke2 + 103
11 libdispatch.dylib 0x18b8c9420 _dispatch_client_callout + 15
12 libdispatch.dylib 0x18b8cc5d0 _dispatch_continuation_pop + 531
13 libdispatch.dylib 0x18b8cbcd4 _dispatch_async_redirect_invoke + 635
14 libdispatch.dylib 0x18b8d9224 _dispatch_root_queue_drain + 335
15 libdispatch.dylib 0x18b8d9a08 _dispatch_worker_thread2 + 163
16 libsystem_pthread.dylib 0x18b6e652c _pthread_wqthread + 223
17 libsystem_pthread.dylib 0x18b6ed8d0 start_wqthread + 7
It's Xcode 16.0 + Apply TV 4K (4th Gen) tvOS 18, does anyone know what's the cause of this error and is there any solution for it?
Thank you very much,
Kai
Hello Everyone,
within the renderEncoder?.drawIndexedPrimitives(type: .line…. function, I can't render all the lines of the object. I can see approx. 80%. Do you know what could be causing this? Other game engines, like those in C++, handle this just fine.
import MetalKit
class Renderer: NSObject, MTKViewDelegate {
var parent: ContentView
var metalDevice: MTLDevice!
var metalCommandQueue: MTLCommandQueue!
let allocator: MTKMeshBufferAllocator
let pipelineState: MTLRenderPipelineState
var scene: RenderScene
let mesh: ObjMesh
init(_ parent: ContentView) {
self.parent = parent
if let metalDevice = MTLCreateSystemDefaultDevice() {
self.metalDevice = metalDevice
}
self.metalCommandQueue = metalDevice.makeCommandQueue()
self.allocator = MTKMeshBufferAllocator(device: metalDevice)
mesh = ObjMesh(device: metalDevice, allocator: allocator, filename: "cube")
let pipelineDescriptor = MTLRenderPipelineDescriptor()
let library = metalDevice.makeDefaultLibrary()
pipelineDescriptor.vertexFunction = library?.makeFunction(name: "vertexShader")
pipelineDescriptor.fragmentFunction = library?.makeFunction(name: "fragmentShader")
pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
pipelineDescriptor.vertexDescriptor = MTKMetalVertexDescriptorFromModelIO(mesh.metalMesh.vertexDescriptor)
do {
try pipelineState = metalDevice.makeRenderPipelineState(descriptor: pipelineDescriptor)
} catch {
fatalError()
}
scene = RenderScene()
super.init()
}
func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {
}
func draw(in view: MTKView) {
//update
scene.update()
guard let drawable = view.currentDrawable else {
return
}
let commandBuffer = metalCommandQueue.makeCommandBuffer()
let renderPassDescriptor = view.currentRenderPassDescriptor
renderPassDescriptor?.colorAttachments[0].clearColor = MTLClearColorMake(0, 0.5, 0.5, 1.0)
renderPassDescriptor?.colorAttachments[0].loadAction = .clear
renderPassDescriptor?.colorAttachments[0].storeAction = .store
let renderEncoder = commandBuffer?.makeRenderCommandEncoder(descriptor: renderPassDescriptor!)
renderEncoder?.setRenderPipelineState(pipelineState)
var cameraData: CameraParameters = CameraParameters()
cameraData.view = Matrix44.create_lookat(
eye: scene.player.position,
target: scene.player.position + scene.player.forwards,
up: scene.player.up
)
cameraData.projection = Matrix44.create_perspective_projection(
fovy: 45, aspect: 800/600, near: 0.1, far: 10
)
renderEncoder?.setVertexBytes(&cameraData, length: MemoryLayout<CameraParameters>.stride, index: 2)
renderEncoder?.setVertexBuffer(mesh.metalMesh.vertexBuffers[0].buffer, offset: 0, index: 0)
for cube in scene.cubes {
var model: matrix_float4x4 = Matrix44.create_from_rotation(eulers: cube.eulers)
model = Matrix44.create_from_translation(translation: cube.position) * model
renderEncoder?.setVertexBytes(&model, length: MemoryLayout<matrix_float4x4>.stride, index: 1)
for submesh in mesh.metalMesh.submeshes {
renderEncoder?.drawIndexedPrimitives(
type: .line, indexCount: submesh.indexCount,
indexType: submesh.indexType, indexBuffer: submesh.indexBuffer.buffer,
indexBufferOffset: submesh.indexBuffer.offset
)
}
}
renderEncoder?.endEncoding()
commandBuffer?.present(drawable)
commandBuffer?.commit()
}
}
====================
import MetalKit
class ObjMesh {
let modelIOMesh: MDLMesh
let metalMesh: MTKMesh
init(device: MTLDevice, allocator: MTKMeshBufferAllocator, filename: String) {
guard let meshURL = Bundle.main.url(forResource: filename, withExtension: "obj") else {
fatalError()
}
let vertexDescriptor = MTLVertexDescriptor()
var offset: Int = 0
//position
vertexDescriptor.attributes[0].format = .float3
vertexDescriptor.attributes[0].offset = offset
vertexDescriptor.attributes[0].bufferIndex = 0
offset += MemoryLayout<SIMD3<Float>>.stride
vertexDescriptor.layouts[0].stride = offset
let meshDescriptor = MTKModelIOVertexDescriptorFromMetal(vertexDescriptor)
(meshDescriptor.attributes[0] as! MDLVertexAttribute).name = MDLVertexAttributePosition
let asset = MDLAsset(url: meshURL,
vertexDescriptor: meshDescriptor,
bufferAllocator: allocator)
self.modelIOMesh = asset.childObjects(of: MDLMesh.self).first as! MDLMesh
do {
metalMesh = try MTKMesh(mesh: self.modelIOMesh, device: device)
} catch {
fatalError("couldn't load mesh")
}
}
}
===============
cube.obj
Blender v2.91.0 OBJ File: ''
www_blender_org
mtllib piece.mtl
o Cube_Cube.001
v -1.000000 1.000000 -1.000000
v -1.000000 1.000000 1.000000
v 1.000000 1.000000 -1.000000
v 1.000000 1.000000 1.000000
v -1.000000 -1.000000 -1.000000
v -1.000000 -1.000000 1.000000
v 1.000000 -1.000000 -1.000000
v 1.000000 -1.000000 1.000000
vt 0.375000 0.000000
vt 0.625000 0.000000
vt 0.625000 0.250000
vt 0.375000 0.250000
vt 0.625000 0.500000
vt 0.375000 0.500000
vt 0.625000 0.750000
vt 0.375000 0.750000
vt 0.625000 1.000000
vt 0.375000 1.000000
vt 0.125000 0.500000
vt 0.125000 0.750000
vt 0.875000 0.500000
vt 0.875000 0.750000
vn 0.0000 1.0000 0.0000
vn 1.0000 0.0000 0.0000
vn 0.0000 -1.0000 0.0000
vn -1.0000 0.0000 0.0000
vn 0.0000 0.0000 -1.0000
vn 0.0000 0.0000 1.0000
usemtl None
s off
f 1/1/1 2/2/1 4/3/1 3/4/1
f 3/4/2 4/3/2 8/5/2 7/6/2
f 7/6/3 8/5/3 6/7/3 5/8/3
f 5/8/4 6/7/4 2/9/4 1/10/4
f 3/11/5 7/6/5 5/8/5 1/12/5
f 8/5/6 4/13/6 2/14/6 6/7/6
I created an empty binary archive and try to save it on disk, but it returned false without giving any error message.
The code is from
https://developer.apple.com/documentation/metal/shader_libraries/metal_binary_archives/creating_binary_archives_from_device-built_pipeline_state_objects?language=objc
I have a very simple Mac app with just a MTKView in it which shows a single color. I want to move the rendering code to C++. For this I created a C++ framework target which interoperates with the Swift code - main project target. I am trying to link metal-cpp library to the C++ framework target using these instructions. Approach described in this article works with simple C++ Mac console apps. But in my mixed Swift/C++ project Xcode cannot find Foundation/Foundation.hpp (and probably other headers) to include into the C++ header.
I inserted metal-cpp folder into my project and added it to C++ target's header search paths, as written in the instructions.
I'm trying to create a custom Metal-based visual effect as a UIView to be used inside an existing UIKit-based interface. (An example might be a view that applies a blur effect to what's behind it.) I need to capture the MTLTexture of what's behind the view so that I can feed it to MTLRenderCommandEncoder.setFragmentTexture(_:index:). Can someone show me how or point me to an example? Thanks!
I use xcode16 and swiftUI for programming on a macos15 system. There is a problem. When I render a picture through mtkview, it is normal when displayed on a regular view. However, when the view is displayed through the .sheet method, the image cannot be displayed. There is no error message from xcode.
import Foundation
import MetalKit
import SwiftUI
struct CIImageDisplayView: NSViewRepresentable {
typealias NSViewType = MTKView
var ciImage: CIImage
init(ciImage: CIImage) {
self.ciImage = ciImage
}
func makeNSView(context: Context) -> MTKView {
let view = MTKView()
view.delegate = context.coordinator
view.preferredFramesPerSecond = 60
view.enableSetNeedsDisplay = true
view.isPaused = true
view.framebufferOnly = false
if let defaultDevice = MTLCreateSystemDefaultDevice() {
view.device = defaultDevice
}
view.delegate = context.coordinator
return view
}
func updateNSView(_ nsView: MTKView, context: Context) {
}
func makeCoordinator() -> RawDisplayRender {
RawDisplayRender(ciImage: self.ciImage)
}
class RawDisplayRender: NSObject, MTKViewDelegate {
// MARK: Metal resources
var device: MTLDevice!
var commandQueue: MTLCommandQueue!
// MARK: Core Image resources
var context: CIContext!
var ciImage: CIImage
init(ciImage: CIImage) {
self.ciImage = ciImage
self.device = MTLCreateSystemDefaultDevice()
self.commandQueue = self.device.makeCommandQueue()
self.context = CIContext(mtlDevice: self.device)
}
func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {}
func draw(in view: MTKView) {
guard let currentDrawable = view.currentDrawable,
let commandBuffer = commandQueue.makeCommandBuffer()
else {
return
}
let dSize = view.drawableSize
let drawImage = self.ciImage
let destination = CIRenderDestination(width: Int(dSize.width),
height: Int(dSize.height),
pixelFormat: view.colorPixelFormat,
commandBuffer: commandBuffer,
mtlTextureProvider: { () -> MTLTexture in
return currentDrawable.texture
})
_ = try? self.context.startTask(toClear: destination)
_ = try? self.context.startTask(toRender: drawImage, from: drawImage.extent,
to: destination, at: CGPoint(x: (dSize.width - drawImage.extent.width) / 2, y: 0))
commandBuffer.present(currentDrawable)
commandBuffer.commit()
}
}
}
struct ShowCIImageView: View {
let cii = CIImage.init(contentsOf: Bundle.main.url(forResource: "9-10", withExtension: "jpg")!)!
var body: some View {
CIImageDisplayView.init(ciImage: cii).frame(width: 500, height: 500).background(.red)
}
}
struct ContentView: View {
@State var showImage = false
var body: some View {
VStack {
Image(systemName: "globe")
.imageScale(.large)
.foregroundStyle(.tint)
Text("Hello, world!")
ShowCIImageView()
Button {
showImage = true
} label: {
Text("showImage")
}
}
.frame(width: 800, height: 800)
.padding()
.sheet(isPresented: $showImage) {
ShowCIImageView()
}
}
}
Guten Tag,
my project is simple, first I want draw wired Hexa,-Tetra- and Octahedrons.
I draw a cube with Metal but I didn't found rotation, translation and scale.
I have searched help , the examples I found are too complicated for me.
Mit freundlichen Grüßen
VanceRegnet
hello everyone.
I got a texture loading error on visionOS 2.0:
Can't create texture(Error Domain=MTKTextureLoaderErrorDomain Code=0 "Pixel format(MTLPixelFormatInvalid) is not valid on this device" UserInfo={NSLocalizedDescription=Pixel format(MTLPixelFormatInvalid) is not valid on this device, MTKTextureLoaderErrorKey=Pixel format(MTLPixelFormatInvalid) is not valid on this device}
But this texture can load correctly on visionOS1.3.
I don't know what happen between visionOS1.3 and visionOS2.0.
The texture is a ktx file which stores cubemap that encoding in astc6x6hdr. And the ktx texture has a glInternalFormat info: GL_COMPRESSED_RGBA_ASTC_6x6.
I wonder if visionOS2.0 no longer supports astc6x6hdr cubemap format, or there is something wrong with my assets.
Hi, I am recently writing metal shader language to parallelize the algorithms to accelerate the speed of it.
I created a simple example to show the acceleration result of it. Since Rust is used in our algorithm, so I used metal-rs as the wrapper to execute the MSL kernels from rust side.
In this example, I am calculating the result of two arrays, and kernel looks like:
kernel void two_array_addition_2(
constant uint* a [[buffer(0)]],
constant uint* b [[buffer(1)]],
device uint* c [[buffer(2)]],
uint idx [[thread_position_in_grid]]
) {
c[idx] = a[idx] + b[idx];
}
in the main.rs, you can see a function called execute_kernel() , this function has all it needs to execute the kernel in MSL (such as commandEncoder, piplelineState, etc).
use core::mem;
use metal::{Buffer, MTLSize};
use objc::rc::autoreleasepool;
use std::time::Instant;
use two_array_addition::abstractions::state::MetalState;
fn execute_kernel(
name: &str,
state: &MetalState,
input_a: &Buffer,
input_b: &Buffer,
output_c: &Buffer,
) -> Vec<u32> {
// assert!(input_a.len() == input_b.len() && input_a.len() == output_c.len());
// let len = input_a.len() as u64;
let len = input_a.length() as u64 / mem::size_of::<u32>() as u64;
// 1. Init the MetalState
// - we inited it
// 2. Set up Pipeline State
let pipeline = state.setup_pipeline(name).unwrap();
// 3. Allocate the buffers for A, B, and C
// - we allocated outside of this function
let mut result: &[u32] = &[];
autoreleasepool(|| {
// 4. Create the command buffer & command encoder
let (command_buffer, command_encoder) = state.setup_command(
&pipeline,
Some(&[(0, input_a), (1, input_b), (2, output_c)]),
);
// 5. command encoder dispatch the threadgroup size and num of threads per threadgroup
let threadgroup_count = MTLSize::new((len + 256 - 1) / 256, 1, 1);
let thread_per_threadgroup = MTLSize::new(256, 1, 1);
// let grid_size = MTLSize::new(len, 1, 1);
// let threadgroup_count = MTLSize::new(pipeline.max_total_threads_per_threadgroup(), 1, 1);
command_encoder.dispatch_thread_groups(threadgroup_count, thread_per_threadgroup);
command_encoder.end_encoding();
command_buffer.commit();
command_buffer.wait_until_completed();
// 6. Copy the result back to the host
let start = Instant::now();
result = MetalState::retrieve_contents::<u32>(output_c);
let duration = start.elapsed();
println!("Duration for copying result back to host: {:?}", duration);
});
result.to_vec()
}
The performance of the result is kinda interesting to me.
This is the result:
$ cargo run -r
This is expected to run for a while... please wait...
Generating input arrays...
Generating input arrays...
Generating output array...
Generating expected output...
Duration for allocating buffers: 2.015258s
Executing 1st kernel (1)...
Duration for copying result back to host: 5.75µs
Executing 1st kernel (2)...
Duration for copying result back to host: 542ns
Executing 2nd kernel (1)...
Duration for copying result back to host: 1µs
Executing 2nd kernel (2)...
Duration for copying result back to host: 458ns
Duration expected: 183.406167ms
Duration for 1st kernel (1): 1.894994875s
Duration for 1st kernel (2): 537.318208ms
Duration for 2nd kernel (1): 501.33275ms
Duration for 2nd kernel (2): 497.339916ms
You have successfully run the kernels!
The speed is slower when executing in the MSL kernel, while I reckon of the dataset is quite big ($2^{29}$)
The first kernel execution takes more time to launch.
Is there any way to optimize the MSL in this case? And in most case, when you design the algorithm into parallelism, what would be the concerns?
The machine I am using is M1 Pro with 14-core GPU and 16 GB memory.
Does anyone have idea / explanation for why these happen?
Thank you