Render advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.

Metal Documentation

Post

Replies

Boosts

Views

Activity

Metal Inline Functions
Hi! How to define and call an inline function in Metal? Or simple function that will return some value. Case: inline uint index4D(constant _4D& shape, constant uint& n, constant uint& c, constant uint& h, constant uint& w) { return n * shape.C * shape.H * shape.W + c * shape.H * shape.W + h * shape.W + w; } When I call it in my kernel function I get No matching function for call error. Thx in advance.
2
0
442
Nov ’24
Metal Compute Overhead
Hello, We are experimenting with Metal to accelerate some peculiar numerical computation. Our workloads are relatively small, so the ability to avoid moving data to and from the GPU's memory is very appealing. However, we are observing higher overhead compared to CUDA, which negates the benefits of avoiding data transfer. In our tests using an empty kernel, CUDA completes in 0.001 ms (Intel i7 10700K, RTX 3080), while Metal's waitUntilCompleted takes 0.12 ms (M2 Max). As we do not have prior experience with Metal, we are wondering if we are using the APIs just fine and this timing is expected, or if there is a way to reduce it. Thank you in advance for any comment! test-metal.cpp
0
0
410
Nov ’24
Using the same texture for both input & output of a fragment shader
Hello, This exact question was already asked in this forum (8 years ago) but I can't find a definitive answer: Does Metal allow using the same color texture as both an input and output (color attachment) of a fragment shader? Is the behavior defined somewhere? I believe this results in undefined behavior under both DirectX and OpenGL, so I'd assume the same for Metal, but then why doesn't Metal warn me about this as it does on some many other "misconfigurations"? It also seems to work correctly in my case, as I found out by accident. Would love to get a clarification! Thanks ahead!
8
1
946
May ’24
Metal rendering shows incorrect color, gputrace shows fragment function returning correct color
I'm experiencing a strange issue where I'm seeing black in a metal drawable where it should be a different color. When I capture the frame and inspect the returned value from the fragment function, it's correct, but the drawable isn't. This screenshot hopefully illustrates the issue. I've not found any references to similar issues. I saw something about some out of bounds or NaN values being dropped to 0 (which would be black), but the debugger doesn't indicate this is happening.
1
0
329
Oct ’24
CAMetalDisplayLink does not work on separate thread
I am looking to implement CAMetalDisplayLink on a separate thread on a macOS application. I am basing my implementation on the following example project: Achieving Smooth Frame Rates with Metal Display Link This project allows you to configure whether a separate thread is used for rendering by setting RENDER_ON_MAIN_THREAD in GameConfig to 0. However, when I set it to use a separate thread nothing is rendered. Stepping through the code shows that a separate thread is created, but a CAMetalDisplayLinkUpdate is never received. Does anyone know why this does not work?
1
0
454
Oct ’24
Strange Metal related shader issue
Hi everyone, I encountered a very strange shader bug that seems related to Metal only (not OpenGL). You can find the full description of the issue on the Babylon.js forums here: https://forum.babylonjs.com/t/strange-shader-related-issue-on-macos-with-safari-and-chrome-not-firefox/54289 (sorry, I couldn't post a clickable link here as this seems to be blocked here). I have a workaround to fix the issue (as described in the link above), but this really looks like an issue in Metal itself. Let me know if you need more details or explanations.
0
0
337
Oct ’24
Getting stuck in first frame of renderLoop.
First I get this ar_world_tracking_provider_query_device_anchor_at_timestamp <0x302b9c0a0>: The device_anchor can only be queried when the world tracking provider is running. This seemed to all break with the auto-update to 2.0.1. Simulator runs the code fine. I seem to see an infinite stall here frameLayer.endUpdate() // Pace frames by waiting for the optimal prediction time. try await LayerRenderer.Clock().sleep(until: timing.optimalInputTime, tolerance: nil) // Start submitting the updated frame. frameLayer.startSubmission() <-
0
0
381
Oct ’24
Page-Curl Shader -- Pixel transparency check is wrong?
Given I do not understand much at all about how to write shaders I do not understand the math associated with page-curl effects I am trying to: implement a page-curl shader for use on SwiftUI views. I've lifted a shader from HIROKI IKEUCHI that I believe they lifted from a non-metal shader resource online, and I'm trying to digest it. One thing I want to do is to paint the "underside" of the view with a given color and maintain the transparency of rounded corners when they are flipped over. So, if an underside pixel is "clear" then I want to sample the pixel at that position on the original layer instead of the "curl effect" pixel. There are two comments in the shader below where I check the alpha, and underside flags, and paint the color red as a debug test. The shader gives this result: The outside of those rounded corners is appropriately red and the white border pixels are detected as "not-clear". But the "inner" portion of the border is... mistakingly red? I don't get it. Any help would be appreciated. I feel tapped out and I don't have any IRL resources I can ask. // // PageCurl.metal // ShaderDemo3 // // Created by HIROKI IKEUCHI on 2023/10/17. // #include <metal_stdlib> #include <SwiftUI/SwiftUI_Metal.h> using namespace metal; #define pi float(3.14159265359) #define blue half4(0.0, 0.0, 1.0, 1.0) #define red half4(1.0, 0.0, 0.0, 1.0) #define radius float(0.4) // そのピクセルの色を返す [[ stitchable ]] half4 pageCurl ( float2 _position, SwiftUI::Layer layer, float4 bounds, float2 _clickedPoint, float2 _mouseCursor ) { half4 undersideColor = half4(0.5, 0.5, 1.0, 1.0); float2 originalPosition = _position; // y座標の補正 float2 position = float2(_position.x, bounds.w - _position.y); float2 clickedPoint = float2(_clickedPoint.x, bounds.w - _clickedPoint.y); float2 mouseCursor = float2(_mouseCursor.x, bounds.w - _mouseCursor.y); float aspect = bounds.z / bounds.w; float2 uv = position * float2(aspect, 1.) / bounds.zw; float2 mouse = mouseCursor.xy * float2(aspect, 1.) / bounds.zw; float2 mouseDir = normalize(abs(clickedPoint.xy) - mouseCursor.xy); float2 origin = clamp(mouse - mouseDir * mouse.x / mouseDir.x, 0., 1.); float mouseDist = clamp(length(mouse - origin) + (aspect - (abs(clickedPoint.x) / bounds.z) * aspect) / mouseDir.x, 0., aspect / mouseDir.x); if (mouseDir.x < 0.) { mouseDist = distance(mouse, origin); } float proj = dot(uv - origin, mouseDir); float dist = proj - mouseDist; float2 linePoint = uv - dist * mouseDir; half4 pixel = layer.sample(position); if (dist > radius) { pixel = half4(0.0, 0.0, 0.0, 0.0); // background behind curling layer (note: 0.0 opacity) pixel.rgb *= pow(clamp(dist - radius, 0., 1.) * 1.5, .2); } else if (dist >= 0.0) { // THIS PORTION HANDLES THE CURL SHADED PORTION OF THE RESULT // map to cylinder point float theta = asin(dist / radius); float2 p2 = linePoint + mouseDir * (pi - theta) * radius; float2 p1 = linePoint + mouseDir * theta * radius; bool underside = (p2.x <= aspect && p2.y <= 1. && p2.x > 0. && p2.y > 0.); uv = underside ? p2 : p1; uv = float2(uv.x, 1.0 - uv.y); // invert y pixel = layer.sample(uv * float2(1. / aspect, 1.) * float2(bounds[2], bounds[3])); // ME<---- if (underside && pixel.a == 0.0) { //<---- PIXEL.A IS 0.0 WHYYYYY pixel = red; } // Commented out while debugging alpha issues // if (underside && pixel.a == 0.0) { // pixel = layer.sample(originalPosition); // } else if (underside) { // pixel = undersideColor; // underside // } // Shadow the pixel being returned pixel.rgb *= pow(clamp((radius - dist) / radius, 0., 1.), .2); } else { // THIS PORTION HANDLES THE NON-CURL-SHADED PORTION OF THE SAMPLING. float2 p = linePoint + mouseDir * (abs(dist) + pi * radius); bool underside = (p.x <= aspect && p.y <= 1. && p.x > 0. && p.y > 0.); uv = underside ? p : uv; uv = float2(uv.x, 1.0 - uv.y); // invert y pixel = layer.sample(uv * float2(1. / aspect, 1.) * float2(bounds[2], bounds[3])); // ME if (underside && pixel.a == 0.0) { //<---- PIXEL.A IS 0.0 WHYYYYY pixel = red; } // Commented out while debugging alpha issues // if (underside && pixel.a == 0.0) { // // If the new underside pixel is clear, we should sample the original image's pixel. // pixel = layer.sample(originalPosition); // } else if (underside) { // pixel = undersideColor; // } } return pixel; }
1
0
466
Oct ’24
Can't profile Metal on Apple TV
Hi, I can capture a frame on the Apple TV, but when I try to profile the capture for GPU timing information, I got "Abort Trap 6" error and with following error in the report: Exception Type: EXC_CRASH (SIGABRT) Exception Codes: 0x0000000000000000, 0x0000000000000000 Triggered by Thread: 7 Application Specific Information: abort() called Last Exception Backtrace: 0 CoreFoundation 0x18c0a99d0 __exceptionPreprocess + 160 1 libobjc.A.dylib 0x18b596d24 objc_exception_throw + 71 2 CoreFoundation 0x18bfa7308 -[__NSArrayM insertObject:atIndex:] + 1239 3 MTLReplayController 0x101f5d148 DYMTLReplayFrameProfiler_loadAnalysis + 1140 4 MTLReplayController 0x101e97f90 GTMTLReplayClient_collectGPUShaderTimelineData + 224 5 MTLReplayController 0x101e81794 __30-[GTMTLReplayService profile:]_block_invoke_4 + 288 6 Foundation 0x18eb6072c __NSOPERATION_IS_INVOKING_MAIN__ + 11 7 Foundation 0x18eb5cc1c -[NSOperation start] + 623 8 Foundation 0x18eb60edc __NSOPERATIONQUEUE_IS_STARTING_AN_OPERATION__ + 11 9 Foundation 0x18eb60bc4 __NSOQSchedule_f + 167 10 libdispatch.dylib 0x18b8d6a84 _dispatch_block_async_invoke2 + 103 11 libdispatch.dylib 0x18b8c9420 _dispatch_client_callout + 15 12 libdispatch.dylib 0x18b8cc5d0 _dispatch_continuation_pop + 531 13 libdispatch.dylib 0x18b8cbcd4 _dispatch_async_redirect_invoke + 635 14 libdispatch.dylib 0x18b8d9224 _dispatch_root_queue_drain + 335 15 libdispatch.dylib 0x18b8d9a08 _dispatch_worker_thread2 + 163 16 libsystem_pthread.dylib 0x18b6e652c _pthread_wqthread + 223 17 libsystem_pthread.dylib 0x18b6ed8d0 start_wqthread + 7 It's Xcode 16.0 + Apply TV 4K (4th Gen) tvOS 18, does anyone know what's the cause of this error and is there any solution for it? Thank you very much, Kai
0
0
262
Oct ’24
renderEncoder?.drawIndexedPrimitives(type: .line…
Hello Everyone, within the renderEncoder?.drawIndexedPrimitives(type: .line…. function, I can't render all the lines of the object. I can see approx. 80%. Do you know what could be causing this? Other game engines, like those in C++, handle this just fine. import MetalKit class Renderer: NSObject, MTKViewDelegate { var parent: ContentView var metalDevice: MTLDevice! var metalCommandQueue: MTLCommandQueue! let allocator: MTKMeshBufferAllocator let pipelineState: MTLRenderPipelineState var scene: RenderScene let mesh: ObjMesh init(_ parent: ContentView) { self.parent = parent if let metalDevice = MTLCreateSystemDefaultDevice() { self.metalDevice = metalDevice } self.metalCommandQueue = metalDevice.makeCommandQueue() self.allocator = MTKMeshBufferAllocator(device: metalDevice) mesh = ObjMesh(device: metalDevice, allocator: allocator, filename: "cube") let pipelineDescriptor = MTLRenderPipelineDescriptor() let library = metalDevice.makeDefaultLibrary() pipelineDescriptor.vertexFunction = library?.makeFunction(name: "vertexShader") pipelineDescriptor.fragmentFunction = library?.makeFunction(name: "fragmentShader") pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm pipelineDescriptor.vertexDescriptor = MTKMetalVertexDescriptorFromModelIO(mesh.metalMesh.vertexDescriptor) do { try pipelineState = metalDevice.makeRenderPipelineState(descriptor: pipelineDescriptor) } catch { fatalError() } scene = RenderScene() super.init() } func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) { } func draw(in view: MTKView) { //update scene.update() guard let drawable = view.currentDrawable else { return } let commandBuffer = metalCommandQueue.makeCommandBuffer() let renderPassDescriptor = view.currentRenderPassDescriptor renderPassDescriptor?.colorAttachments[0].clearColor = MTLClearColorMake(0, 0.5, 0.5, 1.0) renderPassDescriptor?.colorAttachments[0].loadAction = .clear renderPassDescriptor?.colorAttachments[0].storeAction = .store let renderEncoder = commandBuffer?.makeRenderCommandEncoder(descriptor: renderPassDescriptor!) renderEncoder?.setRenderPipelineState(pipelineState) var cameraData: CameraParameters = CameraParameters() cameraData.view = Matrix44.create_lookat( eye: scene.player.position, target: scene.player.position + scene.player.forwards, up: scene.player.up ) cameraData.projection = Matrix44.create_perspective_projection( fovy: 45, aspect: 800/600, near: 0.1, far: 10 ) renderEncoder?.setVertexBytes(&cameraData, length: MemoryLayout<CameraParameters>.stride, index: 2) renderEncoder?.setVertexBuffer(mesh.metalMesh.vertexBuffers[0].buffer, offset: 0, index: 0) for cube in scene.cubes { var model: matrix_float4x4 = Matrix44.create_from_rotation(eulers: cube.eulers) model = Matrix44.create_from_translation(translation: cube.position) * model renderEncoder?.setVertexBytes(&model, length: MemoryLayout<matrix_float4x4>.stride, index: 1) for submesh in mesh.metalMesh.submeshes { renderEncoder?.drawIndexedPrimitives( type: .line, indexCount: submesh.indexCount, indexType: submesh.indexType, indexBuffer: submesh.indexBuffer.buffer, indexBufferOffset: submesh.indexBuffer.offset ) } } renderEncoder?.endEncoding() commandBuffer?.present(drawable) commandBuffer?.commit() } } ==================== import MetalKit class ObjMesh { let modelIOMesh: MDLMesh let metalMesh: MTKMesh init(device: MTLDevice, allocator: MTKMeshBufferAllocator, filename: String) { guard let meshURL = Bundle.main.url(forResource: filename, withExtension: "obj") else { fatalError() } let vertexDescriptor = MTLVertexDescriptor() var offset: Int = 0 //position vertexDescriptor.attributes[0].format = .float3 vertexDescriptor.attributes[0].offset = offset vertexDescriptor.attributes[0].bufferIndex = 0 offset += MemoryLayout<SIMD3<Float>>.stride vertexDescriptor.layouts[0].stride = offset let meshDescriptor = MTKModelIOVertexDescriptorFromMetal(vertexDescriptor) (meshDescriptor.attributes[0] as! MDLVertexAttribute).name = MDLVertexAttributePosition let asset = MDLAsset(url: meshURL, vertexDescriptor: meshDescriptor, bufferAllocator: allocator) self.modelIOMesh = asset.childObjects(of: MDLMesh.self).first as! MDLMesh do { metalMesh = try MTKMesh(mesh: self.modelIOMesh, device: device) } catch { fatalError("couldn't load mesh") } } } =============== cube.obj Blender v2.91.0 OBJ File: '' www_blender_org mtllib piece.mtl o Cube_Cube.001 v -1.000000 1.000000 -1.000000 v -1.000000 1.000000 1.000000 v 1.000000 1.000000 -1.000000 v 1.000000 1.000000 1.000000 v -1.000000 -1.000000 -1.000000 v -1.000000 -1.000000 1.000000 v 1.000000 -1.000000 -1.000000 v 1.000000 -1.000000 1.000000 vt 0.375000 0.000000 vt 0.625000 0.000000 vt 0.625000 0.250000 vt 0.375000 0.250000 vt 0.625000 0.500000 vt 0.375000 0.500000 vt 0.625000 0.750000 vt 0.375000 0.750000 vt 0.625000 1.000000 vt 0.375000 1.000000 vt 0.125000 0.500000 vt 0.125000 0.750000 vt 0.875000 0.500000 vt 0.875000 0.750000 vn 0.0000 1.0000 0.0000 vn 1.0000 0.0000 0.0000 vn 0.0000 -1.0000 0.0000 vn -1.0000 0.0000 0.0000 vn 0.0000 0.0000 -1.0000 vn 0.0000 0.0000 1.0000 usemtl None s off f 1/1/1 2/2/1 4/3/1 3/4/1 f 3/4/2 4/3/2 8/5/2 7/6/2 f 7/6/3 8/5/3 6/7/3 5/8/3 f 5/8/4 6/7/4 2/9/4 1/10/4 f 3/11/5 7/6/5 5/8/5 1/12/5 f 8/5/6 4/13/6 2/14/6 6/7/6
3
0
480
Oct ’24
Can't link metal-cpp to C++ framework in Swift Mac app
I have a very simple Mac app with just a MTKView in it which shows a single color. I want to move the rendering code to C++. For this I created a C++ framework target which interoperates with the Swift code - main project target. I am trying to link metal-cpp library to the C++ framework target using these instructions. Approach described in this article works with simple C++ Mac console apps. But in my mixed Swift/C++ project Xcode cannot find Foundation/Foundation.hpp (and probably other headers) to include into the C++ header. I inserted metal-cpp folder into my project and added it to C++ target's header search paths, as written in the instructions.
4
0
909
Jun ’24
Metal UIView to transform what's behind it
I'm trying to create a custom Metal-based visual effect as a UIView to be used inside an existing UIKit-based interface. (An example might be a view that applies a blur effect to what's behind it.) I need to capture the MTLTexture of what's behind the view so that I can feed it to MTLRenderCommandEncoder.setFragmentTexture(_:index:). Can someone show me how or point me to an example? Thanks!
2
0
642
Aug ’24
Cannot Display MTKView on a sheeted view on macOS15
I use xcode16 and swiftUI for programming on a macos15 system. There is a problem. When I render a picture through mtkview, it is normal when displayed on a regular view. However, when the view is displayed through the .sheet method, the image cannot be displayed. There is no error message from xcode. import Foundation import MetalKit import SwiftUI struct CIImageDisplayView: NSViewRepresentable { typealias NSViewType = MTKView var ciImage: CIImage init(ciImage: CIImage) { self.ciImage = ciImage } func makeNSView(context: Context) -&gt; MTKView { let view = MTKView() view.delegate = context.coordinator view.preferredFramesPerSecond = 60 view.enableSetNeedsDisplay = true view.isPaused = true view.framebufferOnly = false if let defaultDevice = MTLCreateSystemDefaultDevice() { view.device = defaultDevice } view.delegate = context.coordinator return view } func updateNSView(_ nsView: MTKView, context: Context) { } func makeCoordinator() -&gt; RawDisplayRender { RawDisplayRender(ciImage: self.ciImage) } class RawDisplayRender: NSObject, MTKViewDelegate { // MARK: Metal resources var device: MTLDevice! var commandQueue: MTLCommandQueue! // MARK: Core Image resources var context: CIContext! var ciImage: CIImage init(ciImage: CIImage) { self.ciImage = ciImage self.device = MTLCreateSystemDefaultDevice() self.commandQueue = self.device.makeCommandQueue() self.context = CIContext(mtlDevice: self.device) } func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {} func draw(in view: MTKView) { guard let currentDrawable = view.currentDrawable, let commandBuffer = commandQueue.makeCommandBuffer() else { return } let dSize = view.drawableSize let drawImage = self.ciImage let destination = CIRenderDestination(width: Int(dSize.width), height: Int(dSize.height), pixelFormat: view.colorPixelFormat, commandBuffer: commandBuffer, mtlTextureProvider: { () -&gt; MTLTexture in return currentDrawable.texture }) _ = try? self.context.startTask(toClear: destination) _ = try? self.context.startTask(toRender: drawImage, from: drawImage.extent, to: destination, at: CGPoint(x: (dSize.width - drawImage.extent.width) / 2, y: 0)) commandBuffer.present(currentDrawable) commandBuffer.commit() } } } struct ShowCIImageView: View { let cii = CIImage.init(contentsOf: Bundle.main.url(forResource: "9-10", withExtension: "jpg")!)! var body: some View { CIImageDisplayView.init(ciImage: cii).frame(width: 500, height: 500).background(.red) } } struct ContentView: View { @State var showImage = false var body: some View { VStack { Image(systemName: "globe") .imageScale(.large) .foregroundStyle(.tint) Text("Hello, world!") ShowCIImageView() Button { showImage = true } label: { Text("showImage") } } .frame(width: 800, height: 800) .padding() .sheet(isPresented: $showImage) { ShowCIImageView() } } }
1
0
506
Sep ’24
MTKTextureLoader loading texture error on visionOS2.0
hello everyone. I got a texture loading error on visionOS 2.0: Can't create texture(Error Domain=MTKTextureLoaderErrorDomain Code=0 "Pixel format(MTLPixelFormatInvalid) is not valid on this device" UserInfo={NSLocalizedDescription=Pixel format(MTLPixelFormatInvalid) is not valid on this device, MTKTextureLoaderErrorKey=Pixel format(MTLPixelFormatInvalid) is not valid on this device} But this texture can load correctly on visionOS1.3. I don't know what happen between visionOS1.3 and visionOS2.0. The texture is a ktx file which stores cubemap that encoding in astc6x6hdr. And the ktx texture has a glInternalFormat info: GL_COMPRESSED_RGBA_ASTC_6x6. I wonder if visionOS2.0 no longer supports astc6x6hdr cubemap format, or there is something wrong with my assets.
1
0
392
Sep ’24
Why is the speed of metal shading kernel so slow?
Hi, I am recently writing metal shader language to parallelize the algorithms to accelerate the speed of it. I created a simple example to show the acceleration result of it. Since Rust is used in our algorithm, so I used metal-rs as the wrapper to execute the MSL kernels from rust side. In this example, I am calculating the result of two arrays, and kernel looks like: kernel void two_array_addition_2( constant uint* a [[buffer(0)]], constant uint* b [[buffer(1)]], device uint* c [[buffer(2)]], uint idx [[thread_position_in_grid]] ) { c[idx] = a[idx] + b[idx]; } in the main.rs, you can see a function called execute_kernel() , this function has all it needs to execute the kernel in MSL (such as commandEncoder, piplelineState, etc). use core::mem; use metal::{Buffer, MTLSize}; use objc::rc::autoreleasepool; use std::time::Instant; use two_array_addition::abstractions::state::MetalState; fn execute_kernel( name: &str, state: &MetalState, input_a: &Buffer, input_b: &Buffer, output_c: &Buffer, ) -> Vec<u32> { // assert!(input_a.len() == input_b.len() && input_a.len() == output_c.len()); // let len = input_a.len() as u64; let len = input_a.length() as u64 / mem::size_of::<u32>() as u64; // 1. Init the MetalState // - we inited it // 2. Set up Pipeline State let pipeline = state.setup_pipeline(name).unwrap(); // 3. Allocate the buffers for A, B, and C // - we allocated outside of this function let mut result: &[u32] = &[]; autoreleasepool(|| { // 4. Create the command buffer & command encoder let (command_buffer, command_encoder) = state.setup_command( &pipeline, Some(&[(0, input_a), (1, input_b), (2, output_c)]), ); // 5. command encoder dispatch the threadgroup size and num of threads per threadgroup let threadgroup_count = MTLSize::new((len + 256 - 1) / 256, 1, 1); let thread_per_threadgroup = MTLSize::new(256, 1, 1); // let grid_size = MTLSize::new(len, 1, 1); // let threadgroup_count = MTLSize::new(pipeline.max_total_threads_per_threadgroup(), 1, 1); command_encoder.dispatch_thread_groups(threadgroup_count, thread_per_threadgroup); command_encoder.end_encoding(); command_buffer.commit(); command_buffer.wait_until_completed(); // 6. Copy the result back to the host let start = Instant::now(); result = MetalState::retrieve_contents::<u32>(output_c); let duration = start.elapsed(); println!("Duration for copying result back to host: {:?}", duration); }); result.to_vec() } The performance of the result is kinda interesting to me. This is the result: $ cargo run -r This is expected to run for a while... please wait... Generating input arrays... Generating input arrays... Generating output array... Generating expected output... Duration for allocating buffers: 2.015258s Executing 1st kernel (1)... Duration for copying result back to host: 5.75µs Executing 1st kernel (2)... Duration for copying result back to host: 542ns Executing 2nd kernel (1)... Duration for copying result back to host: 1µs Executing 2nd kernel (2)... Duration for copying result back to host: 458ns Duration expected: 183.406167ms Duration for 1st kernel (1): 1.894994875s Duration for 1st kernel (2): 537.318208ms Duration for 2nd kernel (1): 501.33275ms Duration for 2nd kernel (2): 497.339916ms You have successfully run the kernels! The speed is slower when executing in the MSL kernel, while I reckon of the dataset is quite big ($2^{29}$) The first kernel execution takes more time to launch. Is there any way to optimize the MSL in this case? And in most case, when you design the algorithm into parallelism, what would be the concerns? The machine I am using is M1 Pro with 14-core GPU and 16 GB memory. Does anyone have idea / explanation for why these happen? Thank you
1
0
507
Sep ’24