Post not yet marked as solved
Hi i am very new to this coding gig. I am an Art Student who happened to develop an interest due to some problems. To a lot of developers shock and disgust i am in fact starting with c/c++ and I am using the humble Xcode, out of lack of better knowledge. I tried to use visual basic however i had an SVN problem because my mac is on Catalina. I just want some intel into what makes Xcode not that great in case i run into problems in the future I would be able to know. Your info will be much appreciated.
Post not yet marked as solved
Hi,
I am generating a Metal library that I build using the command line tools on macOS for iphoneos, following the instructions here.
Then I serialise this to a binary blob that I load at runtime, which seems to work ok as everything renders as expected.
When I am doing a frame capture and open up a shader function it tries to load the symbols and fails. I tried pointing it to the directory (and the file) containing the symbols file, but it never resolves those.
In the bottom half of the Import External Sources dialogue there is one entry in the Library | Debug Info section:
The library name is Library 0x21816b5dc0 and below Debug Info it says Invalid UUID.
The validation layer doesn't flag any invalid behaviour so I am a bit lost and not sure what to try next?
Post not yet marked as solved
Device: iPod 7
iOS version: 14.4.1 ,
GMetalDevice = MTLCreateSystemDefaultDevice();
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
frame #0: 0x00000001d7a3784c libsystem_kernel.dylib`__pthread_kill + 8
* frame #1: 0x00000001f39c29e8 libsystem_pthread.dylib`pthread_kill + 212
frame #2: 0x00000001b4b738f4 libsystem_c.dylib`abort + 100
frame #3: 0x00000001bb238030 libsystem_malloc.dylib`malloc_vreport + 556
frame #4: 0x00000001bb2381e8 libsystem_malloc.dylib`malloc_report + 60
frame #5: 0x00000001bb22e2e8 libsystem_malloc.dylib`free + 432
frame #6: 0x00000001f48dab10 AGXMetalA10`___lldb_unnamed_symbol1249 + 1644
frame #7: 0x00000001f49064d4 AGXMetalA10`___lldb_unnamed_symbol1504 + 72
frame #8: 0x00000001c18ca694 Metal`+[MTLIOAccelDevice registerDevices] + 224
frame #9: 0x00000001c18ccbf8 Metal`invocation function for block in MTLDeviceArrayInitialize() + 872
frame #10: 0x0000000119505528 libdispatch.dylib`_dispatch_client_callout + 16
frame #11: 0x0000000119506e6c libdispatch.dylib`_dispatch_once_callout + 84
frame #12: 0x00000001c18cc470 Metal`MTLCreateSystemDefaultDevice + 200
frame #13: 0x00000001054f5f54 MyProject`+[FIOSView layerClass](self=<unavailable>, _cmd=<unavailable>) at IOSView.cpp:134:18 [opt]
frame #14: 0x00000001aebbd034 UIKitCore`UIViewCommonInitWithFrame + 1040
frame #15: 0x00000001aebbcbcc UIKitCore`-[UIView initWithFrame:] + 124
frame #16: 0x00000001054f6274 MyProject`-[FIOSView initWithFrame:](self=<unavailable>, _cmd=<unavailable>, Frame=<unavailable>) at IOSView.cpp:233:14 [opt]
frame #17: 0x0000000102d237f8 MyProject`invocation function for block in FAppEntry::PlatformInit() [inlined] MainThreadInit() at LaunchIOS.cpp:348:24 [opt]
frame #18: 0x0000000102d237a8 MyProject`invocation function for block in FAppEntry::PlatformInit(.block_descriptor=<unavailable>) at LaunchIOS.cpp:373:47 [opt]
frame #19: 0x0000000119503ce4 libdispatch.dylib`_dispatch_call_block_and_release + 24
frame #20: 0x0000000119505528 libdispatch.dylib`_dispatch_client_callout + 16
frame #21: 0x0000000119513994 libdispatch.dylib`_dispatch_main_queue_callback_4CF + 972
frame #22: 0x00000001abdf85e0 CoreFoundation`__CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 12
frame #23: 0x00000001abdf2a88 CoreFoundation`__CFRunLoopRun + 2480
frame #24: 0x00000001abdf1ba0 CoreFoundation`CFRunLoopRunSpecific + 572
frame #25: 0x00000001c2b5a598 GraphicsServices`GSEventRunModal + 160
frame #26: 0x00000001ae6e32f4 UIKitCore`-[UIApplication _run] + 1052
frame #27: 0x00000001ae6e8874 UIKitCore`UIApplicationMain + 164
frame #28: 0x0000000102d312bc MyProject`main(argc=3, argv=0x000000016d4e7700) at LaunchIOS.cpp:584:13 [opt]
frame #29: 0x00000001abad0568 libdyld.dylib`start + 4
Post not yet marked as solved
Hey all,
I have a few questions about the Loading textures and models using Metal fast resource loading project. I'm not experienced in 3D rendering or Mac development in general, so bear with me :)
I noticed that the model and texture data for the scene objects had a.dat extension and appeared to be binary data files. This is different from the models and textures from the Rendering a Scene with Deferred Lighting in C++, which contains .obj and .mtl files that seem to be commonly exported from 3D modelling programs like Blender or Maya.
First question: I didn't notice any explanation of for the difference in format between the two projects in the source code. From reading a bit online it seems like binary formats are more efficient and generally more representative of what you'd want to actually ship, is that correct?
Second, I was also wondering if the binary format used in "Loading textures and models using Metal fast resource loading" is in a standard format, or if it was created by the project's author just for the project.
Third, I was wondering what the typical process was for storing assets in a binary format. Are these usually directly exported from 3D modelling programs? Or is intermediate output from 3D modelling programs (such as .obj files) usually parsed by the developer and then written to the binary format? If this is the case, are there commonly used libraries for this, or do people usually just hand-roll parsers?
Any recommended learning material would be appreciated.
Thank you!
Post not yet marked as solved
I have been working with AR for a while now, sorta learning still, and getting increasingly frustrated. I created complex animations using reality composer and now that feels like a joke. The thing is huge, I need to rewrite it in RealityKit but something tells me if I go Metal on this **** it's going to decrease the latency and make things run so much faster. I really need my app to be as light as possible because the 3D graphics will function like a 3D UI system.
My greatest painpoint is the anxiety I feel when I know my application will be HUGE and crash once I publish it. It makes me feel so anxious. Like an earthquake is coming. My goal is to create the lightest thing possible. I am reading a book on Linear Algebra for Machine Learning and I am leaning into this more mathy direction so I am thinking I might as well just re-write using metal and C++?
never used metal it btw, never used C++ either.
I have some experience with animation and experience with sculpture so the 3D world IRL is not new to me.
Post not yet marked as solved
What is the best source for information/tutorial material on using Metal with C++?
metal-cpp?
Post not yet marked as solved
Dear experts, I'm working on adding UI for my cpp based path tracer renderer.
I want to create metal cpp device and pass it to renderer, but also I want to use ImGui and GLFW (for window manager and input events handling). I've found solution how I can mix obj c code that requires by GLFW window setup and cpp code: https://github.com/ikryukov/MetalCppImGui
// Here is key thing for integration GLFW with Metal Cpp
// GLFW supports only obj c window handle
NSWindow *nswin = glfwGetCocoaWindow(window);
CA::MetalLayer* layer = CA::MetalLayer::layer();
layer->setDevice(device);
layer->setPixelFormat(MTL::PixelFormatBGRA8Unorm); // bridge to obj c here because NSWindow expetcs objc
CAMetalLayer* l = (__bridge CAMetalLayer*)layer;
nswin.contentView.layer = l;
nswin.contentView.wantsLayer = YES;
Is there any official way to handle event in Metal cpp without objective c support? Maybe MetalKit will have such features in future?
Post not yet marked as solved
Hi experts, I'm working on PathTracer using metal-cpp and I use per-primitive data from you latest presentation: https://developer.apple.com/videos/play/wwdc2022/10105/ and described here: https://developer.apple.com/documentation/metal/ray_tracing_with_acceleration_structures
Currently I want to store something like this structure:
struct Triangle
{
vector_float3 positions[3];
uint32_t normals[3];
uint32_t tangent[3];
uint32_t uv[3];
};
Are there any guidelines on size of this small amounts of data that I could store in acceleration structure? And if it stores inside BVH nodes -> that could affect traversal performance.
Thanks!
Hi!
I'm currently trying to convert this Objective-C example project ( https://developer.apple.com/documentation/metal/performing_calculations_on_a_gpu?language=objc ) to one using the metal-cpp wrapper.
However when I make the MetalAdder class extend NS::Object (just like in the original codebase) it removes my constructor.
class MetalAdder : public NS::Object{ ... } is what I have.
When I instantiate this MetalAdder class as:
MetalAdder adder;
adder.initWithDevice(device);
or
auto adder = NS::TransferPtr(new MetalAdder);
I get the error Call to implicitly-deleted default constructor of 'MetalAdder'.
Is there something I'm doing wrong? Should I instantiate in a different way or should my MetalAdder class just not extend the NS::Object class?
Thanks in advance!
Post not yet marked as solved
Hello All,
I have code on CUDA, and I can create several CUDA streams and run my kernels in parallel and get a performance boost for my task. Next, I rewrote the code for Metal and try to parallelize the task in the same way.
CUDA Streams
Metal device: Mac Studio with M1 Ultra. (write the code on Metal-cpp)
I creating several MTLCommandBuffer in 1 MTLCommandQueue or several MTLCommandQueue with more MTLCommandBuffer.
Regarding Metal resources, there are two options:
Buffers (MTLBuffer) was created with an option MTLResourceStorageModeShared. In the profiler, all Command buffers are performed sequentially on the timeline of Compute.
Buffers (MTLBuffer) was created with an option "MTLResourceStorageModeShared | MTLResourceHazardTrackingModeUntracked". In the profiler, I really saw the parallelism. But the maximum number of threads in the Compute timeline is always no more than 2 (see pictures). Also weird.
Computing commands do not depend on each other.
METAL Compute timeline
About performance:
[1] In the first variant, the performance is the same for different amounts of MTLCommandQueue and MTLCommandBuffer.
[2] In the second variant, the performance for one MTLCommandBuffer is greater than for 2 or more.
Question: why is this happening? How to parallelize the work of the compute kernels to get an increase performance?
Addition information:
Also, the CUDA code is rewritten in OpenCL, and it is perfectly parallelized in Windows(NVIDIA/AMD/Intel) if several OpenCL queues are running. The same code running on M1 Ultra works the same way with 1 or with many OpenCL queues. In turn, Metal is faster than OpenCL, so I am trying to figure out exactly Metal, and make the kernels work in parallel on Metal.
I've stepped through to verify that I have MTL::Buffer objects for each index buffer, but when I capture the GPU frame it just reads "indexBuffer: Null" for each draw. Is this just a bug? I'm guessing so as some of the geometry is appearing correctly.
Post not yet marked as solved
I'm not sure if this is somehow a cadence issue or something. If I attempt to use the stylized M button in Xcode to kick off a GPU capture on iOS it seems to just go on forever capturing command buffers instead of exiting when we swap to the next display surface.
We are presenting the current drawable with presentDrawable and invoking nextDrawable on the MetalLayer (but note that we are doing this by extending it to be supported from C++).
If I trigger and end the capture myself it works fine, and so that works for now, but I'm curious if I'm doing something wrong that causes it not to recognize the end of frame correctly for the Xcode GUI version.
Post not yet marked as solved
Dear developers,
i need support to develop a simple computation on the GPU. I would like to perform matrix multiplication: this will be good with metal-cpp because i need to export as cpp library. Following documentation:
file Multiply.metal :
kernel void multiply(device float *pMatA, device float *pMatB
, device float *pMatC, device float *pMatR)
{
simdgroup_float8x8 sgMatA;
simdgroup_float8x8 sgMatB;
simdgroup_float8x8 sgMatR;
simdgroup_load(sgMatA, pMatA);
simdgroup_load(sgMatB, pMatB);
simdgroup_multiply(sgMatR, sgMatA, sgMatB);
simdgroup_store(sgMatR, pMatR);
}
File Multiply.hpp
#include <Foundation/Foundation.hpp>
#include <Metal/Metal.hpp>
class Multiply {
public:
MTL::Device* m_device;
MTL::ComputePipelineState *m_add_function_pso;
MTL::CommandQueue *m_command_queue;
MTL::Buffer *m_buffer_A;
MTL::Buffer *m_buffer_B;
MTL::Buffer *m_buffer_result;
void init_with_device(MTL::Device*);
void prepare_data();
void send_compute_command();
private:
void generate_random_float_data(MTL::Buffer* buffer);
void encode_dot_command(MTL::ComputeCommandEncoder* compute_encoder);
void verify_results();
};
File Multiply.cpp
#include <iostream>
#include "Multiply.hpp"
const unsigned int array_length = 1 << 5;
const unsigned int buffer_size = array_length * sizeof(float);
void Multiply::init_with_device(MTL::Device* device){ m_device = device;
NS::Error* error;
auto default_library = m_device->newDefaultLibrary(); if(!default_library){
std::cerr << "Failed to load default library."; std::exit(-1);
}
auto function_name = NS::String::string("multiply", NS::ASCIIStringEncoding);
auto dot_function = default_library->newFunction(function_name);
if(!dot_function){
std::cerr << "Failed to find the dot function.";
}
m_dot_function_pso = m_device->newComputePipelineState(dot_function, &error);
m_command_queue = m_device->newCommandQueue(); };
void Multiply::prepare_data(){
m_buffer_A = m_device->newBuffer(buffer_size, MTL::ResourceStorageModeShared);
m_buffer_B = m_device->newBuffer(buffer_size, MTL::ResourceStorageModeShared);
m_buffer_result = m_device->newBuffer(buffer_size, MTL::ResourceStorageModeShared); generate_random_float_data(m_buffer_A); generate_random_float_data(m_buffer_B);
}
void Multiply::generate_random_float_data(MTL::Buffer* buffer)
{
float* data_ptr = (float*)buffer->contents();
for (unsigned long index = 0; index < array_length; index++)
{
for(unsigned long index2 =0; index2 < array_length; index2++)
{
data_ptr[index][index2] = (float)rand() / (float)(RAND_MAX);
}
}
void Multiply::send_compute_command() {
MTL::CommandBuffer* command_buffer = m_command_queue->commandBuffer();
// assert(command_buffer != nullptr); MTL::ComputeCommandEncoder* compute_encoder = command_buffer->computeCommandEncoder(); encode_dot_command(compute_encoder); compute_encoder->endEncoding();// MTL::CommandBufferStatus status = command_buffer->status();
// std::cout << status << std::endl;
command_buffer->commit();
command_buffer->waitUntilCompleted(); verify_results();
}
void Multiply::encode_dot_command(MTL::ComputeCommandEncoder* compute_encoder){
compute_encoder->setComputePipelineState(m_dot_function_pso); compute_encoder->setBuffer(m_buffer_A, 0, 0); compute_encoder->setBuffer(m_buffer_B, 0, 1); compute_encoder->setBuffer(m_buffer_result, 0, 2); MTL::Size grid_size = MTL::Size(array_length, 1, 1); NS::UInteger thread_group_size_ = m_dot_function_pso->maxTotalThreadsPerThreadgroup(); if(thread_group_size_ > array_length){ thread_group_size_ = array_length;
}
MTL::Size thread_group_size = MTL::Size(thread_group_size_, 1, 1); compute_encoder->dispatchThreads(grid_size, thread_group_size);
}
void Multiply::verify_results(){
auto a = (float*) m_buffer_A->contents();
auto b = (float*) m_buffer_B->contents();
auto result = (float*) m_buffer_result->contents();
for (unsigned long index = 0; index < array_length; index++) {
for (unsigned long index2 = 0; index < array_length; index2++) {
if (result[index][index2] != (a[index][index2] * b[index][index2]))
{
std::cout << "Comput ERROR: index=" << index << "result=" << result[index][index2] << "vs " << a[index][index2] + b[index][index2] << "=a*b\n"; assert(result[index][index2] == (a[index][index2] * b[index][index2]));
}
} std::cout << "Compute results as expected\n";}}
Is all this implementation correct? Can someone kindly give suggestions about speed improvement or other solutions? Thank you in advance.
Post not yet marked as solved
Hi all. I'm trying to get C++ code working with Metal.
I get the array of MTL:Device by calling
NS::Array *device_array = MTL::CopyAllDevices();
Next, I want to get the only element of the MTL::Device array by calling
MTL::Device *device = device_array->object(0);
I get an error:
Cannot initialize a variable of type 'MTL::Device *' with an rvalue of type 'NS::Object *'
Question: how to get an MTL::Device object from NS::Array?
Hi. I'm new to Metal(actually any type of software development run on apple products). I have many questions about using MTL::Buffer and dispatch_semaphore, and drawInMTKView(). I read README.md, but I need some more help understanding it.
Full code of 03-animation in metal-cpp sample
This is sample code in metal-cpp sample code by apple(I downloaded here). In this code, _pFrameData is an array of MTLBuffer, and kMaxFramesInFlight is the size of this array. Its type is static const int, and the value is 3. When Renderer is created, _pFrameData are initialized like that.
void Renderer::buildFrameData()
{
for ( int i = 0; i < Renderer::kMaxFramesInFlight; ++i )
{
_pFrameData[ i ]= _pDevice->newBuffer( sizeof( FrameData ), MTL::ResourceStorageModeManaged );
}
}
draw method, call by drawInMTKView.
void Renderer::draw( MTK::View* pView )
{
NS::AutoreleasePool* pPool = NS::AutoreleasePool::alloc()->init();
_frame = (_frame + 1) % Renderer::kMaxFramesInFlight;
MTL::Buffer* pFrameDataBuffer = _pFrameData[ _frame ];
MTL::CommandBuffer* pCmd = _pCommandQueue->commandBuffer();
dispatch_semaphore_wait( _semaphore, DISPATCH_TIME_FOREVER );
Renderer* pRenderer = this;
pCmd->addCompletedHandler( ^void( MTL::CommandBuffer* pCmd ){
dispatch_semaphore_signal( pRenderer->_semaphore );
});
reinterpret_cast< FrameData * >( pFrameDataBuffer->contents() )->angle = (_angle += 0.01f);
pFrameDataBuffer->didModifyRange( NS::Range::Make( 0, sizeof( FrameData ) ) );
MTL::RenderPassDescriptor* pRpd = pView->currentRenderPassDescriptor();
MTL::RenderCommandEncoder* pEnc = pCmd->renderCommandEncoder( pRpd );
pEnc->setRenderPipelineState( _pPSO );
pEnc->setVertexBuffer( _pArgBuffer, 0, 0 );
pEnc->useResource( _pVertexPositionsBuffer, MTL::ResourceUsageRead );
pEnc->useResource( _pVertexColorsBuffer, MTL::ResourceUsageRead );
pEnc->setVertexBuffer( pFrameDataBuffer, 0, 1 );
pEnc->drawPrimitives( MTL::PrimitiveType::PrimitiveTypeTriangle, NS::UInteger(0), NS::UInteger(3) );
pEnc->endEncoding();
pCmd->presentDrawable( pView->currentDrawable() );
pCmd->commit();
pPool->release();
}
Q1. what is the meaning of kMaxFramesInFlight's name and value?
Q2. how are dispatch_semaphore_wait() and drawInMTKView() working? At first, I guess if the count of dispatch_semaphore change 0, the Renderer::draw are blocked by dispatch_semaphore_wait() until GPU read the buffer and execute dispatch_semaphore_signal. But now I think it's not a correct understanding because I don't know about drawInMTKView. How much drawInMTKView is called in 1 second and when?
Q3. and.... why use dispatch_semaphore for here? I try to change my code to use a single MTLBuffer for the same work. Just changing some code(add a single buffer, remove code for dispatch_semaphore), the changed code works same.
Post not yet marked as solved
Not quite understanding these. As far as I can tell the Foundation, QuartzCore and Metal frameworks are included in the link line:
-framework Metal -framework QuartzCore -framework Foundation
Technically they are in there a few times. Not familiar enough with our project to know why.
Getting a ton of undefined symbols. Metal-cpp is a header only library and so doesn't have any additional libraries of it's own right?
Undefined symbols for architecture arm64:
"MTL::Private::Selector::s_knewTextureViewWithPixelFormat_textureType_levels_slices_swizzle_", referenced from:
"MTL::Private::Selector::s_knewTextureWithDescriptor_", referenced from:
"NS::Private::Selector::s_kinit", referenced from:
"NS::Private::Selector::s_kautorelease", referenced from:
This is while compiling for iOS (thus the arm64).
When trying to debug my compute shaders by pressing the ladyBird button inside of GPU trace, the error "Unable to create shader debug session" occurs. This only happens on my M1 MAX mbp, the exact same project does not have this error on my old intel mbp. I have tried reinstalling the past 3 generations of Xcode yet I cannot fix this error. It is making it impossible for me to develop my program as there is no information online about how to fix this error.
Post not yet marked as solved
Now the examples of metal-cpp are target on desktop and using AppKit which is not supported on iOS. Is there any tips for developing with metal-cpp on mobile device?
Post not yet marked as solved
Hi,
I am new to Metal and macOS development, and trying to learn Metal with the CPP wrapper for a toy rendering engine. I am mostly following the "Learn Metal with C++" sample code.
I am trying to read mouse and keyboard input. It seems like the Objective-C or Swift wrappers allow you to override your own MTK::View class, and then override the respective keyDown(), keyUp() methods.
However, when looking at the CPP wrapper, MTK::View doesn't have any virtual functions to override.
How can I read mouse and keyboard inputs in my application? Hopefully without having an Objective-C bridge.
Thank you,
Robin