Some feature requests for Metal

Hello guys. With the release of the M1 Pro and M1 Max in particular, the Mac has become a platform that could become very interesting for games in the future. However, since some features are still missing in Metal, it could be problematic for some developers to port their games to Metal. Especially with the Unreal Engine 5 you can already see a tendency in this direction, since e.g. Nanite and Lumen are unfortunately not available on the Mac. As a Vulkan developer I wanted to inquire about some features that are not yet available in Metal at the moment. These features are very interesting if you want to write a GPU driven renderer for modern game engines. Furthermore, these features could be used to emulate D3D12 on the Mac via MoltenVK, which would result in more games being available on the Mac.

  1. Buffer device address:

This feature allows the application to query a 64-bit buffer device address value for a buffer. It is very useful for D3D12 emulation and for compatibility with Vulkan, e.g. to implement ray tracing on MoltenVK.

  1. DrawIndirectCount:

This feature allows an application to source the number of draws for indirect drawing calls from a buffer. Also very useful in many gpu driven situations

  1. Only 500000 resources per argument buffer

Metal has a limit of 500000 resources per argument buffer. To be equivalent to D3D12 Resource Binding Tear 2, you would need 1 million. This is also very important as so many DirectX12 game engines could be ported to Metal more easily.

  1. Mesh shader / Task shader:

Two interesting new shader stages to optimize the rendering pipeline

Are there any plans to implement this features in future? Is there a roadmap for metal? Is there a website where I can suggest features to the metal developers?

I hope to see at least the first 3 features in metal in the future and I think that many developers feel the same way.

Best regards, Marlon

Post not yet marked as solved Up vote post of zmxrlxn Down vote post of zmxrlxn
3.9k views

Replies

Hi, thanks for the suggestions! We take feature requests through Feedback Assistant. If you do wish to file requests, it would be great to provide some info about your use case and what you are trying to do. Also, please post the feedback IDs here, it helps us with tracking the requests.

We don’t generally talk about our future plans, and new APIs are usually released during WWDC.

Thank you for your answer. I will make these feature requests in the next days. Of course I will post the feedback ids here.

The feedback ID for DrawIndrectCount is FB9826206, the feedback ids for the following feature requests will be posted here in the next days.

Have a nice day and stay healthy !:) Marlon

Well Said

That’s great work ! Maybe you did not read this : Lumen & Nanite on MacOS https://forums.unrealengine.com/t/lumen-nanite-on-macos/508411

It might be good to get in touch with people like :: Richmar1 https://forums.unrealengine.com/u/richmar1

Philip Turner https://forums.unrealengine.com/u/philipturner/ https://github.com/philipturner/metal-benchmarks#nanite-atomics

They are doing some pretty good tests to get Nanite and Lumen work on macOS ‘intel’/silicon machines.

I hope you all gonna make Unreal Engine work with all its amazing features such as Nanite, Lumen, ray tracing… on Mac hardware. Intel or Silicon.

I just don’t want to have to buy a PC when I know that softwares like Houdini, Maya, ****, FinalCutPro, Blender, DaVinci Resolve, unity, etc… take advantages of the very efficient and powerful Apple silicon hardware.

The machine I will buy to replace my old dual cpu 2012 Macpro will need to run Unreal Engine 5 natively with the same features as the Windows version: whether it’s a MacBook Pro or Mac Studio or a Mac Pro.

Thank you guys for working hard on making that happen !

Since there seems to be an interest in these questions a year after they were asked...

  1. Metal 3 has added support for GPU buffers and resource handles, as well as mesh shaders. This ticks off two big feature requests from the list and allows flexible encoding of resources on the CPU (or the GPU!) without any encoder APIs.

  2. The 500k resources limit has been taken a bit out out of the context. DX12 tier 2 guarantees availability of 1 million descriptors (or binding points). But Metal does not have a concept of descriptors. The hardware still uses descriptors of curse, but these details are hidden from the user and managed by the Metal driver. A GPU resource ID or a GPU buffer address are not descriptors, but either direct addresses or offsets into a hidden descriptor table. This changes the balance of things somewhat. There is no fixed limit to the number of "texture binding points" you can use with Argument Buffers for example — the only limit is the size of the buffer itself. And Metal does not need data buffer descriptors in the first place — it uses direct GPU address pointers instead. So if you are porting a DX12 game or emulating DX12 environment, you can trivially create a Metal buffer with a million texture binding points — this is fully supported a will work. What's more, you can do resource type erasure by binding this buffer to multiple typed targets simultaneously (e.g. to use the same buffer to bind different types of texture resources). Metal Argument Buffers are basically syntactic sugar over the popular bindless model — it's just that in DX12 you'd use the descriptor pool and integer indices to select resources from the pool, and in Metal the descriptor pool is hidden and the index is hidden behind the texture2D etc. object. But any time you use this texture2D object the Metal shader (at least on Apple Silicon hardware) actually translates is to something like pool[texture_id] (for more low-level info see here: https://github.com/dougallj/applegpu/issues/37). In fact, Apple Silicon GPU works very similar to current hardware from Nvidia.

    Instead, the 500k limit appears to be the maximal number of resources your application can use. Every time you create a texture object, Metal driver adds a descriptor to the hidden descriptor pool. If you try to create a lot of textures, you will experience major slowdown. No idea whether there is a hardware limitation or a driver implementation limit. And since it's fairly unlikely that a game will actually need half a million textures (not with current GPU memory sizes anyway), I don't see this limitation being relevant in practice for the next few years.