Hello. In the iOS app i'm working on we are very tight on memory budget and I was looking at ways to reduce our texture memory usage. However I noticed that comparing ASTC8x8 to ASTC12x12, there is no actual difference in allocated memory for most of our textures despite ASTC12x12 having less than half the bpp of 8x8. The difference between the two only becomes apparent for textures 1024x1024 and larger, and even in that case the actual texture data is sometimes only 60% of the allocation size. I understand there must be some alignment and padding going on, but this seems extreme. For an example scene in my app with astc12x12 for most textures there is over a 100mb difference in astc size on disk versus when loaded, so I would love to be able to recover even a portion of that memory.
Here is some test code with some measurements i've taken using an iphone 11:
for(int i = 0; i < 11; i++) {
MTLTextureDescriptor *texDesc = [[MTLTextureDescriptor alloc] init];
texDesc.pixelFormat = MTLPixelFormatASTC_12x12_LDR;
int dim = 12;
int n = 2 << i;
int mips = i+1;
texDesc.width = n;
texDesc.height = n;
texDesc.mipmapLevelCount = mips;
texDesc.resourceOptions = MTLResourceStorageModeShared;
texDesc.usage = MTLTextureUsageShaderRead;
// Calculate the equivalent astc texture size
int blocks = 0;
if(mips == 1) {
blocks = n/dim + (n%dim>0? 1 : 0);
blocks *= blocks;
} else {
for(int j = 0; j < mips; j++) {
int a = 2 << j;
int cur = a/dim + (a%dim>0? 1 : 0);
blocks += cur*cur;
}
}
auto tex = [objCObj newTextureWithDescriptor:texDesc];
printf("%dx%d, mips %d, Astc: %d, Metal: %d\n", n, n, mips, blocks*16, (int)tex.allocatedSize);
}
MTLPixelFormatASTC_12x12_LDR
128x128, mips 7, Astc: 2768, Metal: 6016
256x256, mips 8, Astc: 10512, Metal: 32768
512x512, mips 9, Astc: 40096, Metal: 98304
1024x1024, mips 10, Astc: 158432, Metal: 262144
128x128, mips 1, Astc: 1936, Metal: 4096
256x256, mips 1, Astc: 7744, Metal: 16384
512x512, mips 1, Astc: 29584, Metal: 65536
1024x1024, mips 1, Astc: 118336, Metal: 147456
MTLPixelFormatASTC_8x8_LDR
128x128, mips 7, Astc: 5488, Metal: 6016
256x256, mips 8, Astc: 21872, Metal: 32768
512x512, mips 9, Astc: 87408, Metal: 98304
1024x1024, mips 10, Astc: 349552, Metal: 360448
128x128, mips 1, Astc: 4096, Metal: 4096
256x256, mips 1, Astc: 16384, Metal: 16384
512x512, mips 1, Astc: 65536, Metal: 65536
1024x1024, mips 1, Astc: 262144, Metal: 262144
I also tried using MTLHeaps (placement and automatic) hoping they might be better, but saw nearly the same numbers.
Is there any way to have metal allocate these textures in a more compact way to save on memory?
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Hi,
Introducing Swift Concurrency to my Metal app has been a bit challenging as Swift Concurrency is limited by the cooperative thread pool.
GPU work is obviously not CPU bound and can block forward moving progress, especially when using waitUntilCompleted on the command buffer. For concurrent render work this has the potential of under utilizing the CPU and even creating dead locks.
My question is, what is the Metal's teams general recommendation when it comes to concurrency? It seems to me that Dispatch or OperationQueues are still the preferred way for Metal bound tasks in order to gain maximum performance?
To integrate with Swift Concurrency my idea is to use continuations that kick off render jobs via Dispatch or Queues? Would this be the best solution to bridge async tasks with Metal work?
Thanks!
hello everyone.
I got a texture loading error on visionOS 2.0:
Can't create texture(Error Domain=MTKTextureLoaderErrorDomain Code=0 "Pixel format(MTLPixelFormatInvalid) is not valid on this device" UserInfo={NSLocalizedDescription=Pixel format(MTLPixelFormatInvalid) is not valid on this device, MTKTextureLoaderErrorKey=Pixel format(MTLPixelFormatInvalid) is not valid on this device}
But this texture can load correctly on visionOS1.3.
I don't know what happen between visionOS1.3 and visionOS2.0.
The texture is a ktx file which stores cubemap that encoding in astc6x6hdr. And the ktx texture has a glInternalFormat info: GL_COMPRESSED_RGBA_ASTC_6x6.
I wonder if visionOS2.0 no longer supports astc6x6hdr cubemap format, or there is something wrong with my assets.
I created an empty binary archive and try to save it on disk, but it returned false without giving any error message.
The code is from
https://developer.apple.com/documentation/metal/shader_libraries/metal_binary_archives/creating_binary_archives_from_device-built_pipeline_state_objects?language=objc
Hi,
I can capture a frame on the Apple TV, but when I try to profile the capture for GPU timing information, I got "Abort Trap 6" error and with following error in the report:
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Triggered by Thread: 7
Application Specific Information:
abort() called
Last Exception Backtrace:
0 CoreFoundation 0x18c0a99d0 __exceptionPreprocess + 160
1 libobjc.A.dylib 0x18b596d24 objc_exception_throw + 71
2 CoreFoundation 0x18bfa7308 -[__NSArrayM insertObject:atIndex:] + 1239
3 MTLReplayController 0x101f5d148 DYMTLReplayFrameProfiler_loadAnalysis + 1140
4 MTLReplayController 0x101e97f90 GTMTLReplayClient_collectGPUShaderTimelineData + 224
5 MTLReplayController 0x101e81794 __30-[GTMTLReplayService profile:]_block_invoke_4 + 288
6 Foundation 0x18eb6072c __NSOPERATION_IS_INVOKING_MAIN__ + 11
7 Foundation 0x18eb5cc1c -[NSOperation start] + 623
8 Foundation 0x18eb60edc __NSOPERATIONQUEUE_IS_STARTING_AN_OPERATION__ + 11
9 Foundation 0x18eb60bc4 __NSOQSchedule_f + 167
10 libdispatch.dylib 0x18b8d6a84 _dispatch_block_async_invoke2 + 103
11 libdispatch.dylib 0x18b8c9420 _dispatch_client_callout + 15
12 libdispatch.dylib 0x18b8cc5d0 _dispatch_continuation_pop + 531
13 libdispatch.dylib 0x18b8cbcd4 _dispatch_async_redirect_invoke + 635
14 libdispatch.dylib 0x18b8d9224 _dispatch_root_queue_drain + 335
15 libdispatch.dylib 0x18b8d9a08 _dispatch_worker_thread2 + 163
16 libsystem_pthread.dylib 0x18b6e652c _pthread_wqthread + 223
17 libsystem_pthread.dylib 0x18b6ed8d0 start_wqthread + 7
It's Xcode 16.0 + Apply TV 4K (4th Gen) tvOS 18, does anyone know what's the cause of this error and is there any solution for it?
Thank you very much,
Kai
Topic:
Graphics & Games
SubTopic:
Metal
Hey guys,
is it possible to implement mirror like reflections like in this project:
https://developer.apple.com/documentation/metal/metal_sample_code_library/rendering_reflections_in_real_time_using_ray_tracing
for visionOS? Or is the hardware not prepared for Metal Raytracing?
Thanks in advance
Is there anywhere that the format of error/warning/other? messages that are generated by the Metal compiler are documented?
Topic:
Graphics & Games
SubTopic:
Metal
I've been working with ARKit and Metal on the Vision Pro, but I've encountered a slight flickering issue with the mesh rendered using Metal. The flickering tends to occur around the edges of objects or on pixels with high color contrast, and it becomes more noticeable as the distance increases. Is there any way to resolve this issue?
First I get this
ar_world_tracking_provider_query_device_anchor_at_timestamp <0x302b9c0a0>: The device_anchor can only be queried when the world tracking provider is running.
This seemed to all break with the auto-update to 2.0.1. Simulator runs the code fine.
I seem to see an infinite stall here
frameLayer.endUpdate()
// Pace frames by waiting for the optimal prediction time.
try await LayerRenderer.Clock().sleep(until: timing.optimalInputTime, tolerance: nil)
// Start submitting the updated frame.
frameLayer.startSubmission() <-
Hi everyone,
I encountered a very strange shader bug that seems related to Metal only (not OpenGL).
You can find the full description of the issue on the Babylon.js forums here: https://forum.babylonjs.com/t/strange-shader-related-issue-on-macos-with-safari-and-chrome-not-firefox/54289 (sorry, I couldn't post a clickable link here as this seems to be blocked here).
I have a workaround to fix the issue (as described in the link above), but this really looks like an issue in Metal itself.
Let me know if you need more details or explanations.
Topic:
Graphics & Games
SubTopic:
Metal
I'm experiencing a strange issue where I'm seeing black in a metal drawable where it should be a different color. When I capture the frame and inspect the returned value from the fragment function, it's correct, but the drawable isn't.
This screenshot hopefully illustrates the issue.
I've not found any references to similar issues. I saw something about some out of bounds or NaN values being dropped to 0 (which would be black), but the debugger doesn't indicate this is happening.
Topic:
Graphics & Games
SubTopic:
Metal
Hi!
How to define and call an inline function in Metal? Or simple function that will return some value.
Case:
inline uint index4D(constant _4D& shape,
constant uint& n,
constant uint& c,
constant uint& h,
constant uint& w) {
return n * shape.C * shape.H * shape.W + c * shape.H * shape.W + h * shape.W + w;
}
When I call it in my kernel function I get No matching function for call error.
Thx in advance.
Hello,
We are experimenting with Metal to accelerate some peculiar numerical computation. Our workloads are relatively small, so the ability to avoid moving data to and from the GPU's memory is very appealing. However, we are observing higher overhead compared to CUDA, which negates the benefits of avoiding data transfer.
In our tests using an empty kernel, CUDA completes in 0.001 ms (Intel i7 10700K, RTX 3080), while Metal's waitUntilCompleted takes 0.12 ms (M2 Max). As we do not have prior experience with Metal, we are wondering if we are using the APIs just fine and this timing is expected, or if there is a way to reduce it.
Thank you in advance for any comment!
test-metal.cpp
I have multiple CAMetalLayers that I render content to and noticed that the graphics overview HUD does not function properly when I have more than one CAMetalLayer. The values reported will be very strange. For example, FPS may report 999 or some large negative value. It the HUD simply not designed to work with multiple CAMetalLayers or MTKViews? When I disable all but one of my CAMetalLayers, the HUD works as expected.
I've noticed a major third-party app has the following flag set to 1/true in its Info.plist: CAMetalLayerWantsCompositingDependencies
Does anyone know if it’s recognized by Core Animation / Metal, and what it’s supposed to do? It might obviously have zero relationship to the OS, defined by that app and for that app... but since it looks very much like an unofficial/undocumented environment setting, it might be great to know what problem it solves. I happen to have issues related to compositing other CALayers over a CAMetalLayer in my app... so this definitely stood out as interesting.
Thank you!
Topic:
Graphics & Games
SubTopic:
Metal
I would like to take YCbCr CVPixelBuffers from AVCaptureVideoDataOutput, apply some processing in RGB space, render to an MTKView, and pass to AVAssetWriter for recording.
Right now, I'm doing this all manually – deswing the incoming data if necessary, choose the right matrix to convert to RGB, apply processing, etc. I also have to convert back to YCbCr before feeding the frames to AVAssetWriter because encoding performs much better if I do. Is there any efficient, built-in way to achieve the same?
I can't use AVCaptureVideoPreviewLayer, since I need to do some further processing before display. I can't use AVCaptureVideoDataOutput's videoSettings to get automatic BGRA conversion because that would lose bit depth for 10 bit video formats (and isn't available on all formats anyway).
I see these Accelerate functions, but they seemingly don't use the GPU, nor do they support all the formats and bit depths I'd need.
I found reference to some undocumented MTLPixelFormats that seem to do exactly what I want, but I don't want to rely on something like this unless it's explicitly endorsed. This would also incur an RGB/YCbCr conversion on every texture read and write, right?
Is there anything I'm missing here?
Hi, wanted to test if possible to use Mesa3D Dozen driver(Vulkan on D3D12 )+D3DMetal 2b3 to get maybe better Vulkan driver on Wine than default MoltenVK.. this will support Vulkan windows apps via using D3D12Metal..
using vulkan_dzn.dll,dzn_icd.x86_64.json,dxil.dll from x64 folder from: https://github.com/pal1000/mesa-dist-win/releases/download/24.3.0-rc1/mesa3d-24.3.0-rc1-release-msvc.7z
using simple vulkaninfo app and running like:
wine64 vulkaninfo
I get error:
[D3DMetal:LOG:2A825] Unsupported API: CheckFeatureSupport, unhandled support query 53
also seems D3DMetal Wine integration on Whisky doesn't expose d3d12core.dll and d3d12.dll like new Agility D3D12 dlls or VKD3D, so
getting:
MESA: error: Failed to retrieve D3D12GetInterface MESA: error: Failed to load DXCore
but anyways seems to try to load the driver as:
WARNING: dzn is not a conformant Vulkan implementation, testing use only.
full log:
MESA: error: Failed to retrieve D3D12GetInterface MESA: error: Failed to load DXCore WARNING: dzn is not a conformant Vulkan implementation, testing use only. [D3DMetal:LOG:2A825] Unsupported API: CheckFeatureSupport, unhandled support query 53 00bc:fixme:dcomp:DCompositionCreateDevice 0000000000000000, {c37ea93a-e7aa-450d-b16f-9746cb0407f3}, 000000000011E328. MESA: error: Failed to load DXCore WARNING: dzn is not a conformant Vulkan implementation, testing use only. [D3DMetal:LOG:2A825] Unsupported API: CheckFeatureSupport, unhandled support query 53 00bc:fixme:dcomp:DCompositionCreateDevice 0000000000000000, {c37ea93a-e7aa-450d-b16f-9746cb0407f3}, 000000000011E578. ERROR: [Loader Message] Code 0 : setup_loader_term_phys_devs: Call to 'vkEnumeratePhysicalDevices' in ICD c:\windows\system32\.\vulkan_dzn.dll failed with error code -3 ERROR: [Loader Message] Code 0 : setup_loader_term_phys_devs: Failed to detect any valid GPUs in the current config ERROR at C:\j\msdk0\build\Khronos-Tools\repo\vulkaninfo\vulkaninfo.h:241:vkEnumeratePhysicalDevices failed with ERROR_INITIALIZATION_FAILED
my app use mtkview to render video, but [MTKView initwithFrame:device] takes 2-3s in some some 2019 macbook pro, system macos 15.0.1.
how can I do?
Hello, I am using MTKView to display: camera preview & video playback. I am testing on iPhone 16. App crashes at a random moment whenever MTKView is rendering CIImage.
MetalView:
public enum MetalActionType {
case image(CIImage)
case buffer(CVPixelBuffer)
}
public struct MetalView: UIViewRepresentable {
let mtkView = MTKView()
public let actionPublisher: any Publisher<MetalActionType, Never>
public func makeCoordinator() -> Coordinator {
Coordinator(self)
}
public func makeUIView(context: UIViewRepresentableContext<MetalView>) -> MTKView {
guard let metalDevice = MTLCreateSystemDefaultDevice() else {
return mtkView
}
mtkView.device = metalDevice
mtkView.framebufferOnly = false
mtkView.clearColor = MTLClearColor(red: 0, green: 0, blue: 0, alpha: 0)
mtkView.drawableSize = mtkView.frame.size
mtkView.delegate = context.coordinator
mtkView.isPaused = true
mtkView.enableSetNeedsDisplay = true
mtkView.preferredFramesPerSecond = 60
context.coordinator.ciContext = CIContext(
mtlDevice: metalDevice, options: [.priorityRequestLow: true, .highQualityDownsample: false])
context.coordinator.metalCommandQueue = metalDevice.makeCommandQueue()
context.coordinator.actionSubscriber = actionPublisher.sink { type in
switch type {
case .buffer(let pixelBuffer):
context.coordinator.updateCIImage(pixelBuffer)
break
case .image(let image):
context.coordinator.updateCIImage(image)
break
}
}
return mtkView
}
public func updateUIView(_ nsView: MTKView, context: UIViewRepresentableContext<MetalView>) {
}
public class Coordinator: NSObject, MTKViewDelegate {
var parent: MetalView
var metalCommandQueue: MTLCommandQueue!
var ciContext: CIContext!
private var image: CIImage? {
didSet {
Task { @MainActor in
self.parent.mtkView.setNeedsDisplay() //<--- call Draw method
}
}
}
var actionSubscriber: (any Combine.Cancellable)?
private let operationQueue = OperationQueue()
init(_ parent: MetalView) {
self.parent = parent
operationQueue.qualityOfService = .background
super.init()
}
public func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {
}
public func draw(in view: MTKView) {
guard let drawable = view.currentDrawable, let ciImage = image,
let commandBuffer = metalCommandQueue.makeCommandBuffer(), let ci = ciContext
else {
return
}
//making sure nothing is nil, now we can add the current frame to the operationQueue for processing
operationQueue.addOperation(
MetalOperation(
drawable: drawable, drawableSize: view.drawableSize, ciImage: ciImage,
commandBuffer: commandBuffer, pixelFormat: view.colorPixelFormat, ciContext: ci))
}
//consumed by Subscriber
func updateCIImage(_ img: CIImage) {
image = img
}
//consumed by Subscriber
func updateCIImage(_ buffer: CVPixelBuffer) {
image = CIImage(cvPixelBuffer: buffer)
}
}
}
now the MetalOperation class:
private class MetalOperation: Operation, @unchecked Sendable {
let drawable: CAMetalDrawable
let drawableSize: CGSize
let ciImage: CIImage
let commandBuffer: MTLCommandBuffer
let pixelFormat: MTLPixelFormat
let ciContext: CIContext
init(
drawable: CAMetalDrawable, drawableSize: CGSize, ciImage: CIImage,
commandBuffer: MTLCommandBuffer, pixelFormat: MTLPixelFormat, ciContext: CIContext
) {
self.drawable = drawable
self.drawableSize = drawableSize
self.ciImage = ciImage
self.commandBuffer = commandBuffer
self.pixelFormat = pixelFormat
self.ciContext = ciContext
}
override func main() {
let width = Int(drawableSize.width)
let height = Int(drawableSize.height)
let ciWidth = Int(ciImage.extent.width) //<-- Thread 22: EXC_BAD_ACCESS (code=1, address=0x5e71f5490) A bad access to memory terminated the process.
let ciHeight = Int(ciImage.extent.height)
let destination = CIRenderDestination(
width: width, height: height, pixelFormat: pixelFormat, commandBuffer: commandBuffer,
mtlTextureProvider: { [self] () -> MTLTexture in
return drawable.texture
})
let transform = CGAffineTransform(
scaleX: CGFloat(width) / CGFloat(ciWidth), y: CGFloat(height) / CGFloat(ciHeight))
do {
try ciContext.startTask(toClear: destination)
try ciContext.startTask(toRender: ciImage.transformed(by: transform), to: destination)
} catch {
}
commandBuffer.present(drawable)
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
}
}
Now I am no Metal expert, but I believe it's a very simple execution that shouldn't cause memory leak especially after we have already checked for whether CIImage is nil or not. I have also tried running this code without OperationQueue and also tried with @autoreleasepool but none of them has solved this problem.
Am I missing something?
I’ve been trying to run Jurassic World Evolution 2 using the Game Porting Toolkit on macOS, but the game doesn’t launch and crashes immediately. Based on the error and research, it seems the issue is related to missing support for D3D12_TILED_RESOURCES_TIER_2 in the Metal API.
If this is the case, does anyone know if support for tiled resources is planned for future updates of the toolkit? Or are there any potential workarounds for bypassing this limitation?