tl;dr how can I get raw YUV in a Metal fragment shader from a VideoToolbox 10-bit/BT.2020 HEVC stream without any extra/secret format conversions?
With VideoToolbox and 10-bit HEVC, I've found that it defaults to CVPixelBuffers w/ formats kCVPixelFormatType_Lossless_420YpCbCr10PackedBiPlanarFullRange
or kCVPixelFormatType_Lossy_420YpCbCr10PackedBiPlanarFullRange
. To mitigate this, I have the following snippet of code to my application:
// We need our pixels unpacked for 10-bit so that the Metal textures actually work
var pixelFormat:OSType? = nil
let bpc = getBpcForVideoFormat(videoFormat!)
let isFullRange = getIsFullRangeForVideoFormat(videoFormat!)
// TODO: figure out how to check for 422/444, CVImageBufferChromaLocationBottomField?
if bpc == 10 {
pixelFormat = isFullRange ? kCVPixelFormatType_420YpCbCr10BiPlanarFullRange : kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange
}
let videoDecoderSpecification:[NSString: AnyObject] = [kVTVideoDecoderSpecification_EnableHardwareAcceleratedVideoDecoder:kCFBooleanTrue]
var destinationImageBufferAttributes:[NSString: AnyObject] = [kCVPixelBufferMetalCompatibilityKey: true as NSNumber, kCVPixelBufferPoolMinimumBufferCountKey: 3 as NSNumber]
if pixelFormat != nil {
destinationImageBufferAttributes[kCVPixelBufferPixelFormatTypeKey] = pixelFormat! as NSNumber
}
var decompressionSession:VTDecompressionSession? = nil
err = VTDecompressionSessionCreate(allocator: nil, formatDescription: videoFormat!, decoderSpecification: videoDecoderSpecification as CFDictionary, imageBufferAttributes: destinationImageBufferAttributes as CFDictionary, outputCallback: nil, decompressionSessionOut: &decompressionSession)
In short, I need kCVPixelFormatType_420YpCbCr10BiPlanar
so that I have a straightforward MTLPixelFormat.r16Unorm
/MTLPixelFormat.rg16Unorm
texture binding for Y/CbCr. Metal, seemingly, has no direct pixel format for 420YpCbCr10PackedBiPlanar
. I'd also rather not use any color conversion in VideoToolbox, in order to save on processing (and to ensure that the color transforms/transfer characteristics match between streamer/client, since I also have a custom transfer characteristic to mitigate blocking in dark scenes).
However, I noticed that in visionOS 2, the CVPixelBuffer
I receive is no longer a compressed render target (likely a bug), which caused GPU texture read bandwidth to skyrocket from 2GiB/s to 30GiB/s. More importantly, this implies that VideoToolbox may in fact be doing an extra color conversion step, wasting memory bandwidth.
Does Metal actually have no way to handle 420YpCbCr10PackedBiPlanar
? Are there any examples for reading 10-bit HDR HEVC buffers directly with Metal?