Is there a way to directly go from VideoToolbox to Metal for 10-bit/BT.2020 YCbCr HEVC?

tl;dr how can I get raw YUV in a Metal fragment shader from a VideoToolbox 10-bit/BT.2020 HEVC stream without any extra/secret format conversions?

With VideoToolbox and 10-bit HEVC, I've found that it defaults to CVPixelBuffers w/ formats kCVPixelFormatType_Lossless_420YpCbCr10PackedBiPlanarFullRange or kCVPixelFormatType_Lossy_420YpCbCr10PackedBiPlanarFullRange. To mitigate this, I have the following snippet of code to my application:

        // We need our pixels unpacked for 10-bit so that the Metal textures actually work
        var pixelFormat:OSType? = nil
        let bpc = getBpcForVideoFormat(videoFormat!)
        let isFullRange = getIsFullRangeForVideoFormat(videoFormat!)
        
        // TODO: figure out how to check for 422/444, CVImageBufferChromaLocationBottomField?
        if bpc == 10 {
            pixelFormat = isFullRange ? kCVPixelFormatType_420YpCbCr10BiPlanarFullRange : kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange
        }
        
        let videoDecoderSpecification:[NSString: AnyObject] = [kVTVideoDecoderSpecification_EnableHardwareAcceleratedVideoDecoder:kCFBooleanTrue]
        var destinationImageBufferAttributes:[NSString: AnyObject] = [kCVPixelBufferMetalCompatibilityKey: true as NSNumber, kCVPixelBufferPoolMinimumBufferCountKey: 3 as NSNumber]
        if pixelFormat != nil {
            destinationImageBufferAttributes[kCVPixelBufferPixelFormatTypeKey] = pixelFormat! as NSNumber
        }

        var decompressionSession:VTDecompressionSession? = nil
        err = VTDecompressionSessionCreate(allocator: nil, formatDescription: videoFormat!, decoderSpecification: videoDecoderSpecification as CFDictionary, imageBufferAttributes: destinationImageBufferAttributes as CFDictionary, outputCallback: nil, decompressionSessionOut: &decompressionSession)

In short, I need kCVPixelFormatType_420YpCbCr10BiPlanar so that I have a straightforward MTLPixelFormat.r16Unorm/MTLPixelFormat.rg16Unorm texture binding for Y/CbCr. Metal, seemingly, has no direct pixel format for 420YpCbCr10PackedBiPlanar. I'd also rather not use any color conversion in VideoToolbox, in order to save on processing (and to ensure that the color transforms/transfer characteristics match between streamer/client, since I also have a custom transfer characteristic to mitigate blocking in dark scenes).

However, I noticed that in visionOS 2, the CVPixelBuffer I receive is no longer a compressed render target (likely a bug), which caused GPU texture read bandwidth to skyrocket from 2GiB/s to 30GiB/s. More importantly, this implies that VideoToolbox may in fact be doing an extra color conversion step, wasting memory bandwidth.

Does Metal actually have no way to handle 420YpCbCr10PackedBiPlanar? Are there any examples for reading 10-bit HDR HEVC buffers directly with Metal?

I attempted to read the packed 10-bit data out with the following code (does not work):

half readPackedY(texture2d<uint> in_tex_y, uint2 xyPx, uint stride) {
    uint idx = (xyPx.x + xyPx.y * stride);
    uint idxPackedBase = ((xyPx.x + xyPx.y * stride) / 4) * 5;
    uint which = idx % 4;
    uint px0Lut[16] = {0, 1, 2, 3};
    uint px0Mask[16] = {0xFF, 0x3F, 0xF, 3};
    int px0Shift[16] = {2, 4, 6, 8};
    uint px1Lut[16] = {1, 2, 3, 4};
    uint px1Mask[16] = {0xC0, 0xF0, 0xFC, 0xFF};
    int px1Shift[16] = {6, 4, 2, 0};
    
    uint px0Idx = idxPackedBase + px0Lut[which];
    uint px1Idx = idxPackedBase + px1Lut[which];
    
    uint2 px0XY = uint2(px0Idx % stride, px0Idx / stride);
    uint2 px1XY = uint2(px1Idx % stride, px1Idx / stride);
    
    uint8_t px0 = (in_tex_y.read(px0XY).r & px0Mask[which]);
    uint8_t px1 = (in_tex_y.read(px1XY).r & px1Mask[which]);
    uint px = (px0 << px0Shift[which]) | (px1 >> px1Shift[which]);
    
    return half(float(px) / 1023.0);
}

however, Metal reads the texture in a swizzled format (seems to be 4px x 4px) but I'd rather not add undefined behavior to my app.

Signs seem to be pointing to 'actually there is no API for this' because WebKit has its own fancy MTLPixelFormatYCBCR10_420_2P_PACKED (boooooooo private texture formats 🍅🍅🍅)

Yeah it turns out the private texture formats work exactly how CVPixelBuffer should work when getting a MTLTexture: You pass the formats into CVMetalTextureCacheCreateTextureFromImage for plane 0, ignore plane 1, and the MTLTexture handles all of the YUV->RGB format conversion for you, and has a fairly significant performance/power benefit as well.

Here's an example of their usage: https://gist.github.com/shinyquagsire23/81c86f4bf670aaa68b5804080ff964a0. Might not be kosher for App Store submission, but if it's good enough for WebKit it's good enough for me. If Apple doesn't want it used directly they should provide an actual API/abstraction tbh.

Is there a way to directly go from VideoToolbox to Metal for 10-bit/BT.2020 YCbCr HEVC?
 
 
Q