Reference White Calculation for HDR Video Rendering in Metal

Our multimedia application Boinx FotoMagico displays media files of various kinds with a Metal rendering engine. At the moment we still use .bgra8Unorm pixel format and sRGB color space and only render in SDR, which is increasingly a problem, as much of the video content is HDR nowadays (e.g. videos shot on an iPhone). For that reason we would like to switch to EDR rendering with .rgba16Float pixel format and extendedLinearDisplayP3 color space.

We have already worked out how to do this for HDR image files, but still have a technical problem when rendering HDR video files. We are using AVFoundation to get the video frames as CVPixelBuffers and convert them to MTLTexture using a CVMetalTextureCache. MTLTextures are then further processed in various compute shaders before being rendered to screen. However the pixel values in the texture are not what we expected. Video frames appear too bright/overexposed.

In WWDC21 session "Explore HDR rendering with EDR" Ken Greenebaum mentioned:

“AVFoundation does not presently decode HDR formats, such as HDR10, to EDR. Consequently, these need to be adapted for use with EDR rendering. This conversion is straightforward and involves two steps. First, converting to linear light by applying the inverse transfer function. And second, dividing by the medium's reference white.”

https://developer.apple.com/videos/play/wwdc2021/10161?time=1498

However, the session does not explain, how to get or calculate the correct value for "reference white". We could not find any relevant info on the web. This is why we need DTS assistance. We need the code that calculates the correct value for reference white for any kind of video, whether it is SDR or HDR, and regardless of codec and encoding. I assume that Ken Greenebaum is the best Apple engineer to ask in this case, because he recorded most of the EDR related WWDC sessions in recent years?

We have written a small test app that renders a short sample video (HLG encoding). The window contains two views. The upper view uses an AVPlayerLayer and renders the video natively just like QuickTime Player. The video content looks correct here. BTW, the window background is SDR white, so that bright EDR pixels can be clearly identified, e.g. the clouds just above the mountains in the upper left corner of the sample video. You may need to lower display brightness a bit if these clouds do not appear brighter than the white window background.

The bottom view uses a CAMetalLayer and low-level Metal rendering. The CVPixelBuffers we receive from AVFoundation still need to be scaled down so that SDR reference white reaches pixel value 1.0. Entering a value of 9.0 to 10.0 for reference white in the text field makes it look about right on my Studio Display. But that is just experimental for this sample video file. We need code to calculate the correct value for reference white for any kind of video file!

We have a couple of questions:

  1. SDR videos should probably use 1.0 as reference white, as their encoded pixel values can already be used as is? Is this assumption correct?
  2. Different video encoding of HDR video (HLG, PQ, etc) will probably lead to different values for reference white?
  3. Is the value for reference white constant throughout a video, or can it vary over time, either scene by scene, or even frame by frame?
  4. If it can vary, does the CVPixelBuffer of the current video frame contain all the necessary metadata to calculate the correct value?
  5. Does the NSScreen.maximumExtendedDynamicRangeColorComponentValue also influence the reference white value?

The attached sample project is structured in a way that the only piece of code that needs to be modified is the ViewController.sdrReferenceWhiteValue() function. Please read the comments and the #warning in this function. This is where the code for calculating the reference white value should be inserted.

Here is the download link for the sample project:

https://www.dropbox.com/scl/fi/4w5gmftav5xhbixu9u6pb/HDRMetalTest.zip?rlkey=n8cm02soux3rx03vplgo6h1lm&dl=0

Answered by DTS Engineer in 824003022

... and finally question 5:

EDR pixels use a reference white of 1.0, as SDR pixels did before i.e. EDR pixels in the 0 … 1 range are equivalent to SDR pixels in that range.

When rendering to SDR or HDR displays, EDR 1.0 maps to the display’s reference white brightness, which is adjustable with the brightness control of the display. The headroom of a display is the ratio of its peak brightness to its reference white brightness. EDR values above this headroom will be clipped.

Reducing the reference white brightness of a given display often maintains the same peak value e.g. on the Pro Display XDR (default preset, peak 1600 nits), the reference white can be adjusted from 500 nits (3.2x headroom) down to 4 nits (400x headroom) by changing the brightness.

Hello and thank you for reaching out to us with interesting questions and sample project.

As mentioned previously, we aim to support you here rather than through the incident system (TSIs).

More forthcoming.

Thanks for letting us know that you're working on this.

The sample project has been quite helpful for us thanks.

Starting with 3 and 4:

Reference white is constant with respect to a given video. As mentioned, the value may differ amongst different videos e.g. 100nit versus 203nit, but not within any particular video. Take care not to confuse reference white with diffuse white (used as a reference for the brightest “non-shiny” parts of an image versus defining what “white” means for consistent viewing experiences).

Per 3 above, it doesn’t vary and it would helpful to have explicit metadata here so please consider an API enhancement request. You can do this with the Feedback Assistant.

Going back to question 1:

Yes, SDR video should be decoded to logical values between 0.0, and 1.0.

Note: Super-white values are an exception. Some SDR formats encode super-white values and those should be decoded to EDR values above 1.0. For instance rec.709 8bit video range luma encoding uses codes from 16 to 235. To decode to SDR we would subtract by 16 to offset the super-blacks, then scale by 255/219 to map the brightest non-super white to 1.0, and then clamp all values to 1.0. For EDR we would not apply the clamp.

Question 2:

Yes, different but well specified. In Hollywood, 100 nits in PQ is used as reference white, following the original specification. However, in television, reference white has shifted to 203 nits in more recent PQ specifications.

HLG sets reference white at the 75% code value which makes the max HLG code decode to 4.96. The 100% code after OOTF should be 1000nit, and the 75% code value after 1000nit OOTF should be 203nit, thus establishing the 4.96 EDR headroom.

... and finally question 5:

EDR pixels use a reference white of 1.0, as SDR pixels did before i.e. EDR pixels in the 0 … 1 range are equivalent to SDR pixels in that range.

When rendering to SDR or HDR displays, EDR 1.0 maps to the display’s reference white brightness, which is adjustable with the brightness control of the display. The headroom of a display is the ratio of its peak brightness to its reference white brightness. EDR values above this headroom will be clipped.

Reducing the reference white brightness of a given display often maintains the same peak value e.g. on the Pro Display XDR (default preset, peak 1600 nits), the reference white can be adjusted from 500 nits (3.2x headroom) down to 4 nits (400x headroom) by changing the brightness.

Thank you for providing answers to questions 1 through 5. While they do provide some insight, we are still confused about how to calculate the reference white value with the available metadata from the AVAssetTrack.

I did file the API enhancement request (FB16501084), but since we still have a deployment target of macOS Catalina (10.15) we cannot wait for a future API enhancement, but rather need some code to calculate that value ourself right now.

Could you explain again how to calculate that value from the available metadata, or provide us with a code snippet, please?

Got it!

Let's see what we can do to refine and provide more detail.

In the meantime this quip / tip: necessary metadata isn't present so you'll need to infer based on standards (see answer to question 2).

Reference White Calculation for HDR Video Rendering in Metal
 
 
Q