Display HDR video in EDR with AVFoundation and Metal
Learn how you can take advantage of AVFoundation and Metal to build an efficient EDR pipeline. Follow along as we demonstrate how you can use AVPlayer to display HDR video as EDR, add playback into an app view, render it with Metal, and use Core Image or custom Metal shaders to add video effects such as keying or color management. Whether you develop games or pro apps, we'll help you decide which frameworks to use and share best practices for selecting transports, colorspaces, and pixelbuffer formats.
♪ ♪ Ken Greenebaum: Hi everyone! Welcome to WWDC 2022. My name is Ken Greenebaum, and I'm with the Color and Display Technologies team at Apple. We are thrilled to have three EDR talks this year. Hope you've had an opportunity to watch "Explore EDR on iOS," where we announced EDR API support for iOS, as well as "Display EDR content with Core Image, Metal, and SwiftUI." Some of you may have also watched my EDR talk last year, where we demonstrated how to use AVPlayer to play back HDR video, using EDR.
In this talk we're gonna go deeper, and explore how to use Core Media interfaces to provide, not only EDR playback, but also how to decode and playback HDR video, into your own EDR layers or views.
Then we'll continue beyond simply playing back content, to show how to access the decoded video frames in real time, via Core Video's display link, send those frames to CoreImage Filters, or a Metal Shader, to add color management, visual effects, or apply other signal processing, and finally, plumb the resulting frames to Metal to render. We're going to start by reviewing the EDR compatible video media frameworks, to help you decide which best matches your application's requirements.
Next we will briefly discuss the high level AVKit and AVFoundation frameworks, that can do all of the work of playing HDR video, if your application requires straight forward playback.
And finally, we'll discuss best practices for using decoded video frames, with Core Video and Metal, in your EDR playback, editing, or image processing engine.
Let's begin by taking a quick survey of Apple's video frameworks; Starting with the highest level interfaces; which are the easiest to use; and continuing to lower level frameworks that offer more opportunities, at the expense of adding complexity to your code. It is best to use the highest level framework possible to take advantage of the optimizations provided automatically for you. This will get us ready to dive into the body of the talk, where we will be exploring a number of scenarios, from simple EDR playback to more sophisticated plumbing of decoded video frames to CoreImage or Metal for real time processing. At the highest level, there is AVKit. With AVKit you can create user interfaces for media playback; complete with transport controls, chapter navigation, Picture in Picture support, and display of subtitles and closed captions. AVKit can playback HDR content as EDR, as we will demonstrate using AVPlayerViewController. However, if your application requires further processing of video frames, you will have to use a media framework that can give you more control over your pipeline. Next there is AVFoundation. AVFoundation is the full-featured framework for working with time based audio visual media on Apple Platforms. Using AVFoundation, you can easily play, create, and edit QuickTime movies and MPEG 4 files, play HLS streams, and build powerful media functionality into your apps. We'll be exploring use of AVPlayer and the related AVPlayerLayer interface in this talk. Core Video is a framework that provides a pipeline model for digital video. It simplifies the way you work with videos by partitioning the process into discrete steps. Core Video also makes it easier for you to access and manipulate individual frames, without having to worry about translating between data types or worrying about display synchronization. We'll be demonstrating use of DisplayLink, and CVPixelBuffer's with Core Image. And CVMetalTextureCache, with Metal. Next there is Video Toolbox. This is a low-level framework that provides direct access to hardware encoders and decoders. Video Toolbox provides services for video compression and decompression, and for conversion between raster image formats stored in Core Video pixel buffers. VTDecompressionSession is a powerful low-level interface that is outside of the scope of this talk, but advanced developers might want to investigate further. And finally, there is Core Media. This framework defines the media pipeline used by AVFoundation, and the other high-level media frameworks. You can always use Core Media's low-level data types and interfaces to efficiently process media samples and manage queues of media data. In the remainder of this talk we will demonstrate how and when to use these frameworks in your app. First, how to use AVKit and AVFoundation to easily playback HDR video rendered as EDR. Then a series of more sophisticated applications of AVPlayer: to render to your own layer, to access individually decoded frames via CADisplayLink and send the resulting CVPixelBuffers to Core Image for processing, and finally, accessing the decoded frames as Metal textures via the CVMetalTextureCache for processing and rendering in Metal. Now that we have an overview of the video media layer on Apple platforms, we'll focus on AVKit and AVFoundation frameworks. Let's get things started by first discussing playback of your HDR video content using AVFoundation's AVPlayer interface. An AVPlayer is a controller object, used to manage the playback and timing of a media asset. The AVPlayer interface can be used for high-performance playback of HDR video, automatically rendering the result as EDR when possible.
With AVPlayer, you can play local, and remote file based media, such as QuickTime movies; as well as streaming media, served using HLS. Essentially, AVPlayer is used to play one media asset at a time. You can reuse the player instance to serially play additional media assets, or even create multiple instances to play more than one asset simultaneously, but AVPlayer manages the playback of only a single media asset at a time. AVFoundation framework also provides a subclass of AVPlayer called AVQueuePlayer that you can use to create and manage the queuing and playing of sequential HDR media assets. If your application requires simple playback of HDR video media rendered to EDR, then AVPlayer with AVPlayerViewController, may be the best approach. Use AVPlayer with AVPlayerLayer to playback your own views on iOS or macOS.
These are the most straightforward ways of using AVPlayer. Let's look at examples of both. First we will look how you can use AVFoundation's AVPlayer interface, in conjunction with AVKit's AVPlayer View Controller. Here, we start by instantiating AVPlayer from the media's URL.
Next we create an AVPlayerViewController, then set the player property of our viewer controller to the player we just created from the media's URL.
And present the view controller modally to start playback of the video. AVKit manages all the details for you and will automatically play back HDR Video as EDR on displays supporting EDR. As I mentioned, some applications will need to play back HDR video media into their own view. Let's look at how to accomplish this using AVPlayer with AVPlayerLayer. To play HDR video media as EDR in your own view, we again start by creating an AVPlayer with the media's URL. However this time we instantiate an AVPlayerLayer with the player we just created. Next we need to set the bounds on the player layer, which we get from the view. Now that the player layer has the bounds from the view, we can add the player layer as a sublayer to the view. Finally, to play back the HDR video media, we call AVPlayer's play method. That's all that is needed to play back HDR video media as EDR in your own layer using AVPlayer and AVPlayerLayer. We just explored the two most straightforward HDR video playback workflows using AVPlayer. However, many applications require more than simple media playback.
For example, an application might require image processing, such as color grading or chroma keying to be applied to the video. Let's explore a workflow that gets decoded video frames from AVPlayer, applies Core Image filters or Metal shaders in real time, and renders the results as EDR. We will be demonstrating how to use AVPlayer and the AVPlayerItem to decode EDR frames from your HDR video media, access the decoded frames from the Core Video display link, send the resulting pixel buffers to Core Image or Metal for processing, then render the results in a CAMetalLayer as EDR on displays with EDR support. With this in mind, let's first demonstrate setting a few key properties on the CAMetalLayer, which are required to ensure HDR media will render correctly as EDR. First we need to get the CAMetalLayer that we will be rendering the HDR video content to. On that layer we opt into EDR by setting the wantsExtendedDynamicRangeContent flag to true.
Please be sure to use a pixel format that supports Extended Dynamic Range content. For the AVPlayer example that follows, we will set the CAMetalLayer to use a half float pixel format, however a ten bit format used in conjunction with a PQ or HLG transfer function would also work. To avoid limiting the result to SDR, we also need to set the layer to an EDR compatible extended range color space.
In our examples we will be setting the half float metal texture to the extended linear display P3 color space. We just scratched the surface regarding EDR, color spaces, and pixel buffer formats. You might want to check out my session from last year, "HDR rendering with EDR," as well as this year's "EDR on iOS," for more details.
Now that we have set the basic properties on the CAMetalLayer, let's continue the demonstration by adding real time image processing using a Core Image, or Metal shader. We'll be using a display link in conjunction with AVPlayer to access the decoded video frames in real time.
For this workflow, you start by creating an AVPlayer from an AVPlayerItem. Next, you instantiate an AVPlayerItemVideoOutput, configured with appropriate pixel buffer format and color space for EDR. Then you create and configure a Display link. And lastly, you run the Display link to get the pixel buffers to Core Image or Metal for processing. We will demonstrate a CADisplayLink as is used on iOS. Please use the equivalent CVDisplayLink interface when developing for macOS. This time we choose to create an AVPlayerItem from our media's URL, and instantiate an AVPlayer with the AVPlayerItem that we just created. Now we create a pair of dictionaries to specify the color space and pixel buffer format of the decoded frames. The first dictionary, videoColorProperties, is where the color space and transfer function are specified. In this example we request the Display P3 colorspace, which corresponds to the color space of most Apple displays, and the linear transfer function which allows AVFoundation to maintain the extended range values required for EDR.
The second dictionary, outputVideoSettings, specifies the characteristics of the pixel buffer format and also provides a reference to the videoColorProperties dictionary we just created. In this example, we request wide color and the half float pixel buffer format. It is very helpful that AVPlayerItemVideoOutput, not only decodes video into the pixel buffer format we specify in the output settings dictionary, but also automatically performs any color conversion required via a pixel transfer session. Recall, a video might contain multiple clips, potentially with different colorspaces. AVFoundation automatically manages these for us, and as we'll soon be demonstrating, this behavior also allows the resulting decoded video frames to be sent to low level frameworks like Metal that don't themselves provide automatic colorspace conversion to the display's colorspace.
Now we create the AVPlayerItemVideoOutput with the outputVideoSettings dictionary. As the third step, we setup the Display link, which will be used to access the decoded frames in real time. CADisplayLink takes a call back that is run on each display update. In our example we call a local function that we will explore in a moment to get the CVPixelBuffers that we will be sending to Core Image for processing. Next we create a video player item observer to allow us to handle changes to specified player Item properties.
Our example will execute this code every time for the player item's status changes.
When the player item's status changes to readyToPlay, we add our AVPlayerItemVideoOutput to the new AVPlayerItem that was just returned, register CADisplayLink with the main run loop set to common mode, and start the real time decoding of the HDR video by calling play on the video player.
Finally, we will take a look at an example CADisplayLink callback implementation, which we referred to earlier as the `displayLinkCopyPixelBuffers` local function. Once the HDR video begins to play, the CADisplayLink callback function is called on each display refresh. For instance it might be called 60 times a second for a typical display. This is our code's opportunity to update the frame displayed if there is a new CVPixelBuffer. On each display callback, we attempt to copy a CVPixelBuffer containing the decoded video frame to be displayed at the current wall clock time. However, the `copyPixelBuffer` call might fail, as there won't always be a new CVPixelBuffer available at every display refresh, especially when the screen refresh rate exceeds that of the video being played. If there is not a new CVPixelBuffer, then the call fails and we skip the render. This causes the preceding frame to remain on-screen for another display refresh. But if the copy succeeds, then we have a fresh frame of video in a CVPixelBuffer. There are a number of ways that we might process and render this new frame. One opportunity is to send the CVPixelBuffer to Core Image for processing. Core Image can string together one or more CIFilters to provide GPU accelerated image processing to the video frame.
Please note that not all CIFilters are compatible with EDR and might have trouble with HDR content, including clamping to SDR or worse. Core Image provides many EDR compatible Filters. Use filter names with CICategoryHighDynamicRange, to enumerate EDR compatible Core Image filters. In our example, we will be adding a simple sepia tone effect. Now let's return to our example and integrate Core Image. On each display link callback that yields a fresh CVPixelBuffer, create a CIImage from that pixel buffer.
Instance the CIFilter to implement the desired effect. I am using the sepia tone filter because of its parameter-less simplicity, however there are many CIFilters built into the system, and it is straightforward to write your own, too. Set the CIFilter's inputImage to the CIImage we just created.
And the processed video result will be available in the filter's outputImage. Chain as many CIFilters together as are required to achieve your desired effect. Then use a CIRenderDestination to render the resulting image to your application's view code.
Please refer to the WWDC 2020 talk "Optimize the Core Image pipeline for your video app" to learn more about this workflow. Another opportunity, is to process and render the fresh CVPixelBuffer using Metal and custom Metal shaders. We will briefly describe the process of converting the CVPixelBuffer to a Metal texture. However, implementing this conversion maintaining best performance is a deep topic best left for another talk. We instead recommend deriving the Metal texture from the CoreVideo Metal texture cache, and will walk through that process as the final example in this talk. Generally speaking, the process is to get the IOSurface from the CVPixelBuffer, create a MetalTextureDescriptor, and then create a MetalTexture from the MetalDevice, using `newTextureWithDescriptor`.
However, there is a danger that the textures may be re-used, and over-drawn, if careful locking is not applied. Further, not all PixelBuffer formats are natively supported by MetalTexture, which is why we use half float in this example. Because of these complications, we instead recommend directly accessing Metal textures from Core Video, as we will now demonstrate. Let's further explore Core Video and Metal. As mentioned, CVMetalTextureCache is both a straightforward and efficient way to use CVPixelBuffers with Metal. CVMetalTextureCache is handy because you get a Metal texture directly from the cache without need for further conversion. CVMetalTextureCache automatically bridges between CVPixelBuffer's, and MetalTexture's, thereby both simplifying your code and keeping you on the fast path. In conjunction with CVPixelBufferPools, CVMetalTextureCache also provides performance benefits, by keeping MTLTexture to IOSurface mapping alive.
Finally, using CVMetalTextureCache removes the need to manually track IOSurfaces. Now the final example in our talk: how to extract Metal textures directly from Core Video using CVMetalTextureCache.
Here, we start by getting the system default Metal device. We use that to create a Metal Texture Cache, and then instantiate a Core Video Metal Texture Cache associated with the Metal Texture Cache. That can then be used to access decoded video frames as Metal Textures, which conveniently, can be directly used in our Metal engine. In this example, we create and use the Metal system default device. Next we create the CVMetalTextureCache with CVMetalTextureCacheCreate, specifying the Metal device we just created. We get the height and width of the CVPixelBuffer needed to create the Core Video Metal texture. Then we call `CVMetalTextureCacheCreateTextureFromImage`, to instantiate a CVMetalTexture object and associate that with the CVPixelBuffer. Finally we call `CVMetalTextureGetTexture`, to get the desired Metal texture. Swift applications should use a strong reference for CVMetalTexture, however, when using Objective-C, you must ensure that Metal is done with your texture before you release the CVMetalTextureRef. This may be accomplished using metal command buffer completion handlers.
And that's all, folks! To review, we explored a number of workflows that will render your HDR video media to EDR, for playback, editing, or image processing.
You learned how to go from AVPlayer to AVKit's AVPlayerViewController, for playback of HDR media. You also learned how use AVPlayer, along with AVPlayerLayer, to display HDR media on your own view. And finally, we explored how to add real time effects during playback. Connecting AVFoundation's AVPlayer to CoreVideo and then to Metal for rendering. And applying real time effects using CoreImage filters, as well as Metal shaders.
If you want to dig deeper, I recommend a few WWDC sessions related to creating video workflows, as well as integrating HDR media with EDR. I especially want to call out the session "Edit and play back HDR video with AVFoundation". This session explores use of AVVideoComposition with `applyingCIFiltersWithHandler` for applying effects to your HDR media. In this session you'll also learn how to use custom compositor, which can then be used with a CVPixelBuffer, when each video frame becomes available for processing. As I mentioned at the beginning, this year we're also presenting two other sessions on EDR: "EDR on iOS," where we announced EDR API support has expanded to include iOS, and "HDR content display with EDR using CoreImage, Metal and SwiftUI," where we further explore integrating EDR with other media frameworks. Hope you incorporate HDR video into your EDR enabled applications on both macOS and now iOS. Thanks for watching.
Looking for something specific? Enter a topic above and jump straight to the good stuff.
An error occurred when submitting your query. Please check your Internet connection and try again.