Metal 2 on A11 - Imageblock Sample Coverage Control
Imageblock sample coverage control provides access to multisample tracking data within a tile shader, enabling development of custom MSAA resolve algorithms and more. Understand how the A11 GPU tracks unique samples, then explore an example that optimizes rendering of dense geometry through surface aggregation.
Building upon Imageblocks and tile shading,
Imageblock Sample Coverage Control gives you
new opportunities to optimize your multisampled render passes
and is designed for the enhanced multisampling hardware
in the A11's GPU.
Imageblock Sample Coverage
is the third in a series of presentations
focused on new Metal 2 features on A11.
Before we dive into the enhanced multisampling features
in Metal 2,
let's have a quick refresher on Multisample Antialiasing.
Multisample Antialiasing, or MSAA,
is a technique used to improve the appearance
of primitive edges by representing each pixel
with multiple depth and color samples.
First let's take a look at how a GPU renders a triangle
Here's a four by four grid of pixels
representing your color attachment.
The GPU's rasterizer samples the center of a pixel
to determine if it is covered by a primitive.
Let's bring in a triangle.
All the pixel centers covered by the primitive
are colored red.
If each pixel only has a single sample position,
a pixel is classified as covered or not covered
based only on the pixel center.
You can see in the resulting image
the classic symptom of edge aliasing:
the staircase artifacts also known as jaggies.
Let's see the same triangle
rendered with multisample antialiasing.
For this example, each pixel
has four evenly distributed sampling positions.
With four samples per pixel,
the GPU's rasterizer can determine finer grained coverage
for the primitive.
When a multisampled attachment is resolved,
the GPU will average the values of the samples
to determine the final color of each pixel.
This results in a smoother edge
and reduces the appearance of the staircase effect.
In traditional multisampling implementations,
the tradeoffs for the improved edge appearance
are the computational overhead of blending each sample,
the larger memory footprint
of multisampled attachments textures,
and the higher memory bandwidth
to store and resolve multiple samples per pixel.
Apple's A-Series GPUs
have a very efficient MSAA implementation
that directly addresses these tradeoffs.
The hardware tracks
whether each pixel contains a primitive edge
so that your blending executes per sample
only for the pixels that have difference sample values.
With Metal on A-series GPUs, you can eliminate
the extra memory storage requirements
by using memoryless render targets
for the multisampled attachments.
The full sample data
only exists temporarily in tile memory.
By using Metal's multisample resolve store action,
you avoid incurring any additional
system memory bandwidth
by directly resolving from tile memory
to the resolve attachment.
In addition, Metal 2 introduced Programmable Sample Positions
to allow you to choose the sample locations
and control your sampling pattern.
With Metal 2 on A11,
we made multisampling even more efficient at blending.
While current A-series GPUs
already track whether edges intersect each pixel,
the A11 GPU extends this tracking
to an even finer granularity by tracking
the number of unique samples within each pixel.
Even without using any other Metal 2 features,
your existing multisampled applications
will blend more efficiently
with A11's GPU's enhanced multisampling hardware.
Leveraging the flexibility of other Metal 2 features
such as Imageblocks and tile shading,
Imageblock Sample Coverage Control gives you access
to each pixel's sample coverage tracking data
for even more control
of your multisampled render passes.
With Imageblock Sample Coverage Control,
your tile pipelines can resolve sample data
at any time in a render pass
using your own custom resolve algorithms.
To understand the additional flexibility
provided by Imageblock Sample Coverage Control,
you first need to understand how edge tracking works
in the A-Series GPUs.
Current A-series GPUs rasterize your scene in tiles,
and each tile contains metadata
that tracks whether a pixel contains a primitive edge.
In this image, the red pixels contains primitive edges,
and the white pixels do not.
The number of edge-containing pixels increases
relative to the density of your scene's geometry.
When a pixel contains a primitive edge,
that means it has multiple unique sample values.
For this reason, the blend equation executes per sample
for edge-containing pixels.
Pixels that do not contain an edge need to blend only once.
The A11 GPU tracks edges with a finer granularity.
Pixels that contain edges
often only have a few unique samples.
The A11 GPU's enhanced multisampling
tracks and blends only the unique samples.
In the Metal Shading Language,
we give a name to these unique samples:
So let's take a look at how the A11 GPU
improves multisampled blending performance.
For pixels that contain a primitive edge,
the A11 GPU will track
the number of unique colors in that pixel.
As primitives intersect or completely cover a pixel,
the number of colors contained in that pixel
will grow and shrink.
The A11 GPU tracks these transitions automatically.
In the diagram on the right,
the pixel contains four samples, but only two colors.
When the A11 GPU
needs to blend the next primitive with this pixel,
there are only two unique samples to blend.
And for advanced rendering algorithms
that use programmable blending, the savings will be significant.
So let's take a fragment
through a few of these color tracking transitions.
Initially, each fragment contains a single color,
representing all samples.
This would be your clear color when a render pass begins.
If a primitive edge cuts through a pixel,
the A11 GPU will create a new unique color
and assign the covered samples to the new color.
The two samples covered by this green triangle
are assigned to the new color one.
The samples that are not covered
are still assigned to the color zero.
Let's say the next primitive that intersects this pixel
is a red translucent triangle.
The red triangle covers three samples.
Current A-Series GPUs would blend each
of the three covered samples.
The A11 GPU, blends only twice
because two of the covered samples
share the same color index.
In this case, color one is a blend of green and red,
and the GPU creates a new color at index two
since that is a new, unique color.
The number of unique colors in a pixel can grow
when a primitive cuts through a pixel.
But there are times when the hardware will reduce
the number of unique colors.
Here, an opaque, non-blending triangle
entirely covers the pixel.
All four samples are completely replaced by blue,
so the A11 GPU will merge the three colors back to one,
since all the blue samples
can be represented by a single color again.
The enhanced mutlisampling hardware in the A11 GPU
is so powerful that we extended
the Metal shading language
to give you explicit control over sample coverage
with Imageblock Sample Coverage Control.
With this new feature,
tile pipelines have the capability
to resolve sample data in place in the middle of a render pass
by changing the color coverage of the pixel.
And since you write the kernel in Metal Shading Language,
that means you can write your own custom resolve filters.
Let's go through a simple example.
First, we have a kernel
that has an imageblock argument.
Next, we query the number of colors
at a given coordinate in the imageblock.
For a render pass with four samples per pixel,
the value returned can be one, two, three, or four,
depending on how many unique colors are at that pixel.
A multisampled imageblock can return an imageblock data
for each sample or color.
In this example, we will get the imageblock data for color c.
Since this example is looping over
the number of unique colors,
we also need to consider the number of samples
that are covered by each color.
We do this by getting the coverage mask
for this color index
and calling pop count
to get the number of set bits in the mask.
Next, we finish resolving our color
by dividing by the number of samples per pixel
and writing the resolved value back to the imageblock
with a full sample mask.
By writing a single value with a full sample mask,
the A11 GPU will merge all the sample data
back to a single color.
Now this is an example of a basic resolve,
but since it is a tile pipeline,
you can write a kernel to resolve your sample data
in a way that best fits your application.
So you just saw an example of writing a custom resolve filter.
Let's discuss another reason to use tile shading
to resolve sample data.
Now, some applications render complex scenes
with lots of opaque geometry and lots of translucent geometry
While the A11 GPU will do its best
to blend only the unique colors for each pixel,
if you know that your scene has a lot of blended geometry,
with large amount of overdraw,
you may want to resolve your sample data
with a tile pipeline before the heavy blending phase.
With Imageblock Sample Coverage Control,
you can resolve the sample data with a tile pipeline
after rendering your opaque geometry
to ensure that all pixels contain a single unique color
prior to blending.
Let's look at a more advanced example
of using a tile pipeline
to change the coverage of the sample data.
Since the tile pipeline can be implemented
with the compute function,
you can do much more than simply average values together.
Our Surface Aggregation sample app starts
with a multisampled single-pass deferred shading algorithm
and uses a tile-based kernel dispatch
to reduce the number of shaded samples
in the deferred pass.
The goal of this algorithm is to shade fewer samples
in the expensive deferred pass
while retaining the edge-smoothing benefits
of multisample antialiasing.
We won't dive into all the details of the algorithm,
so be sure to download and explore
our Surface Aggregation sample app.
But now let's visualize how this technique
reduces the cost of shading.
The two images visualize the pixels
containing more than one sample per pixel in the g-buffer.
The image on the left shows the g-buffer
before merging surfaces,
and the image on the right
shows the g-buffer after merging surfaces.
The surface aggregation kernel is able to reduce
the number of g-buffer samples
that need to be shaded.
As you can see on the right image,
the only pixels containing multiple unique samples
are on true creases and depth boundaries.
Before Metal 2 on the A11 GPU,
this algorithm would require
a separate render pass for each phase of the algorithm,
incurring multiple round trips to system memory.
But with Imageblock Sample Coverage Control,
all three phases of the algorithm
can be merged into one render pass,
saving your app a ton of memory bandwidth.
As you can see in the diagram,
all three phases operate on the imageblock
keeping all the working data inside tile memory.
First, you will render the g-buffer
to the imageblock in tile memory.
And next, you will dispatch
the surface aggregation tile pipeline
to reduce the number of g-buffer samples
into fewer aggregate g-buffer samples.
Finally, the deferred shading pass
will only shade each aggregate sample.
If you're interested in learning more about this technique,
please visit the link at the end of this presentation
to download the sample app.
To recap, we first talked about the hardware enhancements
made to multisampling in the A11 GPU.
The A11 GPU tracks the number of unique samples
in every pixel to reduce the cost of blending.
This optimization applies to both API blending
as well as programmable blending.
We then discussed the enhancements
in Metal 2 for the A11 GPU
that expose this powerful hardware feature
to kernels used in tile shading.
With Imageblock Sample Coverage Control,
you can write your own custom resolve kernels
and dispatch them at any time in a render pass
to implement powerful new optimizations.
Together, the enhanced multisampling in the A11 GPU
and the new shading language features in Metal 2
enable new techniques to keep your data on-chip longer.
You can use this feature
to implement algorithms like Surface Aggregation
in a single render pass.
For more information about Metal 2
and links to the sample code,
please visit the developer website
Thank you for watching.
Looking for something specific? Enter a topic above and jump straight to the good stuff.
An error occurred when submitting your query. Please check your Internet connection and try again.