Building on the fundamentals, dive into the specifics of constructing games and graphics apps with Metal. Learn about scene management and understand how to manage and update Metal resources. Understand the rendering loop, command encoding, and multi-thread synchronization.
I hope you're all having a good time so far
and you've had some nice sessions you've seen.
We've got a great week for you guys.
It's going to be really fun.
I'm Matt Collins.
This is my colleague Jared Marsau and we're here
to talk Adopting Metal, Part 2.
This is Section 603.
So if you're in the wrong place, you get to see some graphics.
So let's recap.
We have two Adopting Metal Sessions.
Hopefully you were here
for Warren's presentation a little bit ago, where we talked
about the fundamental concepts: Basic drawing, lighting,
texturing, good stuff like that.
And in this presentation we're going
to take it to the next level.
We're going to draw many objects.
We're going to talk about managing dynamic data,
large amounts of dynamic data, GPU-CPU synchronization,
and we'll cap it off with some multithreaded encoding.
Tomorrow we've got some great presentations.
We'll talk about what's new in Metal.
We'll have the first session, tessellation, resource heaps,
memoryless frame buffers, and some stuff
about our improved tools
to really help you guys get the best out of your apps.
Part 2, we'll talk about function specialization
and function resource read-writes, wide color
and texture assets, and additions
to Metal performance shaders.
And if you really want to dig in heavy,
we'll have an awesome talk about advanced shader optimization,
shader performance fundamentals, tuning shader code,
more detailed about how the hardware works.
It'll be great.
So if you're really interested in tuning your shaders
to make them to best they can be, check out that tomorrow.
So this is Part 2 of Adopting Metal and we're going to build
on what we learned in Part 1.
We figured out how to get up and running.
So let's take a look at the concepts that you need
to get the most out of Metal in a real-world situation.
We've got a demo that will draw a ton of stuff in a simple scene
and we'll use that demo for context during today's session
as we discuss and learn a couple lessons from it.
We'll talk about the ideal organization flow of your data,
how to manage large chunks of dynamic data, the importance
of synchronization between the CPU and the GPU, and,
like I said before, some multithreaded encoding.
So hopefully you're familiar with the fundamentals of Metal
because we won't be going over them again.
So we expect that you understand how to create a Metal queue,
a Metal command buffer, how to encode commands, and we'll build
on that to go forward.
So let's start with the demo itself
and see what we're aiming towards.
So right now we've got 10,000 cubes
and they're all spinning around, loading in space.
It's an interesting scene.
Metal allows us to issue a ton of draw calls
with very low overhead.
So here we have 10,000 cubes and 10,000 draw calls.
You can see on the bottom there's a little shadow.
We're using a shadow map, playing on the bottom,
some nice anti-aliased lines give you some depth cues,
and of course all of our cubes.
So what goes into rendering a scene like this?
As you can see, we've got a lot of objects and each
of these objects has its own associated piece of unique data.
We need the position, rotation, and color.
And this has to update every frame
because we're animating them.
So this is a bunch of data that we're constantly changing,
constantly have to reinform the GPU what we're drawing.
We can also draw a few more objects, maybe a little more.
You can spin it around a little bit and see
that we're actually floating in space.
So we have a draw call for cube and a bunch of data for cube
and we have to think about the best way to think
about this data, how to manage it,
and how to communicate it to the GPU.
So let's dive right in.
Managing Dynamic Data: This is a huge chunk
of data that's changing every frame.
And as you can imagine in a modern app like a game,
you also have a bunch of data
that every frame needs to be updated.
So our draw basically looks like this.
We want to go through all the objects we're interested
in drawing and update them.
Then we want to encode draw calls for every object
and then we have to submit all these GPU commands.
We have a lot of objects.
We started at 10,000 and we were cranking it
up to up to 100, 200,000.
Each of these objects has its own set of data and we have
to figure out the best way to update this.
Now in the past, you might've done something like this.
You push updated data to the GPU, maybe uniforms
or something, you bind a shader, some buffers,
some textures, and you draw.
And you push some more data up.
You bind shader, buffers, textures.
You draw your next object.
In our scene we repeat this 10,000, 20,000 times,
but we really want to get away from this sort of paradigm
and try something new.
What if we could just load all our data upfront
and have every command that we issue reference the data
that was already there.
The GPU is a massively powerful processer
and it does not like to wait.
So if all our data in already in place, we can just point the GPU
to it and it will go happily crunch away
and do all our rendering for us.
And each draw call we make then references the appropriate data
that's already there.
In our sample, it's very straightforward.
We have one draw that references one chunk of data.
So the first draw call references the first chunk
of data, the second, the second chunk, and so on.
But it doesn't have to be that way
and we can actually reuse data.
We have some data, like at the front here, frame data,
that we can reference from all our draw calls
or we could have a draw call that references two pieces
of data in different places.
If you're familiar with instancing,
it's a very similar idea.
All your data will be in place before you start rendering.
So how do we do this in Metal?
In our application, we create one single Metal buffer
and this is our constant buffer.
It holds all the data that we need to render our frame.
We want to create this upfront, outside of the rendering loop,
and reuse it every time we draw.
We don't duplicate any data.
Again, any draw call can reference any piece of data,
so there's no need for duplication.
Each draw call will reference an offset into the buffer.
It'll do a little bit of tracking to know
which draw represents which offset.
And then you'll just draw with everything
and everything will be in place.
Let's take a look at the code for this.
Here's the code from the app.
You can think of us as having two sets of data.
Like I mentioned before, there's a set of frame data
that will update here and there's a set of data
that will change per object.
This is the unique rotation position, et cetera.
So we need to put both sets of data in place.
Now what do I mean by per-frame data?
Well this is data that is consistent
across every draw call we make.
For example, in our sample we have a ViewProjection matrix.
It's a 4 by 4 matrix, very straightforward,
if you're familiar with graphics.
It represents the camera transform and the projection.
This is not going to change throughout our frame,
so we only need one copy of it.
And we'd like to reuse data as much as we can
so we can create one copy and put it into our buffer.
Let's start filling this out.
So here, we have our constant buffer,
which is just a Metal buffer we've created.
And with the Contents function, we have a pointer to it.
Our app has a helper function, which is GetFrameData,
and this returns that main pass structure I just showed you
that has the view transform in it,
the ViewProjection transform.
And then we simply just copy this into the start
of our buffer and then we're in place.
So our buffer will look like this.
We'll have a MainPass with the appropriate data for our frame
and we'll put it at the start of our giant constant buffer.
So now we have all this empty space afterwards.
And like we saw, we need to do 10,000, 20,000 draw calls,
so we need to start filling this out with a ton of information.
So then we have a set of per-object data
and this is the unique data we need to draw a single object.
In our case, we have a single LocalToWorld transform,
which is the concatenation of the position and the rotation
and we have the color.
So this is the set of data we need per draw call.
So we'll walk through every object we want to render.
We'll keep track of the offset into the buffer.
We have our updateData utility function,
which will do our little update for our rotation,
and then we'll update the offset.
This will pack our data tightly
and we'll fill it out as we go through.
Let's take a closer look at what updateData looks like.
It's quite simple.
Now, animation is kind of out of the scope of this talk,
so I have a little helper function here that's
updateAnimation with a deltaTime.
This could be whatever you want in your own application
and indeed you should but depending on what sort
of animation you need.
But it my case it returns an objectData object
which has the LocalToWorld transform and the color.
And just as I did before, I copy it into my constant buffer.
So here's what that looks like.
I've got my frame data in place.
I have my other data, another piece, and another piece.
So all our data is in place and we're ready for rendering.
But are we missing anything?
Turns out that we are and I want to bring your attention to this.
We have one constant buffer.
I mentioned I created one Metal buffer and I was reusing it.
Now there's a problem with this.
The CPU and the GPU are actually two unique parallel processors.
They can read and write the same memory at the same time.
So what happens when you have something reading to a piece
of memory while something else is writing to it?
So it looks a little like this.
The CPU prepares a frame and writes it to a buffer.
The GPU starts working on this and reads from the buffer.
The CPU doesn't know anything about this,
so it decides I'm going to prepare the next frame
and it starts overwriting the same data.
And now our results are undefined.
We don't actually know what we're reading to, reading from,
or writing to or what the data state will be.
So it's important to realize in Metal,
this is not handled for you implicitly.
The CPU and GPU can write the same data
at the same time however they'd like.
You must synchronize access yourself.
It's just like writing CPU code that's multithreaded.
You have to ensure you're not stomping yourself.
And that brings us to CPU-GPU synchronization.
Let's start simple.
The easiest way to do this would to just be to wait
after you've submitted commands to the GPU.
Your CPU draw function does all of its work,
submits the commands, and then just sits there
until it's ensured the GPU is done working.
That way we know we won't ever override it
because the GPU will be idle by the time we try
to generate our next frame.
This won't be fast but it's safe.
So we need some sort of mechanism for the GPU
to let us know, hey, I'm done with this, go do your thing.
Metal provides this in the form of callbacks.
We call them handlers and there are two of them
that are interesting, addScheduledHandler
and that executes when a command buffer has been scheduled
to run on the GPU.
And for us, an even more interesting one is the
completion handler and this is called
when the GPU has finished executing a command buffer.
The command buffer is completely retired and we're ensured
at this point it's safe to modify whatever resources
that we were using there.
So this is perfect.
We just need some way to signal ourselves that, hey,
we're done, we can go forward.
Now how many of you are familiar with the concept of a semaphore?
Anyone? Pretty good.
Quick background on semaphores.
They are synchronization primitive and they're used
to control access to a limited resource
and that fits us perfectly here.
We have one constant buffer and that's a limited resource,
so we'll have a semaphore
and we'll create it with a value of 1.
The count on a semaphore represents how many resources
we're trying to protect.
So we'll create our semaphore.
And again, this is something
that should be created outside of your render loop.
And the first thing we do once we start
to draw is we wait on the semaphore.
Now in Apple semaphore, we call it waiting.
Some people call this taking.
Some people call it downing.
It doesn't really matter.
The idea is that you wait on it and our timeout we set
to distant future,
which effectively means we'll wait forever.
Our thread will go to sleep if there's nothing available
and wait for something to do.
When we're done,
in our completion handler we will signal the semaphore.
That'll tell us that it's safe to modify the resources again.
We're completely done with it and we can go forward.
So this is sort of a naive approach to synchronization
but it looks a little like this.
Frame 0 we'll write into the buffer.
And on the GPU, we'll read from the buffer.
The CPU will wait.
When the GPU is done processing Fame 0,
it will send the completion handler and frame 1 will work
and create another frame on the CPU.
And that will process on the GPU and so on.
So this works but, as you can see here,
we have all these waits and both the CPU
and GPU are actually idle half the time.
It doesn't seem like a good use of our computing resources.
What we'd like to do is overlap the CPU and the GPU work.
That way we can actually leverage the parallelism that's
inherent in this system, but we still need
to somehow avoid stomping our data.
So we'd like our ideal workload to look like this.
Frame 0 would be prepared on the CPU, pushed to the GPU.
While the GPU is processing it, the CPU then gets
to work creating frame 1 and so on, and again.
So one thing to keep in mind here is
that the CPU is actually getting a little ahead of the GPU.
If you notice where frame 2 is on the CPU,
frame 0 is the only thing that's done on the GPU.
So we're a little bit ahead and I want you to keep
that in mind for a little later.
But first let's talk about our solution
in the demo and what we do here.
We'd like to overlap our CPU and GPU but we know we can't do it
with one constant buffer without waiting a lot.
So our solution is to create a pool of buffers.
So when we create a frame, we write into one buffer
and then our CPU proceeds
to create the next frame while writing into another buffer.
While it's doing this, the GPU is free to read from the buffer
that was produced before.
Now we don't have an infinite number of buffers
because we don't have infinite memory.
So our pool has to have a limit.
On our application, we've chosen three.
This is something that you need to decide for yourself.
We can't tell you what to do because there are a lot
of things that go into the latency consideration,
how much memory you want to use.
So we recommend you experiment with your app what fits for you.
For this example, we've chosen three.
So here, you can see we've exhausted our pool.
We have three frames that have been prepared
but only one is finished on the GPU.
So we need to wait a little bit.
But by now, frame 0 is done, so we can reuse the buffer
from the pool and so on.
So let's look at this in code.
Here's synchronizing access to constant buffers.
We've already got a semaphore and they're great
for controlling access to limited resources.
In this case our limit is three
but it can be whatever you'd like.
So here we create our semaphore with our count.
And instead of creating one constant buffer,
we now create an array of them.
And lastly, we need an index and we'll use this index
to represent the currently available constant buffer
for us to use.
We can walk through the array and wrap around
and the semaphore will control our access and protect us.
So in our draw function, we'll immediately wait
on the semaphore, and if there's nothing available,
we'll go to sleep.
Once we've taken the semaphore and proceeded, we know it's safe
for us to grab the current constant buffer.
In our index, current constant buffer is tracking
which one's available.
Then we fill out our frame as normal, encode all our commands,
do all our updates, add the completion handler,
and then we'll signal the semaphore, saying, hey,
we're done with this frame.
You can go forward.
And the last thing we need to do is update the index.
We'll add one.
We'll use modulo to wrap around.
And don't worry, we don't have to worry
about overwriting ourselves
because the semaphore will protect us.
So constant buffers in the demo.
The demo has an array of three buffers
and I've seen some applications track buffers
by marking them as, oh, this is being read
from in frame number 7, this is written
to you in frame number 5.
But with this model you don't actually have to do that.
The semaphore takes care of all the synchronization for you.
And if you can take the semaphore, you're guaranteed
that the last frame that was using that was done,
otherwise you'd still be asleep.
So now all our data is in place and it's protected.
And we'd like to start issuing a bunch of draw calls
to get some stuff on the screen.
So here's the basic rendering loop for our demo.
We have two passes: One pass that draws a shadow map
and one pass that reads the shadow map, and we've decided
to split these into two separate command buffers.
There's a good reason for this.
It lets us have two encoding functions
that are independent and unique.
They don't depend on each other.
You encode the shadow pass.
You pass that to command buffer and the constant buffer
that you've already filled out and it encodes all the commands
to render the shadow map.
And then you have a separate encoding function
that encodes the main pass.
You pass it to mainCommandBuffer and the other data you need
and it encodes all those other commands.
When the encoding is all done, you call commit
on your two command buffers, push them off,
and then you've got your frame.
So what goes into actually encoding drawing one
of our cubes?
We need a bunch of data and not just the rotation data.
We need some geometric data for the cubes,
which is quite simple, you know, think about a cube is what,
eight vertices, maybe an index buffer.
And in our sample, we don't really have complex materials
or anything, just some very simple Lambert shading.
So we could reuse that pipeline state object
across all of our cubes.
We mentioned the per-frame data earlier.
We need one copy of that.
So we'll update it.
Stick it in place.
And then of course we need the per-object data,
that LocalToWorld and the color information
that we're animating.
So when we issue our draw calls,
we want to make sure we reference the correct data.
So our encoder will produce commands,
put them into our command buffer,
draw call 0 will reference both the frame data and the object
that we're interested in.
Draw call 1, similarly, will reference the frame data
and the object 1 data and so on.
This way everything's in place.
We issue our calls and the GPU will start crunching away.
Now we have a ton of draw calls to issue.
You know, in our demo, it was minimal, 10,000,
and we want to issue these as efficiently as possible.
So we'd like to avoid doing redundant work.
We don't want to reset everything every draw.
Anything that's shared, geometry, pipeline states,
we'd like to set that once and leave that in place.
So avoid redundant state updates
and avoid redundant argument table updates.
It's also worth keeping in mind that the vertex
and fragment stage argument tables are completely separate.
You can bind a buffer to the vertex stage and not
to the fragment stage or vice-versa.
But if you have to bind everything to both stages,
this can potentially double the calls you call the
This is one reason we didn't use set vertex bytes in our example.
You can imagine we have 50,000 objects and we had
to make a copy of all that data twice, once for the vertex stage
and once for the fragment stage.
That would quickly get really big.
But if we kept it all in one buffer and just referenced it,
we wouldn't have to worry about that.
And the last guideline I want to point
out is using a new function,
This merely changes the pointer into one of your buffers.
So you can see here when you call these,
they actually don't take a reference to a Metal buffer.
They take an offset and an index.
This is because you must have already set the buffer
to that specific point and this just changes the pointer
within it and that's perfect for what we want.
We have one constant buffer and we're just walking through it.
So we can set it once in the beginning
and then every time we draw, we call setVertexBufferOffset
and just point the next draw call
to the current spot in our buffer.
It looks a little something like this.
We bind this constant buffer
and then we call setVertexBufferOffset
with this offset.
Then we call it again striding it forward
and again striding it forward.
We're not changing the buffer that we've set to this index.
We're just changing the offset within that buffer.
With these guidelines in mind,
our encoding is actually pretty simple.
We have a bunch of data we can set up front.
The per-frame constants is pretty obvious
because we know we're not going to change it.
So we'll set that.
We'll set the constant buffer once because we know it has
to be in place for us to use the setVertexBufferOffset function.
We'll set the geometry buffer and the pipeline state
because we know they're shared across all of our cubes.
Then finally we can start looping
through all the objects we want to draw.
We'll set the offset into the constant buffer
for our current draw.
And then we'll actually issue the draw.
And here's the code from the encode main pass function
in the sample.
We'll start off by setting the vertex buffer
that is our geometry and the render pipeline state,
which is our litShadowedPipeline.
We'll set the constant buffer
so we can use setVertexBufferOffset later.
In this case we're setting it to both the vertex
and the fragment stages.
And then we'll set the per-frame data.
Now you'll notice here that I've set the constant buffer
to two separate indices with different offsets.
And Metal allows you to do this as much as you want.
You could set the same constant buffer to every index
at a different offset if you'd like, completely up to you.
And then we dive right into our loop.
We need to track the offset because we know
that we're not starting right at the beginning
of our constant buffer.
There's some frame data in there.
So the offset will be pushed back past the frame data.
Then we'll call setVertexBufferOffset
and setFragmentBufferOffset to point this draw
to the correct data that we want to draw with.
We'll issue the draw call
and then we'll set the offset again just striding one object
data struct at a time.
So our draws are in place.
This is still very linear.
And I promised you some multithreading
and Warren mentioned that, hey, you can actually encode a bunch
of stuff in parallel in Metal.
So how would you do this?
An ideal frame might look like this.
Our render threat is chugging along and it realizes, hey,
I need to render a shadow map and I need
to render a main pass.
It'd be great if I could code this in parallel.
I've got multiple CPUs.
So what if I dispatch this work out, encoded some stuff,
then I rejoin back to the render thread
and the render thread pushed this over to the GPU
to do a bunch of work.
This would look great.
How many of you have used GCD?
This is a great fit for Grand Central Dispatch.
If you're not familiar,
Grand Central Dispatch is Apple's multiprocessing API.
This is an API that lets you create queues
and these queues manage computing resources
on your machine.
There are two types of queues you can create.
There's a serial queue.
When you dispatch work through a serial queue, you're guaranteed
that all that work will happen in order.
But what's more interesting for us is the concurrent queue.
When you dispatch work to the concurrent queue, GCD will look
at your system and figure out the best way
to schedule this for you.
And that's perfect.
We have two jobs we need to do in parallel.
So if we created this one queue and just pushed the work to it,
it would do that for us.
This is another object you want to create once and reuse.
So here's some code to create a concurrent dispatch queue.
You should always a label on your queues.
I've used the very creative label queue here
but you might want to call it something else.
So we made some modifications to the code.
We still create the command buffers at the start.
But since we were smart enough to use two command buffers
and separate our encoding functions
into two unique things, there isn't much else for us
to do other than dispatch the work.
So dispatchQueue.async is the main call you use
to dispatch work to a queue in GCD.
This is an asynchronous call.
It'll push the work on and your thread will keep going.
So here we dispatch the shadow pass
and then we dispatch the main pass.
We'll want to commit this work somehow
so we call dispatch barrier sync and this makes sure
that all the work is done by the time we get to this point.
And then finally we've rejoined and we can commit our work.
Now the ordering is important here.
The shadow map has to be done by the time we reference it.
So we have to commit the shadow command buffer first
and then the main command buffer later.
There's something else I want to bring up here.
How many of you are familiar with the concept of a closure?
Great. How many of you have ever had an issue
where closures captures self
and you thought you were referencing something else?
You can be honest.
It's happened to all of us.
I just wanted to call this out.
Closures capture self.
So if you're referencing a member variable or an iVar
within them and you're not explicitly saying self.iVar,
it's still actually going to reference that variable.
So if you want to make sure you're going
to reference the correct data, it's a good idea
to capture it outside and I'll show you what I mean
in a second.
These two things don't do the same thing.
So in the first one where I encode the shadow pass,
you can see the constant buffer I'm grabbing is dependent
I don't actually know what that will be at the time it executes.
This is really asynchronous programming.
So by the time my dispatch is actually running,
this could've changed behind my back.
It may be right but it may not be.
I can't guarantee it.
So keep that in mind and don't do it that way.
Instead, we'd like to capture a reference
to the constant buffer we're interested in.
So here we just say let constant buffer
and grab it out of the array.
But then when we issue our dispatch,
we reference the specific one that we've already grabbed.
That makes sure we know exactly what data we're reading from.
So this is some multithreading fun.
The actual code in the sample looks like this.
We capture the constant buffer.
And when we use it, we make sure we're using the correct one,
the one that we've captured already,
to know that we're using this frame's constant buffer.
Now I had mentioned the ordering earlier
and how this was important.
When you create a command buffer and you commit it, the ordering
that this executes on your GPU is implied
by the order you commit it in.
So if I commit the shadow command buffer first
and the main command buffer second, I'm guaranteed
that the shadow one will happen first on the GPU followed
by the main command buffer.
Sometimes we refer to this
as implicit command buffer ordering.
But you can be a little more explicit about it.
Metal provides an enqueue function
that enforces command buffer ordering.
If you have a set of command buffers, you can enqueue them
and you're guaranteed that they will execute
in that order regardless of how you commit them
or when you commit them.
This is something really cool because it allows you
to commit command buffers from multiple threads, in any order,
and you don't have to worry about it.
The runtime will ensure you're executing in the correct order.
So let's see how to apply this to our code.
A couple new additions here.
Now when we create our command buffers,
we immediately enqueue them in the order.
Again, the order matters, so we still have
to enqueue shadowCommandBuffer first
and then mainCommandBuffer second.
But now when we dispatch, we can actually commit
from within our other thread.
Again, the runtime is going to ensure the ordering.
So we don't actually have to worry about it.
This actually lets us remove that barrier we had before
because we have no need to rejoin
and commit the command buffers.
They're already committed for us.
But I seem to have skipped over all
that synchronization stuff I talked about a second ago
and we still need it because we're still going
to be overriding ourselves if we don't have it.
So can we apply these same synchronization lessons
to this sort of multithreaded world?
It turns out we can and it's actually quite straightforward.
We bring back our friendly semaphore
and our array of constant buffers.
And again, don't forget to grab the correct one that you want.
At the start, we'll wait on the semaphore and sleep
if nothing's available.
We've enforced our ordering with enqueue and we push it through.
Now we know
that mainCommandBuffer is the final command buffer
in our frame.
And we know that we want to signal that our frame is done.
So we should add our completion handler to the mainCommandBuffer
and you could do this from within the dispatch.
So the mainCommandBuffer is the final command buffer.
We add the completion handler to it, to signal our semaphore,
and we commit it from within the dispatch,
just like we did before.
Now you may notice here that I'm referencing self.semaphore
and a second ago I just told you to watch out for that.
So what's going on?
Well it turns out a semaphore is a synchronization primitive
and we do actually want to be looking at the same one
as all of our other threads.
So we want the value of the semaphore
at the time the thread is executing.
So in this case, we actually want self.semaphore,
something to keep aware of.
And here's the recipe for our rendering.
At the start of our render function,
we wait on the semaphore.
We select the current constant buffer.
We write the data into our constant buffer
that represents all of our objects.
We encode the commands into command buffers.
We can do the single-threaded,
multithreaded, however you'd like.
We add a completion handler onto our final command buffer
and we use it to signal the semaphore to let us know
when we're done and we commit our command buffers.
And the GPU takes all this
and starts chugging away at our frame.
So let's look at the demo again and see what this got us.
So here you can see in the top left,
this is single-threaded encode mode
and you can see how many draws we're issuing, 10,000.
And the top right, you can see the time it takes us
to encode a frame.
So here we've got 5 milliseconds and we can crank the number
of draws up and see that it starts costing more
and more as we draw things.
Now this is single-threaded mode.
And when you think about it, we're drawing a shadow map,
which means we have to issue 40,000 draws in the shadow map,
and then we're drawing the main pass, which means we have
to issue another 40,000 draws to reference that.
But again, we can do this in parallel,
so we've added a parallel mode to this demo.
And you can see how it's faster to go through.
Now take a look at everything that's going on.
You can fly around a little bit.
So here we have 40,000 cubes, unique, independent.
They're all being updated.
We're using GCD to encode a bunch of stuff in parallel.
We have two command buffers: One to generate the shadow map
on the ground and one to render all of the cubes in color.
The lighting is quite simple, Lambert shadowing,
which is basically what Warren talked
about earlier, the N.L lighting.
And that's our demo.
This will be available as sample code
for you guys to take a look at.
Hopefully you can rip it apart, take some of the ideas
and the thoughts in it and apply them to your own code.
So what did we talk about today?
When you walked in here, hopefully you came
to Warren's session earlier and maybe you knew a little bit
about graphics or had done some programming before,
but we took you through everything in Metal.
The conceptual overview of Metal, the reasoning
around it is to use an API that is close to the hardware
and close to the driver.
We learned about the Metal device, which is the root object
in Metal that everything comes from.
We talked a bit about loading data into Metal
and the different resource types and how you use them,
the Metal shading language, which is the C++ variant you use
to write programs on the GPU.
We talked about building pipeline states,
prevalidated objects that contain your two functions,
vertex and fragment or a compute function,
and a bunch of other baked-in, prevalidated state
to save you time at runtime.
Then we went into issuing GPU commands,
creating a Metal queue, creating command buffers off that queue,
and creating encoders to fill the command buffer in,
and then issuing that work and sending it over to the GPU.
We walked you through animation and texturing
and using set vertex bytes to send small bits of data
to do your animation in.
Then when the small bits of data weren't enough,
we talked about managing large chunks of dynamic data
and using one big constant buffer and referencing it
in multiple places to get some data reuse out of the system.
We talked about CPU-GPU synchronization, the importance
of making sure your CPU and your GPU aren't overriding each other
and playing nicely.
And then lastly, we talked a little bit
about multithreaded encoding, how you can use GCD with Metal
to encode multiple command buffers
on your queues at the same time.
And that's adopting Metal.
Hopefully you enjoyed the talk and you can apply some of these
to your apps and make your apps even better
than they already are.
If you'd like some more information, you can check
out this website, developer.apple.com/wwdc/603.
We have a few more sessions tomorrow
that I recommend you go check out.
At 11:00 o'clock, we have What's New in Metal,
Part 1 and then a little later at 1:40,
we have What's New in Metal, Part 2.
That'll tell us everything that's new in the world
of Metal, awesome stuff you can add
to your applications to make them better.
And then for you hardcore shader heads out there,
we have Advanced Metal Shader Optimization at 3:00.
So if you want to know how to get the best
out of your shaders, I recommend you go check out that talk.
It's really great.
Thanks for coming to hear us talk.
Welcome to WWDC.
Have a good rest of the week.
Looking for something specific? Enter a topic above and jump straight to the good stuff.
An error occurred when submitting your query. Please check your Internet connection and try again.