Hear from the experts about how you can write faster Swift code and use Instruments to identify performance bottlenecks. Dive deep into specific techniques that will help you produce the most efficient code possible.
Good morning, and welcome to Optimizing Swift Performance.
My name is Nadav, and together with my colleagues, Michael
and Joe, I am going to show you how
to optimize your Swift programs.
Now, we, the engineers on the Compiler Team, are passionate
about making code run fast.
We believe that you can build amazing things
when your apps are highly optimized.
And if you feel the same way, then this talk is for you.
Today I'll start by telling you about some
of the new compiler optimizations
that we have added over the last year.
Later, Michael will describe the underlying implementation
of Swift and give you some advice
on writing high-performance Swift code.
And finally, Joe will demonstrate how
to use instruments to identify
and analyze performance bottlenecks in your Swift code.
So Swift is a flexible and safe programming language with lots
of great features, like closures and protocols and generics and,
of course, automatic reference counting.
Now, some of you may associate these features with slowness
because the program has to do more work
to implement these high-level features.
But Swift is a very fast programming language that's
compiled to highly optimized native code.
So how did we make Swift fast?
Well, we made Swift fast
by implementing compiler optimizations that target all
of these high-level features.
These compiler optimizations make sure that the overhead
of the high-level features is minimal.
Now, we have lots of compiler optimizations,
and we don't have enough time to go over all of them,
so I decided to bring you one example
of one compiler optimization.
This optimization is called bounds checks elimination.
On the screen, you can see a very simple loop.
This loop encrypts the content of the array
by X-raying all the elements in the array with the number 13.
It's not a very good encryption.
The reading and writing outside of the bounds
of the array is a serious bug
and can also have security implications,
and Swift is protecting you by adding a little bit of code
that checks that you don't read or write outside
of the bounds of the array.
Now, the problem is that this check slows your code down.
Another problem is that it blocks other optimizations.
For example, we cannot vectorize this code
with this check in place.
So we've implemented a compiler optimization
for hoisting this check outside of the loop, making the cost
of the check negligible, because instead of checking
on each iteration of the loop
that we are hitting inside the bounds of the array,
we are only checking once when we enter the array.
So this is a very powerful optimization
that makes numeric code run faster.
Okay. So this was one example of one optimization,
and we have lots of optimizations.
And we know that these optimizations work
and that they are very effective because we are tracking hundreds
of programs and benchmarks, and over the last year,
we noticed that these programs became significantly faster.
Every time we added a new optimization,
every time we made an improvement
to existing optimizations,
we noticed that these programs became faster.
Now, it's not going to be very interesting for you to see all
of these programs, so I decided to bring you five programs.
The programs that you see
on the screen behind me right now are programs
from multiple domains.
One is an object-oriented program.
Another one is numeric.
Another one is functional.
And I believe that these programs represent the kind
of code that users write today in Swift.
And as you can see, over the last year,
these programs became significantly faster,
between two to eight times faster, which is great.
Now, these programs are optimized in release mode.
But I know that you also care about the performance
of unoptimized programs because you are spending a lot
of time writing your code and debugging it and running it
in simulator, so you care
about the performance of unoptimized code.
So, these are the same five programs,
this time in debug mode.
They are unoptimized.
So you are probably asking yourself, wait,
how can improvements
to the optimizer improve the performance of unoptimized code.
Right? Well, we made unoptimized code run faster
by doing two things.
First of all, we improved the Swift runtime component.
The runtime is responsible for allocating memory,
accessing metadata, things like that.
So we optimized that.
And the second thing that we did is that now we are able
to optimize the Swift Standard Library better.
The Standard Library is the component
that has the implementation of array and dictionary and set.
So by optimizing the Standard Library better,
we are able to accelerate the performance
of unoptimized programs.
We know that over the last year, the performance
of both optimized
and unoptimized programs became significantly better.
But to get the full picture,
I want to show you a comparison to Objective-C.
So on the screen you can see two very well-known benchmarks.
It's Richards and DeltaBlue, both written
in object-oriented style.
And on these benchmarks,
Swift is a lot faster than Objective-C.
At this point in the talk, I am not going
to tell you why Swift is faster than Objective-C,
but I promise you that we will get back to this slide
and we will talk about why Swift is faster.
Okay. Now I am going to talk about something different.
I want to talk about a new compiler optimization mode
that's called "Whole Module Optimization"
that can make your programs run significantly faster.
But before I do that, I would like to talk
about the way Xcode compiles files.
So Xcode compiles your files individually.
And this is a good idea because it can compile many files
in parallel on multiple cores in your machine.
It can also recompile only files that need to be updated.
So that's good.
But the problem is that the optimizer is limited
to the scope of one file.
With Whole Module Optimization, the compiler is able
to optimize the entire module at once, which is great
because it can analyze everything
and make aggressive optimizations.
Now, naturally, Whole Module Optimization builds take longer.
But the generated binaries usually run faster.
In Swift 2, we made two major improvements
to Whole Module Optimizations.
So first, we added new optimizations that rely
on Whole Module Optimization mode.
So your programs are likely to run faster.
And second, we were able to parallelize some parts
of the compilation pipeline.
So compiling projects in Whole Module Optimization mode should
take less time.
On the screen behind me, you can see two programs
that became significantly faster with Whole Module Optimization
because the compiler was able to make better decisions,
it was able to analyze the entire module
and make more aggressive optimizations
with the information that it had.
In Xcode 7, we've made some changes
to the optimization level menu,
and now Whole Module Optimization is one
of the options that you can select.
And I encourage you to try Whole Module Optimization
on your programs.
At this point, I would like to invite Michael on stage
to tell you about the underlying implementation of Swift
and give you some advice
on writing high-performance Swift code.
MICHAEL GOTTESMAN: Thanks, Nadav.
Today I would like to speak to you
about three different aspects of the Swift programming language
and their performance characteristics.
For each I will give specific techniques that you can use
to improve the performance of your app today.
Let's begin by talking about reference counting.
In general, the compiler can eliminate most reference
counting overhead without any help.
But sometimes you may still find slowdowns in your code due
to reference counting overhead.
Today I'm going to present two techniques that you can use
to reduce or even eliminate this overhead.
Let's begin by looking at the basics of reference counting
by looking at how reference counting
and classes go together.
So here I have a block of code.
It consists of a class C, a function foo that takes
in an optional C, and a couple of variable definitions.
Let's walk through the code's execution line by line.
First we begin by allocating new instance of class C
and assign it to the variable X.
Notice how at the top of the class instance, there is a box
with the number 1 in it.
This represents the reference count of the class instance.
Of course, it's 1 because there's only one reference
to the class instance currently, namely x.
Then we assign x to the variable y.
This creates a new reference to the class instance,
causing us to increment the reference count
of the class instance, giving us a reference count of 2.
Then we pass off y to foo,
but we don't actually pass off y itself.
Instead, we create a temporary C, and then we assign y to C.
This then acts as a third reference to the class instance,
which then causes us to increment the reference count
of the class instance once more.
Then when foo exits, C is destroyed, which then causes us
to decrement the reference count of the class instance,
bringing us to a reference count of 2.
Then finally, we assign nil to y and nil to x,
bringing the reference count of our class instance to 0,
and then it's deallocated.
Notice how every time we made an assignment,
we had to perform a reference counting operation
to maintain the reference count of the class instance.
This is important since we always have
to maintain memory safety.
Now, for those of you who are familiar with Objective-C,
of course, nothing new is happening here with, of course,
increment and decrement being respectfully retained
But now I'd like to talk to you
about something that's perhaps a bit more exotic,
Namely, how structs interact with reference counting.
I'll begin -- let's begin this discussion by looking at a class
that doesn't contain any references.
Here I have a class, Point.
Of course, it doesn't contain any references,
but it does have two properties in it,
x and y, that are both floats.
If I store one of these points in an array,
because it's a class, of course,
I don't store it directly in the array.
Instead, I store reference to the points in the array.
So when I iterate over the array,
when I initialize the loop variable p,
I am actually creating a new reference to the class instance,
meaning that I have to perform a reference count increment.
Then, when p is destroyed at the end of the loop iteration,
I then have to decrement that reference count.
In Objective-C, one would oftentimes have
to make simple data structures, like Point,
a class so you could use data structures
from Foundation like NSRA.
Then whenever you manipulated the simple data structure,
you would have the overhead of having a class.
In Swift, we can use structs --
in Swift, we can work around this issue by using a struct
in this case instead of a class.
So let's make Point a struct.
Immediately, we can store each Point in the array directly,
since Swift arrays can store structs directly.
But more importantly, since a struct does not inherently
require reference counting and both properties
of the struct also don't require reference counting,
we can immediately eliminate all the reference counting overhead
from the loop.
Let's now consider a slightly more elaborate example of this
by considering a struct with a reference inside of it.
While a struct itself does not inherently require reference
counting modifications on assignment,
like I mentioned before, it does require such modifications
if the struct contains a reference.
This is because assigning a struct is equivalent
to assigning each one
of its properties independently of each other.
So consider that the struct Point that we saw previously,
it is copied efficiently,
there are no reference counting needed when we assign it.
But let's say that one day I'm working on my app
and I decide that, well, I would like to make each one
of my Points to be drawn a different color.
So I add a UIColor property to my struct.
Of course, UIColor being a class,
this is actually adding a reference to my struct.
Now, this means that every time I assign this struct,
it's equivalent to assigning this UIColor independently
of the struct, which means that I have
to perform a reference counting modification.
Now, while having a struct with one reference count in it is not
that expensive, I mean, we work with classes all the time,
and classes have the same property.
I would now like to present to you a more extreme example,
namely, a struct with many reference counted fields.
Here I have a struct user, and I am using it to model users
in an app I am writing.
And each user instance has some data associated with it, namely,
three strings -- one for the first name of the user,
one for the last name of the user,
and one for the user's address.
I also have a field for an array and a dictionary
that stores app-specific data about the user.
Even though all of these properties are value types,
internally, they contain a class which is used
to manage the lifetime of their internal data.
So this means that every time I assign one of these structs,
every time I pass it off to a function, I actually have
to perform five reference counting modifications.
Well, we can work around this by using a wrapper class.
Here again, I have my user struct, but this time,
instead of standing on its own, it's contained
within a wrapper class.
I can still manipulate the struct using the class reference
and, more importantly, if I pass off this reference to a function
or I declare -- or I sign --
initialize a variable with the reference,
I am only performing one reference count increment.
Now, it's important to note
that there's been a change in semantics here.
We've changed from using something with value semantics
to something with reference semantics.
This may cause unexpected data sharing that may lead
to weird results or things that you may not expect.
But turns out there is a way
that you can have value semantics and benefit
from this optimization.
If you'd like to learn more about this,
please go to the Building Better Apps with Value Types talk
in Swift tomorrow in Mission
at 2:30 p.m. It's going to be a great talk.
I really suggest that you go.
Now that we've talked about reference counting,
I'd like to continue by talking a little bit about generics.
Here I have a generic function min.
It's generic over type T that conforms
to the comparable protocol from the Swift Standard Library.
From a source code perspective,
this doesn't really look that big.
I mean, it's just three lines.
But in reality, a lot more is going
on behind the scenes than one might think.
For instance, the code that's actually emitted --
here, again I am using a pseudo-Swift
to represent the code the compiler emits --
the code the compiler emits is not these three lines.
Instead, it's this.
First notice that the compiler is using indirection
to compare both x and y.
This is because we could be passing in two integers
to the min function, or we could be passing in two floats
or two strings, or we could be passing in any comparable type.
So the compiler must be correct in all cases and be able
to handle any of them.
Additionally, because the compiler can't know
if T requires reference counting modifications or not,
it must insert additional indirection
so the min T function can handle both types T
that require reference counting and those types T that do not.
In the case of an integer, for instance,
these are just no-up calls into the Swift runtime.
In both of these cases, the compiler is being conservative
since it must be able to handle any type T in this case.
Luckily, there is a compiler optimization
that can help us here, that can remove this overhead.
This compiler optimization is called generic specialization.
Here I have a function foo, it passes two integers
to the generic min-T function.
When the compiler performs generic specialization,
first it looks at the call to min and foo and sees, oh,
there are two integers being passed
to the generic min-T function here.
Then since the compiler can see the definition
of the generic min-T function, it can clone min-T
and specialize this clone function
by replacing the generic type T with the specialized type Int.
Then the specialized function is optimized for Int,
and all the overhead associated with this function is removed,
so all the reference count --
the unnecessary reference counting calls are removed,
and we can compare the two integers directly.
Finally, the compiler replaces the call
to the generic min-T function with a call
to the specialized min Int function,
enabling further optimizations.
While generic specialization is a very powerful optimization,
it does have one limitation; namely, that --
namely, the visibility of the generic definition.
For instance, this case, the generic definition
of the min-T function.
Here we have a function compute
which calls a generic min-T function with two integers.
In this case, can we perform generic specialization?
Well, even though the compiler can see
that two integers are being passed
to the generic min-T function,
because we are compiling file 1.Swift
and file 2.Swift separately, the definition of functions
from file 2 are not visible to the compiler
when the compiler is compiling file 1.
So in this case, the compiler cannot see the definition
of the generic min-T function when it's compiling file 1,
and so we must call the generic min-T function.
But what if we have Whole Module Optimization enabled?
Well, if we have Whole Module Optimization enabled,
both file 1.Swift and file 2.Swift are compiled together.
This means that definitions from file 1
and file 2 are both visible
when you are compiling file 1 or file 2 together.
So basically, this means that the generic min-T function,
even though it's in file 2,
can be seen when we are compiling file 1.
Thus, we are able to specialize the generic min-T function
into min int and replace the call to min-T with min Int.
This is but one case where the power
of whole module optimization is apparent.
The only reason the compiler can perform generic specification
in this case is because of the extra information provided to it
by having Whole Module Optimization being enabled.
Now that I have spoken about generics, I'd like to conclude
by talking about dynamic dispatch.
Here I have a class hierarchy for the class Pet.
Notice that Pet has a method noise, a property name,
and a method noiseimpl, which is used
to implement the method nose.
Also notice it has a subclass
of Pet called Dog that overrides noise.
Now consider the function make noise.
It's a very simple function,
it takes an argument p that's an instance of class Pet.
Even though this block of code only involves a small amount
of source again, a lot more is occurring here behind the scenes
than one might think.
For instance, the following pseudo-Swift code is not what is
actually emitted by the compiler.
Name and noise are not called directly.
Instead, the compiler emits this code.
Notice the indirection here that's used
to call names getter or the method noise.
The compiler must insert this indirection
because it cannot know given the current class hierarchy whether
or not the property name or the method noise are meant
to be overridden by subclasses.
The compiler in this case can only emit --
can only emit direct calls if it can prove
that there are no possible overrides
by any subclasses of name or noise.
In the case of noise, this is exactly what we want.
We want noise to be able to be overridden
by subclasses in this API.
We want to make it so that if I have an instance
of Pet that's really a dog, the dog barks when I call noise.
And if I have an instance of Pet that's actually a class,
that when I call noise, we have a meow.
That makes perfect sense.
But in the case of name, this is actually undesirable.
This is because in this API,
name is not -- is never overridden.
It's not necessary to override name.
We can model this
by constraining this API's class hierarchy.
There are two Swift language features that I am going
to show you today that you can use
to constrain your API's class hierarchy.
The first are constraints on inheritance,
and the second are constrains on access via access control.
Let's begin by talking about inheritance constraints,
namely, the final keyword.
When an API contains a declaration
with the final keyword attached, the API is communicating
that this declaration will never be overridden by a subclass.
Consider again the make noise example.
By default, the compiler must use indirection
to call the getter for name.
This is because without more information, it can't know
if name is overridden by a subclass.
But we know that in this API, name is never overridden,
and we know that in this API, it's not intended for name
to be able to be overridden.
So we can enforce this and communicate this
by attaching the final keyword to name.
Then the compiler can look at name and realize, oh,
this will never be overridden by a subclass,
and the dynamic dispatch, the indirection, can be eliminated.
Now that we've talked about final inheritance constraints,
I'd like to talk a little bit about access control.
Turns out in this API, pet and dog are both in separate files,
pet.Swift and dog.Swift, but are in the same module, module A.
Additionally, there is another subclass of pet called Cat
in a different module but in the file cat.Swift.
The question I'd like to ask is,
can the compiler emit a direct call to noiseimpl?
By default, it cannot.
This is because by default, the compiler must assume
that this API intended for noiseimpl to be overridden
in subclasses like Cat and Dog.
But we know that this is not true.
We know that noiseimpl is a private implementation detail
of pet.Swift and that it shouldn't be visible outside
We can enforce this by attaching the private keyword
Once we attach the private keyword to noiseimpl,
noiseimpl is no longer visible outside of pet.Swift.
This means that the compiler can immediately know
that there cannot be any overrides of noiseimpl in cat
or dog because, well, they are not in pet.Swift,
and since there is only one class in pet.Swift
that implements noiseimpl, namely Pet,
the compiler can emit a direct call to noiseimpl in this case.
Now that we've spoken about private, I would like to talk
about the interaction between Whole Module Optimization
and access control.
We have been talking a lot
about the class Pet, but what about Dog?
Remember that Dog is a subclass of Pet
that has internal access instead of public access.
If we call noise on an instance of class Dog,
without more information, the compiler must insert indirection
because it cannot know if there is a subclass of Dog
in a different file of module A.
But when we have Whole Module Optimization enabled,
the compiler has module-wide visibility.
It can see all the files in the module together.
And so the compiler is able to see, well, no,
there are no subclasses of dog,
so the compiler can call noise directly
on instances of class Dog.
The key thing to notice here is that all I needed to do was
to turn on Whole Module Optimization.
I didn't need to change my code at all.
By giving the compiler more information,
by allowing the compiler to understand my class hierarchy,
with more information I was able to get this optimization
for free without any work on my part.
Now I'd like to bring back that graph
that Nadav introduced earlier.
Why Is Swift so much faster than Objective-C
on these object-oriented benchmarks?
The reason why is that in Objective-C,
the compiler cannot eliminate the dynamic dispatch
through Ob-C message send.
It can't inline through it.
It can't perform any analysis.
The compiler must assume that there could be anything
on the other side of an Ob-C message send.
But in Swift, the compiler has more information.
It's able to see all the certain things on the other side.
It's able to eliminate this dynamic dispatch in many cases.
And in those cases where it does,
a lot more performance results,
resulting in significantly faster code.
So please, use the final keyword in access control
to communicate your API's intent.
This will help the compiler to understand your class hierarchy,
which will enable additional optimizations.
However, keep in mind that existing clients may need
to be updated in response to such changes.
And try out Whole Module Optimization
in your release builds.
It will enable the compiler to make further optimizations --
for instance, more aggressive specialization --
and by allowing the compiler
to better understand your API's class hierarchy,
without any work on your part, you can benefit
from increased elimination of dynamic dispatch.
Now I'd like to turn this presentation over to Joe,
who will show you how you can use these techniques
and instruments to improve the performance
of your application today.
JOE GRZYWACZ: Thank you, Michael.
My name is Joe Grzywacz.
I am an engineer on the Instruments Team,
and today I want to take you
through a demo application that's running a little slowly
right now, so let's get started.
So here we have my Swift application that's running
slowly, so what I want to do is go ahead and click and hold
on the Run button and choose Profile.
That's going to build my application in release mode
and then launch instruments as template choosers
so we can decide how we want to profile this.
Since it's running slowly, a good place to start is
with the time profiler template.
From Instruments, just press Record,
your application launches, and Instruments is recording data
in the background about what it's doing.
So here we can see we are running
at 60 frames per second before I've started anything,
which is my target performance.
But as soon as I add these particles to the screen,
they are moving around and avoiding each other just
like I wanted, but we are running at only
about 38 frames per second.
We lost about a third of our performance.
Now that we have reproduced the problem,
we can quit our application and come back to Instruments.
Let me make this a little bit larger
so we can see what's going on.
You can just drag this, drag that around.
View Snap Track to Fit is handy
to make your data fill your horizontal time.
Now what are we looking at?
Here in the track view, this is our CPU usage
of our application.
We can see on the left before I did anything, CPU usage was low;
after I added those particles, CPU usage became higher.
You can see what those values are by moving your mouse
and hovering it inside this ruler view.
You can see prior we were around 10% or so, not doing much.
Later on we moved around 100%.
So we saturated our CPU.
In order to increase our performance,
we need to decrease how much work we're doing.
So what work were we doing?
That's where this detail pane down below comes in.
So here's all of our threads.
Go ahead and open this up a little bit.
You are probably familiar with this call stack
from seeing it inside of Xcode in the debugger.
Start, calls main, calls NS application main, et cetera.
But what Instruments is also going
to tell you is how much time you were spending inside
of that function, including its children,
right here in this first column Running Time.
We can see 11,220 milliseconds, or 99% of our time,
was spent in NSApplication Main or the things it called.
The second column, Self,
is how much time the instrument sampled inside
that function itself, so it excludes its children.
So what I want to do is see where does
that self number get larger, and that means
that function is actually performing a lot of work.
You can continue opening these up one by one, hunting around,
but that can take a little while.
Instead we recommend you come over here to the right side,
this extended detail view,
and Instruments will show you the single heaviest stack trace
in your application.
That's where it sampled the most number of times.
You can see again here is our main thread,
it took 11,229 milliseconds.
It began in Start.
Symbols in gray are system frameworks.
Symbols in black here, like Main, are your code.
And what I'd like to do is just look down this list and see
if it's kind of a big jump.
That means something interesting happened around this time.
If I scan down this list,
the number is slowly getting smaller,
but there's no big jumps going on, until I get down here
where I see a jump from about 9,000 to about 4,000.
So something happened there.
I am going to go ahead and click on my code,
and Instruments has automatically expanded the call
tree on the left side so you can see what you just clicked on.
Let me frame this up.
And what's going on here?
Well, if I back up just a little bit for a moment,
here is my NSFiretimer call, what's driving my simulation,
trying to get at 60 frames per second.
Down here is my particle Sim.app delegate.update routine,
that's my Swift routine driving my simulation.
But in between is this weird @objc thing sitting here.
I want to point out that's just a thunk.
Basically, it's a compiler inserted function that gets us
from the Objective-C world here in NSFiretimer
down to the Swift world down here inside of my code.
That's all it is.
Otherwise, we can ignore it.
Now, we can see my update routine is taking 89%
of the time, so continuing
to optimize this function is a good idea.
So everything else above it is not really interesting to me.
I am going to go ahead and hide it by focusing
in on just this update routine
by clicking this arrow here on the right.
Everything else around this has been hidden.
Running time has been renormalized to 100%,
just to help you do a little less mental math.
If we look in on what's going on in this function,
Update Phase Avoid calls Find Nearest Neighbor,
that calls down into something really interesting here.
We see Swift release is taking 40% of our time,
and Swift retain is taking another 35% of our time.
So between just these two functions, we are doing
of our update routine is just managing reference counts.
Far from ideal.
So what's going on here?
Well, if I double-click on my Find Nearest Neighbor routine
that calls those retains releases,
Instruments will show you the source code.
However, Swift is an automatic reference counted language,
so you are not going to see the releases
and retains here directly.
But you can, if you go over to the disassembly view,
click on that button there,
Instruments will show you what the compiler actually generated.
And you can hunt around in here
and see there's a bunch of calls here.
There's 23% of the time on this release.
There's some more retains and releases here.
There is another release down here.
They are all over the place.
So what can we do about that?
Let's return to our code here and go to my particle file.
Here is my class Particle,
so it's an internal class by default.
And it adheres to some collidable protocol.
Down below is -- this is the Find Nearest Neighbor routine
that was taking all of that time before.
Now, I know that when the update timer fires, that code is going
to call Find Nearest Neighbor on every single particle
on the screen, and then there's this interfor loop that's going
to iterate over every single particle on the screen.
We have an N-squared algorithm here or effectively,
the stuff that happens inside this for loop is going
to happen a really large number of times.
Whatever we do to optimize this thing should have big payoff.
So what is going on?
We have our for loop itself
where we access one of those particles.
So there's some retain release overhead.
There are property getters being called here,
this dot ID property.
And as Michael was talking about,
since this is an internal class,
there might be some other Swift files somewhere
that overrides these property getters, so we are going
to be performing a dynamic dispatch
to these property getters,
which has retain/release overhead as well.
Down here there is this distance squared function call.
Despite the fact that it lives literally a dozen source code
lines away, once again, we are going
to be doing a dynamic dispatch to this routine with all
of that overhead as well as the retain release overhead.
So what can we do about this code?
Well, this code is complete.
I wrote this application, I am finished,
my particle class is complete,
and I have no need to subclass it.
So what I should do is communicate my intention
to the compiler by marking this class as final.
So with that one little change, let's go ahead
and profile application again and see what happened.
This time, the compiler was able to compile that file,
knowing that there are no other subclasses
of that particle file -- particle class, excuse me --
and that means it's able
to perform additional optimizations.
It can call those functions directly,
maybe even inline them, or any other number of optimizations
that can reduce the overhead that we had before.
So if we record, this time when I add the particles,
we can see they are moving around and running
around at 60 frames per second at this time,
so we got back 20 frames per second with just
that one small change.
That's looking good.
However, as you may guess,
I have a second phase here called collision
where we swap the algorithm
and now they are bouncing off one another,
and again our frame rate dropped by about 25 percent
down to 45 frames per second.
We reproduced the problem again, let's return to Instruments
and see what's happening.
We will do what we do before, make this a little bit larger,
Snap Track to Fit, and now what do we see?
Over here on the left, this was our avoidance phase.
Things are running much better, around 30%, 40% or so,
so that's why we are hitting our 60 frames per second.
But over here on the right, this is our collision phase.
And now this is capping out at 100% of our CPU,
and that's why our frame rate is suffering again.
We did what we did a moment ago right now, this call tree data
down here in the detail pane is going to have data
from this avoidance phase, which is running fine,
as well as this collision phase, which is what I really want
to actually be focusing on.
So that avoidance sample over here is going
to water down our results.
Instead, I would like to set a time filter so I am only looking
at my collision phase.
That's really simple to do.
Just click and drag in the timeline view,
and now our detail pane has been updated
to only consider the samples from our collision phase.
Now we can do what we did before,
head over to our extended detail view.
Look down this list, see where we see a jump,
and something interesting happens here,
we went from about 8,000 milliseconds
to 2,000 milliseconds.
So I am going to click on my collision detection class here.
Instruments once again automatically expands this call
tree for us.
And if we just kind of look at what's going on here,
88% of my time is spend inside of this runtime step routine.
This is a good place to dig in.
I'll do what I did before and click
on this Focus arrow here on the right.
Now we are looking at just our runtime step routine,
and let's see what it's doing.
Well, 25% of its time is being spent inside
of Swift.array.underscore getelement.
When you see this A inside of angle brackets,
that means you are calling into the generic form
of that function and all the overhead that entails.
You will see this again here inside
of Swift array is valid subscript,
there's that A inside of angle brackets.
It also happens when you have
that A inside of square brackets.
So we are calling a generic property getter here.
So just between these three generic functions,
we are looking at about 50% of our time is being spent inside
of these generic functions.
So what can we do about getting rid of that overhead?
All right, back over to Xcode.
Here is my collision detection file.
Here we can see that collidable protocol
that my particle was adhering to.
Here is that generic class, class detection,
type T that adheres to a collidable protocol.
What does it do, well it has this collidables array here,
that's of generic type T.
And here down below is our runtime step routine,
and that's where we were spending all of our time.
So what does this function do?
Well, it iterates over all our collidables, accesses one
of the collidables from that array, calls a bunch
of property getters here.
Here's some more.
There is an interfor loop, where we do kind
of the same thing again, we pull
out another second collidable from that array.
Then all sorts of property getters down below.
We're doing a lot of generic operations here, and we'd really
like to get rid of that.
How do we do that?
Well, this time you can see my collision detection class is
here inside of this Swift file.
However, the users of this,
where I am using this class is inside this app delegate
routine, this particle Swift file, so it's in other parts
of this module, so we are going to have to turn
to Whole Module Optimization.
Doing that's really easy, just click on your project.
Go over here to build settings.
Make sure you are looking at all of your build settings.
Then just do a search for optimization.
And here is that setting that Nadav showed you earlier.
You just want to switch your release build
over to Whole Module Optimization.
And now when we profile, the compiler is going to look
at all those files together and build a more optimized binary,
but let's check and see what happened.
So we will launch time profiler for the third time here,
start our recording, and 60 frames per second,
we add our particles, this avoidance phase still running
at 60 frames per second.
Good, I expected that not to change.
Always good to verify.
Then we move over to our collision phase.
Now that is running at 60 frames per second as well.
All it took was a couple minutes of analysis
and a few small tweaks,
and we made our application a lot faster.
So to summarize what we saw here today,
we know that Swift is a flexible programming language
that uses -- that's safe
and uses automatic reference counting
to perform its memory management.
Now, those powerful features are what make it a delight
to program in, but they can come with a cost.
What we want you to do is focus on your APIs and your code
that when you are writing them, you keep performance in mind.
And how do you know what costs you are paying for?
Profile your application inside of Instruments,
and do it throughout the lifetime
of your application development so that when you find a problem,
you find it sooner and you can react to that more easily,
especially if it involves changing some of your APIs.
There's documentation online, of course.
The Developer Forums where you can go, and you will be able
to ask questions about Swift and get them answered,
as well as Instruments.
And speaking of Instruments, there's a Profiling
in Depth talk today in Mission at 3:30.
There is an entire session devoted to Time Profiler
and getting into even more depth
than we're able to get into today.
And as Michael talked about earlier,
there is a Building Better Apps with Value Types in Swift
that will also build upon what you saw today.
So thank you very much.
Looking for something specific? Enter a topic above and jump straight to the good stuff.
An error occurred when submitting your query. Please check your Internet connection and try again.