
If you already use Shark, the tool that measures performance and helps you optimize your code, you may be ready to go farther in optimizing the
performance of your application. This article introduces some powerful features in Shark 4.2 that can help get you there. (If you are new to Shark, see the
introductory articles listed below under For More Information.)
Shark comes with several different measurement configurations, each one tailored to make specific measurements of performance, such as
malloc tracing, L2 cache misses, and more. Here we will cover System Trace, an important new measurement configuration that first
appears in Shark version 4.2. After explaining System Trace and how to read its output, we show you how to work with the new
programmatic controls available in the System Trace configuration:
custom Sign Posts that record events during the trace. With sign posts, you can easily correlate the System Trace to program state.
The goal of System Trace is to give you a way to measure and understand how your code interacts with Mac OS X and how the threads in your multi-threaded
application interact with each other. It complements Shark's system-wide time profiling by allowing you to observe behavior that is otherwise transparent to statistical profiling
techniques.
With System Trace, you get a much more granular view of your application's performance, and it provides additional ways to make it run faster and more efficiently.
Getting Shark
Before we start, make sure you have the latest copy of Shark (version 4.3.1 as of this writing). Shark comes as part of the Xcode Developer Tools package, and the most recent
version is available as an upgrade from the ADC Tools Downloads page. The Developer Tools package also comes on a separate CD with retail copies of Mac OS X—but make sure you have already installed version 4.2 or later.
When you first launch the application, be sure to spend some time with the Shark User Guide, available from Shark's Help menu. The user guide is a detailed manual that
covers all of the functions Shark provides.
Shark supports profiling of universal binaries, and starting with version 4.2, Shark includes ISA documentation for both Intel, and PowerPC processors. You can
find the ISA documentation on Shark's Help menu.
Using System Trace
The System Trace configuration allows you to see how your code interacts with the operating system. If your
application is multi-threaded, the System Trace configuration also shows you how the threads interact with each other.
When your code crosses the boundary from user space to kernel space, System Trace records the occurrence. Crossing from user space to kernel space
can happen in three ways:
- System calls
- Virtual memory faults
- Asynchronous hardware interrupts
Before we look at an example System Trace, let's look at the default parameters of the System Trace configuration. First, select the System Trace
profile from the popup button in Shark's main window. Next, choose Show Mini Config Editor from the Config menu (the keyboard shortcut is Command-Shift-C).
Figure 1 shows the Configuration Editor for System Trace:
Figure 1: Default System Trace Configuration
System Trace is not a sampled profile. Rather, it is an exact trace of all kernel space boundary crossings for all processes
running during the profile. On a heavily loaded system, the trace can become very large, so the default configuration is limited
to 10 seconds or 1,000,000 trace records. If you need a longer running trace, you can alter the configuration, or create a new
System Trace configuration with the required parameters.
Now let's look at a System Trace profile:
Figure 2: System Trace Showing All Processes
As shown in Figure 2, Shark can display the System Trace in three different ways: Summary view, Trace View, and Timeline View. The Summary view is the best place to start your analysis.
Summary View
In the Summary view, the top half of our example System Trace shows us an overall breakdown of where the running processes are
spending their time. In this case, the system was idle for 82.5% of the ten-second profile.
Some 12% of the time was spent in user space, and 1.9% was spent in system calls.
Notice that the summary begins with all processes showing. You can select a specific process using the Process popup button in the lower
left corner of the window.
Below the overall summary data, another tabbed view allows you to look at three different individual summaries of the various event types:
Scheduler summary, System Call summary, and Virtual Memory Faults summary.
Summary View: Scheduler
The Scheduler summary tab shows an outline view of the scheduling behavior of all threads in the selected process (the figure above shows the scheduler summary for all processes).
You can see all the threads of a process by clicking the triangle.
The Scheduler Summary view allows you to verify that the threads of your application are
running efficiently. For example, if you thread your CPU-bound code, you would expect to see worker threads running for a few milliseconds (as opposed to microseconds).
If the Scheduler Summary view shows this not to be the case, it could indicate the worker threads are completing their computations and going back into
a wait-state. In this case, the overhead of synchronization outweighs the advantage of multithreading.
The scheduler summary shows data for four different metrics:
- Busy Time: The total time the thread ran (user space time plus system non-idle time).
- User Time: The total time the thread spent in user space.
- System Time: The total time the thread spent in kernel or driver code.
- Priority: The thread priority used by the scheduler
You can choose between these four metrics using the popup button in the lower right corner, under the outline.
For all four metrics, the Total, Avg, Min, and Max columns display the total, average, minimum, and maximum values during the session. When the Busy Time metric is selected, the Intervals
column displays the number of times a thread was scheduled to run.
Summary View: System Calls
The System Calls summary tab shows the time spent in each system call. Additionally, the System Calls Summary view allows you to correlate system calls with
your own code. For example, if the System Calls Summary shows your code to be spending a lot of time making system calls, you can use that to optimize how and when your application
acquires that information. Perhaps you will find you can cache certain results instead of making repeated calls into the system.
Figure 3 shows the System Call summary view with the Core Image Fun House process selected.
Figure 3: System Call Summary Showing the Core Image Fun House Process
Like Shark's Time profile, the System Call summary can be viewed in either heavy or tree view mode. You can customize the System Calls summary using the Advanced Settings drawer.
To open Advanced Settings, choose Show Advanced Settings from Shark's View menu. Using the Weight By popup button you can customize the view by selecting one of the following metrics:
- CPU Time: The total time spent in system calls for a processor.
- Wait Time: The total time that system calls spent in a blocked state, waiting on a resource or event.
- Total Time: The total time spent in system calls.
- Count: The total number of system calls.
Summary View: Virtual Memory Faults
Finding virtual memory faults can be difficult using a time profile. The System Trace VM Faults summary view, however, can show you exactly what code causes a virtual memory fault.
For example, the VM Fault Summary view might show that your code is generating an inordinate amount of zero-fill faults, caused by frequent memory allocations and deallocations
in a loop.
You could optimize your code to use a single allocation of a large memory buffer, rather than allocating and freeing several smaller buffers each time through the loop.
Figure 4 shows the VM Faults summary for the Core Image Fun House process:
Figure 4: VM Faults Summary Showing the Core Image Fun House Process
There are five different types of virtual memory faults:
- Page In: A nonresident page was read from disk.
- Page Out: A page was written to disk.
- Zero Fill: A page marked for zero fill on demand was touched.
- Copy on Write: A shared page was modified.
- Page Cache Hit: A resident, but unmapped page was touched.
The VM Fault summary groups the data by the type of fault. In the example, you can see that the Core Image Fun House process spent 3.2ms in Page Cache Hit faults.
By clicking the triangle, you can see the exact calls that caused this type of fault to occur. The VM Fault summary can be customized using the Advanced Settings drawer.
Like the System Call summary, you can select the same metrics (CPU Time, Wait Time, etc.). For VM Faults, the Advanced Settings drawer has an additional metric, Size, which
lets you see the total number of bytes impacted by the fault.
Trace View
The Trace view shows all of the events that occurred during the session. An event can be a thread interval, system call, or virtual memory fault. Events are viewed using the
Scheduler, System Calls, and VM Faults tabs, respectively.
Trace View: Scheduler
The duration of time during which a thread runs is called a thread run interval. The Scheduler trace shows all thread intervals that occurred during the session. You can filter the view
to focus on a specific process, thread, and CPU using the popup buttons under the grid. Figure 5 shows the Scheduler trace view for the Core Image Fun House process:
Figure 5: Scheduler Trace View Showing the Core Image Fun House Process
The Scheduler trace view shows the unique index assigned to each thread interval, along with the breakdown of time spent during that interval. In the example, during thread interval index 211, the Core Image Fun House
process spent 564.2 microseconds in user space, and 36.6 microseconds in system space. The Δt column shows the amount of time elapsed since the thread was last scheduled to run. Finally,
the Reason column shows why the thread's tenure was ended.
Trace View: System Calls
The System Calls trace shows the system call events for the session. Figure 6 shows an example System Call trace for the Core Image Fun House process:
Figure 6: System Calls Trace View Showing the Core Image Fun House Process
System calls occur during thread intervals, but a system call might begin in one interval and end in another. The Interval column of this view shows the beginning and ending
thread run intervals for each system call event. Note the thread run interval numbers correspond back to the Index column of the Scheduler trace view. When a single number is displayed
in the Interval column, it means the system call began and ended in the same interval. If the call event started before the trace session started, the starting thread run interval will
be shown as "?". Likewise, if the call ended after the session stopped, the ending thread run interval will be "?".
You can toggle the call stack for the system call by clicking the button in the lower right corner of the table.
Trace View: VM Fault
The VM Fault trace shows the virtual memory fault events that occurred during the session. Figure 7 shows an example VM Fault trace for a session with the Core Image Fun House process selected:
Figure 7: VM Fault Trace View Showing the Core Image Fun House Process
As with the System Calls trace, each VM Fault event is assigned a unique index, shown in the Index column. Virtual memory faults occur during thread run intervals, and this is shown in the Interval column. Again,
the value in the Interval column corresponds back to the Index column of the Schedule trace view. The VM Fault table shows the type of fault that occurred, and the amount of time
that was spent processing it. You can toggle the call stack at each fault event by clicking the button in the lower right corner of the table.
It should be noted that double-clicking on an event in any of the three trace view tables (Scheduler, System Calls, and VM Faults) will take you immediately to that event in the
Timeline view, which is discussed next.
Timeline View
The Timeline view shows all events at the precise location they occurred during the System Trace session. Along a horizontal axis representing time, every event — a thread run interval, system call, or VM fault —
is shown exactly where and when it happened. Each row in the Timeline view corresponds to a thread. You can scroll the timeline horizontally to see the events that pertain to each thread.
The next figure shows an example Timeline view. Unlike previous images, Figure 8 shows all processes that were running during the session.
Figure 8: Timeline View Showing All Processes
Timeline View: Navigation
Using the zoom slider above the timeline table, you can magnify the view. To zoom in, drag the slider to the right. To zoom out, drag it left.
You can also click and drag a rectangle on the chart to zoom into. Option-click to zoom out again.
In the example view the timeline has been
magnified and scrolled to the left to show the very beginning of the System Trace session. You can see the system running in thread 0x20D0318 of the mach_kernel process. That thread is then
preempted, so that thread 0x27A7C60 of Shark can run.
Timeline View: Thread Lines and Event Labels
Each colored bar represents a thread run interval. Different colored bars represent each CPU in the system. Along the thread run interval, you can see the system calls, virtual memory faults, interrupts, and sign post events that occurred, if any (sign post
events will be covered later in this article). You can get data for an individual thread run interval in the timeline by clicking on the colored bar.
Events that occur during a thread run interval are denoted by special icons placed along the colored bar. System calls are denoted by colored telephone icons. The different types of
virtual memory faults are also represented by a set of icons. You can see the meaning of each icon by choosing the Show Advanced Settings item on Shark's File menu. The Advanced Settings
drawer for the Timeline view also allows you to filter out the various event types. To open a view showing data specific to an event, click on the event's icon. The black line under the icon
represents the start and end of the event. As noted previously, events can start and end in different thread run intervals.
Figure 9 shows the Advanced Settings drawer for the Timeline view, with the icons that represent each event type.
Figure 9: Timeline View Advanced Settings Drawer
Instrumenting System Trace with Sign Posts
You can control Shark programatically using the CHUD framework. You can even integrate programatic control directly into your Xcode project by creating custom build
configurations. The CHUD framework sports a large, but straightforward API, which allows you to control Shark in a variety of ways.
In this section, we will concentrate on programmatic control of Shark's System Trace configuration.
In the earlier discussion of the Timeline
view, you saw that you can create custom events called sign posts. With sign posts, you can correlate the System Trace with
program state. Let's take a look at an example that shows how to add programatic control of Shark to your project. Then,
we'll define a custom sign post event and show how it appears when the application is profiled with Shark.
Creating a Custom Build Configuration for Programmatic Control of Shark
The example application we will use is the Core Image Fun House, which is found in the folder:
/Developer/Examples/Quartz/CoreImage/FunHouse. Open this project in Xcode, and
then add the CHUD framework by doing the following:
- In the Groups and Files pane, right-click on the project node and choose Add > Existing Frameworks.
- Navigate to the /System/Library/PrivateFrameworks folder and select CHUD.framework.
- In the Groups and FIles pane, drag the CHUD.framework node into the Frameworks folder.
Next, add a new build configuration for controlling Shark.
- Click the project node in the Groups and Files pane.
- Open the Inspector window.
- In the Inspector window, click the Configurations tab.
- Select the Development build configuration.
- Click Duplicate.
A new build configuration is created.
- Rename the new configuration, "Development [Instrumented]".
You have now created a duplicate of the Development build configuration called "Development [Instrumented]".
The Configurations tab of the Inspector window should look like Figure 10.
Figure 10: Development [Instrumented] Build Configuration
Select the "Development [Instrumented]" build configuration in Xcode's Active Build Configuration popup button.
Next, add a preprocessor macro that will enable us to tell when this configuration is active:
- In the Groups and Files pane, select the FunHouse target node of the project.
- Open the Inspector window.
- Click the Build tab.
- In the Configuration popup button, select the "Development [Instrumented]" build configuration.
Scroll the list to the Preprocessor Macros setting.
- Add a preprocessor macro called
DEVELOPMENT_INSTRUMENTED.
The Inspector window should look like Figure 11.
Figure 11: Development [Instrumented] Preprocessor Macro
Adding a custom build configuration is not strictly necessary, but in doing so, you can create a build specifically for programatic control of Shark.
In code, you can detect when the "Development [Instrumented]" configuration is active by testing for the DEVELOPMENT_INSTRUMENTED macro.
Note: You should turn off Xcode's Zero Link, and Fix and Continue options in the "Development [Instrumented]" build configuration.
Zero Link and Fix and Continue are meant for development and debugging, rather than performance measurement.
Next, let's create two custom sign post events for the Fun House application. Even this can be incorporated into the Xcode project.
Defining the Point Sign Post Events
Sign post events are defined in a simple text file. Add a new file that will contain event definitions to the project:
- Choose New File from the Xcode File menu.
- Select Empty File in Project from the list.
- Click Next.
- In the File Name text box, enter the name "FunHouseSharkEvents.text".
- Click Finish.
The sign post definition file contains one or more event definitions, each of which consists of a hexadecimal value followed by the event name.
In our sign post event file, type the following lines:
0x0 FunHouseAwakeFromNib
0x1 FunHouseTerminate
The value of the event must be in the range 0x0 to 0x3FFF. Shark will ignore event definition codes outside of this range. You may create multiple
event definition files, but be careful the values do not overlap.
To make Shark aware of the sign post events, we must copy the sign post event definition file to the folder: ~/Library/Application Support/Shark/KDebugCodes. We can accomplish
this in Xcode by adding a Copy Files build phase to the FunHouse target, as follows:
- In the Groups and Files pane, select the FunHouse target.
- Right click on the target and choose Add > New Build Phase > New Copy Files Build Phase.
The Copy Files Inspector window opens.
- In the Destination popup button, select Absolute Path.
- In the Full Path text box, enter $(HOME)/Library/Application Support/Shark/KDebugCodes.
The Copy Files Inspector window should look like Figure 12.
Figure 12: Copy Files Inspector
- Close the Inspector window.
- In the Groups and Files pane, click the
FunHouseSharkEvents.text file and drag it to the Copy Files build phase you just added.
This way, Xcode will track the sign post file as part of the project. When the target is built, Xcode will copy the file to the proper location as part of the build process.
Instrumenting Code for Sign Post Events
Now for some details about sign post events. Shark can record two types of sign post events:
- Point event: A point event marks a single occurrence.
- Interval event: An interval event marks an event that has a starting and ending point.
All sign post events are recorded by calling the CHUD framework function chudRecordSignPost(). The signature of the function is:
int chudRecordSignPost(unsigned code, chud_signpost_t type, unsigned arg1, unsigned arg2,
unsigned arg3, unsigned arg4);
The first argument to the function is the event code value defined in the sign post file. The second parameter is the type of sign post, which can be either: chudPointSignPost, for a point event, or
chudBeginIntervalSignPost, to mark the start of an interval event. The end of an interval event is marked by passing the value chudEndIntervalSignPost.
The remaining four parameters are for your own use. You may associate meaningful values with these parameters, and they will be displayed when you inspect the event in Shark.
We will use the two events defined earlier to create two point events: the first event will show exactly when the awakeFromNib method is called.
The second will show exactly when the applicationWillTerminate method is called. We are going to record the two sign posts
in two different source files. First, open the FunHouseApplication.m file and add the CHUD framework header:
#import <CHUD/CHUD.h>
Next, add a #define preprocessor macro so that you can refer to the event by name rather than numeric value:
#define FunHouseAwakeFromNib 0x0
Now add the following lines to the beginning of the awakeFromNib method:
- (void)awakeFromNib
{
#ifdef DEVELOPMENT_INSTRUMENTED
chudInitialize();
chudRecordSignPost(FunHouseAwakeFromNib, chudPointSignPost, 0, 0, 0, 0);
#endif
// ...
}
This code initializes the CHUD framework, and calls the chudRecordSignPost() function with sign post code 0, which
corresponds to the FunHouseAwakeFromNib event.
The FunHouseTerminate sign post is recorded in the FunHouseAppDelegate.m file. Open the file,
and add the #import <CHUD/CHUD.h> header.
Again, add a preprocessor macro for the event's numeric value:
#define FunHouseTerminate 0x1
Finally, add the following code to the beginning of the applicationWillTerminate method:
- (void)applicationWillTerminate:(NSNotification *)aNotification
{
#ifdef DEVELOPMENT_INSTRUMENTED
chudRecordSignPost(FunHouseTerminate, chudPointSignPost, 0, 0, 0, 0);
chudCleanup();
#endif
// ...
}
This code records sign post code 0x1, which corresponds to the FunHouseTerminate event.
Now you can build the application — we're almost there.
Viewing Sign Post Events
Start Shark, and select the System Trace configuration. We need to modify the configuration a bit, because the default System Trace only samples for ten seconds. Choose
Show Mini Config Editor from Shark's Config menu. Shark's main window now displays an additional three parameters for the System Trace configuration: Start Delay, Time Limit, and Sample Limit.
Increase the Time Limit to 45 seconds, and turn off the Sample Limit check box. We won't run the trace for that long, but it will give enough time to start the
Fun House application and then shut it down.
In Shark's main window, click the Start button.
Now, run the Core Image Fun House application. When the program starts, it immediately displays an Open File dialog with several files displayed. Select a file and click Open.
We only want to record our two startup and shutdown events, so exit the Fun House application.
In Shark's main window, click the Stop button.
When the System Trace window opens, click the Trace tab. You can see that an additional tab has been added, called Sign Posts. Figure 13 shows the additional tab:
Figure 13: Fun House Trace Showing Sign Posts Tab
Shark recorded the two sign post events. The additional four parameters are displayed under the table. You can double-click on a row
in the table to go directly to that sign post on the Timeline view. Figure 14 shows the FunHouseAwakeFromNib sign post
on the Timeline. You can click on the sign post icon to display detailed information on the event.
Figure 14: Fun House Timeline Showing Point Sign Post Events
Creating Interval Sign Posts
Interval sign posts let you mark sections of code, so that you can directly correlate the System Trace timeline with your program state. The Fun House application
uses a timer to play transition effects. For the next example, you will see how to instrument the code to mark the start and end of the timer with sign posts.
You will also instrument the code so that interval sign posts mark the start and end of each timer event.
Defining Interval Sign Post Events
The first step is to add the new sign post event codes to the text file in the project. Open the FunHouseSharkEvents.text file and add codes
for events 2, and 3:
0x0 FunHouseAwakeFromNib
0x1 FunHouseTerminate
0x2 FunHouseTimer
0x3 FunHouseAutoTimer
The FunHouseTimer sign post will mark the beginning and ending of the effect timer. The FunHouseAutoTimer sign post will surround
the section of code in the autoTimer method. This will allow us to see the amount of time spent in that particular code block.
Instrumenting Code for Interval Sign Post Events
The transition effect timer code is located in the file EffectStackController.m. Open the file and make the modifications necessary to use the
framework, and define preprocessor macros for the sign posts:
- Add
#import <CHUD/CHUD.h>
- Add preprocessor macros:
#define FunHouseTimer 0x2
#define FunHouseAutoTimer 0x3
Locate the startTimer method. Modify the method to record the start of the timer sign post:
// start the transition timer
- (void)startTimer
{
if (!timer)
{
#ifdef DEVELOPMENT_INSTRUMENTED
chudRecordSignPost(FunHouseTimer, chudBeginIntervalSignPost, 0, 0, 0, 0);
#endif
timer = [NSTimer scheduledTimerWithTimeInterval:1.0 / 30.0 target:self selector:@selector(autoTimer:)
userInfo:nil repeats:YES];
[timer retain];
}
}
Locate the stopTimer method. Modify it to record the end of the interval sign post:
// stop the transition timer
- (void)stopTimer
{
if (timer)
{
#ifdef DEVELOPMENT_INSTRUMENTED
chudRecordSignPost(FunHouseTimer, chudEndIntervalSignPost, 0, 0, 0, 0);
#endif
[timer invalidate];
[timer release];
timer = nil;
}
}
Locate the autoTimer method. Modify the code to create a point sign post each time the timer fires:
// called by the transition timer every 1/30 second
// this animates the transitions in sequence - one after another
- (void)autoTimer:(id)sender
{
int count, i, transitionIndex;
CIFilter *f;
NSString *type;
NSDictionary *attr;
double now;
float transitionValue, value, lastTimeValue;
#ifdef DEVELOPMENT_INSTRUMENTED
chudRecordSignPost(FunHouseAutoTimer, chudBeginIntervalSignPost, 0, 0, 0, 0);
#endif
// ...
Add the following code to the end of the autoTimer method:
// ...
// let core image recompute the view
[_inspectingCoreImageView setNeedsDisplay:YES]; // force a redisplay
#ifdef DEVELOPMENT_INSTRUMENTED
chudRecordSignPost(FunHouseAutoTimer, chudEndIntervalSignPost, 0, 0, 0, 0);
#endif
// ...
Recording the Interval Sign Posts
Now build and run the Fun House application. Select one of the sample images to begin.
Add a "flash" transition to the effect stack, and select a different image as the target.
Start Shark, and once again configure it to run the System Trace for 45 seconds, without a sample limit. Click Start to begin sampling.
Switch back to Fun House, and click the play button to run the flash transition effect. Once the effect is complete, switch back to Shark and stop the
System Trace. Let's take a look at what happened.
Viewing the Interval Sign Post Events
Figure 15 shows the Sign Posts tab of the Trace view.
Figure 15: Fun House Trace View Showing Interval Sign Post Events
The Trace view Sign Posts tab contains the FunHouseTimer sign post, which shows the total time the effect timer ran, from start to finish. Also shown is
each occurrence of the FunHouseAutoTimer sign post. Each sign post corresponds to one timer event. The FunHouseAutoTimer sign post
tells you the total time spent in the block of code surrounded by the
chudBeginIntervalSignPost and chudEndIntervalSignPost in the autoTimer method. Double-clicking on any
of the sign post events will take you directly to the event in the Timeline view.
The Trace Sign Posts tab shows a great deal of information about the time spent in each timer event. The columns of the table display the following information:
- Index: The thread interval index.
- Interval: The beginning and ending thread interval index.
- Process: The process the event occurred in.
- Thread: The thread the event occurred in.
- Name: The name of the sign post event, as defined in the sign post event file.
- CPU Time: The active CPU time during the interval sign post event.
- Wait Time: The wait time during the interval sign post event.
You could use the additional four arguments of the chudRecordSignPost function to record information specific to each timer event. These arguments
would be shown in the Trace Sign Post view, allowing you to correlate the trace to program state at the finest level of granularity.
For more information on programatic control, see the Shark User Guide. Additionally, the header files of the CHUD framework are well-commented
and contain descriptions for each function in the framework.
Summary
Understanding how your code interacts with Mac OS X is another important aspect of optimization. Shark's System Trace facility makes behind-the-scenes
behavior such as thread scheduling, system calls, and virtual memory faults visible. Shark is the only tool that can pin system calls and virtual
memory faults directly to the code that caused them, giving you insight into parts of your code that might benefit from optimization or redesign.
Through the use of programatic control and custom sign posts, you can easily correlate the System Trace to your program state. Using interval sign posts,
you can get data on how long a certain critical operation takes by surrounding the code with sign post markers.
The overarching theme of Shark's System Trace is to show system behavior, and to show you the code that caused that behavior.
For More Information
Posted: 2006-01-30
|