Xcode Instruments: Mismatch Time Profiler results vs. CPU Cycles

Question

Created Oct ’24

Replies 2

Boosts 0

Participants 2

Recently, we reworked a crucial part of our app and managed to half the amount of CPU cycles our app requires (according to Xcode Instruments).

Nonetheless, when using the Time Profiler component in instruments, it shows that the CPU time spent was either higher or the same (depending on execution).

The main time-consuming factor here: libsystem_pthread.dylib - the amount of CPU time spent by this library has doubled from original implementation to reworked implementation.

Therefore, I'm having a few questions:

How should I interpret this result?
How is this even possible if the CPU clock cycles halved?
What is the better metric here, the CPU cycles or the time profiler?
How can I reduce the impact of that said library? What does that library do and how can I influence its performance?

Thanks in advance.

Boost

Answer 1

DTS Engineer OP

Apple

Oct ’24

Nonetheless, when using the Time Profiler component in instruments, it shows that the CPU time spent was either higher or the same (depending on execution).

I'm not entirely sure what views your comparing here (some screenshots might clarify things) but the one thing I'd be careful about here is that a most of the views Time Profiler shows are in terms of "% of total resources used", not absolute terms. In concrete terms, if a thread runs "sleep(60)" then time profiler will say that it spent 100% of that minute in "sleep". That doesn't actually represent any kind of "problem" or consume resource, it's just the thread doing nothing.

Related to that point:

The main time-consuming factor here: libsystem_pthread.dylib - the amount of CPU time spent by this library has doubled from original implementation to reworked implementation.

Again, I'm not sure which view you're seeing here, but the call trees Instruments shows are built by "merging" a series of samples together, then showing what functions occurred in each trace. Every thread except the main thread* will almost always** start with 100% in one or more functions from "libsystem_pthread.dylib". That's simply because that is the library the system uses to create all of the threads it creates, not because "libsystem_pthread.dylib" is actually "doing" anything significant.

*The main thread is created through an act of spontaneous generation when your process is fork'd, not through the normal thread creation process.

**While it is technically possible to create a thread using mach, doing so is at a minimum quite strange and more likely, quite dangerous. I'm not aware of any Apple API that does this, so if a non-pthread shows up in your app it's time to starts asking some serious questions about the sorts of libraries you're spending time with.

How can I reduce the impact of that said library? What does that library do and how can I influence its performance?

I'd need to see a screenshot to know exactly what's going on, but I think you're just misunderstanding what it's showing you. One thing to keep in mind here is that TimeProfiler's "job" is simply to "tell you what your app is doing". If your app is doing "nothing", then it will happily show you a detailed stack trace of what your app is doing when it does "nothing".

Indeed, I think what I'd actually suggest is that you create a new project from one of our templates, run Time Profiler against that, then add "junk" code into the project so you can see how that changes things. I think it's easier to understand what it's actually showing you when you're starting with the most basic case instead of a "real" app.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

0

Answer 2

linus_b OP

Oct ’24

Hey Kevin,

thanks a lot for the swift response. There is an ongoing case regarding this issue with DTS: 9323940. I've provided the full trace file there for reference. Unfortunately, I cannot share the entire thing here in the forums.

Nonetheless, I've taken a screenshot of the time profiler view to make things more clear:

Bildschirmfoto 2024-10-04 um 10.53.29.png

In this picture, you see the view I'm talking about in my original post. The other view (CPU Profiler, see below), shows that the cycles halved. Nonetheless, the time profiler view (shown in the picture) did not show any improvement (the 12.38 seconds are no improvement to the 12.36 seconds measured for the original, unoptimized implementation [not displayed in the picture]).

I hope this makes it more clear - I don't think this is one of the views where time profiler shows the measurements relative to the total usage, or am I mistaking here? The way I understood it, this is the total time the app spent on the CPU, requiring calculation?

Aside of that, would you say that the regular CPU Profiler (with the cycles) is a better/more reliable indication of any performance improvements? I.e. can I conclude that my refactoring actually worked if the cycles halved?

Here you see the CPU Profiler view (the one I keep mentioning w.r.t. CPU cycles):

Bildschirmfoto 2024-10-04 um 11.15.09.png

Here you see a total of 6.60 Gc for the refactored implementation, as opposed to 11.40 Gc for the original, unoptimized, implementation.

Really appreciate your help on this issue here, thanks a lot!

0