Performance

RSS for tag

Improve your app's performance.

Posts under Performance tag

52 Posts
Sort by:
Post not yet marked as solved
0 Replies
261 Views
How well does SolidWorks perform using Windows on a MacBook Pro with the M1 Max chip and 64GB unified memory? I’ll likely be running Windows through Parallels.
Posted
by Big00.
Last updated
.
Post not yet marked as solved
1 Replies
288 Views
When I try to use instruments most of the time I can't identify the threads. I'm assuming these are "internal" threads? One of the threads was causing the CPU to go up to 100% usage. I didn't change anything in the code, how do I identify what is causing these issues.
Posted Last updated
.
Post not yet marked as solved
3 Replies
1.1k Views
This is a follow up to feedback FB9144718, which we also discussed at a WWDC21 "Performance, power, and stability" lab session. Issue Summary The actual issue we are facing is that our XPC service is not running as fast as we would expect it to be, especially not on Intel machines; somewhat better on M1 machines but still not really good. After a lot of profilling with instruments, it finally turned out that the problem is caused by our processing getting regularly stopped as our processing thread is being preempted and put on hold for sometimes a tramendous amount of time (up to over 32 ms have been monitored). Even it is preempted for just a couple of ms most of the time, this is still a lot considering that the actual work it would otherwise perform is only in the range of microseconds. The reason why this is happening is probably caused by the fact that we don't use the XPC service just for processing application messages through the XPC protocol, which we do as well, but also retrieve requests through a mach port from another process. This causes our thread priority to be dropped down to 4 (see highlighted log line) and that's the reason why we get preempted for so long. The reason why it's not equally dramatic on M1 is that we are not preempted there, instead we are forced to run on the high efficiency cores instead of the high performance ones. Ideas from the Lab Other than completely restructuring our entire implementation which is eventually going to happen in the future anyway for Big Sur and newer, we still have to maintain this structure as long as we need to also support pre-Big Sur macOS version, so it would be great to have a less dramatic fix. Two suggestions were made at the lab: Change the RunLoopType in the XPC Plist from dispatch_main to NSRunLoop. We tried that but that didn't made any difference. Add a key ProcessType with the value Interactive to the XPC Plist. This key is not documented for XPC services, only for launchd daemons but we were told it should actually work for XPC services as well. We tried that as well, both, top level as well as adding it to the XPC sub-key but that didn't make a difference either. Another Idea That Didn't Work Now that second suggestion made me look up that key in the man page for launchd.plist and what I found there was pretty interesting. Apparently there is a ProcessType value documented as Adaptive Adaptive jobs move between the Background and Interactive classifications based on activity over XPC connections. See xpc_transaction_begin(3) for details. This seems to be our problem. Our XPC service is considered inactive when it processes messages over the mach port. Looking up the documentation of xpc_transaction_begin(3) tells me: Services may extend the default behavior using xpc_transaction_begin() and xpc_transaction_end(), which increment and decrement the transaction count respectively. This may be necessary for services that send periodic messages to their clients, not in direct reply to a received message. Using these two messages also frees us from the requirement to enable/disable sudden termination our own as it will automatically be controlled by these two functions as well. Yet even using these two functions to indicate activity doesn't prevent us from being preempted at regular intervals as our priority still drops to priority level 4 while we are still in the middle of processing (haven't called xpc_transaction_end() yet) . We seem to use it correctly though as it correctly disables sudden termination on our behalf as long as our XPC service remain in the active state (it will only receive mach messages for processing while in that state) and also gets re-enabled when we leave the active state again. Final Thoughts Also on the man page of xpc_transaction_begin() is written: The XPC runtime will also automatically manage the service's priority based on where a message came from. If an app sends a message to the service, the act of sending that message will boost the destination service's priority and resource limits so that it can more quickly fill the request. If, however, a service gets a message from a background process, the service stays at a lower priority so as not to interfere with work initiated as a direct result of user interaction. It looks like this is not working the way we use the XPC service at the moment. Our mach port messages either come from a System Extension (Big Sur and up) or from a root daemon started by launchd (Catalina and below, ProcessType is Interactive and nice value is -10) but apparently these messages cannot boost our XPC service and so it will stay on low prio.
Posted
by xcoder112.
Last updated
.
Post not yet marked as solved
0 Replies
289 Views
For a Create ML activity classifier, I’m classifying “playing” tennis (the points or rallies) and a second class “not playing” to be the negative class. I’m not sure what to specify for the action duration parameter given how variable a tennis point or rally can be, but I went with 10 seconds since it seems like the average duration for both the “playing” and “not playing” labels. When choosing this parameter however, I’m wondering if it affects performance, both speed of video processing and accuracy. Would the Vision framework return more results with smaller action durations?
Posted
by Curiosity.
Last updated
.
Post not yet marked as solved
2 Replies
347 Views
Debugging gputrace from M1 Max on older hardware in XCode warns "No compatible devices connected" and to "Connect a device that supports the screen resolution and Metal feature profile that this gputrace file was generated on. Seriously? I was boasting about XCode/Metals ability to get a gputrace and play it back, which is super helpful, but this was quite a let down. Is there no way other than buying a new Mac with M1 Max to get a look at the gputrace?
Posted Last updated
.
Post not yet marked as solved
2 Replies
484 Views
In our AR app and appclip made with SceneKit, we experience very significant drops in framerate when we make our 3D content appear at different steps of the experience. For now all of our 3D objects are in our Main Scene. Those which are supposed to appear at some point in the experience have their opacity set to 0.01 at the beginning and then fade in with a SCNAction (the reason why we tried setting their opacity to 0.01 at start was to make sure that these objects are rendered from the start of the experience). However, if the objects all have their opacity set to 1 from the start of the experience, we do not experience any fps drop. It is worth noting that the fps drops only happen the first time the app is opened. If I close it and re-open it, then it unfolds without any freeze. What would be the best way to load (or pre-load) these 3D elements to avoid these freezes? We have conducted our tests on an iPhone X (iOS 15.2.1), on an iPhone 12 Pro (iOS 14), and on an iPad Pro 2020 (iPad OS 14.8.1).
Posted Last updated
.
Post not yet marked as solved
9 Replies
750 Views
Hi, I am developing a simple passthrough proxy system extension using NETransparentProxyProvider. This is what the extension fundamentally does: In handleNewFlow open a connection to the remote endpoint using CreateTCPConnection method in Tunnel provider. Once the remote endpoint is connected open the NEAppProxyTCPFlow and start both ends of the flow. When I use perf to test the network speed while sending I see a 10 times drop in speed when using my system extension. iperf -c <server_address> iperf uses 131072 byte blocks to send data by default for 10 seconds My code for inbound and outbound flows is quite simple: For inbound flow read from the remote connection, in the completion handler for read write to the flow and in the completion handler for flow start another read from remote. For outbound flow read from the flow, in the completion handler write to the remote and in the completion handler for writing to the remote trigger another read from the flow. Is there any problem with the above approach which can cause network transfer slowdown? I also captured Wireshark traces for cases with and without my system extension and I see a pattern there. When I read from the flow the system extension reads chunks of varying sizes irrespective of what the application is sending. Eg. I see 4096, 16384, 8192. When I send these chunks to the remote connection it keeps waiting for ACKs for each chunk irrespective of the TCP window size. I also see a [PSH, ACK] in the last packet for each chunk. Without my system extension, iperf sends many packets in short time without [PSH,ACK] as it is using bigger buffer and does not wait for ACKs so frequently. It respects the TCP window size. I can provide any details needed to help root cause this problem. I am testing this on macOS BigSur 11.5.1 Any help is greatly appreciated Regards
Posted
by Dhan18.
Last updated
.
Post not yet marked as solved
1 Replies
475 Views
Below, the sampleBufferProcessor closure is where the Vision body pose detection occurs. /// Transfers the sample data from the AVAssetReaderOutput to the AVAssetWriterInput, /// processing via a CMSampleBufferProcessor. /// /// - Parameters: /// - readerOutput: The source sample data. /// - writerInput: The destination for the sample data. /// - queue: The DispatchQueue. /// - completionHandler: The completion handler to run when the transfer finishes. /// - Tag: transferSamplesAsynchronously private func transferSamplesAsynchronously(from readerOutput: AVAssetReaderOutput, to writerInput: AVAssetWriterInput, onQueue queue: DispatchQueue, sampleBufferProcessor: SampleBufferProcessor, completionHandler: @escaping () -> Void) { /* The writerInput continously invokes this closure until finished or cancelled. It throws an NSInternalInconsistencyException if called more than once for the same writer. */ writerInput.requestMediaDataWhenReady(on: queue) { var isDone = false /* While the writerInput accepts more data, process the sampleBuffer and then transfer the processed sample to the writerInput. */ while writerInput.isReadyForMoreMediaData { if self.isCancelled { isDone = true break } // Get the next sample from the asset reader output. guard let sampleBuffer = readerOutput.copyNextSampleBuffer() else { // The asset reader output has no more samples to vend. isDone = true break } // Process the sample, if requested. do { try sampleBufferProcessor?(sampleBuffer) } catch { /* The `readingAndWritingDidFinish()` function picks up this error. */ self.sampleTransferError = error isDone = true } // Append the sample to the asset writer input. guard writerInput.append(sampleBuffer) else { /* The writer could not append the sample buffer. The `readingAndWritingDidFinish()` function handles any error information from the asset writer. */ isDone = true break } } if isDone { /* Calling `markAsFinished()` on the asset writer input does the following: 1. Unblocks any other inputs needing more samples. 2. Cancels further invocations of this "request media data" callback block. */ writerInput.markAsFinished() /* Tell the caller the reader output and writer input finished transferring samples. */ completionHandler() } } } The processor closure runs body pose detection on every sample buffer so that later in the VNDetectHumanBodyPoseRequest completion handler, VNHumanBodyPoseObservation results are fed into a custom Core ML action classifier. private func videoProcessorForActivityClassification() -> SampleBufferProcessor { let videoProcessor: SampleBufferProcessor = { sampleBuffer in do { let requestHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer) try requestHandler.perform([self.detectHumanBodyPoseRequest]) } catch { print("Unable to perform the request: \(error.localizedDescription).") } } return videoProcessor } How could I improve the performance of this pipeline? After testing with an hour long 4K video at 60 FPS, it took several hours to process running as a Mac Catalyst app on M1 Max.
Posted
by Curiosity.
Last updated
.
Post not yet marked as solved
2 Replies
463 Views
I just got an app feature working where the user imports a video file, each frame is fed to a custom action classifier, and then only frames with a certain action classified are exported. However, I'm finding that testing a one hour 4K video at 60 FPS is taking an unreasonably long time - it's been processing for 7 hours now on a MacBook Pro with M1 Max running the Mac Catalyst app. Are there any techniques or general guidance that would help with improving performance? As much as possible I'd like to preserve the input video quality, especially frame rate. One hour length for the video is expected, as it's of a tennis session (could be anywhere from 10 minutes to a couple hours). I made the body pose action classifier with Create ML.
Posted
by Curiosity.
Last updated
.
Post not yet marked as solved
4 Replies
3.1k Views
So, I need to know if the 16gb is enough for xcode development. Or I have to choose the 32gb? What I execute/open when I develop: xcode vsc ios simulator android emulator a few window of terminal one chrome window with so many tabs notes figma Thanks.
Posted
by iLook.
Last updated
.
Post not yet marked as solved
2 Replies
395 Views
In the App we saw a 4 seconds loading time while launching the App. Sometimes this time up to 8 seconds but the average time is 4 seconds. What could be the reason? In the AppDelegate there isn't server calls. The breakpoint inside the AppDelegate function didFinishLaunchingWithOptions is taking 4 seconds to get in. I have some api calls but they are launched after the breakpoint in the AppDelegate. The Api calls responses are working faster. In Instruments I have some blocked time, is it normal? Is something wrong in the log?
Posted
by bicho.
Last updated
.
Post not yet marked as solved
6 Replies
2.3k Views
Hi there, I previously experienced a massive memory leak in SwiftUI on macOS which now seems to be fixed. CLICK - https://developer.apple.com/forums/thread/676860 But my app still becomes slower and memory footprint grows after some time of using it. I was able to created a new minimal but more extreme example to reproduce: This is a full app for SwiftUI. Compile for macOS (I'm using Xcode 12.4 macOS 11.2.3) import SwiftUI @main struct DemoApp: App {     @State var strings = ["Hello 1", "Hello 2"]     @State var bool = false     let timer = Timer.publish(every: 0.1, on: .main, in: .common).autoconnect()     @State var selected: Int?     var body: some Scene {         WindowGroup {             List(strings.indices, id: \.self, selection: $selected) { stringIndex in                 Text(strings[stringIndex])             }             .toolbar(content: {                 ForEach(1...100, id: \.self) { _ in                     Text("Hello World")                 }             })             .onReceive(timer) { input in                 if bool == true { selected = 0 }                 else { selected = 1 }                 bool = !bool             }         }     } } Timer is optional but it will show the problem more quickly. It seems to be connected to toolbar content a lot. Memory footprint quickly goes up, as well as CPU utilization & app slows down significantly just after a few seconds. Please help! What can I do? My app is pretty much not usable because of this bug. Johannes
Posted
by Yoko_Ono.
Last updated
.
Post not yet marked as solved
0 Replies
650 Views
When running on iPhone13 Pro Max (iOS15.1) and tapping the screen with several fingers in succession, it may go to 30fps at the moment of the tap. self.displayLink = [CADisplayLink displayLinkWithTarget: self selector: @selector(repaintDisplayLink)]; [self.displayLink setPreferredFramesPerSecond:60]; // or // [self.displayLink setPreferredFrameRateRange:CAFrameRateRangeMake(60,60,60)]; [self.displayLink addToRunLoop:[NSRunLoop currentRunLoop] forMode:NSDefaultRunLoopMode]; Unity and cocos2d-x are written with almost the same code, but the same phenomenon occurs. It is not reproduced on iPhoneXS MAX(iOS15.1), iPhoneSE2, iPhone7, etc. Also, if you set it to 120fps and run it, and then tap it continuously, the 120fps will turn into 60fps in the same way.
Posted
by Majing3.
Last updated
.
Post not yet marked as solved
0 Replies
402 Views
I have an multiple render-backend application on MacOS,so I can compare the performance between Metal and OpenGL about OffscreenRender. I found that, when I do not use MSAA,metal perform better.But when I open MSAA,especially at Large-texture,such as 3840x1920, metal seems slower. Then I simplify the render-pass, no draw call, seems it's the clear of texture was the difference of Metal and OpenGL, I'm not sure. in OpenGL,I use glClearBufferfv; in Metal, I set colorAttachments.loadAction=MTLLoadActionClear(I tried enalbe or disable tripple-buffering); Then I close the clear-action both on OpenGL and Metal,render-pass time of openGL reduce about 3ms ,Metal reduce about 8ms, Is that mean Metal take more time to clear texture? How can I resolve with such problem? (because I need clear-action) @Apple Device: Mac-mini-2018 Xcode: Xcode13.0 Here is the metal instrument screen-shot(idle of GPU seems too long,about 50ms):
Posted Last updated
.
Post not yet marked as solved
1 Replies
501 Views
My app uses CoreML to run neural network (CoreML use CPU for most layers). Some time my performance is very good but after 30 seconds speed become slowly (fps is much less). I profiled it and found, that iOS use performance core for my app at the beginning. After 30 seconds it stops use Performance cores and starts use efficient cores (them frequency is less). You may see to on the screenshot. qos of my queue is .userInitiated. Also I tried .userInteractive, but is don't change anything. I assume that it is cores planning feature of iOS. But I cannot find any information about it. Is there documentation, which describe this behavior? Can I say iOS use performance core for my app all the time? I use iPhone XR with iOS 14.7.1.
Posted Last updated
.
Post not yet marked as solved
0 Replies
413 Views
It seems that SpriteKit doesn't batch the draw calls for textures in the same texture atlas. There are two different behaviors, based on how the SKTextureAtlas gets initialized: The draw calls are not batched when the texture atlas is initialized using SKTextureAtlas(named:) (loading the texture atlas from data stored in the app bundle). But the draw calls seem to be batched when the texture atlas is created dynamically using SKTextureAtlas(dictionary:). The following images show the two different behaviors and the SpriteAtlas inside the Assets Catalog. 1. Draw calls not batched: 2. Draw calls batched: The SpriteAtlas: I created a sample Xcode 13 project to show the different behavior: https://github.com/clns/spritekit-atlas-batching. I have tried this on the iOS 15 simulator on macOS Big Sur 11.6.1. The code to reproduce is very simple: import SpriteKit class GameScene: SKScene { override func didMove(to view: SKView) { let atlas = SKTextureAtlas(named: "Sprites") // let atlas = SKTextureAtlas(dictionary: ["costume": UIImage(named: "costume")!, "tank": UIImage(named: "tank")!]) let costume = SKSpriteNode(texture: atlas.textureNamed("costume")) costume.setScale(0.3) costume.position = CGPoint(x: 200, y: 650) let tank = SKSpriteNode(texture: atlas.textureNamed("tank")) tank.setScale(0.3) tank.position = CGPoint(x: 500, y: 650) addChild(costume) addChild(tank) } } Am I missing something?
Posted
by calin.
Last updated
.
Post not yet marked as solved
0 Replies
312 Views
We have an iPhone app where we are extensively using NSPredicate for array filtering. With iOS 15.1 upgrade, we started seeing performance issues with our app. After troubleshooting, we found slowness with array filtering, which otherwise works pretty fast on iOS 15.0.2 and below versions. Has anyone else faced such issue? If so, any resolution available? We couldn't find anything in release notes for iOS 15.1, which could cause this issue.
Posted Last updated
.
Post not yet marked as solved
1 Replies
521 Views
MacBook Pro 13" (2015) needed new battery. To drain existing battery I let MacBook on, but also update OS X to Big Sur. Replaced battery and charged it. Upon boot, all worked but not keyboard. Trackpad works but keyboard doesn't. Replaced track pad cable, but keyboard is still nil. Any suggestions? Many thanks!
Posted
by jef231.
Last updated
.
Post not yet marked as solved
0 Replies
415 Views
I'm using xcrun xctrace export --output results.xml --input test_trace.trace --xpath '//trace-toc[1]/run[1]/data[1]/table' to export data from a test run with instruments as part of my app's CI. With Xcode 12 this resulted in an xml file that I could parse relatively quickly, but now with Xcode 13 the export process itself is taking 90+ seconds and generating a 160mb xml file for a 10 second recording. I noticed the table that has increased is the time-sample schema. Just attempting to export this table with --xpath '//trace-toc[1]/run[1]/data[1]/table[4]' takes quite a while. The table has about 790 thousand rows. I'm using a custom instrument based off the time profiler, and still have about the same number of stack trace samples in my output. Did anything change in Xcode 13 that caused instruments to include many more time samples that aren't corresponding to a stack trace? Is it possible to disable this to have fewer time samples in my trace (while preserving the stack trace frequency) so the xml can be parsed quicker?
Posted Last updated
.
Post not yet marked as solved
1 Replies
420 Views
I'm using Xcode 13.0 (13A233). I want to access the "enablePerformanceTestsDiagnostics" feature. When I enter "xcodebuild test -project PerformanceTest.xcodeproj -scheme PerformanceTest -destination "platform=iOS Simulator,id=D43B8013-11A6-4E66-A42A-7174B9109276" -enablePerformanceTestsDiagnostics YES" command in the terminal,the xcresult did not produce the memgraphset.zip file for me. I don't know why this happened, Any one can confirm if this is a bug or give me some hints about my operation.
Posted
by niekaihua.
Last updated
.