Some fundamental doubts about DisptachQueue and GCD

Question

Created Jan ’25

Replies 14

Boosts 0

Views 1.2k

Participants 5

I understand that GCD and it's underlying implementations have evolved over time. And many things have not been shared explicitly in Apple documentation.

The most concepts of DispatchQueue (serial and concurrent queues), DispatchQoS, target queue and system provided queues: main and globals etc.

I have some doubts & questions to clarify:

[Main Dispatch Queue] [Link] Because the main queue doesn't behave entirely like a regular serial queue, it may have unwanted side-effects when used in processes that are not UI apps (daemons). For such processes, the main queue should be avoided. What does it mean? Can you elaborate?
[Global Concurrent Dispatch Queues] Are they global to a process or across processes on a device. I believe it is the first case but just wanted to be sure.
[Global Concurrent Dispatch Queues] Does system create 4 (for each QoS) * 2 (over-commiting and non-overcommiting queues) = 8 queues in all. When does which type of queue comes into play?
[Custom Queue][Target Queue concept] [swift-corelibs-libdispatch/man/dispatch_queue_create.3] QUOTE The default target queue of all dispatch objects created by the application is the default priority global concurrent queue. UNQUOTE Is this stil true?
- We could not find a mention of this in any latest official apple documentation (though some old forum threads (one more) and github code documentation indicate the same).
- The official documentation only says:
  - [dispatch_set_target_queue] QUOTE If you want the system to provide a queue that is appropriate for the current object UNQUOTE
  - [dispatch_queue_create_with_target] QUOTE Specify DISPATCH_TARGET_QUEUE_DEFAULT to set the target queue to the default type for the current dispatch queue.UNQUOTE
  - [Dispatch>DispatchQueue>init] QUOTE Specify DISPATCH_TARGET_QUEUE_DEFAULT if you want the system to provide a queue that is appropriate for the current object. UNQUOTE
- What is the difference between passing target queue as 'nil' vs 'DISPATCH_TARGET_QUEUE_DEFAULT' to DispatchQueue init?
[Custom Queue][Target Queue concept] [dispatch_set_target_queue] QUOTE The system doesn't allocate threads to the dispatch queue if it has a target queue, unless that target queue is a global concurrent queue. UNQUOTE
- The system does allocate threads to the custom dispatch queues that have global concurrent queue as the default target.
- What does that mean? Why does targetting to global concurrent queues mean in that case?
[System / GCD Thread Pool] that excutes work items from DispatchQueue: Is this thread pool per queue? or across queues per process? or across processes per device?

Boost

Answer 1

Etresoft OP

Jan ’25

Why do you care? And why are you avoiding the elephant in the room - Swift Concurrency?

While GCD isn't deprecated, it does appear to be disavowed. Given that documentation was never its strong suit, and no one ever really knew how to use it, it seems like it would be risky to rely on any of those answers even if you could get them.

If Apple breaks and/or changes Swift Concurrency, they'll have to document it. Or, if nothing else, people will figure it out and complain about it online. Either way, with a large user base, the word will get out.

But if Apple changes anything at the GCD layer, perhaps to support those upcoming changes to Swift Concurrency, anyone relying on low-level GCD behaviour is going to be in a pickle and no one will be able to help.

0

Answer 2

Etresoft OP

Jan ’25

It's best to reply as a new reply. The "comment" functionality only serves to hide activity.

I'm not familiar with the Network framework. They don't seem to be all that sensitive. In every reference I can find, people are just using ".main" or ".global()". When Apple engineers respond, they don't seem to be complaining about that. So perhaps you're just thinking too hard.

That being said, many apps are little more than demo apps. And I have seen Apple engineers go out of their way to recommend against ".global()" in other places.

Many years ago, I tried to write some real-world networking apps and ran into many of the same kinds of detailed questions that you are asking. I was trying to use GCD networking directly, before the Network framework existed.

My solution was to switch to a simpler, more well-defined, and proven API - BSD sockets. You do have that same option too.

There is always a risk when committing to a new API that depends on some other technology that later falls out of favour and/or use. I think your other question is much more straightforward and specific. Hopefully you'll get a good answer there.

1

Answer 3

DTS Engineer OP

Apple

Jan ’25

Recommended

You seems to have started two threads on related topics. I’ve answered your Network framework questions on your other thread. As part of that I provided links to further docs. I recommend that you read through those and then come back here if you have follow-up questions.

Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

0

Answer 4

abhishekjain OP

Jan ’25

Written by @DTS Engineer I recommend that you read through those and then come back here if you have follow-up questions.

I believe what will help is to build the understanding around, how Dispatch works first, using this thread and then we would be able to build some good hypothesis on a solution to the problem stated in another thread.

0

Answer 5

DTS Engineer OP

Apple

Jan ’25

Accepted Answer

[Main Dispatch Queue] [Link] Because the main queue doesn't behave entirely like a regular serial queue, it may have unwanted side-effects when used in processes that are not UI apps (daemons). For such processes, the main queue should be avoided. What does it mean?

First off, as background, Dispatch's "main queue" is NOT in fact a "dispatch queue"/dispatch_queue_main_t. Our interface frameworks (UIKit/AppKit) both have the concept of the "main thread", which is both the first thread created and is there those thread use a RunLoop to receive events. The "dispatch main queue" was created to provide a convenient way to send messages to that special thread. In an that uses an main thread runloop, dispatching to the main thread does the same thing as "performSelectorOnMainThread".

Then:

For such processes, the main queue should be avoided. What does it mean?

Dispatch also has a "dispatchMain" function, which allows daemons built around GCD to block the main thread, effectively using it as another GCD thread. The "avoid" above specifically refers to that case, as dispatching to dispatch_queue_main_t can cause unexpected behavior in a dispatchMain based process.

WARNING: dispatchMain exists to solve a system level issue that external developers simply do not have. Choosing to use dispatchMain has MUCH broader consequence than it might appear and my recommendation would be that developers simply not use it at all.

[Global Concurrent Dispatch Queues] Are they global to a process or across processes on a device. I believe it is the first case but just wanted to be sure.

To the process.

[Global Concurrent Dispatch Queues] Does system create 4 (for each QoS) * 2 (over-commiting and non-overcommiting queues) = 8 queues in all. When does which type of queue comes into play?

This question is actually at the heart of why "Avoid Dispatch Global Concurrent Queues" exists. The issue here is that, fundamentally, the "Global Concurrent Queues" aren't really "queues", at least not in the same way other dispatch queues. Their actual role in GCD is that they manage the base scheduling priority for the threads that actually "do work".

Putting that in more concrete terms, the conceptual idea here was/is that dispatch queues feed work "into the system" while the global queues are responsible for managing and scheduling work on to the entire thread pool.

The design mistake here was that allowing work to be directly submitted to the global queues unnecessarily confused this API division and created a bug opportunity that did not really need to exist.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

1

Answer 6

DTS Engineer OP

Apple

Jan ’25

That dynamic explains this:

[Custom Queue][Target Queue concept] [swift-corelibs-libdispatch/man/dispatch_queue_create.3] QUOTE The default target queue of all dispatch objects created by the application is the default priority global concurrent queue. UNQUOTE Is this stil true?

Yes, but that's simply because serial queues don't actually "do" any work. Their job is to coordinate and serialize work, while the concurrent queues actually execute work.

And I have seen Apple engineers go out of their way to recommend against ".global()" in other places.

The practical issue here is that, based on our experience with GCD, we eventually figured out that most app work just as well with a much smaller number of queues than the overcommit system would allow. In concrete terms, if you submit 4 blocks to the system "at the same time", it's often faster to simply execute those serially on a single queue than it is to try and run them in parallel. However, GCDs overcommit tends to unnecessarily favor the parallel approach.

What is the difference between passing target queue as 'nil' vs 'DISPATCH_TARGET_QUEUE_DEFAULT' to DispatchQueue init?

Nothing.

#define DISPATCH_TARGET_QUEUE_DEFAULT NULL

[Custom Queue][Target Queue concept] [dispatch_set_target_queue] QUOTE The system doesn't allocate threads to the dispatch queue if it has a target queue, unless that target queue is a global concurrent queue. UNQUOTE The system does allocate threads to the custom dispatch queues that have global concurrent queue as the default target. What does that mean?

Earlier I talked about how exposing the concurrent queues as "queues" was a design mistake that muddled the API's clarity and this is an example of that in action. The "target queue" system is actually dealing with two separate concepts which then get muddled up because of our original design choice. Those concepts are:

Managing work:

Allowing queues to be "merged" into each other so that the scheduling of work can be separated from the "logic" of work. From a design perspective, you'd generally want the different subsystems of your app to have their own queues so that you can logically separate unrelated work. However, you might also want all of that work to be serialized on a single (or small number) of queues.

So, when one serial queue is the target of another queue, what that actually means is that that work of those two queue will be automatically "merged" and, from a scheduling perspective, the final behavior would be the same is if there was was only a single queue.

The muddle phrasing here:

The system doesn't allocate threads to the dispatch queue if it has a target queue,

...basically means "target queues merge work, they don't magically make more threads to do work".

Scheduling work:

Why does targetting to global concurrent queues mean in that case?

As I alluded to above, only the global queues do actual "work", as they're the part of GCD that actually "does work". What's easy to overlook here is that ALL GCD queues eventually end up targeting one the global queues. That's because of this requirement in the documentation for dispatch_set_target_queue:

"Important
When setting up target queues, it is a programmer error to create cycles in the dispatch queue hierarchy. In other words, don't set the target of queue A to queue B and the target of queue B to queue A."

The natural result of that requirement is that all target queue usage patterns are tree shaped, with a single queue at the bottom which then targets one of the global queues. Any other configuration of any complexity would include a cycle, which would then fail (I believe GCD crashes immediately).

[System / GCD Thread Pool] that excutes work items from DispatchQueue: Is this thread pool per queue? or across queues per process? or across processes per device?

GCD manages a pool of threads within your process. Of the choices above, that would be "across queues per process", however, I don't think that's a good way to understand what's going on. Queues and thread aren't connected to each, queues collect work and work is then "fed" into a pool of threads that are "underneath" the public APIs you interact with.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

1

Answer 7

abhishekjain OP

Jan ’25

Thanks @DTS Engineer . This is very helpful.

First off, as background, Dispatch's "main queue" is NOT in fact a "dispatch queue"/dispatch_queue_main_t. Our interface frameworks (UIKit/AppKit) both have the concept of the "main thread", which is both the first thread created and is there those thread use a RunLoop to receive events. The "dispatch main queue" was created to provide a convenient way to send messages to that special thread. In an that uses an main thread runloop, dispatching to the main thread does the same thing as "performSelectorOnMainThread".

So, even in case of non interaction apps - (1) In gui session, our apps use NSApplicationMain() (2) In non-gui session, like the case of daemons, we use CFRunLoopRun() - so it should be safe to dispatch work on main thread?

[Main Dispatch Queue] [Link] Because the main queue doesn't behave entirely like a regular serial queue, it may have unwanted side-effects when used in processes that are not UI apps (daemons). For such processes, the main queue should be avoided.

This guideline is for non interaction apps that don't get into a RunLoop on the main thread?

Putting that in more concrete terms, the conceptual idea here was/is that dispatch queues feed work "into the system" while the global queues are responsible for managing and scheduling work on to the entire thread pool. The design mistake here was that allowing work to be directly submitted to the global queues unnecessarily confused this API division and created a bug opportunity that did not really need to exist.

Though the intent and problem created is now clear from above response. Can you now explain - how that has now been remedied or attempted to improve using over-commiting and non-overcommiting queues for global concurrent queues?

I am sharing my understanding from what I have learnt from the responses:

GCD manages a thread pool per process
GCD custom queues can have other GCD custom queues as target but eventually the leaf custom queue targets one of the global queues.
These GCD custom queues (whether serial or concurrent) are there to only feed work "into the system"
The actual work happens in the global queues - responsible for managing and scheduling work on to the entire thread pool per process.
This merger of work across queues in hierarchy and the scheduling of work ensures that the execution semantics are preserved for a queue (serial - one block at a time and concurrent - multiple blocks at a time)

Please let me know if the understanding is correct.

0

Answer 8

DTS Engineer OP

Apple

Jan ’25

So, even in case of non interaction apps - (1) In gui session, our apps use NSApplicationMain() (2) In non-gui session, like the case of daemons, we use CFRunLoopRun() - so it should be safe to dispatch work on main thread?

Yes. A few additional details:

RunLoops are a tricky topic that many developers struggle to get their head around, but you can find my broader attempt to explain them here.
The "core" run loop API is actually CFRunLoop. NSRunLoop is in fact a convenience wrapper class around CFRunLoop. NSApplicationMain goes through the app initialization process... then uses NSRunLoop to run through CFRunLoop.

All that means that there are in fact ONLY two APIs involved here- CFRunLoop and dispatchMain().

This guideline is for non interaction apps that don't get into a RunLoop on the main thread?

That guideline is for app that use dispatchMain. It does NOT apply to any app using CFRunLoop or the layers built above it (see above).

Also, just making sure this is clear, I can't really think of any reason why a developer would use dispatchMain.

Though the intent and problem created is now clear from above response. Can you now explain - how that has now been remedied or attempted to improve using over-commiting and non-overcommiting queues for global concurrent queues?

At this point that original design choice can't really be changed, however, the solution from a code level is really simple. Don't use the global queues directly, create your own queues and dispatch to those.

One thing to understand here is that the design mistake here wasn't about how GCD actually works, but was actually about how effected GCD usage. The global queues have tended to encourage blindly dispatching work the global queues when in fact performance would be identical or better if the app routed all work through one (or a small number) of serial queues.

Can you now explain - how that has now been remedied or attempted to improve using over-commiting and non-overcommiting queues for global concurrent queues?

I think the first thing to understand here is that overcommit is a feature, not a bug. GCD's role is to provide a "base" level thread API that can be used across ALL of our APIs and frameworks and, in that context, there are two incompatible requirements:

No formal coordination of activity across components, meaning "framework A" doesn't need to know about "framework B".
Thread's are allowed to block waiting on "stuff" (file I/O, dispatch_sync, etc).

If both of those things are required then you can easily deadlock by creating "chains" of work that depend on work that ends up waiting on a thread to run. Overcommit breaks that pattern by allow the dispatch system to create additional threads if/when existing threads block.

The downside of that approach is that it can lead to an explosion of work IF lots of blocking working is being dispatched concurrently and the solution is to... not do that. More specifically, create your own serial queues and use those to limit the level of parallel activity.

I am sharing my understanding from what I have learnt from the responses:

Yes, that all looks correct.

One clarification here:

These GCD custom queues (whether serial or concurrent)

Concurrent queues are a relatively late addition to the API and, IMHO, are something that you should actively avoid, as they create exactly the same issues as the global concurrent queues.

The ONE exception to that is cases where you're specifically trying to create a limited amount of parallel activity with a specific component ("up to 4 jobs at once"). IF that's the case, then the correct solution would be to use NSOperationQueue to set the width.

As a side note here, NSOperationQueue is actually the API I would recommend over dispatch for case where you want something that works like GCD. It's built as a wrapper around dispatch, however, it also provides things like a common work object base class, cancellation, progress, etc. It also exports the underlying GCD queue, so you can also use it with any API that requires a GCD queue.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

1

Answer 9

abhishekjain OP

Jan ’25

Thanks again @DTS Engineer (Kevin). It is very useful information.

Concurrent queues are a relatively late addition to the API and, IMHO, are something that you should actively avoid, as they create exactly the same issues as the global concurrent queues.

If the work we dispatch to the concurrent queue, does not involve blocking (say NO blocking system calls / IO etc) though they may run for few micro seconds in some worst cases (no blocking involved though). Then it should not lead to overcommitting (which in turns lead to thread explosion)?

The specific problem with using concurrent queue, where work that can block, is dispatched on these queues, and when that's picked up by threads, at some point it blocks and if there is more work in the queues, more threads are spawned in that case to pick those pending work in the queue.

Is the understanding correct?

What are the limits involved here? For ex: total number of parallel threads that are in action, total number of threads in action plus blocked?

The ONE exception to that is cases where you're specifically trying to create a limited amount of parallel activity with a specific component ("up to 4 jobs at once"). IF that's the case, then the correct solution would be to use NSOperationQueue to set the width.

Glad that you mentioned it. Can we see how can we use it for some problem we intend to solve for the networking subsystem of our app - seperate thread here.

0

Answer 10

DTS Engineer OP

Apple

Jan ’25

If the work we dispatch to the concurrent queue, does not involve blocking (say NO blocking system calls / IO etc) though they may run for few micro seconds in some worst cases (no blocking involved though). Then it should not lead to overcommitting (which in turns lead to thread explosion)?

Yes, but...

Work like this (no I/O at all) is relatively rare, particularly in volumes high enough that parallel activity is actually relevant.
"Bulk" CPU bound work in GCD can cause serious delays and disruptions in your app. GCDs underlying "goal" is basically "keep all the cores busy". If all the cores are currently running CPU bound work... then GCD has met it's goal and will stop dispatching new work until that work finishes. That may not be what you wanted.
If parallelizing long running CPU work can be trickier that it looks.

As on particularly memorable example, I was once given a benchmark that clearly showed an iPad Air 2 (2014) was ~2x faster than an iPhone 7 (2016). Crucially, that result was entirely accurate. The iPad WAS faster than a much newer iPhone.

The problem was that the benchmark wasn't actually showing what he thought it was. The problem was that he'd divided the work up into such small block that what was actually being tested was the devices ability to process effectively "empty" blocks. Turns out that if you want to shuffle empty block, an extra core (3 vs 2) is "better".

As it happens, I still have the raw numbers I worked up. Here was the original test, showing the iPad Air 2 at ~2x as fast:

iPad Air 2-> GCD Calls: 25600 process time: 0.225574
iPhone 7  -> GCD Calls: 25600 process time: 0.493609

However, here is the exact same test doing ALL the work in a single block:

iPad Air 2-> GCD Calls: 1 process time: 0.143264
iPhone 7  -> GCD Calls: 1 process time: 0.038005

That is, the iPad Air 2 was ~2x faster and the iPhone 7 was ~10x faster without ANY parallelization.

Finally, the "ideal" result turned out to be:

iPad Air 2-> GCD Calls: 100 process time: 0.054692
iPhone 7  -> GCD Calls: 100 process time: 0.017707

The key lesson to take away here is that parallel is NOT inherently "better". Used carelessly, it can often end up creating problems that never needed to exist at all. In the case above, there was a starting assumption that the task (encryption) was "slow", so a parallel solution was created. That parallel solution then reenforced that impression, because his implementation WAS in fact slow. This dynamic is most obvious in the fastest device I tested:

iPhone XS -> GCD Calls: 25600 process time: 0.143785
iPhone XS -> GCD Calls: 1 process time: 0.024048
iPhone XS -> GCD Calls: 100 process time: 0.008627

Sure, the optimal solution is fast (REALLY fast), but 0.02s isn't exactly "slow". Ultimately, a lot of effort was wasted unnecessarily solving a performance problem that did not in fact exist.

The specific problem with using concurrent queue, where work that can block, is dispatched on these queues, and when that's picked up by threads, at some point it blocks and if there is more work in the queues, more threads are spawned in that case to pick those pending work in the queue. Is the understanding correct?

Yes, but I think it's easier to understand in reverse. GCD's "goal" it to keep all of the cores busy. If a thread blocks and there are blocks waiting to run, then it creates a new thread and starts another block.

However, the final issue here isn't about catastrophic failure, but is about "noise" and wasted performance. Under load, typical GCD work tends to have the following characteristics:

The execution time for each block is relatively small. Making up a number, say ~0.01s.
The actually work is a mix of CPU and I/O bound work, so it will block during it's execution, at least briefly.
Scheduling is "bursty", not "smooth". That is, it's more common for a number of blocks to be submitted in a short time window followed by a pause, instead of a steady, even, "stream".

In concrete terms, imagine an app processing an event on the main thread which submits 10 blocks to GCD. What happens to those block:

If they're submitted to a serial queue, then all blocks are done in 10 x 0.01s-> ~0.1s
If they're submitted concurrently, then things get... messy. In the worst case, GCD creates 10 thread, each of which runs for ~0.01s, and is then stuck with 10 threads with nothing to do.

The key point here is that for most apps, their isn't ANY functional difference between those two cases. In the BEST case, the performance difference is invisible to the user. In the worst case, there isn't ANY performance benefit.

The CLASSIC pattern here is that work is dispatched to the background, then the result is sent back to the main thread. If you assume exactly the same block length for the main block thread, then the sequences take exactly the same time. That is:

Block 1 finishes 0.01s after start and returns to the main thread. The last block finishes 0.10s later and returns to the main thread. Final completion occurs at ~0.11s.
A block finishes at ~0.01s later and returns to the main thread. The other blocks finish at some point after that, and are all queued on the main thread. Each block takes 0.01s.... so final completion occurs at ~0.11s.

...except #2 left 10 threads twiddling their thumbs with nothing to do.

What are the limits involved here? For ex: total number of parallel threads that are in action, total number of threads in action plus blocked?

The total number of thread it will create is ~60 threads, but that's high enough that it doesn't really work well. Total CPU bound count is, ideally, ~core count. However, keep in mind that most work loads are a mix of both CPU and I/O, so you can easily end up with lots of CPU bound threads.

Glad that you mentioned it. Can we see how can we use it for some problem we intend to solve for the networking subsystem of our app - seperate thread here.

I'll add a quick note there.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

0

Answer 11

veer_dutta07 OP

Feb ’25

Hi @DTS Engineer (Kevin),

Concurrent queues are a relatively late addition to the API and, IMHO, are something that you should actively avoid, as they create exactly the same issues as the global concurrent queues.The ONE exception to that is cases where you're specifically trying to create a limited amount of parallel activity with a specific component ("up to 4 jobs at once"). IF that's the case, then the correct solution would be to use NSOperationQueue to set the width. As a side note here, NSOperationQueue is actually the API I would recommend over dispatch for case where you want something that works like GCD. It's built as a wrapper around dispatch, however, it also provides things like a common work object base class, cancellation, progress, etc. It also exports the underlying GCD queue, so you can also use it with any API that requires a GCD queue.

As per your suggestion i tried using OperationQueue, which provide[s] us the capability of controlling the number of Concurrent Task[s] by specifying the maxConcurrentTask[s] property of the Operation Queue.

To test this, I created an OperationQueue with QOS set to default and concurrent attributes to allow multiple tasks to run in parallel. I then set maxConcurrentOperationCount = 2 and passed the queue to the .start call of NWListener. Next, I sent 10 concurrent connections from another machine to observe whether only 2 connections would be processed at a time. However, despite setting maxConcurrentOperationCount, all 10 connections were handled simultaneously. It appears that the underlying dispatch queue is bypassing the concurrency limit.

import Network
import Foundation
 
var operation_queue = OperationQueue ()
 
operation_queue.underlyingQueue = DispatchQueue (label: "test_operation_queue", qos: .default, attributes: .concurrent)
 
operation_queue.maxConcurrentOperationCount = 2
 
do {
     
    var params = NWParameters(tls: NWProtocolTLS.Options (), tcp: NWProtocolTCP.Options())
     
    let listener = try NWListener(using: params, on: 52000) // Use a specific port
     
    listener.stateUpdateHandler = { state in
        switch state {
        case .setup:
            print("Setup state")
        case .waiting(let error):
            print("Waiting state with error: \(error)")
        case .ready:
            print("Server is ready on port \(listener.port ?? 0)")
        case .failed(let error):
            print("Failed state with error: \(error)")
        case .cancelled:
            print("Cancelled state")
        @unknown default:
            print("Unknown state")
        }
    }
     
    listener.newConnectionHandler = {connection in
             
        print ("Connection Received and processing Started")
        
    }
     
     
    listener.start(queue: operation_queue.underlyingQueue!)
 
} catch {
    print("Failed to create listener: \(error)")
}

Am I missing something here?

0

Answer 12

DTS Engineer OP

Apple

Feb ’25

To test this, I created an OperationQueue with QOS set to default and concurrent attributes to allow multiple tasks to run in parallel. I then set maxConcurrentOperationCount = 2 and passed the queue to the .start call of NWListener. Next, I sent 10 concurrent connections from another machine to observe whether only 2 connections would be processed at a time. However, despite setting maxConcurrentOperationCount, all 10 connections were handled simultaneously. It appears that the underlying dispatch queue is bypassing the concurrency limit.

Am I missing something here?

Yes. You've misunderstood what the queue here is actually used for and what constraining it actually "does". The queue you're providing to the Network.framework has a very specific, limited role- it's the target queue the network framework uses when calling "back" to your code to tell you "stuff". As the NWListener documentation puts it:

"The queue on which listener events are delivered."

Event delivery is COMPLETELY separate from the process of actually managing connections and, more importantly, is basically independant of total thread activity. That is, managing 100s, 1000s, or even 10,000s of connections doesn't inherently require more threads than managing one.

If your goal is to limit NWListener then you do that with the "newConnectionLimit" property:

"The remaining number of inbound connections to deliver before rejecting connections."

However, no such limit property exists for NWConnection because that limit wouldn't really make "sense". Each NWConnection is an individual outgoing connection, so the way you limit outgoing connection... is to stop creating additional connections*.

*Strictly speaking you could do exactly the same thing in NWListener by failing in newConnectionHandler, however, setting "newConnectionLimit" allows the system to fail the connection attempt much earlier in the connection process, which is more efficient.

Note that this dynamic is why a higher level API like NSURLSession has a property like "HTTPMaximumConnectionsPerHost". The point of NSURLSession is to provide a higher level abstraction for managing connections, part of which is allowing you to queue up network transactions without worrying about batched transactions generating overwhelming transaction volume.

Lastly, this dynamic is exactly what led to the advice I gave here:

"Everything" gets it's own serial queue, at whatever locations make sense in the context of your apps own architecture. That might be one per connection, one per object, one per... whatever. The only thing to be careful of here is creating so many queues that it becomes difficult for you to keep track of what work is supposed to be happening on what queue. In this context, queues are really more of about "labeling" work than about executing it*.
ALL (yes, ALL) of those queues are then targeted at the same serial queue.
Forget about all of this and get the rest of your app working.

...more advice on optimization.

For networking in particular, the underlying issue here is that the queue architecture isn't all the relevant to overall app performance. That is, if your app kicks of 4 connection and all of those for connections finish at EXACTLY the same time 1s later, then the difference between:

Processing those events serially ("width = 1").

OR

Processing those events in parallel ("width = 4").

...is probably difficult to measure, much less "notice". More to the point, in the cases where it is noticeable, the problem and the solution are almost always:

Make sure that you actually know what the problem really is. For example, I've seen MANY cases where "my network stack is slow" was in fact "parsing a giant pile of JSON is slow". The solution there is to move that JSON parsing out of your networking queue, not to mess with your queue architecture.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

1

Answer 13

veer_dutta07 OP

Feb ’25

Thanks, @DTS Engineer (Kevin). I think there's been a misunderstanding regarding my doubt about the OperationQueue concurrency limit. Since I’m using an OperationQueue with a defined concurrency limit, the number of tasks executing concurrently depends on this limit. As I am the one scheduling tasks through the OperationQueue, the concurrency level will be controlled by the set limit.

Now, in the case of Apple's Network Framework, we need to provide a dispatch queue (The queue on which listener events are delivered). If I pass the same dispatch queue that the OperationQueue uses internally, will this affect the number of callbacks executed concurrently? Or does the OperationQueue concurrency limit have no impact on callback execution?

0

Answer 14

DTS Engineer OP

Apple

Feb ’25

Now, in the case of Apple's Network Framework, we need to provide a dispatch queue (The queue on which listener events are delivered). If I pass the same dispatch queue that the OperationQueue uses internally, will this affect the number of callbacks executed concurrently? Or does the OperationQueue concurrency limit have no impact on callback execution?

Actually, I think I was wrong about the behavior here. You can target an OperationQueue at a dispatch queue, but the behavior here is the same as setting the target queue for for a dispatch_queue. That is, your operation queue will now execute on that dispatch_queue, but the width of the dispatch queue won't actually change.

So, resetting the conversation here, there isn't any public API that will allow you to modify the "width" of a dispatch queue.

Having said that, I also want to reemphasize my earlier point, which is basically that there isn't any reason to try and process network callback in parallel. More specifically:

For most apps network callbacks are "infrequent" enough that callback collisions shouldn't really occur.
When the do occur, they're occurring because of the volume of work occurring in the callback, not because of the frequency of the callback itself. The solution is to move the work out of the network callback, not to make the callbacks parallel.

Looking at the example here:

"I've seen MANY cases where "my network stack is slow" was in fact "parsing a giant pile of JSON is slow". The solution there is to move that JSON parsing out of your networking queue, not to mess with your queue architecture."

The simplest solution would be to do the JSON parsing on an OperationQueue, then use maxConcurrentOperationCount to constrain the level of simultaneous parsing.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

0