Controlling the number of Pending Send Completions using NWConnection

Context: We are using NWConnection for UDP and TCP Connections, and wanted to know the best way to keep the number of pending send completions in control to limit resource usage

Questions:

  1. Is there a way to control the send rate, such that too many 'send pending completion' does not get queued. Say if I do a ‘extremely dense flurry of 10 million NWConnection.send’ will all go asynchronous without any complications? Or I would be informed once it reaches some threshold.
  2. Or no? And is it the responsibility of the application using NWConnection.send to limit the outstanding completion , as if they were beyond a certain limit, it would have an impact on outstanding and subsequent requests?
  3. If so – how would one know ‘what is supposed to be the limit’ at runtime? Is this a process level or system level limit.
  4. Will errors like EAGAIN and ETIMEOUT ever will be reported. In the test I simulated, where the TCP Server was made to not do receive, causing the 'socket send buffer' to become full on the sender side. On the sender side my send stopped getting complete, and became pending. Millions of sends were pending for long duration, hence wanted to know if we will ever get EAGAIN or ETIMEOUT.
Answered by DTS Engineer in 822865022

Within Network framework, every connection has a send buffer [1]. That buffer has a high-water mark, that is, its expected maximum size. When you send data on the connection, the system always adds the data to the buffer. After that, one of two things happens:

  • If the amount of buffered data is below the high-water mark, the system immediately calls the completion handler associated with the send.

  • If not, it defers calling your completion handler. That is, it holds to on the completion handler and only calls it once the amount of buffered data has dropped to a reasonable level.

If you have a lot of data to send, the easiest approach is to send a chunk of data and, in the completion handler, send the next chunk. Assuming the network is consuming data slower than you’re producing it, the amount of buffered data will rapidly increase until it exceeds the high-water mark. At that point the system will stop calling your completion handler, which means you’ll stop sending new data. This gives the network tranport a big buffer of data, allowing it to optimise its behaviour on the wire.

I think the above will let you resolve all your specific questions, but please do reply here if you need further help.

Finally, if you combine Network framework with another API that uses completion handlers, you might find the techniques shown in Handling Flow Copying to be useful.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] You can think of this like a socket send buffer and a user-space buffer, but keep in mind that in many cases Network framework doesn’t use BSD Sockets for its networking but instead relies on a user-space networking stack.

Accepted Answer

Within Network framework, every connection has a send buffer [1]. That buffer has a high-water mark, that is, its expected maximum size. When you send data on the connection, the system always adds the data to the buffer. After that, one of two things happens:

  • If the amount of buffered data is below the high-water mark, the system immediately calls the completion handler associated with the send.

  • If not, it defers calling your completion handler. That is, it holds to on the completion handler and only calls it once the amount of buffered data has dropped to a reasonable level.

If you have a lot of data to send, the easiest approach is to send a chunk of data and, in the completion handler, send the next chunk. Assuming the network is consuming data slower than you’re producing it, the amount of buffered data will rapidly increase until it exceeds the high-water mark. At that point the system will stop calling your completion handler, which means you’ll stop sending new data. This gives the network tranport a big buffer of data, allowing it to optimise its behaviour on the wire.

I think the above will let you resolve all your specific questions, but please do reply here if you need further help.

Finally, if you combine Network framework with another API that uses completion handlers, you might find the techniques shown in Handling Flow Copying to be useful.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] You can think of this like a socket send buffer and a user-space buffer, but keep in mind that in many cases Network framework doesn’t use BSD Sockets for its networking but instead relies on a user-space networking stack.

Thank You for the response.

QUOTE

Within Network framework, every connection has a send buffer [1]. That buffer has a high-water mark, that is, it's expected maximum size. When you send data on the connection, the system always adds the data to the buffer.

UNQUOTE

Rephrasing the above: The above statement mean that the NWConnection.send will always cause the data to gets copied into the 'send buffer', but if the copied location is above the high-water mark, it will defer the calling of the completion handler. My previous understanding was that the copying to the send-buffer itself will be deferred. Hence wanted to clarify the doubt.

QUOTE From the document https://developer.apple.com/documentation/networkextension/handling-flow-copying Buffering larger amounts of data can lead to memory problems. If the provider must buffer data, set an upper bound on the buffer and don’t read until the buffer has space to hold more data. UNQUOTE

We are using UDP, and want to parallelize sending of multiple datagrams of a large message. Does the above statement in bold means that if the application wants to send multiple datagrams before waiting for a completion, the application should define a upper limit in term of size. Say if the application defines an upper limit of 16000 bytes, and given datagram of size 1000 bytes, then at most the application can have 16 pending send completion. Once the pending limit reaches 16 the application should defer further sends. Hence have a query here, what is the recommended basis of deciding on the buffer size for where parallelism is desired. Is there any function that can be used at runtime to know how many pending send completions or how much buffer size pending at a time is fine.

Written by tuhinkumar in 822883022
My previous understanding was that the copying to the send-buffer itself will be deferred.

I think that’s a distinction that makes no difference in practice. Under the covers Network framework works with dispatch_data_t objects. Those are immutable. When you pass data to a send routine, it retains the object until it’s done with the data. Exactly when that happens is an implementation detail. Critically, you might see different behaviour with each networking stack:

  • With the in-kernel networking stack, the data must eventually get copied into a socket buffer, at which point the data object could be released.

  • With the user-space networking stack, you could imagine it using the data object as the socket buffer.

IMPORTANT I’m not saying that this is how it currently works; rather I’m pointing out how the implementation could differ and still maintain the current API.

Written by tuhinkumar in 822883022
We are using UDP, and want to parallelize sending of multiple datagrams of a large message

How big is this message gonna get? Your example was 16 KB, which is a trivial size for a modern OS. If that’s your typical case then I’d:

  • Use the batch(_:) method.

  • Inside the batch, send each datagram.

  • Add a completion handler only to the last.

Written by tuhinkumar in 822883022
Is there any function that can be used at runtime to know how many pending send completions or how much buffer size pending at a time is fine.

There is for TCP but AFAIK there’s nothing on similar for UDP.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Hi @DTS Engineer ,

Thanks for the advice! I’ve been trying to apply the batch(_:) method for both send and receive operations, but I’m having trouble figuring out how to do it, since both methods have their own completion handlers. Could you please provide a demonstration of how to apply batching in this context?

Also, when you mention attaching a completion handler only to the last send operation in the batch, I’m not clear on how failures from earlier send operations are communicated to me. Since each send and receive has its own callback, how would I be notified if an earlier send fails if I’m only handling the final completion?

Written by harshal_goyal in 823333022
Could you please provide a demonstration of how to apply batching in this context?

I don’t have any code snippets for this handy, but AFAIK there’s nothing tricky to it. You simple call batch(…) and, in the closure, make a bunch of send calls. Or a bunch of receive calls.

Written by harshal_goyal in 823333022
how would I be notified if an earlier send fails if I’m only handling the final completion?

You wouldn’t. But such is the nature of networking. In the UDP case, a datagram can be dropped at any time. Even in the TCP case, the completion handler just tells you that the networking stack has accepted the data for transmission, not that the data made it to the remote peer.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Hi @DTS Engineer , Thanks!

I was using the below code to test this out:

import Foundation
import Network
 
 
class Networkconnection {
    
    var vConnection : NWConnection
    
    public init () {
        
        vConnection = NWConnection(to: NWEndpoint.hostPort(host: "10.20.16.144", port: NWEndpoint.Port(rawValue: 45000)!), using: .udp)
        
        vConnection.stateUpdateHandler = { (state) in
            
            print ("did change state: \(state)")
        }
        
    }
    
    public func sendData () -> Void {
        
        vConnection.batch {
            vConnection.send(content: "Send1".data(using: .utf8), completion: NWConnection.SendCompletion.idempotent)
            
            vConnection.send(content: "Send2".data(using: .utf8), completion: NWConnection.SendCompletion.idempotent)
            
            vConnection.send(content: "Send3".data(using: .utf8), completion: NWConnection.SendCompletion.contentProcessed({ error in
                print (Thread.current)
                print ("Send 3 completed with error code: \(error?.errorCode)")
            }))
        }
    }
    
    public func StartConnection () -> Void {
        
        vConnection.start(queue: .global())
    }
    
}
 
 
func Main () -> Void {
    
    var conn_obj = Networkconnection ()
    conn_obj.StartConnection()
    conn_obj.sendData()
}
 
Main ()
RunLoop.main.run()

Now, I have a couple of questions:

  1. How can I determine whether "Send 1" and "Send 2" were successfully written to the OS buffer? Since only the callback for "Send 3" is triggered, is there any way to track the success of the previous sends?

  2. When I mark "Send 1" and "Send 2" as idempotent, what does that actually mean? I understand that marking something as idempotent usually means I can retry the operation without causing any issues if it fails. But how do I know if the operation succeeded or failed? Is this the same as usual, or does idempotency work differently in the batch API?

  3. When would I actually want to use idempotent sends within a batch as opposed to using callbacks for each operation? What is the specific use case for marking operations as idempotent within a batch, and how does it impact performance or behavior?

  4. If for some of my application logic I am still interested in receiving callbacks for each individual send operation, would using the batch API provide any benefits compared to just triggering the sends individually without batching them? Would the batch API improve efficiency or callback management in this scenario, or is it better to handle each send separately?

Written by harshal_goyal in 823519022
1. How can I determine whether "Send 1" and "Send 2" were successfully written to the OS buffer?

Technically you can’t.

In practice, however, it’s reasonable to assume that, once you receive the completion handler for the last send, the others are on their way.

Written by harshal_goyal in 823519022
2. When I mark "Send 1" and "Send 2" as idempotent, what does that actually mean?

It means that the networking stack doesn’t give you any indication that it accepted the data for transfer across the network.

The term idempotent is somewhat of a stretch here. In general it means that a operation can be applied multiple times and it’ll achieve the same result. In networking it usually means that a request can be safely retried. In this content it means Network framework won’t tell you about failures, so you apply it when you’re able to retry in some way.

Written by harshal_goyal in 823519022
3. When would I actually want to use idempotent sends within a batch as opposed to using callbacks for each operation?

It minimises the number of callbacks you’re dealing with.

Written by harshal_goyal in 823519022
4. … would using the batch API provide any benefits compared to just triggering the sends individually without batching them?

The batch mechanism is about performance. Its shouldn’t affect the semantics of these operations. So, if you’re expecting semantic benefits then, no, you won’t get that.

You might get performance benefits but, like with anything related to performance, it’s best to measure rather than guess.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Thanks @DTS Engineer !

This really helps to clear things up.

I have one more question:

In the case of TCP, if I trigger three sends simultaneously, either with or without batching, does this guarantee that the order in which I triggered the sends is the same order in which they are delivered? Or should I wait for the completion of the “n-1”th send before triggering the “nth” send?

Written by harshal_goyal in 823671022
In the case of TCP, if I trigger three sends simultaneously, either with or without batching, does this guarantee that the order in which I triggered the sends is the same order in which they are delivered?

If you do these sends “simultaneously” then there is no “order”, by definition.

Sending data on a TCP connection from multiple threads simultaneously is bananas. Don’t do that.

So, assuming that you’re doing all the sends from the same thread [1] then the data will be serialised on the wire in order. That’s true regardless of whether or not you use batching.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] Or from the same async function, if you’re using Swift concurrency.

Hi @DTS Engineer ,

Technically you can’t. In practice, however, it’s reasonable to assume that, once you receive the completion handler for the last send, the others are on their way.

In above scenario, now assume that if I am sending 100 datagrams using "batch" and first 99 are marked idempotent whereas the last one has a completion callback attached to it. Now whenever i get the completion callback for the last send is is safe to assume that all the previous sends where completed either successfully or with failure and I can safely de-allocate the buffers for all these 100 sends ?

Sending data on a TCP connection from multiple threads simultaneously is bananas. Don’t do that. So, assuming that you’re doing all the sends from the same thread [1] then the data will be serialised on the wire in order. That’s true regardless of whether or not you use batching.

Now in above case if I am doing TCP sends from "n" different threads not simultaneously, but ensuring that one thread goes after another without waiting for the completion will the order of sends still be maintained?

Also, assuming I have a TCP Client which is performing "10" sends and there are a few pending recvs triggered waiting for the data on client, now if the other end closes the connection after receiving say 3rd packet, what will happen to the remaining recvs / sends, will I get an error for all the pending sends / recvs on my client?

Written by harshal_goyal in 824310022
is is safe to assume that … I can safely de-allocate the buffers

I don’t know what you mean by that. Network framework works in terms of Swift Data values [1]. When you call send(…), the framework makes a copy of the value [2]. As soon as the send(…) returns you are free to do whatever you want with the Data value you supplied [3].

Written by harshal_goyal in 824310022
I am doing TCP sends … will the order of sends still be maintained?

Yes. Anything else would completely break TCP semantics.

Written by harshal_goyal in 824310022
if the other end closes the connection after receiving say 3rd packet

That depends on how they close the TCP connection. There are two different ways you can close a TCP connection:

  • Cleanly

  • Forcefully

In the clean case the peer sends a FIN, which effectively closes the write side of the TCP connection. It then waits for a FIN from the remote peer which closes the read side of the connection.

In the forceful case the peer just drops the connection on the floor, usually generating a RST in the process. It also responds to incoming traffic with RSTs.

In the clean case you’ll see the following behaviour for the scenario you describe:

  • On receiving the FIN, Network framework will complete all the outstanding receives with EOF.

  • It’ll continue to accept new data to send.

  • And it’ll try to send any outstanding data.

What happens at that point depends on the remote peer. If it receives and then ACKs the data then this process can continue indefinitely. If it fails to ACK the data then that’ll start building up and eventually the backlog gets too much and any future send(…) call will stop calling its completion handler. In that case the next step is determined by the send timeout (connectionDropTime), if any.

If you’re using BSD Sockets on the remote peer, note the following:

  • The close system call always initiates a clean close. Likewise if the process terminates. AFAIK there’s no way to trigger a force close with BSD Sockets.

  • The little-known shutdown system call gives you more control over this stuff.

  • If you call close with an un-ACKed send, the behaviour is controlled by the SO_LINGER and SO_LINGER_SEC socket options.

Finally, if the remote peer forcefully closes the connection then Network framework will fail all pending receives (probably with ECONNRESET, but perhaps with EOF), drop any outstanding sends on the floor, and change the state of the overall connection state to something indicating the failure.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] Or, on the C side, Dispatch data values. Those have different semantics but they don’t materially change this problem.

[2] A logical copy. This won’t make a copy of the actual bytes due to the copy-on-write optimisation.

[3] Although be aware that, if you modify it, you might trigger a copy to the above-mentioned copy-on-write optimisation.

Hi @DTS Engineer ,

I don’t know what you mean by that. Network framework works in terms of Swift Data values [1]. When you call send(…), the framework makes a copy of the value [2]. As soon as the send(…) returns you are free to do whatever you want with the Data value you supplied [3].

To clarify, in my case, I’m creating the actual memory to be sent in C++ and then using Data's bytesNoCopy initializer to wrap this C++ memory for the send operation.

Since I’m sending multiple packets in a batch operation (in this case, 100 sends), and the first 99 are marked as idempotent, while the last one has a completion callback, I want to ensure that I deallocate the memory at the right time.

sample code:



public class MyClientClass {

public var uConneciton : NWConnection 

// init and other methods


private  func GetDataAndPerfromIdempotentSend () -> Void {

     let data_size: Int;

     let cpp_memory_pointer : UnsafeMutableRawPointer = CppGetMemory (data_size);
      
     let data = Data ( bytesNoCopy: cpp_memory_pointer, count : data_size, deallocator : .none)

      uConnection.send (content: data, completion: .idempotent)     

}

private  func GetDataAndPerfromSend () -> Void {

     let data_size: Int;

     let cpp_memory_pointer : UnsafeMutableRawPointer = CppGetMemory (data_size);
      
     let data = Data ( bytesNoCopy: cpp_memory_pointer, count : data_size, deallocator : .none)

      uConnection.send (content: data, completion: .contentProcessed : CallbackFunction)     

}

public func TriggerBatchSends () -> Void {

uConnection.batch ( {
    
    for _ in range 1...99 {
      
       GetDataAndPerfromIdempotentSend ();
    }

         GetDataAndPerfromSend ();
     })
   }

}

In above case since my actual memory is allocated in cpp, I wanted to know when can I free this memory? Is it safe to de-allocate all the 100 packets' memory once I receive the callback for last send?

The non-copy initialiser for Data is unlikely to benefit you here. If you want to go down this path, use a DispatchData.

You send DispatchData using the normal send method:

let connection: NWConnection = …
let data: DispatchData = …
connection.send(content: data, completion: .idempotent)

Create a no-copy DispatchData with the init(bytesNoCopy:deallocator:) initialiser. If you pass the .custom(…) deallocator, you get called back when it’s safe to free the memory.

IMPORTANT Your current approach, which relies on send completion callbacks, is incorrect and could lead to subtle memory corruption bugs.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Hi  @DTS Engineer ,

Written by DTS Engineer in 824638022
The non-copy initialiser for Data is unlikely to benefit you here. If you want to go down this path, use a DispatchData. You send DispatchData using the normal send method:

I’m not entirely clear on what you meant here. Even with the no-copy initializer for Data, I can still specify a .custom deallocator. So, I didn’t fully understand your point. For Data, I get an UnsafeMutableRawPointer along with the size I passed, which should allow me to reuse the same Data object, pointing to different memory each time, right?

IMPORTANT Your current approach, which relies on send completion callbacks, is incorrect and could lead to subtle memory corruption bugs.

I have a question regarding the send callback. If the callback is triggered, is it safe to modify or deallocate the data within it? Is there a chance the OS is still referencing the data, even after the callback has been triggered? This is what I inferred from your earlier statement.

Also, I’m still unclear about your answer to my original question. If I use a batch API to perform 100 sends, where 99 are marked as idempotent (with no callbacks) and only the last one has a callback, does receiving the callback guarantee that the first 99 sends are already complete? In other words, can I safely deallocate their data since it's already been written to the OS buffer? If I use a deallocator for each data object, will I end up managing the same number of callbacks as if I had used a callback for every send, instead of just the final one?

Historically, there was a solid bridge between Data and NSData. Moreover, NSData and dispatch_data_t were also bridged [1]. However, the Data to NSData bridge is much creakier these days. There are many cases where seemingly benign operations will cause a copy to be made. For this reason, if performance is important and you have the option of using Dispatch data, you should do so.

Consider this program:

import Foundation

func main() {
    let p = calloc(3000, 1)!
    print("p: \(p)")
    let b = UnsafeRawBufferPointer(start: p, count: 3000)
    let dd = DispatchData(bytesNoCopy: b, deallocator: .free)
    let d = Data(dd)
    d.withUnsafeBytes { buf in
        print("d: \(buf.baseAddress!)")
    }
    let n = d as NSData
    print("n: \(n.bytes)")
}

main()

If I run in on my machine (macOS 15.2), I see this:

p: 0x000000013000a200
d: 0x000000013000e600
n: 0x000000013000e600

As you can see, the buffer got copied when the program constructs a Data from the DispatchData.

This is just one example of this phenomenon. You’ll find these eager copies crop up in all sorts of odd places. You’ll also find that they get elided in various places too. It’s hard to predict exactly when you’ll get a copy and when you won’t, which is why my advice is to use Dispatch data if you can.


Written by harshal_goyal in 824752022
For Data, I get an UnsafeMutableRawPointer along with the size I passed, which should allow me to reuse the same Data object, pointing to different memory each time, right?

I’m not sure exactly what your getting at here, but Data is not an object, it’s a struct, so the question doesn’t make any sense.

Written by harshal_goyal in 824752022
If the callback is triggered, is it safe to modify or deallocate the data within it?

That depends on what you mean by “safe”.

First up, an absolute rule: If you construct a data value with a no-copy initialiser, it’s never safe to modify the buffer ‘behind the back’ of the value. For example, this is not safe:

let p = calloc(1024, 1)!
let d = Data(bytesNoCopy: p, count: 1024, deallocator: .custom({ p, _ in free(p)}))
// vvv NOT SAFE vvv
p.storeBytes(of: 1, as: UInt8.self)
// ^^^ NOT SAFE ^^^
print(d)

In the no-copy case, the data value ‘owns’ the buffer until it calls your deallocator.

The above is true for Data, NSData, DispatchData, and dispatch_data_t. There are no exceptions.

Coming back to the NWConnection.send(…) case, the behaviour varies by type. Let’s start with dispatch_data_t. It’s a reference type, but immutable. Being immutable, you can’t modify the data, so that part of the question isn’t valid. That means the only concern is the reference. The reference must be valid when you call send(…) [2] and must remain valid until send(…) returns. At that point you can release your reference. If the connection needs to maintain a reference, it will have done that before returning from the send(…) call.

The situation with DispatchData is similar. It’s not an object per se, but acts much like one. As with dispatch_data_t, its contents are immutable. And Swift’s ARC ensures that things are valid for the duration of the send(…) call.

The Data type is quite different. It’s mutable, with a copy-on-write (CoW) implementation. Consider this code:

let connection: NWConnection = …
var d = Data("Hello Cruel World!".utf8)
connection.send(content: d, completion: .contentProcessed({ error in
    d[1] += UInt8(ascii: "E")   // B
}))
d[0] += UInt8(ascii: "h")       // A

When the send(…) returns, the connection has made a copy of the data. However, that’s a CoW copy. If you modify the data at point A, then you could trigger a copy. And that’s true at B as well. Remember that .contentProcessed(…) means that the data has been enqueued for sending. The connection could still be holding on to its copy in its send buffer.

CoW semantics mean that this isn’t unsafe, but it is a potential performance pitfall.

Written by harshal_goyal in 824752022
can I safely deallocate their data since it's already been written to the OS buffer?

If you’re talking about the backing buffer for a no-copy data value then, no, you can’t safely deallocate that. It’s only safe to deallocate it when the system calls the deallocator closure that you supplied when you passed that buffer to the data value’s initialiser. There’s absolutely no relationship between when that deallocator is called and the send(…) method calls its completion handler.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] In one direction.

[2] Well, nw_connection_send because we’re talking C at this point.

Controlling the number of Pending Send Completions using NWConnection
 
 
Q