macOS: NEDNSProxyProvider and NEFilterDataProvider = UDP datagram truncation

Environment:

macOS 13.5 and 13.4

Cisco VPN installed with active DNS Proxy, Content Filter and Transparent Proxy extensions (yes I know Apple can't support 3rd parties here).

Our Content Filter extension loaded.

In this scenario DNS packets are being truncated after our extension asks to see more data from the Cisco DNS Proxy flow.

Specifically we ask to see a max of 96 bytes for all UDP traffic. When a DNS query is made the DNS response is truncated to 96 bytes after we return [allowVerdict] from our [handleInboundDataFromFlow:] for that first 96 bytes.

This only occurs when both our extension and the Cisco extensions are loaded. It does not occur if just our extension is loaded or just the Cisco extensions are loaded.

  • So my question is are we doing something wrong by returning [allowVerdict] after the initial 96 byte inspection in [handleInboundDataFromFlow:]? Should we instead try to only respond after a complete datagram is received? But how to do that when we do not know the datagram size because we only request peek data for TCP/UDP (thus no IP header info)?

  • Is Cisco at fault? From inspecting their binary it appears they are not using the NWUDPSession/NWTCPConnection classes as recommend by Apple, but are instead using their own custom classes that probably wrap the BSD socket API.

  • Is this a OS bug? From what I know NEFilterFlow is a stream and the OS is supposed to handle proper data reassembly. But somehow a truncated UDP datagram is being passed on to the Cisco filter (and dig). Shouldn't the system send the complete datagram after we return [allowVerdict] for the initial 96 bytes?

Example:

% dig 0f54fx204jpt.stspg-customer.com
;; Warning: Message parser reports malformed message packet.

; <<>> DiG 9.10.6 <<>> 0f54fx204jpt.stspg-customer.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4251
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: Message has 35 extra bytes at end

;; QUESTION SECTION:
;0f54fx204jpt.stspg-customer.com. IN	A

;; Query time: 72 msec
;; SERVER: 2606:4700:4700::1111#53(2606:4700:4700::1111)
;; WHEN: Wed Aug 30 16:55:21 CDT 2023
;; MSG SIZE  rcvd: 96

The above response is supposed to be 211 bytes. 96 bytes is the peek data size that our filter requests.

Logs from our filter:

// dig query made
2023-08-30 16:55:21.908228 [New] 24132: D89B5B5D-793C-4940-D720-1111F6E50100 17
2023-08-30 16:55:21.908380 [DATA] Out: 24132: D89B5B5D-793C-4940-D720-1111F6E50100 60@0
// Cisco proxy query made
2023-08-30 16:55:21.910918 [New] 10474: A1F78912-846E-4BEC-978D-C6B4F6E028F3 17
2023-08-30 16:55:21.911470 [DATA] Out: 10474: A1F78912-846E-4BEC-978D-C6B4F6E028F3 60@0
// Cisco proxy response
2023-08-30 16:55:21.978536 [DATA] In: 10474: A1F78912-846E-4BEC-978D-C6B4F6E028F3 96@0
// dig response
2023-08-30 16:55:21.979407   [DATA] In: 24132: D89B5B5D-793C-4940-D720-1111F6E50100 96@0

From inspecting their binary it appears they are not using the NWUDPSession / NWTCPConnection classes as recommend by Apple, but are instead using their own custom classes that probably wrap the BSD socket API.

Two points about this:

  • The [in-provider networking][ref] APIs aren’t really preferred. We generally recommend that NE providers use Network framework, with the caveat that, in some obscure cases, that’s not possible.

  • Regardless, this is definitely a preference, not a requirement. Using BSD Sockets in an NE provider is just fine.

I have more about this in TN3151 Choosing the right networking API.

Is this a OS bug?

That wouldn’t surprise me. Content filters are definitely more focused on stream-based protocols rather than datagrams.

My first suggestion here would be to get the other third-party’s stuff out of the equation. That way you’re only dealing with code that you control and the OS. Unfortunately, that’s not super easy to do because creating a DNS proxy provider from scratch is a bit of a challenge.

If you want to attempt this, open a DTS tech support incident and I can help you get bootstrapped.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Thanks Quinn for the quick reply and API clarification.

Multiline

My first suggestion here would be to get the other third-party’s stuff out of the equation. That way you’re only dealing with code that you control and the OS.

BlockQuote

We did test that. With just the Cisco VPN installed everything works, and with just our content filter everything works. But with both installed the bug occurs. Also we have our own DNS Proxy but since there's a system restriction of just 1 DNS Proxy we can't enable ours when customers require Cisco VPN to access corporate resources.

With our DNS Proxy enabled we get no issues either, however our DNS Proxy and our Content Filter are hosted in the same process so our Proxy traffic is not filtered through our Content filter. This is also how Cisco is hosted: all 3 net extensions in one process.

Given this would a DTS incident still be useful or should I just open a Radar?

however our DNS Proxy and our Content Filter are hosted in the same process

Could you temporarily split them apart?

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

macOS: NEDNSProxyProvider and NEFilterDataProvider = UDP datagram truncation
 
 
Q