Network framework crashes from nw_browser_cancel call

Question

Created Dec ’21

Replies 5

Boosts 0

Views 1.3k

Participants 2

We have a few crash reports in TestFlight from users (see attached) in 14.6, 14.8.1 and 15.0.1. It appears that the crashes were triggered by a call to nw_browser_cancel.

Right now we only test the nw_browser_t object passed to the cancel call for != NULL before calling as that is initialised value. The calling function is in objective c++

As documentation is sparse, is there something else we should be doing before making this call?
thanks, Pose AI team

2021-12-03_13-10-05.7208_+0100-c2e89737c99b07f604471bd6d8084878972c756b.crash

2021-12-04_09-04-13.9016_+0100-dae4b31c43929ba720e5675ed86f27fa4f99d02a.crash

Boost

Answer 1

DTS Engineer OP

Apple

Dec ’21

Consider the backtrace of the crashing thread:

Thread 0 name:
Thread 0 Crashed:
0   libdispatch.dylib … dispatch_channel_async + 188 …
1   libdispatch.dylib … dispatch_channel_async + 60 …
2   libnetwork.dylib  … nw_browser_set_state_locked + 648 …
3   libnetwork.dylib  … nw_browser_cancel + 240 …
4   Pose Camera       … NetworkClient::disconnect(char const*) + 40 …

I had a look at the code for nw_browser_set_state_locked and the backtrace points to code that’s doing a dispatch_async to call your state changed handler on the client queue, that is, the queue you passed to nw_browser_set_queue. I can see two possibilities there:

You’ve accidentally over-release that queue.
The state handler block was actually called but smashed something.

I think the first is most likely. What queue are you using here? And what does its memory management look like?

Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

0

Answer 2

PoseAI OP

Dec ’21

Thanks for the response.

We have a singleton client object which owns the network objects, including the queue which we set with a call to dispatch_queue_create("some string", NULL).
We currently use the same queue for the nw_browser and nw_connection, which may have been a bad idea based on your comment depending on what happens under the cancel hood.

When disconnect gets called we do a cleanup:

    if (browser != NULL){
        nw_browser_cancel(browser);
        browser = NULL;
    }
    if (connection != NULL){
        std::string bye = std::string();
        send(bye);
        nw_connection_cancel(connection);
        connection = NULL;
    }
    if (queue != NULL){
        queue = NULL;
    }

We do not explicitly release the queue anywhere because we thought ARC handled releases.

It is not clear to me what type of ownership browser_set_queue and connection_set_queue keeps for the queue (shared or weak ptr?). Could ARC be releasing the queue when we set our reference to NULL and before an async process in browser_cancel?

0

Answer 3

DTS Engineer OP

Apple

Dec ’21

We currently use the same queue for the nw_browser and nw_connection, which may have been a bad idea

No, that should be fine.

We do not explicitly release the queue anywhere because we thought ARC handled releases.

Correct.

The reason I brought that up is that it’s not uncommon for C++ code to not use ARC.

It is not clear to me what type of ownership nw_browser_set_queue and nw_connection_set_queue keeps for the queue (shared or weak ptr?).

I took a quick look at that yesterday and confirmed that the browser object definitely maintains a strong reference to the queue that you set via nw_browser_set_queue. And while I didn’t specifically check this, I fully expect that the connection object will do the same.

Could ARC be releasing the queue when we set our reference to NULL and before an async process in browser_cancel?

No. The browser object maintains its own reference up to the point where you cancel it, at which point it definitely schedules the state changed handler block on the queue before it releases its reference. And after that Dispatch should retain the queue until its drained.

It’s a bit of a mystery really. Have you tried to reproduce this problem by enabling zombies?

Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

0

Answer 4

PoseAI OP

Dec ’21

I have not tried enabling zombies but I haven't actually reproduced the problem myself, these crash reports came from two anonymous users on Testflight.

We added the browser feature because it seemed like a 'freebie' for using the Network framework, to theoretically allow users to easily pair with bonjour enabled local servers, at least down the line. As bonjour is not included in the current server applications (ie Unity, Unreal) we don't actually use it currently (although apparently some anonymous users tested it somehow and crashed so maybe we should care).

If the source of the problem isn't clear to you, I think we will just remove the bonjour/browsing feature from the app at this point to avoid inadvertent crashes, and review down the line if bonjour becomes a more important feature.

Thanks again for taking a look.

P.S. Should I mark this as answered (since we are not investigating this further and just dropping the feature) or leave it as unanswered (since the problem is not identified as of yet)? I don't know the proper etiquette!

0

Answer 5

DTS Engineer OP

Apple

Dec ’21

I have not tried enabling zombies but I haven't actually reproduced the problem myself

Right, that’s kinda the point of zombies, to turn a hard-to-reproduce over-release crash into something that you can reproduce and debug.

could the crash occur because we created a queue with the same label

No. The queue label is just there for debugging. There’s no requirement that the labels be unique.

Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

0