Network framework crashes on fork

Hello, I have a Cocoa application from which I fork a new process (helper sort of) and it crashes on fork due to some cleanup code probably registered with pthreads_atfork() in Network framework.

This is crash from the child process:

Application Specific Information:
*** multi-threaded process forked ***
BUG IN CLIENT OF LIBPLATFORM: os_unfair_lock is corrupt
Abort Cause 258
crashed on child side of fork pre-exec


Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_platform.dylib      	       0x194551238 _os_unfair_lock_corruption_abort + 88
1   libsystem_platform.dylib      	       0x19454c788 _os_unfair_lock_lock_slow + 332
2   Network                       	       0x19b1b4af0 nw_path_shared_necp_fd + 124
3   Network                       	       0x19b1b4698 -[NWConcrete_nw_path_evaluator dealloc] + 72
4   Network                       	       0x19af9d970 __nw_dictionary_dispose_block_invoke + 32
5   libxpc.dylib                  	       0x194260210 _xpc_dictionary_apply_apply + 68
6   libxpc.dylib                  	       0x19425c9a0 _xpc_dictionary_apply_node_f + 156
7   libxpc.dylib                  	       0x1942600e8 xpc_dictionary_apply + 136
8   Network                       	       0x19acd5210 -[OS_nw_dictionary dealloc] + 112
9   Network                       	       0x19b1beb08 nw_path_release_globals + 120
10  Network                       	       0x19b3d4fa0 nw_settings_child_has_forked() + 312
11  libsystem_pthread.dylib       	       0x100c8f7c8 _pthread_atfork_child_handlers + 76
12  libsystem_c.dylib             	       0x1943d9944 fork + 112
(...)

I'm trying to create a child process with boost::process::child which does basically just a fork() followed by execv() and I do it before the - [NSApplication run] is called.

Is it know bug or behavior which I've run into? Also what is a correct way to spawn child processes in Cocoa applications? As far as my understanding goes the basically all the available APIs (e.g. posix, NSTask) should be more or less the same thing calling the same syscalls. So forking the process early before main run loop starts and not starting another NSApplication in forked child should be ok ...or not?

Accepted Reply

Combining fork with Apple’s frameworks is a tricky business [1]. If you only use Posix APIs, fork should behave reliably. Once you start using our higher-level frameworks, well… we try to keep things working but we can’t make any guarantees.

There are two cases here:

  • Calling fork without exec* to create a long-running ‘clone’ of the current process

  • Combining fork and exec* to run a new program in a child process

The first case will (usually :-) work if you limit yourself to Posix APIs. It’s fundamentally incompatible with our higher-level frameworks.

For the second case, the best way to avoid problems is to use an API that combines fork and exec*. At the BSD level, that’s posix_spawn. Higher up, you have NSTask (aka Process in Swift).

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] There’s a fundamental disconnect between BSD and Mach on this topic, and Apple’s frameworks rely on Mach a lot.

Replies

Combining fork with Apple’s frameworks is a tricky business [1]. If you only use Posix APIs, fork should behave reliably. Once you start using our higher-level frameworks, well… we try to keep things working but we can’t make any guarantees.

There are two cases here:

  • Calling fork without exec* to create a long-running ‘clone’ of the current process

  • Combining fork and exec* to run a new program in a child process

The first case will (usually :-) work if you limit yourself to Posix APIs. It’s fundamentally incompatible with our higher-level frameworks.

For the second case, the best way to avoid problems is to use an API that combines fork and exec*. At the BSD level, that’s posix_spawn. Higher up, you have NSTask (aka Process in Swift).

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] There’s a fundamental disconnect between BSD and Mach on this topic, and Apple’s frameworks rely on Mach a lot.

Thank you @eskimo

I was digging more and I found the source of problem and repro[1] - in the dynamic library we load some static intializer is creating a nw_path_monitor and once we fork the process it crashes in the atfork handler when Network frameworks tries to cleanup. I'll report a bug and see if someone tells me if it's operator error or just a bug :-)

I would still have a one more question for educational purposes: Are the the posix_spawn and NSTask doing something fundamentally different to just calling fork and exec*? I mean calling a mach APis or other dark magic?


[1] Simple main.m (used application template from Xcode)

// main.m

#import <Cocoa/Cocoa.h>
#import <Network/Network.h>
#import <dispatch/dispatch.h>

int main(int argc, const char * argv[]) {
  @autoreleasepool {
    nw_path_monitor_t mon = nw_path_monitor_create();
    nw_path_monitor_set_update_handler(mon, ^(nw_path_t path) {
      NSLog(@"monitor updated");
    });
    nw_path_monitor_start(mon);
  }
  
  pid_t pid = fork();
  
  if (pid == -1) {
    NSLog(@"Fork failed");
    exit(1);
  }
  
  if (pid == 0) {
    while (true) {
      NSLog(@"Forked child here");
      sleep(1);
    }
    return 0;
  }
  
  return NSApplicationMain(argc, argv);
}

Child crashes with:

Process:               NWForkCrash [69516]
Path:                  /Users/USER/Library/Developer/Xcode/DerivedData/NWForkCrash-dnvdeuuhbnuhxublasbhfmmluzqb/Build/Products/Debug/NWForkCrash.app/Contents/MacOS/NWForkCrash
Identifier:            com.****.NWForkCrash
Version:               1.0 (1)
Code Type:             ARM-64 (Native)
Parent Process:        NWForkCrash [69508]
Responsible:           NWForkCrash [69508]
User ID:               501

Date/Time:             2023-09-15 16:52:56.2815 +0200
OS Version:            macOS 13.5.2 (22G91)
Report Version:        12
Anonymous UUID:        D6E5A34D-2127-16AF-16E7-BDA9139A6A82

Sleep/Wake UUID:       DF75986B-D513-4000-993D-69A7AB7261A1

Time Awake Since Boot: 88000 seconds
Time Since Wake:       740 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BREAKPOINT (SIGTRAP)
Exception Codes:       0x0000000000000001, 0x0000000194551238

Termination Reason:    Namespace SIGNAL, Code 5 Trace/BPT trap: 5
Terminating Process:   exc handler [69516]

Application Specific Information:
BUG IN CLIENT OF LIBPLATFORM: os_unfair_lock is corrupt
Abort Cause 258
crashed on child side of fork pre-exec


Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_platform.dylib      	       0x194551238 _os_unfair_lock_corruption_abort + 88
1   libsystem_platform.dylib      	       0x19454c788 _os_unfair_lock_lock_slow + 332
2   Network                       	       0x19b1b4af0 nw_path_shared_necp_fd + 124
3   Network                       	       0x19b1b4698 -[NWConcrete_nw_path_evaluator dealloc] + 72
4   Network                       	       0x19af9d970 __nw_dictionary_dispose_block_invoke + 32
5   libxpc.dylib                  	       0x194260210 _xpc_dictionary_apply_apply + 68
6   libxpc.dylib                  	       0x19425c9a0 _xpc_dictionary_apply_node_f + 156
7   libxpc.dylib                  	       0x1942600e8 xpc_dictionary_apply + 136
8   Network                       	       0x19acd5210 -[OS_nw_dictionary dealloc] + 112
9   Network                       	       0x19b1beb08 nw_path_release_globals + 120
10  Network                       	       0x19b3d4fa0 nw_settings_child_has_forked() + 312
11  libsystem_pthread.dylib       	       0x10463f7c8 _pthread_atfork_child_handlers + 76
12  libsystem_c.dylib             	       0x1943d9944 fork + 112
13  NWForkCrash                   	       0x1045db024 main + 96 (main.m:21)
14  dyld                          	       0x1941c7f28 start + 2236

Are the the posix_spawn and NSTask doing something fundamentally different to just calling fork and exec*?

Yes.

On the posix_spawn front, that’s a system call, so no user space code runs in the intermediate forked-but-not-exec’d state:

  • The parent never forks.

  • The child starts executing at the post-exec point.

So, the child process just spontaneously, and atomically, pops into existence. That avoids issues like this.

Regarding NSTask, it uses posix_spawn under the covers and thus inherits the same behaviour.


Earlier I wrote:

There’s a fundamental disconnect between BSD and Mach on this topic

While that’s true, it’s not the full story. Mach lead to pthreads, and any Unix system with pthreads has similar issues.

This stuff gets really ugly really fast. Consider code like this:

parent                          child
======                          =====
thread A        thread B        
--------        --------        
malloc
> lock                          thread B
                fork            --------
                                malloc
                                > lock

At this point the child is deadlocked because it’s trying to acquire the global malloc lock held by thread A, but thread A doesn’t exist in the child so it can’t unlock it.

The pthreads library came with pthread_atfork, which is the intended mechanism for a library developer to get themselves out of this mess. However, the reality is that many library vendors don’t know they have to do this, and others do this and get it wrong. The latter is not surprising given this quote from the pthread_atfork man page:

Important only async-signal-safe functions are allowed on the child side of fork.

Ouch!

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] You can see the code for its in-kernel implementation in Darwin.