popen() crash

I have a multithreaded application that uses popen() for system calls.

Here is the snippet. I crashes in popen() (line 2 below).

1       fp = NULL;
2        if ((fp = popen(dnsCommand, "r")) == NULL) {
3            logPrint(errno, "*executeDNS: popen failed. Exiting...");
4            exitProcess(__func__, __LINE__);
        }

I caught this under xcode debug and under cli lldb.

To replicate it I have to let it run for several hours, and this functions is called repeatedly during that time. I can't really tell the exact conditions that cause the crash.

The error messages is

Thread 43: EXC_BAD_ACCESS (code=1, address=0x3010000080e)

In this case dnsCommand is

dnsCommand	char [200]	"/Applications/NetBeez/bin/dig +noall +search +stats +comments xfinity.com 2>/dev/null"	

Here is the backtrace

(lldb) bt
* thread #43, stop reason = EXC_BAD_ACCESS (code=1, address=0x3010000080e)
  * frame #0: 0x00007ff803997d22 libsystem_c.dylib`popen + 478
    frame #1: 0x000000010004b53e nbagentgdb`executeDNS(param=0x0000000128900000) at executeDNS.c:235:19
    frame #2: 0x0000000100015778 nbagentgdb`executeTest(param=0x0000000128900000) at utilities.c:910:17
    frame #3: 0x0000000100549cd0 libsystem_pthread.dylib`_pthread_start + 125
    frame #4: 0x0000000100551cff libsystem_pthread.dylib`thread_start + 15

Here is where it crashed in popen()

  0x7ff803997d1c <+472>: je     0x7ff803997d35            ; <+497>
    0x7ff803997d1e <+474>: leaq   -0x68(%rbp), %r12
->  0x7ff803997d22 <+478>: movl   0x10(%rbx), %esi
    0x7ff803997d25 <+481>: movq   %r12, %rdi
    0x7ff803997d28 <+484>: callq  0x7ff8039c68ee            ; symbol stub for: posix_spawn_file_actions_addclose
    0x7ff803997d2d <+489>: movq   (%rbx), %rbx

And the variables

(lldb) register read rbx
     rbx = 0x00000301000007fe
(lldb) register read esi
     esi = 0x000aa000

I have the debugger hot right now on xcode, so if you need any other debug info lmk.

I suspect it crashes when the laptop goes to sleep, but is not consistent.

Any idea how to troubleshoot further?

Post not yet marked as solved Up vote post of pvouzis Down vote post of pvouzis
894 views

Replies

I’d like to see a crash report for this. There’s two ways you can generate that:

  • Run the program outside of the debugger and wait for it to crash.

  • If you still have that live debugging session, choose Process > Detach.

Once you have the crash report, post it here using the techniques described in Posting a Crash Report.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Here is the crash report:

Thanks for the crash report.

With regards the immediate cause of your crash, the backtrace looks like this:

Thread 39 Crashed:
0 libsystem_c.dylib       … popen + 478
1 nbagentgdb              … executeDNS + 1566 (executeDNS.c:235)
2 nbagentgdb              … executeTest + 584 (utilities.c:910)
3 libsystem_pthread.dylib … _pthread_start + 125
4 libsystem_pthread.dylib … thread_start + 15

Some internal magic reveals that frame 0 probably corresponds to this line in the Darwin source, or somewhere nearby. It’s hard to say exactly what’s going on without spending a lot more time looking at the disassembly but I suspect there’s an easier way forward: Copy that popen implementation into your own codebase, rename it to something unique, and call that instead. This has a bunch of advantages:

  • You’ll be able to get better crash reports.

  • And look at the state of the crash in the debugger.

  • And this will be non-optimised code, and thus much easier to understand.


Having said that, your crash report is kinda worrying. You have 120 threads, most of which are blocked waiting on locks and so on. There are, for example:

  • No less than 52 threads waiting in isTestInterfaceDown.

  • And 21 threads are waiting for dig to complete within executeDNS.

It’s quite likely that over threading is running your process out of some critical resource, like file descriptors, which has triggered this problem in popen.

What is your program actually doing? I suspect that you’d be able to make it more reliable (and a lot more efficient) by using APIs rather than running command-line tools. For example, what you’re doing with dig you could just as easily do with <dns.h> API.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Add a Comment

I also compiled with address sanitization, and I get

==64348==ERROR: AddressSanitizer: SEGV on unknown address 0x02ae4e000079 (pc 0x7ff803997d22 bp 0x70000aecbfc0 sp 0x70000aecbf40 T626)
==64348==The signal is caused by a READ memory access.
    #0 0x7ff803997d22 in popen+0x1de (libsystem_c.dylib:x86_64+0x53d22)
    #1 0x10d5d78f6 in wrap_popen+0x306 (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x418f6)
    #2 0x10cc2cefa in executeDNS executeDNS.c:235
    #3 0x10cbc01fd in executeTest utilities.c:910
    #4 0x7ff803a79513 in _pthread_start+0x7c (libsystem_pthread.dylib:x86_64+0x6513)
    #5 0x7ff803a7502e in thread_start+0xe (libsystem_pthread.dylib:x86_64+0x202e)

==64348==Register values:
rax = 0x0000000000000000  rbx = 0x000002ae4e000069  rcx = 0x0000625000827900  rdx = 0x0000000000000005  
rdi = 0x0000000118e98060  rsi = 0x0000000118e980e0  rbp = 0x000070000aecbfc0  rsp = 0x000070000aecbf40  
 r8 = 0x0000000000001048   r9 = 0x0000000000000000  r10 = 0x00000fffffffffff  r11 = 0x0000000000000000  
r12 = 0x000070000aecbf58  r13 = 0x00000000ffffffff  r14 = 0x000000010cf11000  r15 = 0x00007ff84528f8f0  
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (libsystem_c.dylib:x86_64+0x53d22) in popen+0x1de
Thread T626 created by T0 here:
    #0 0x10d5d867c in wrap_pthread_create+0x5c (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x4267c)
    #1 0x10cba0f33 in main nbagent_main.c:635
    #2 0x11239c4fd in start+0x1cd (dyld:x86_64+0x54fd)

==64348==ABORTING

I don't know if this adds any more information, other than it crashes at popen().

Can you point me to the best popen() implementation I could use to test what you suggested?

Can you point me to the best popen implementation I could use to test what you suggested?

For this sort of test your best option is the Darwin open source that’s closest aligned to the system you’re testing on:

  1. Start here.

  2. Click Releases > View releases.

  3. Tunnel down to find the Darwin open source release that most closely matches the macOS release you’re testing on.

  4. Follow the path Libc > gen > FreeBSD > popen.c.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

It's failing here

SLIST_FOREACH(p, &pidlist, next)
		(void)posix_spawn_file_actions_addclose(&file_actions, p->fd);

By looking at the implementation it looks like popen() should be thread safe (e.g. __isthreaded), but locking is not applied on all list manipulations.

I see that there is protection for SLIST_INSERT_HEAD

        THREAD_LOCK();                                                                                                                                                                                                                                                        
        SLIST_INSERT_HEAD(&pidlist, cur, next);                                                                                                                                                                                                                               
        THREAD_UNLOCK();  

But there is no locking for SLIST_FOREACH (where in my case the crash happened).

Shouldn't the list be locked when iterating through it?

If during parsing, a new element is inserted, there might be memory corruption.

By the way, here is the popen.c I used https://github.com/apple-oss-distributions/Libc/blob/Libc-1506.40.4/gen/FreeBSD/popen.c

This change does it for me, it looks like:

        THREAD_LOCK();                                                                                                                                                                                                                                                        
        SLIST_FOREACH(p, &pidlist, next)                                                                                                                                                                                                                                      
                (void)posix_spawn_file_actions_addclose(&file_actions, p->fd);                                                                                                                                                                                                
        THREAD_UNLOCK();  

but locking is not applied on all list manipulations.

Indeed. I’d appreciate you filing a bug about that.

Please post your bug number, just for the record.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

  • @eskimo can you confirm that the bug in popen has been fixed? We're considering removing all popen calls right now.

Add a Comment

I submitted the bug and I believe the bug number is FB12144217

Also submitted a PR with the fix https://github.com/apple-oss-distributions/Libc/pull/2

  • Ta!

  • @eskimo any idea how long it takes to review these kind of PRs?

Add a Comment