Develop kernel-resident device drivers and kernel extensions using Kernel.

Posts under Kernel tag

43 Posts

Post

Replies

Boosts

Views

Activity

FSKit questions and clarifications
I work on EdenFS, an open-source Virtual Filesystem that runs on macOS, Linux, and Windows. My team is very interested in using FSKit as the basis for EdenFS on macOS, but have found the documentation to be lacking and contains some mixed messaging on the future of FSKit. Below are a few questions that don’t seem to be fully covered by the current documentation: Does FSKit support process attribution? Each FUSE request provides a requester Process ID (and other information) through the fuse_in_header structure. Does FSKit pass similar information along for each request? Does the reclaimItem API function similarly to FUSE’s forget operation? If not, what are the differences? See #1 below for why forget/reclaimItem matters to us. Is Apple committed to releasing and supporting FSKit? Is there any timeline for release that we can plan around? Does FSKit have known performance/scalability limitations? We provide alternative methods that clients can use to make bulk requests to EdenFS, but some clients will necessarily be unable to use those and stress the default filesystem APIs. Throughput (on the order of tens of thousands of filesystem requests per minute) and request size are the main concerns, followed closely by directory size restrictions. Why we’re interested in FSKit As mentioned above, my team supports EdenFS on 3 platforms. On Linux, we utilize FUSE; on Windows, we utilize ProjectedFS; and on macOS, we’ve utilized a few different solutions in the past. We first utilized the macFUSE kext, which was great while it lasted. Due to (understandable) changes in supporting kernel extensions, we were forced to move to NFS version 3. NFS has been lackluster in comparison (and our initial investigations show that NFS version 4(.2) would be similar). We have had numerous scalability and reliability issues, some listed below: NFS does not provide a forget API similar to FUSE. EdenFS is forced to remember all file handles that have been loaded because the kernel never informs us when all references to that file handle have been dropped. We can hackily infer that a file handle should never be referenced again in some cases, but a large number of file handles end up being remembered forever. Many of our algorithms scale with the number of file handles that Eden has to consider, and therefore performance issues are inevitable after some time. NFS does not provide information about clients (requesters). We cannot tell which processes are sending EdenFS requests. This attribution is important due to issue #1. We are forced to work with tool owners to modify their applications to be VFS-friendly. If we can’t track down which tools are behaving poorly, they will continue to load excess file handles and cause performance issues. NFS “Server connections interrupted:” dialog during heavy load. Under heavy load, either EdenFS or system-wide, our users experience this dialog pop-up and are confused as to how they should respond (Ignore or Disconnect All). They become blocked in their work, and will be further blocked if they click “Disconnect All” as that unmounts their EdenFS mount. This forces them to restart EdenFS or reboot their laptop to remediate the issue. The above issues make us extremely motivated to use FSKit and partner with Apple to flesh out the final version of the FSKit API. Our use case likely mirrors what other user-space filesystems will be looking for in the FSKit API (albeit at a larger scale than most), and we’re willing to collaborate to work out any issues in the current FSKit offerings.
4
0
1.7k
Jun ’25
Understanding `EINTR`
I’ve talked about EINTR a bunch of times here on DevForums. Today I found myself talking about it again. On reading my other explanations, I didn’t think any of them were good enough to link to, so I decided to write it up properly. If you have questions or comments, please put them in a new thread here on DevForums. Use the App & System Services > Core OS topic area so that I see it. Share and Enjoy — Quinn “The Eskimo!” @ Developer Technical Support @ Apple let myEmail = "eskimo" + "1" + "@" + "apple.com" Understanding EINTR Many BSD-layer routines can fail with EINTR. To see this in action, consider the following program: import Darwin func main() { print("will read, pid: \(getpid())") var buf = [UInt8](repeating: 0, count: 1024) let bytesRead = read(STDIN_FILENO, &buf, buf.count) if bytesRead < 0 { let err = errno print("did not read, err: \(err)") } else { print("did read, count: \(bytesRead)") } } main() It reads some bytes from stdin and prints the result. Build this and run it in one Terminal window: % ./EINTRTest will read, pid: 13494 Then, in other window, stop and start the process by sending it the SIGSTOP and SIGCONT signals: % kill -STOP 13494 % kill -CONT 13494 In the original window you’ll see something like this: % ./EINTRTest will read, pid: 13494 zsh: suspended (signal) ./EINTRTest % did not read, err: 4 [1] + done ./EINTRTest When you send the SIGSTOP the process stops and the shell tells you that. But looks what happens when you continue the process. The read(…) call fails with error 4, that is, EINTR. The read man page explains this as: [EINTR] A read from a slow device was interrupted before any data arrived by the delivery of a signal. That’s true but unhelpful. You really want to know why this error happens and what you can do about it. There are other man pages that cover this topic in more detail — and you’ll find lots of info about it on the wider Internet — but the goal of this post is to bring that all together into one place. IMPORTANT The description of the EINTR error, as returned by strerror and friends, is Interrupted system call. If you see code display or log that description, you’re dealing with EINTR. Signal and Interrupts In the beginning, Unix didn’t have threads. It implemented asynchronous event handling using signals. For more about signals, see the signal man page. The mechanism used to actually deliver a signal is highly dependent on the specific Unix implementation, but the general idea is that: The system decides on a specific process (or, nowadays, a thread) to run the signal handler. If that’s blocked inside the kernel waiting for a system call to complete [1], the system unblocks the system call by failing it with an EINTR error. Thus, every system call that can block [2] might fail with an EINTR. You see this listed as a potential error in the man pages for read, write, usleep, waitpid, and many others. [1] There’s some subtlety around the definition of system call. On traditional Unix systems, executables would make system calls directly. On Apple platforms that’s not supported. Rather, an executable calls a routine in the System framework which then makes the system call. In this context the term system call is a shortcut for a System framework routine that maps to a traditional Unix system call. [2] There’s also some subtlety around the definition of block. Pretty much every system call can block for some reason or another. In this context, however, a block means to enter an interruptible wait state, typically while waiting for I/O. This is what the above man page quote is getting at when it says slow device. Solutions This is an obvious pitfall and it would be nice if we could just get rid of it. However, that’s not possible due to compatibility concerns. And while there are a variety of mechanism to automatically retry a system call after a signal interrupt, none of them are universally applicable. If you’re working on a large scale program, like an app for Apple’s platforms, you only good option is to add code to retry any system call that can fail with EINTR. For example, to fix the program at the top of this post you might wrap the read(…) system call like so: func readQ(_ d: Int32, _ buf: UnsafeMutableRawPointer!, _ nbyte: Int) -> Int { repeat { let bytesRead = read(d, buf, nbyte) if bytesRead < 0 && errno == EINTR { continue } return bytesRead } while true } Note In this specific case you’d be better off using the read(into:retryOnInterrupt:) method from System framework. It retries by default (if that’s not appropriate, pass false to the retryOnInterrupt parameter). You can even implement the retry in a generic way. See the errnoQ(…) snippet in QSocket: System Additions. Library Code If you’re writing library code, it’s important that you handle EINTR so that your clients don’t have to. In some cases it might make sense to export a control for this, like the retryOnInterrupt parameter shown in the previous section, but it should default to retrying. If you’re using library code, you can reasonably expect it to handle EINTR for you. If it doesn’t, raise that issue with the library author. And you get this error back from an Apple framework, like Foundation or Network framework, please file a bug against the framework. Revision History 2025-04-13 Added the description of the error, Interrupted system call, to make it easier for folks to find this post. 2024-10-14 First posted.
0
0
684
Apr ’25
When implementing a custom Mach exception handler, all recovery operations for SIGBUS/SIGSEGV except the first attempt will fail.
Recovery operations for signals SIGBUS/SIGSEGV fail when the process intercepts Mach exceptions. Only the first recovery attempt succeeds, and subsequent Signal notifications are no longer received within the process. I think this is a bug in XNU. The test code main.c is: If we comment out AddMachExceptionServer;, everything will return to normal. #include &lt;fcntl.h&gt; #include &lt;mach/arm/kern_return.h&gt; #include &lt;mach/kern_return.h&gt; #include &lt;mach/mach.h&gt; #include &lt;mach/message.h&gt; #include &lt;mach/port.h&gt; #include &lt;pthread.h&gt; #include &lt;setjmp.h&gt; #include &lt;signal.h&gt; #include &lt;stdbool.h&gt; #include &lt;stdio.h&gt; #include &lt;stdlib.h&gt; #include &lt;string.h&gt; #include &lt;sys/_types/_mach_port_t.h&gt; #include &lt;sys/mman.h&gt; #include &lt;sys/types.h&gt; #include &lt;unistd.h&gt; #pragma pack(4) typedef struct { mach_msg_header_t header; mach_msg_body_t body; mach_msg_port_descriptor_t thread; mach_msg_port_descriptor_t task; NDR_record_t NDR; exception_type_t exception; mach_msg_type_number_t codeCount; integer_t code[2]; /** Padding to avoid RCV_TOO_LARGE. */ char padding[512]; } MachExceptionMessage; typedef struct { mach_msg_header_t header; NDR_record_t NDR; kern_return_t returnCode; } MachReplyMessage; #pragma pack() static jmp_buf jump_buffer; static void sigbus_handler(int signo, siginfo_t *info, void *context) { printf("Caught SIGBUS at address: %p\n", info-&gt;si_addr); longjmp(jump_buffer, 1); } static void *RunExcServer(void *userdata) { kern_return_t kr = KERN_FAILURE; mach_port_t exception_port = MACH_PORT_NULL; kr = mach_port_allocate(mach_task_self_, MACH_PORT_RIGHT_RECEIVE, &amp;exception_port); if (kr != KERN_SUCCESS) { printf("mach_port_allocate: %s", mach_error_string(kr)); return NULL; } kr = mach_port_insert_right(mach_task_self_, exception_port, exception_port, MACH_MSG_TYPE_MAKE_SEND); if (kr != KERN_SUCCESS) { printf("mach_port_insert_right: %s", mach_error_string(kr)); return NULL; } kr = task_set_exception_ports( mach_task_self_, EXC_MASK_ALL &amp; ~(EXC_MASK_RPC_ALERT | EXC_MASK_GUARD), exception_port, EXCEPTION_DEFAULT | MACH_EXCEPTION_CODES,THREAD_STATE_NONE); if (kr != KERN_SUCCESS) { printf("task_set_exception_ports: %s", mach_error_string(kr)); return NULL; } MachExceptionMessage exceptionMessage = {{0}}; MachReplyMessage replyMessage = {{0}}; for (;;) { printf("Wating for message\n"); // Wait for a message. kern_return_t kr = mach_msg(&amp;exceptionMessage.header, MACH_RCV_MSG, 0, sizeof(exceptionMessage), exception_port, MACH_MSG_TIMEOUT_NONE, MACH_PORT_NULL); if (kr == KERN_SUCCESS) { // Send a reply saying "I didn't handle this exception". replyMessage.header = exceptionMessage.header; replyMessage.NDR = exceptionMessage.NDR; replyMessage.returnCode = KERN_FAILURE; printf("Catch exception: %d codecnt:%d code[0]: %d, code[1]: %d\n", exceptionMessage.exception, exceptionMessage.codeCount, exceptionMessage.code[0], exceptionMessage.code[1]); mach_msg(&amp;replyMessage.header, MACH_SEND_MSG, sizeof(replyMessage), 0, MACH_PORT_NULL, MACH_MSG_TIMEOUT_NONE, MACH_PORT_NULL); } else { printf("Mach error: %s\n", mach_error_string(kr)); } } return NULL; } static bool AddMachExceptionServer(void) { int error; pthread_attr_t attr; pthread_attr_init(&amp;attr); pthread_attr_setdetachstate(&amp;attr, PTHREAD_CREATE_DETACHED); pthread_t ptid = NULL; error = pthread_create(&amp;ptid, &amp;attr, &amp;RunExcServer, NULL); if (error != 0) { pthread_attr_destroy(&amp;attr); return false; } pthread_attr_destroy(&amp;attr); return true; } int main(int argc, char *argv[]) { AddMachExceptionServer(); struct sigaction sa; memset(&amp;sa, 0, sizeof(sa)); sa.sa_sigaction = sigbus_handler; sa.sa_flags = SA_SIGINFO; // #if TARGET_OS_IPHONE // sigaction(SIGSEGV, &amp;sa, NULL); // #else sigaction(SIGBUS, &amp;sa, NULL); // #endif int i = 0; while (i++ &lt; 3) { printf("\nProgram start %d\n", i); bzero(&amp;jump_buffer, sizeof(jump_buffer)); if (setjmp(jump_buffer) == 0) { int fd = open("tempfile", O_RDWR | O_CREAT | O_TRUNC, 0666); ftruncate(fd, 0); char *map = (char *)mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); close(fd); unlink("tempfile"); printf("About to write to mmap of size 0 — should trigger SIGBUS...\n"); map[0] = 'X'; // ❌ triger a SIGBUS munmap(map, 4096); } else { printf("Recovered from SIGBUS via longjmp!\n"); } } printf("_exit(0)\n"); _exit(0); return 0; }
2
0
104
Apr ’25
How to Symbolicate an Apple Silicon Panic?
Investigating a kernel panic, I discovered that Apple Silicon Panic traces are not working with how I know to symbolicate the panic information. I have not found proper documentation that corrects this situation. Attached file is an indentity-removed panic, received from causing an intentional panic (dereferencing nullptr), so that I know what functions to expect in the call stack. This is cut-and-pasted from the "Report To Apple" dialog that appears after the reboot: panic_1_4_21_b.txt To start, I download and install the matching KDK (in this case KDK_14.6.1_23G93.kdk), identified from this line: OS version: 23G93 Kernel version: Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:04 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T8122 Then start lldb from Terminal, using this command: bash_prompt % lldb -arch arm64e /Library/Developer/KDKs/KDK_14.6.1_23G93.kdk/System/Library/Kernels/kernel.release.t8122 Next I load the remaining scripts per the instructions from lldb: (lldb) settings set target.load-script-from-symbol-file true I need to know what address to load my kext symbols to, which I read from this line of the panic log, after the @ symbol: com.company.product(1.4.21d119)[92BABD94-80A4-3F6D-857A-3240E4DA8009]@0xfffffe001203bfd0->0xfffffe00120533ab I am using a debug build of my kext, so the DWARF symbols are part of the binary. I use this line to load the symbols into the lldb session: (lldb) addkext -F /Library/Extensions/KextName.kext/Contents/MacOS/KextName 0xfffffe001203bfd0 And now I should be able to use lldb image lookup to identify pointers on the stack that land within my kext. For example, the current PC at the moment of the crash lands within the kext (expected, because it was intentional): (lldb) image lookup -a 0xfffffe001203fe10 Which gives the following incorrect result: Address: KextName[0x0000000000003e40] (KextName.__TEXT.__cstring + 14456) Summary: "ffer has %d retains\n" That's not even a program instruction - that's within a cstring. No, that cstring isn't involved in anything pertaining to the intentional panic I am expecting to see. Can someone please explain what I'm doing wrong and provide instructions that will give symbol information from a panic trace on an Apple Silicon Mac? Disclaimers: Yes I know IOPCIFamily is deprecated, I am in process of transitioning to DriverKit Dext from IOKit kext. Until then I must maintain the kext. Terminal command "atos" provides similar incorrect results, and seems to not work with debug-built-binaries (only dSYM files) Yes this is an intentional panic so that I can verify the symbolicate process before I move on to investigating an unexpected panic I have set nvram boot-args to include keepsyms=1 I have tried (lldb) command script import lldb.macosx but get a result of error: no images in crash log (after the nvram settings)
5
0
1.7k
Apr ’25
How to prevent holes from being created by cluster_write() in files
A filesystem of my own making exibits the following undesirable behaviour. ClientA % echo line1 >>echo.txt % od -Ax -ctx1 echo.txt 0000000 l i n e 1 \n 6c 69 6e 65 31 0a 0000006 ClientB % od -Ax -ctx1 echo.txt 0000000 l i n e 1 \n 6c 69 6e 65 31 0a 0000006 % echo line2 >>echo.txt % od -Ax -ctx1 echo.txt 0000000 l i n e 1 \n l i n e 2 \n 6c 69 6e 65 31 0a 6c 69 6e 65 32 0a 000000c ClientA % od -Ax -ctx1 echo.txt 0000000 l i n e 1 \n l i n e 2 \n 6c 69 6e 65 31 0a 6c 69 6e 65 32 0a 000000c % echo line3 >>echo.txt ClientB % echo line4 >>echo.txt ClientA % echo line5 >>echo.txt ClientB % od -Ax -ctx1 echo.txt 0000000 l i n e 1 \n l i n e 2 \n l i n e 6c 69 6e 65 31 0a 6c 69 6e 65 32 0a 6c 69 6e 65 0000010 3 \n l i n e 4 \n \0 \0 \0 \0 \0 \0 33 0a 6c 69 6e 65 34 0a 00 00 00 00 00 00 000001e ClientA % od -Ax -ctx1 echo.txt 0000000 l i n e 1 \n l i n e 2 \n l i n e 6c 69 6e 65 31 0a 6c 69 6e 65 32 0a 6c 69 6e 65 0000010 3 \n \0 \0 \0 \0 \0 \0 l i n e 5 \n 33 0a 00 00 00 00 00 00 6c 69 6e 65 35 0a 000001e ClientB % od -Ax -ctx1 echo.txt 0000000 l i n e 1 \n l i n e 2 \n l i n e 6c 69 6e 65 31 0a 6c 69 6e 65 32 0a 6c 69 6e 65 0000010 3 \n \0 \0 \0 \0 \0 \0 l i n e 5 \n 33 0a 00 00 00 00 00 00 6c 69 6e 65 35 0a 000001e The first write on clientA is done via the following call chain: vnop_write()->vnop_close()->cluster_push_err()->vnop_blockmap()->vnop_strategy() The first write on clientB first does a read, which is expected: vnop_write()->cluster_write()->vnop_blockmap()->vnop_strategy()->myfs_read() Followed by a write: vnop_write()->vnop_close()->cluster_push_err()->vnop_blockmap()->vnop_strategy() The final write on clientA calls cluster_write(), which doesn't do that initial read before doing a write. I believe it is this write that introduces the hole. What I don't understand is why this happens and how this may be prevented. Any pointers on how to combat this would be much appreciated.
2
0
162
Apr ’25
Is that possible to allocate virtual memory space between 0~4GB?
I tried to use the following code to get a virtual address within 4GB memory space int size = 4 * 1024; int flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_32BIT; void* addr = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, -1, 0); I also tried MAP_FIXED and pass an address for the first argument of mmap. However neither of them can get what I want. Is there a way to get a virtual memory address within 4GB on arm64 on MacOS?
1
0
92
Apr ’25
How to disable the built-in speakers and microphone on a Mac
I need to implement a solution through an API or custom driver to completely block out the built-in speakers and microphone of Mac, because I need other apps to use specified external devices as audio input and output. Is there a way to achieve this requirement? What I mean is that even in system preferences, it should not be possible to choose the built-in microphone and speakers; only my external device can be used.
0
0
83
Apr ’25
Compatibility between macOS VFS ACLs and Linux VFS ACLs
Implementing ACL support in a distributed filesystem, with macOS and Linux clients talking to a remote file server, requires compatibility between the ACL models supported in Darwin-XNU and Linux kernels to be taken into consideration. My filesystem does support EAs to facilitate ACL storage and retrieval. So setting ACLs via chmod(1) and retrieving them via ls(1) does work. However, the macOS and Linux ACL models are incompatible and would require some sort of conversion between them. chmod(1) uses acl(3) to create ACL entries. While acl(3) claims to implement POSIX.1e ACL security API, which, to the best of my knowledge, Linux VFS implements as well, their respective implementations of the standard obviously do differ. Which is also stated in acl(3): This implementation of the POSIX.1e library differs from the standard in a number of non-portable ways in order to support the MacOS/Darwin ACL semantic. Then there's this NFSv4 to POSIX ACL mapping draft that describes the conversion algorithm. What's the recommended way to bridge the compatibility gap there, so that macOS ACL rules are honoured in Linux and vice versa? Thanks.
2
0
171
Apr ’25
Porting VFS kext to FSKit
So if one were to start the attempt of porting an existing kext VFS filesystem, to use the new FSKit (Since presumably kexts could go away), how would that look now? Is it ready? Are there any samples out there that already works (Filesystem using FSKit) ? How is the documentation? ChatGPT did not seem to know much at all. What would be Apple's reception to that? How flexible is FSKit ? Is it locked to the idea of a mount is connected to a physical device (or partition)? Or is it more virtual, in that I will have a pool of disks, and present 1, or many, mount points?
3
1
2.3k
Mar ’25
Inconsistent KEXT Status Between System Information and kextstat
Hello Everyone, I have noticed an inconsistency in the KEXT status between the System Information Extensions section and the output of the kextstat command. In System Information, the extension appears as loaded: ACS6x: Version: 3.8.3 Last Modified: 2025/3/10, 8:03 PM Bundle ID: com.Accusys.driver.Acxxx Loaded: Yes Get Info String: ACS6x 3.8.4 Copyright (c) 2004-2020 Accusys, Ltd. Architectures: arm64e 64-Bit (Intel): No Location: /Library/Extensions/ACS6x.kext/ Kext Version: 3.8.3 Load Address: 0 Loadable: Yes Dependencies: Satisfied Signed by: Developer ID Application: Accusys, Inc (K3TDMD9Y6B) Issuer: Developer ID Certification Authority Signing time: 2025-03-10 12:03:20 +0000 Identifier: com.Accusys.driver.Acxxx TeamID: K3TDMD9Y6B However, when I check using kextstat, it does not appear as loaded: $ kextstat | grep ACS6x Executing: /usr/bin/kmutil showloaded No variant specified, falling back to release I use a script to do these jobs echo " Change to build/Release" echo " CodeSign ACS6x.kext" echo " Compress to zip file" echo " Notary & Staple" echo " Unload the old Acxxx Driver" echo " Copy ACS6x.kext driver to /Library/Extensions/" echo " Change ACS6x.kext driver owner" echo " Loaded ACS6x.kext driver" sudo kextload ACS6x.kext echo " Rebiuld system cache" sudo kextcache -system-prelinked-kernel sudo kextcache -system-caches sudo kextcache -i / echo " Reboot" sudo reboot But it seems that the KEXT is not always loaded successfully. What did I forget to do? Any help would be greatly appreciated. Best regards, Charles
2
0
274
Mar ’25
Kernel panic in mac_label_verify()
Accessing a directory on my custom distributed filesystem results in a kernel panic. According to the backtrace, the last function called before the panic is triggered is mac_label_verify(). See the backtrace file attached. mac_label_verify-panic.txt The panic manifests itself given the following conditions: Machine-a: make a directory in Finder. Machine-b: remove the directory created on machine-a in Finder. Machine-a: access the directory removed on machine-b in Finder. Kernel panic ensues. The panic is reproducible on both Apple Silicon and x86-64. The backtrace is for x86-64 as I wasn't able to symbolicate it on Apple Silicon. Not sure how to tackle this one. Any pointers would be much appreciated.
15
0
1.3k
Mar ’25
Any recent changes to dlopen() implementation?
In some recent releases of macos (14.x and 15.x), we have noticed what seems to be a slower dlopen() implementation. I don't have any numbers to support this theory. I happened to notice this "slowness" when investigating something unrelated. In one part of the code we have a call of the form: const char * fooBarLib = ....; dlopen(fooBarLib, RTLD_NOW + RTLD_GLOBAL); It so happened that due to some timing related issues, the process was crashing. A slow execution of code in this part of the code would trigger an issue in some other part of the code that would then lead to a process crash. The crash itself isn't a concern, because it's an internal issue that will addressed in the application code. What was interesting is that the slowness appears to be contributed by the call to dlopen(). Specifically, whenever a slowness was observed, the crash reports showed stack frames of the form: Thread 1: 0 dyld 0x18f08b5b4 _kernelrpc_mach_vm_protect_trap + 8 1 dyld 0x18f08f540 vm_protect + 52 2 dyld 0x18f0b87e0 lsl::MemoryManager::writeProtect(bool) + 204 3 dyld 0x18f0a7fe4 invocation function for block in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const + 932 4 dyld 0x18f0e629c invocation function for block in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const + 172 5 dyld 0x18f0d9c38 invocation function for block in dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const + 496 6 dyld 0x18f08c2dc dyld3::MachOFile::forEachLoadCommand(Diagnostics&, void (load_command const*, bool&) block_pointer) const + 300 7 dyld 0x18f0d8bcc dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const + 192 8 dyld 0x18f0db5a0 dyld3::MachOFile::forEachInitializerPointerSection(Diagnostics&, void (unsigned int, unsigned int, bool&) block_pointer) const + 160 9 dyld 0x18f0e5f90 dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const + 432 10 dyld 0x18f0a7bb4 dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const + 176 11 dyld 0x18f0af190 dyld4::JustInTimeLoader::runInitializers(dyld4::RuntimeState&) const + 36 12 dyld 0x18f0a8270 dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&, dyld3::Array<dyld4::Loader const*>&) const + 312 13 dyld 0x18f0ac560 dyld4::Loader::runInitializersBottomUpPlusUpwardLinks(dyld4::RuntimeState&) const::$_0::operator()() const + 180 14 dyld 0x18f0a8460 dyld4::Loader::runInitializersBottomUpPlusUpwardLinks(dyld4::RuntimeState&) const + 412 15 dyld 0x18f0c089c dyld4::APIs::dlopen_from(char const*, int, void*) + 2432 16 libjli.dylib 0x1025515b4 DoFooBar + 56 17 libjli.dylib 0x10254d2c0 Hello_World_Launch + 1160 18 helloworld 0x10250bbb4 main + 404 19 libjli.dylib 0x102552148 apple_main + 88 20 libsystem_pthread.dylib 0x18f4132e4 _pthread_start + 136 21 libsystem_pthread.dylib 0x18f40e0fc thread_start + 8 So, out of curiosity, have there been any known changes in the implementation of dlopen() which might explain the slowness? Like I noted, I don't have concrete numbers, but to quantify the slowness I don't think it's slower by a noticeable amount - maybe a few milli seconds. I guess what I am trying to understand is, whether there's anything that needs attention here.
3
0
456
Mar ’25
Kernel panic related to Watchdog in custom virtual file system
Hi. I am facing a panic in distributed virtual filesystem of my own making. The panic arises on attempt of copying a large folder, or writing a large file (both around 20gb). An important note here is that the amount of files we try to copy is larger than available space (for testing purposes, the virtual file system had a capacity of 18 gigabytes). The panic arises somewhere on 12-14gigabytes deep into copying. On the moment of panic, there are still several gigabytes of storage left. The problem is present for sure for such architectures and macOS versions: Sonoma 14.7.1 arm64e Monterey 12.7.5 arm64e Ventura 13.7.1 intel Part from panic log from Ventura 13.7.1 intel, with symbolicated addresses: panic(cpu 2 caller 0xffffff80191a191a): watchdog timeout: no checkins from watchdogd in 90 seconds (48 total checkins since monitoring last enabled) Panicked task 0xffffff907c99f698: 191 threads: pid 0: kernel_task Backtrace (CPU 2), panicked thread: 0xffffff86e359cb30, Frame : Return Address 0xffffffff001d7bb0 : 0xffffff8015e70c7d mach_kernel : _handle_debugger_trap + 0x4ad 0xffffffff001d7c00 : 0xffffff8015fc52e4 mach_kernel : _kdp_i386_trap + 0x114 0xffffffff001d7c40 : 0xffffff8015fb4df7 mach_kernel : _kernel_trap + 0x3b7 0xffffffff001d7c90 : 0xffffff8015e11971 mach_kernel : _return_from_trap + 0xc1 0xffffffff001d7cb0 : 0xffffff8015e70f5d mach_kernel : _DebuggerTrapWithState + 0x5d 0xffffffff001d7da0 : 0xffffff8015e70607 mach_kernel : _panic_trap_to_debugger + 0x1a7 0xffffffff001d7e00 : 0xffffff80165db9a3 mach_kernel : _panic_with_options + 0x89 0xffffffff001d7ef0 : 0xffffff80191a191a com.apple.driver.watchdog : IOWatchdog::userspacePanic(OSObject*, void*, IOExternalMethodArguments*) (.cold.1) 0xffffffff001d7f20 : 0xffffff80191a10a1 com.apple.driver.watchdog : IOWatchdog::checkWatchdog() + 0xd7 0xffffffff001d7f50 : 0xffffff80174f960b com.apple.driver.AppleSMC : SMCWatchDogTimer::watchdogThread() + 0xbb 0xffffffff001d7fa0 : 0xffffff8015e1119e mach_kernel : _call_continuation + 0x2e Kernel Extensions in backtrace: com.apple.driver.watchdog(1.0)[BD08CE2D-77F5-358C-8F0D-A570540A0BE7]@0xffffff801919f000->0xffffff80191a1fff com.apple.driver.AppleSMC(3.1.9)[DD55DA6A-679A-3797-947C-0B50B7B5B659]@0xffffff80174e7000->0xffffff8017503fff dependency: com.apple.driver.watchdog(1)[BD08CE2D-77F5-358C-8F0D-A570540A0BE7]@0xffffff801919f000->0xffffff80191a1fff dependency: com.apple.iokit.IOACPIFamily(1.4)[D342E754-A422-3F44-BFFB-DEE93F6723BC]@0xffffff8018446000->0xffffff8018447fff dependency: com.apple.iokit.IOPCIFamily(2.9)[481BF782-1F4B-3F54-A34A-CF12A822C40D]@0xffffff80188b6000->0xffffff80188e7fff Process name corresponding to current thread (0xffffff86e359cb30): kernel_task Boot args: keepsyms=1 Mac OS version: 22H221 Kernel version: Darwin Kernel Version 22.6.0: Thu Sep 5 20:48:48 PDT 2024; root:xnu-8796.141.3.708.1~1/RELEASE_X86_64 The origin of the problem is surely inside my filesystem. However, the panic happens not there but somewhere in watchdog. As far as I can tell, the source code for watchdog is not available for public. I can't understand what causes the panic. Let's say we have run out of space. Couldn't write data. Writing received a proper error message and aborted. That's what is expected. However, it is unclear for why the panic arises.
4
0
502
Feb ’25
Subdirectory navigation fails for several GUI apps on custom VFS.
Hi. I am developing a custom virtual file system and facing such behaviour: Upon using some graphical apps, for example Adobe Media Encoder, attempting to navigate inside my filesystem deeper than root folder will fail - nothing will happen on "double click" on that subfolder. Another problem, is that whether I try to re-navigate into root directory, it will be empty. The problem is not present for most GUI apps - for example navigation inside Finder, upon choosing download path for file in Safari, apps like Microsoft Word, Excel and other range of applications work totally correctly. A quick note here. From what I have seen - all apps that work correctly actually have calls to VFS_VGET - a predefined vfs layer hook. Whether the Adobe Media Encoder does not call for it - neither in my filesystem, nor in Samba, so my guess is that some applications have different browsing and retrieving algorithm. Is there anything I should examine further ? Default routines (vnop_open, vnop_lookup, vnop_readdir, vnop_close) behave as expected, without any errors. P.S. This application (Adobe Media Encoder) works properly on Samba.
3
0
386
Feb ’25
Missing Developer Kit for build 22H417
I cannot find this specific KDK for my build 22H417. I need help locating and downloading this Developer Kit. Error Domain=KMErrorDomain Code=34 "Missing Developer Kit: As of macOS 13.0, you will need to install a KDK matching your build 22H417 to rebuild kernel collections." UserInfo={NSLocalizedDescription=Missing Developer Kit: As of macOS 13.0, you will need to install a KDK matching your build 22H417 to rebuild kernel collections.} I
0
0
329
Feb ’25