Develop kernel-resident device drivers and kernel extensions using Kernel.

Posts under Kernel tag

40 Posts

Post

Replies

Boosts

Views

Activity

Hardlinks reported as non-existing on macOS Sequoia for 3rd party FS
After creating a hardlink on a distributed filesystem of my own via: % ln f.txt hlf.txt Neither the original file, f.txt, nor the hardlink, hlf.txt, are immediately accessible, e.g. via cat(1) with ENOENT returned. A short time later though, both the original file and the hardlink are accessible. Both files can be stat(1)ed though, which confirms that vnop_getattr returns success for both files. Dtruss(1) indicates it's the open(2) syscall that fails: % sudo dtruss -f cat hlf.txt 2038/0x4f68: open("hlf.txt\0", 0x0, 0x0) = -1 Err#2 ;ENOENT 2038/0x4f68: write_nocancel(0x2, "cat: \0", 0x5) = 5 0 2038/0x4f68: write_nocancel(0x2, "hlf.txt\0", 0x7) = 7 0 2038/0x4f68: write_nocancel(0x2, ": \0", 0x2) = 2 0 2038/0x4f68: write_nocancel(0x2, "No such file or directory\n\0", 0x1A) = 26 0 Dtrace(1)ing my KEXT no longer works on macOS Sequoia, so based on the diagnostics print statements I inserted into my KEXT, the following sequence of calls is observed: vnop_lookup(hlf.txt) -> EJUSTRETURN ;ln(1) vnop_link(hlf.txt) -> KERN_SUCCESS ;ln(1) vnop_lookup(hlf.txt) -> KERN_SUCCESS ;cat(1) vnop_open(/) ; I expected to see vnop_open(hlf.txt) here instead of the parent directory. Internally, hardlinks are created in vnop_link via a call to vnode_setmultipath with cache_purge_negatives called on the destination directory. On macOS Monterey for example, where the same code does result in hardlinks being accessible, the following calls are made: vnop_lookup(hlf.txt) -> EJUSTRETURN ;ln(1) vnop_link(hlf.txt) -> KERN_SUCCESS ;ln(1) vnop_lookup(hlf.txt) -> KERN_SUCCESS ;cat(1) vnop_open(hlf.txt) -> KERN_SUCCESS ;cat(1) Not sure how else to debug this. Perusing the kernel sources for uses of VISHARDLINK, VNOP_LINK and vnode_setmultipath call sites did not clear things up for me. Any pointers would be greatly appreciated.
3
0
295
Jul ’25
Support for Multi-Homed IPv6 Networks, esp. RFC 8028
Hi everyone, I’m running a dual-homed IPv6-mostly LAN where two on-link routers advertise distinct global Provider-Assigned prefixes (one per ISP). On Linux, the host stack appears to follow RFC 8028. It keeps one default route per prefix, and packets appear to leave through a router that recognises their source address and pass ISP BCP 38 (https://datatracker.ietf.org/doc/bcp38/) checks. On macOS Sequoia, I'm only seeing a single un-scoped default route. As a result, traffic sourced from prefix B often exits via router A and is dropped upstream. Questions: Is the single-default-per-interface model in macOS an intentional design choice or simply legacy behaviour that has not yet been updated to RFC 8028? Does the kernel perform any hidden next-hop selection that isn’t reflected in netstat -rn output? Are there any road-map items for fully adopting RFC 8028 in macOS? As a bonus, I'd be very interested in any info you might be able to provide on the status of implementation/support for https://datatracker.ietf.org/doc/html/rfc8978 (Reaction of IPv6 Stateless Address Autoconfiguration (SLAAC) to Flash-Renumbering Events).
2
0
160
Jul ’25
VNOP_MONITOR+vnode_notify() operation details
After perusing the sources of Apple's SMB and NFS clients' implementation of VNOP_MONITOR, my understanding of how VNOP_MONITOR+vnode_notify() operate is as follows: A user-space process advertises an interest in monitoring a file or directory via kqueue(2)/kevent(2). VFS calls the filesystem's implementation of VNOP_MONITOR. VNOP_MONITOR forwards the commencing or terminating of monitoring events request to the filesystem server. Network filesystem client nodes call vnode_notify() to notify the underlying VFS of a filesystem event, e.g. file/directory creation/removal, etc. What I'm still vague about is how does the server communicate back to client nodes that an event of interest has occurred? I'd appreciate being enlightened on the operation of `VNOP_MONITOR+vnode_notify()' in a network filesystem setting.
1
0
174
Aug ’25
Pinpointing dandling pointers in 3rd party KEXTs
I'm debugging the following kernel panic to do with my custom filesystem KEXT: panic(cpu 0 caller 0xfffffe004cae3e24): [kalloc.type.var4.128]: element modified after free (off:96, val:0x00000000ffffffff, sz:128, ptr:0xfffffe2e7c639600) My reading of this is that somewhere in my KEXT I'm holding a reference 0xfffffe2e7c639600 to a 128 byte zone that wrote 0x00000000ffffffff at offset 96 after that particular chunk of memory had been released and zeroed out by the kernel. The panic itself is emitted when my KEXT requests the memory chunk that's been tempered with via the following set of calls. zalloc_uaf_panic() __abortlike static void zalloc_uaf_panic(zone_t z, uintptr_t elem, size_t size) { ... (panic)("[%s%s]: element modified after free " "(off:%d, val:0x%016lx, sz:%d, ptr:%p)%s", zone_heap_name(z), zone_name(z), first_offs, first_bits, esize, (void *)elem, buf); ... } zalloc_validate_element() static void zalloc_validate_element( zone_t zone, vm_offset_t elem, vm_size_t size, zalloc_flags_t flags) { ... if (memcmp_zero_ptr_aligned((void *)elem, size)) { zalloc_uaf_panic(zone, elem, size); } ... } The panic is triggered if memcmp_zero_ptr_aligned(), which is implemented in assembly, detects that an n-sized chunk of memory has been written after being free'd. /* memcmp_zero_ptr_aligned() checks string s of n bytes contains all zeros. * Address and size of the string s must be pointer-aligned. * Return 0 if true, 1 otherwise. Also return 0 if n is 0. */ extern int memcmp_zero_ptr_aligned(const void *s, size_t n); Normally, KASAN would be resorted to to aid with that. The KDK README states that KASAN kernels won't load on Apple Silicon. Attempting to follow the instructions given in the README for Intel-based machines does result in a failure for me on Apple Silicon. I stumbled on the Pishi project. But the custom boot kernel collection that gets created doesn't have any of the KEXTs that were specified to kmutil(8) via the --explicit-only flag, so it can't be instrumented in Ghidra. Which is confirmed as well by running: % kmutil inspect -B boot.kc.kasan boot kernel collection at /Users/user/boot.kc.kasan (AEB8F757-E770-8195-458D-B87CADCAB062): Extension Information: I'd appreciate any pointers on how to tackle UAFs in kernel space.
5
0
421
Sep ’25
Blocking USB Devices on macOS – DriverKit or Other Recommended Approach
Hi Apple, We are working on a general USB device management solution on macOS for enterprise security. Our goal is to enforce policy-based restrictions on USB devices, such as: For USB storage devices: block mount, read, or write access. For other peripherals (e.g., USB headsets or microphones, raspberry pi, etc): block usage entirely. We know in past, kernel extension would be the way to go, but as kext has been deprecated. And DriverKit is the new advertised framework. At first, DriverKit looked like the right direction. However, after reviewing the documentation more closely, we noticed that using DriverKit for USB requires specific entitlements: DriverKit USB Transport – VendorID DriverKit USB Transport – VendorID and ProductID This raises a challenge: if our solution is meant to cover all types of USB devices, we would theoretically need entitlements for every VendorID/ProductID in existence. My questions are: Is DriverKit actually the right framework for this kind of general-purpose USB device control? If not, what framework or mechanism should we be looking at for enforcing these kinds of policies? We also developed an Endpoint Security product, but so far we haven’t found a relevant Endpoint Security event type that would allow us to achieve this. Any guidance on the correct technical approach would be much appreciated. Thanks in advance for your help.
6
0
314
Sep ’25
macos 15.6.1 - BSD sendto() fails for IPv4-mapped IPv6 addresses
There appears to be some unexplained change in behaviour in the recent version of macos 15.6.1 which is causing the BSD socket sendto() syscall to no longer send the data when the source socket is bound to a IPv4-mapped IPv6 address. I have attached a trivial native code which reproduces the issue. What this reproducer does is explained as a comment on that code's main() function: // Creates a AF_INET6 datagram socket, marks it as dual socket (i.e. IPV6_V6ONLY = 0), // then binds the socket to a IPv4-mapped IPv6 address (chosen on the host where this test runs). // // The test then uses sendto() to send some bytes. For the sake of this test, it uses the same IPv4-mapped // IPv6 address as the destination address to sendto(). The test then waits for (a maximum of) 15 seconds to // receive that sent message by calling recvfrom(). // // The test passes on macos (x64 and aarch64) hosts of versions 12.x, 13.x, 14.x and 15.x upto 15.5. // Only on macos 15.6.1 and the recent macos 26, the test fails. Specifically, the first message that is // sent using sendto() is never sent (and thus the recvfrom()) times out. sendto() however returns 0, // incorrectly indicating a successful send. Interesting, if you repeat sendto() a second message from the // same bound socket to the exact same destination address, the send message is indeed correctly sent and // received immediately by the recvfrom(). It's only the first message which goes missing (the test uses // unique content in each message to be sure which exact message was received and it has been observed that // only the second message is received and the first one lost). // // Logs collected using "sudo log collect --last 2m" (after the test program returns) shows the following log // message, which seem relevant: // ... // default kernel cfil_hash_entry_log:6088 <CFIL: Error: sosend_reinject() failed>: // [86868 a.out] <UDP(17) out so 59faaa5dbbcef55d 127846646561221313 127846646561221313 age 0> // lport 65051 fport 65051 laddr 192.168.1.2 faddr 192.168.1.2 hash 201AAC1 // default kernel cfil_service_inject_queue:4472 CFIL: sosend() failed 22 // ... // As noted, this test passes without issues on various macosx version (12 through 15.5), both x64 and aarch64 but always fails against 15.6.1. I have been told that it also fails on the recently released macos 26 but I don't have access to such host to verify it myself. The release notes don't usually contain this level of detail, so it's hard to tell if something changed intentionally or if this is a bug. Should I report this through the feedback assistant? Attached is the source of the reproducer, run it as: clang dgramsend.c ./a.out On macos 15.6.1, you will see that it will fail to send (and thus receive) the message on first attempt but the second one passes: ... created and bound a datagram dual socket to ::ffff:192.168.1.2:65055 ::ffff:192.168.1.2:65055 sendto() ::ffff:192.168.1.2:65055 ---- Attempt 1 ---- sending greeting "hello 1" sendto() succeeded, sent 8 bytes calling recvfrom() receive timed out --------------------- ---- Attempt 2 ---- sending greeting "hello 2" sendto() succeeded, sent 8 bytes calling recvfrom() received 8 bytes: "hello 2" --------------------- TEST FAILED ... The output "log collect --last 2m" contains a related error (and this log message consistently shows up every time you run that reproducer): ... default kernel cfil_hash_entry_log:6088 <CFIL: Error: sosend_reinject() failed>: [86248 a.out] <UDP(17) out so 59faaa5dbbcef55d 127846646561221313 127846646561221313 age 0> lport 65055 fport 65055 laddr 192.168.1.2 faddr 192.168.1.2 hash 201AAC1 default kernel cfil_service_inject_queue:4472 CFIL: sosend() failed 22 ... I don't know what it means though. dgramsend.c
2
0
325
Sep ’25
macOS 26 kernel open source?
Hi! I was wondering if there will be new XNU version for macOS 26 published open source? As far as I remember, previous version's source code was published the moment the OS was officially released, but not this time. If yes, when we can expect it to be published?
1
0
290
Sep ’25
How to connect to a IOUSBHostInterface
I have poked around the web looking for a good example to do this and I haven't found a working example. I need to connect to a USB Device, its multiple ports and supports what looks to be a root port and 4 other ports I am no expert in USB but I do know how to write a kext and client drivers, but thats really not the way to solve this. I need to display the serialized output from these USB ports for a development board. I would rather do this on my Mac than have to cobble up a Linux machine and mess around with Linux. Here is the output from ioreg MCHP-Debug@03100000 <class IOUSBHostDevice, id 0x105f6fdc2, registered, matched, active, busy 0 (20 ms), retain 27> MCHP-Debug@0 <class IOUSBHostInterface, id 0x105f6fdc8, registered, matched, active, busy 0 (13 ms), retain 5> +-o MCHP-Debug@0 <class IOUSBHostInterface, id 0x105f6fdc8, registered, matched, active, busy 0 (13 ms), retain 5> +-o MCHP-Debug@1 <class IOUSBHostInterface, id 0x105f6fdc9, registered, matched, active, busy 0 (11 ms), retain 5> +-o MCHP-Debug@2 <class IOUSBHostInterface, id 0x105f6fdcb, registered, matched, active, busy 0 (9 ms), retain 5> | | | | | +-o MCHP-Debug@3 <class IOUSBHostInterface, id 0x105f6fdcc, registered, matched, active, busy 0 (7 ms), retain 5> I have been able to open a inservice to the device at the top level, but I get an error when I use. usbHostInterface = [[IOUSBHostInterface alloc] initWithIOService:usbDevice options: IOUSBHostObjectInitOptionsNone queue: queue error: &error interestHandler: handler]; Error:Failed to create IOUSBHostInterface. with reason: Unable to obtain configuration descriptor. Assertion failed: (usbHostInterface), function main, file main.m, line 87. I started using DeviceKit but I received signing errors and I shouldn't have to go down that path just to dump data from a USB port? Any suggestions would be great, most of the Apple documentation on USB ports is like 20 years old and the new stuff pushes you towards DeviceKit.
1
0
540
Dec ’25
Entitlement for extension to have read-only access to host's task?
Hi all, I'm building an iOS app extension using ExtensionKit that works exclusively with its containing host app, presenting UI via EXHostViewController. I'd like the extension to have read-only access to the host's task for process introspection purposes. I'm aware this would almost certainly require a special entitlement. I know get-task-allow and the debugger entitlement exist, but those aren't shippable to the App Store. I'm looking for something that could realistically be distributed to end users. My questions: Does an entitlement exist (or is one planned) that would grant an extension limited, read-only access to its host's task—given the extension is already tightly coupled to the host? If not, is this something Apple would consider adding? The use case is an extension that needs to inspect host process state without the ability to modify it. Is there a path to request such an entitlement through the provisioning profile process, or is this fundamentally off the table for App Store distribution? It seems like a reasonable trust boundary given the extension already lives inside the host's app bundle, but I understand the security implications. Any insight appreciated. Thanks!
8
0
388
Jan ’26
macos 26 - socket() syscall causes ENOBUFS "No buffer space available" error
As part of the OpenJDK testing we run several regression tests, including for Java SE networking APIs. These APIs ultimately end up calling BSD socket functions. On macos, starting macos 26, including on recent 26.2 version, we have started seeing some unexplained but consistent exception from one of these BSD socket APIs. We receive a "ENOBUFS" errno (No buffer space available) when trying to construct a socket(). These exact same tests continue to pass on many other older versions of macos (including 15.7.x). After looking into this more, we have been able to narrow this down to a very trivial C code which is as follows (also attached): #include <stdio.h> #include <sys/socket.h> #include <string.h> #include <unistd.h> #include <sys/errno.h> static int create_socket(const int attempt_number) { const int fd = socket(AF_INET6, SOCK_STREAM, 0); if (fd < 0) { fprintf(stderr, "socket creation failed on attempt %d," " due to: %s\n", attempt_number, strerror(errno)); return fd; } return fd; } int main() { const unsigned int num_times = 250000; for (unsigned int i = 1; i <= num_times; i++) { const int fd = create_socket(i); if (fd < 0) { return -1; } close(fd); } fprintf(stderr, "successfully created and closed %d sockets\n", num_times); } The code very trivially creates a socket() and close()s it. It does this repeatedly in a loop for a certain number of iterations. Compiling this as: clang sockbufspaceerr.c -o sockbufspaceerr.o and running it as: ./sockbufspaceerr.o consistently generates an error as follows on macos 26.x: socket creation failed on attempt 160995, due to: No buffer space available The iteration number on which the socket() creation fails varies, but the issue does reproduce. Running the same on older versions of macos doesn't reproduce the issue and the program terminates normally after those many iterations. Looking at the xnu source that is made available for each macos release here https://opensource.apple.com/releases/, I see that for macos 26.x there have been changes in this kernel code and there appears to be some kind of memory accountability code introduced in this code path. However, looking at the reproducer/application code in question, I believe it uses the right set of functions to both create as well as release the resources, so I can't see why this should cause the above error in macos 26.x. Does this look like some issue that needs attention in the macos kernel and should I report it through feedback assitant tool?
4
0
356
Jan ’26
FSKit questions and clarifications
I work on EdenFS, an open-source Virtual Filesystem that runs on macOS, Linux, and Windows. My team is very interested in using FSKit as the basis for EdenFS on macOS, but have found the documentation to be lacking and contains some mixed messaging on the future of FSKit. Below are a few questions that don’t seem to be fully covered by the current documentation: Does FSKit support process attribution? Each FUSE request provides a requester Process ID (and other information) through the fuse_in_header structure. Does FSKit pass similar information along for each request? Does the reclaimItem API function similarly to FUSE’s forget operation? If not, what are the differences? See #1 below for why forget/reclaimItem matters to us. Is Apple committed to releasing and supporting FSKit? Is there any timeline for release that we can plan around? Does FSKit have known performance/scalability limitations? We provide alternative methods that clients can use to make bulk requests to EdenFS, but some clients will necessarily be unable to use those and stress the default filesystem APIs. Throughput (on the order of tens of thousands of filesystem requests per minute) and request size are the main concerns, followed closely by directory size restrictions. Why we’re interested in FSKit As mentioned above, my team supports EdenFS on 3 platforms. On Linux, we utilize FUSE; on Windows, we utilize ProjectedFS; and on macOS, we’ve utilized a few different solutions in the past. We first utilized the macFUSE kext, which was great while it lasted. Due to (understandable) changes in supporting kernel extensions, we were forced to move to NFS version 3. NFS has been lackluster in comparison (and our initial investigations show that NFS version 4(.2) would be similar). We have had numerous scalability and reliability issues, some listed below: NFS does not provide a forget API similar to FUSE. EdenFS is forced to remember all file handles that have been loaded because the kernel never informs us when all references to that file handle have been dropped. We can hackily infer that a file handle should never be referenced again in some cases, but a large number of file handles end up being remembered forever. Many of our algorithms scale with the number of file handles that Eden has to consider, and therefore performance issues are inevitable after some time. NFS does not provide information about clients (requesters). We cannot tell which processes are sending EdenFS requests. This attribution is important due to issue #1. We are forced to work with tool owners to modify their applications to be VFS-friendly. If we can’t track down which tools are behaving poorly, they will continue to load excess file handles and cause performance issues. NFS “Server connections interrupted:” dialog during heavy load. Under heavy load, either EdenFS or system-wide, our users experience this dialog pop-up and are confused as to how they should respond (Ignore or Disconnect All). They become blocked in their work, and will be further blocked if they click “Disconnect All” as that unmounts their EdenFS mount. This forces them to restart EdenFS or reboot their laptop to remediate the issue. The above issues make us extremely motivated to use FSKit and partner with Apple to flesh out the final version of the FSKit API. Our use case likely mirrors what other user-space filesystems will be looking for in the FSKit API (albeit at a larger scale than most), and we’re willing to collaborate to work out any issues in the current FSKit offerings.
4
1
1.8k
Jun ’25
FSKit caching by kernel and performance
I've faced with some performance issues developing my readonly filesystem using fskit. For below screenshot: enumerateDirectory returns two hardcoded items, compiled with release config 3000 readdirsync are done from nodejs. macos 15.5 (24F74) I see that getdirentries syscall takes avg 121us. Because all other variables are minimised, it seems like it's fskit<->kernel overhead. This itself seems like a big number. I need to compare it with fuse though to be sure. But what fuse has and fskit seams don't (I checked every page in fskit docs) is kernel caching. Fuse supports: caching lookups (entry_timeout) negative lookups (entry_timeout) attributes (attr_timeout) readdir (via opendir cache_readdir and keep_cache) read and write ops but thats another topic. And afaik it works for both readonly and read-write file systems, because kernel can assume (if client is providing this) that cache is valid until kernel do write operations on corresponding inodes (create, setattr, write, etc). Questions are: is 100+us reasonable overhead for fskit? is there any way to do caching by kernel. If not currently, any plans to implement? Also, additional performance optimisation could be done by providing lower level api when we can operate with raw inodes (Uint64), this will eliminate overhead from storing, removing and retrieving FSItems in hashmap.
2
1
274
Jul ’25
Instructions for debugging recent macos kernel versions?
Is there any recent and a bit authoritative documentation which explains how to debug recent versions of macos kernel? I have found some blog posts from other users but those are either outdated or don't work for some other reason. I am guessing kernel debugging is pretty common for developers working on macos itself, so I'm hoping someone in this forum would have some working instructions for that.
9
1
460
Oct ’25
Kernel panic when using fclonefileat from ES
Hi, I am developing instant snapshot backup solution for macOS using Endpoint Security. We have stumbled upon a Kernel Panic when using "fclonefileat" API. We are catching a kernel panic on customer machines when attempting to clone the file during ES sync callback: panic(cpu 0 caller 0xfffffe002c495508): "apfs_io_lock_exclusive : Recursive exclusive lock attempt" @fs_utils.c:435 I have symbolized the backtrace to know it is related to clone operation with the following backtrace: apfs_io_lock_exclusive apfs_clone_internal apfs_vnop_clonefile I made a minimal repro that boils down to the following operations: apfs_crash_stress - launch thread to do rsrc writes static void *rsrc_write_worker(void *arg) { int id = (int)(long)arg; char buf[8192]; long n = 0; fill_pattern(buf, sizeof(buf), 'W' + id); while (n < ITERATION_LIMIT) { int file_idx = n % NUM_SOURCE_FILES; int fd = open(g_src_rsrc[file_idx], O_WRONLY | O_CREAT, 0644); if (fd >= 0) { off_t off = ((n * 4096) % RSRC_DATA_SIZE); pwrite(fd, buf, sizeof(buf), off); if ((n & 0x7) == 0) fsync(fd); close(fd); } else { setxattr(g_src[file_idx], "com.apple.ResourceFork", buf, sizeof(buf), 0, 0); } n++; } printf("[rsrc_wr_%d] done (%ld ops)\n", id, n); return NULL; } apfs_crash_es - simple ES client that is cloning the file (error checking omitted for brevity) static std::string volfsPath(uint64_t devId, uint64_t vnodeId) { return "/.vol/" + std::to_string(devId) + "/" + std::to_string(vnodeId); } static void cloneAndScheduleDelete(const std::string& sourcePath, dispatch_queue_t queue, uint64_t devId, uint64_t vnodeId) { struct stat st; if (stat(sourcePath.c_str(), &st) != 0 || !S_ISREG(st.st_mode)) return; int srcFd = open(sourcePath.c_str(), O_RDONLY); const char* cloneDir = "/Users/admin/Downloads/_clone"; mkdir(cloneDir, 0755); const char* filename = strrchr(sourcePath.c_str(), '/'); filename = filename ? filename + 1 : sourcePath.c_str(); std::string cloneFilename = std::string(filename) + ".clone." + std::to_string(time(nullptr)) + "." + std::to_string(getpid()); std::string clonePath = std::string(cloneDir) + "/" + cloneFilename; fclonefileat(srcFd, AT_FDCWD, clonePath.c_str(), 0); { dispatch_after(dispatch_time(DISPATCH_TIME_NOW, 1 * NSEC_PER_SEC), queue, ^{ if (unlink(clonePath.c_str()) == 0) { LOG("Deleted clone: %s", clonePath.c_str()); } else { LOG("Failed to delete clone: %s", clonePath.c_str()); } }); } close(srcFd); } static const es_file_t* file(const es_message_t* msg) { switch (msg->event_type) { case ES_EVENT_TYPE_AUTH_OPEN: return msg->event.open.file; case ES_EVENT_TYPE_AUTH_EXEC: return msg->event.exec.target->executable; case ES_EVENT_TYPE_AUTH_RENAME: return msg->event.rename.source; } return nullptr; } int main(void) { es_client_t* cli; auto ret = es_new_client(&cli, ^(es_client_t* client, const es_message_t * msgc) { if (msgc->process->is_es_client) { es_mute_process(client, &msgc->process->audit_token); return respond(client, msgc, true); } dispatch_async(esQueue, ^{ bool shouldClone = false; if (msgc->event_type == ES_EVENT_TYPE_AUTH_OPEN) { auto& ev = msgc->event.open; if (ev.fflag & (FWRITE | O_RDWR | O_WRONLY | O_TRUNC | O_APPEND)) { shouldClone = true; } } else if (msgc->event_type == ES_EVENT_TYPE_AUTH_UNLINK || msgc->event_type == ES_EVENT_TYPE_AUTH_RENAME) { shouldClone = true; } if (shouldClone) { if (auto f = ::file(msgc)) cloneAndScheduleDelete(f->path.data, cloneQueue, f->stat.st_dev, f->stat.st_ino); } respond(client, msgc, true); }); }); LOG("es_new_client -> %d", ret); es_event_type_t events[] = { ES_EVENT_TYPE_AUTH_OPEN, ES_EVENT_TYPE_AUTH_EXEC, ES_EVENT_TYPE_AUTH_RENAME, ES_EVENT_TYPE_AUTH_UNLINK, }; es_subscribe(cli, events, sizeof(events) / sizeof(*events)); } Create 2 terminal sessions and run the following commands: % sudo ./apfs_crash_es % sudo ./apfs_crash_stress ~/Downloads/test/ Machine will very quickly panic due to APFS deadlock. I expect that no userspace syscall should be able to cause kernel panic. It looks like a bug in APFS implementation and requires fix on XNU/kext side. We were able to reproduce this issue on macOS 26.3.1/15.6.1 on Intel/ARM machines. Here is the panic string: panic_string.txt Source code without XCode project: apfs_crash_es.cpp apfs_crash_stress.cpp Full XCode project + full panic is available at https://www.icloud.com/iclouddrive/0f215KkZffPOTLpETPo-LdaXw#apfs%5Fcrash%5Fes
3
0
103
1w
KDK for current stable version (26.1) missing
The current stable macOS version, 26.1 (build 25B78) is missing a corresponding Kernel Debug Kit (KDK) on the developer downloads page. This means I can't do any kernel-level development tasks currently. For example, if I try to build a new kernel collection with kmutil I get the message Missing Developer Kit: As of macOS 13.0, you will need to install a KDK matching your build 25B78 to rebuild kernel collections. but there is no build 25B78 KDK available to download. The latest 26.1 KDK on the download page is 25B5062e (from a beta I believe) and the final stable KDK for build 25B78 (which kernel development tools require) was never published. Is there any workaround for this to correctly do kernel-level development targeting the latest stable release, or a timeline for when the KDK will release? Thanks!
0
3
332
Nov ’25
Incorrect packet handling in SMBClient MacOS 26.
SMBClient-593 introduces a crtitical bug. When reading and writing data at high volume, the SMBClient no longer properly receives and handle responses from the server. In some cases, the client mishandles the response packet and the following errors are seen in the logs: 2025-12-02 21:36:04.774772-0700 localhost kernel[0]: (smbfs) smb2_smb_parse_write_one: Bad struct size: 0 2025-12-02 21:36:04.774776-0700 localhost kernel[0]: (smbfs) smb2_smb_write: smb2_smb_read_write_async failed with an error 72 2025-12-02 21:36:04.774777-0700 localhost kernel[0]: (smbfs) smbfs_do_strategy: file.txt: WRITE failed with an error of 72 In other cases, the client mishandles the response packet and becomes completely unresponsive, unable to send or receive additional messages, and a forced shutdown of the computer is required to recover. This bug is only present on macos 26. We believe the operative change is in the latest commit, SMBClient-593 beginning at line now 3011 in smb_iod.c. The issue seems to be a race, and occurs much more frequently once throughput exceeds around 10Gbps, and again more frequently above 20Gbps.
6
7
388
Jan ’26
How Can I Check if a FD is Guarded?
How does one check if a file descriptor is guarded? Is there any guarded FD numbers that are determinate? I've seen 12 being NPOLICY in a few things -- is there documentation for which FDs might be guarded? Thanks. The platform is Mac Catalyst.
Replies
1
Boosts
0
Views
220
Activity
May ’25
Kernel version in headers
Does macOS have anything like the FreeBSD macro _FreeBSD_version? That's a macro that gets bumped whenever kernel interfaces change, see https://docs.freebsd.org/en/books/porters-handbook/versions/
Replies
1
Boosts
0
Views
105
Activity
May ’25
Hardlinks reported as non-existing on macOS Sequoia for 3rd party FS
After creating a hardlink on a distributed filesystem of my own via: % ln f.txt hlf.txt Neither the original file, f.txt, nor the hardlink, hlf.txt, are immediately accessible, e.g. via cat(1) with ENOENT returned. A short time later though, both the original file and the hardlink are accessible. Both files can be stat(1)ed though, which confirms that vnop_getattr returns success for both files. Dtruss(1) indicates it's the open(2) syscall that fails: % sudo dtruss -f cat hlf.txt 2038/0x4f68: open("hlf.txt\0", 0x0, 0x0) = -1 Err#2 ;ENOENT 2038/0x4f68: write_nocancel(0x2, "cat: \0", 0x5) = 5 0 2038/0x4f68: write_nocancel(0x2, "hlf.txt\0", 0x7) = 7 0 2038/0x4f68: write_nocancel(0x2, ": \0", 0x2) = 2 0 2038/0x4f68: write_nocancel(0x2, "No such file or directory\n\0", 0x1A) = 26 0 Dtrace(1)ing my KEXT no longer works on macOS Sequoia, so based on the diagnostics print statements I inserted into my KEXT, the following sequence of calls is observed: vnop_lookup(hlf.txt) -&gt; EJUSTRETURN ;ln(1) vnop_link(hlf.txt) -&gt; KERN_SUCCESS ;ln(1) vnop_lookup(hlf.txt) -&gt; KERN_SUCCESS ;cat(1) vnop_open(/) ; I expected to see vnop_open(hlf.txt) here instead of the parent directory. Internally, hardlinks are created in vnop_link via a call to vnode_setmultipath with cache_purge_negatives called on the destination directory. On macOS Monterey for example, where the same code does result in hardlinks being accessible, the following calls are made: vnop_lookup(hlf.txt) -&gt; EJUSTRETURN ;ln(1) vnop_link(hlf.txt) -&gt; KERN_SUCCESS ;ln(1) vnop_lookup(hlf.txt) -&gt; KERN_SUCCESS ;cat(1) vnop_open(hlf.txt) -&gt; KERN_SUCCESS ;cat(1) Not sure how else to debug this. Perusing the kernel sources for uses of VISHARDLINK, VNOP_LINK and vnode_setmultipath call sites did not clear things up for me. Any pointers would be greatly appreciated.
Replies
3
Boosts
0
Views
295
Activity
Jul ’25
Support for Multi-Homed IPv6 Networks, esp. RFC 8028
Hi everyone, I’m running a dual-homed IPv6-mostly LAN where two on-link routers advertise distinct global Provider-Assigned prefixes (one per ISP). On Linux, the host stack appears to follow RFC 8028. It keeps one default route per prefix, and packets appear to leave through a router that recognises their source address and pass ISP BCP 38 (https://datatracker.ietf.org/doc/bcp38/) checks. On macOS Sequoia, I'm only seeing a single un-scoped default route. As a result, traffic sourced from prefix B often exits via router A and is dropped upstream. Questions: Is the single-default-per-interface model in macOS an intentional design choice or simply legacy behaviour that has not yet been updated to RFC 8028? Does the kernel perform any hidden next-hop selection that isn’t reflected in netstat -rn output? Are there any road-map items for fully adopting RFC 8028 in macOS? As a bonus, I'd be very interested in any info you might be able to provide on the status of implementation/support for https://datatracker.ietf.org/doc/html/rfc8978 (Reaction of IPv6 Stateless Address Autoconfiguration (SLAAC) to Flash-Renumbering Events).
Replies
2
Boosts
0
Views
160
Activity
Jul ’25
VNOP_MONITOR+vnode_notify() operation details
After perusing the sources of Apple's SMB and NFS clients' implementation of VNOP_MONITOR, my understanding of how VNOP_MONITOR+vnode_notify() operate is as follows: A user-space process advertises an interest in monitoring a file or directory via kqueue(2)/kevent(2). VFS calls the filesystem's implementation of VNOP_MONITOR. VNOP_MONITOR forwards the commencing or terminating of monitoring events request to the filesystem server. Network filesystem client nodes call vnode_notify() to notify the underlying VFS of a filesystem event, e.g. file/directory creation/removal, etc. What I'm still vague about is how does the server communicate back to client nodes that an event of interest has occurred? I'd appreciate being enlightened on the operation of `VNOP_MONITOR+vnode_notify()' in a network filesystem setting.
Replies
1
Boosts
0
Views
174
Activity
Aug ’25
Pinpointing dandling pointers in 3rd party KEXTs
I'm debugging the following kernel panic to do with my custom filesystem KEXT: panic(cpu 0 caller 0xfffffe004cae3e24): [kalloc.type.var4.128]: element modified after free (off:96, val:0x00000000ffffffff, sz:128, ptr:0xfffffe2e7c639600) My reading of this is that somewhere in my KEXT I'm holding a reference 0xfffffe2e7c639600 to a 128 byte zone that wrote 0x00000000ffffffff at offset 96 after that particular chunk of memory had been released and zeroed out by the kernel. The panic itself is emitted when my KEXT requests the memory chunk that's been tempered with via the following set of calls. zalloc_uaf_panic() __abortlike static void zalloc_uaf_panic(zone_t z, uintptr_t elem, size_t size) { ... (panic)("[%s%s]: element modified after free " "(off:%d, val:0x%016lx, sz:%d, ptr:%p)%s", zone_heap_name(z), zone_name(z), first_offs, first_bits, esize, (void *)elem, buf); ... } zalloc_validate_element() static void zalloc_validate_element( zone_t zone, vm_offset_t elem, vm_size_t size, zalloc_flags_t flags) { ... if (memcmp_zero_ptr_aligned((void *)elem, size)) { zalloc_uaf_panic(zone, elem, size); } ... } The panic is triggered if memcmp_zero_ptr_aligned(), which is implemented in assembly, detects that an n-sized chunk of memory has been written after being free'd. /* memcmp_zero_ptr_aligned() checks string s of n bytes contains all zeros. * Address and size of the string s must be pointer-aligned. * Return 0 if true, 1 otherwise. Also return 0 if n is 0. */ extern int memcmp_zero_ptr_aligned(const void *s, size_t n); Normally, KASAN would be resorted to to aid with that. The KDK README states that KASAN kernels won't load on Apple Silicon. Attempting to follow the instructions given in the README for Intel-based machines does result in a failure for me on Apple Silicon. I stumbled on the Pishi project. But the custom boot kernel collection that gets created doesn't have any of the KEXTs that were specified to kmutil(8) via the --explicit-only flag, so it can't be instrumented in Ghidra. Which is confirmed as well by running: % kmutil inspect -B boot.kc.kasan boot kernel collection at /Users/user/boot.kc.kasan (AEB8F757-E770-8195-458D-B87CADCAB062): Extension Information: I'd appreciate any pointers on how to tackle UAFs in kernel space.
Replies
5
Boosts
0
Views
421
Activity
Sep ’25
Blocking USB Devices on macOS – DriverKit or Other Recommended Approach
Hi Apple, We are working on a general USB device management solution on macOS for enterprise security. Our goal is to enforce policy-based restrictions on USB devices, such as: For USB storage devices: block mount, read, or write access. For other peripherals (e.g., USB headsets or microphones, raspberry pi, etc): block usage entirely. We know in past, kernel extension would be the way to go, but as kext has been deprecated. And DriverKit is the new advertised framework. At first, DriverKit looked like the right direction. However, after reviewing the documentation more closely, we noticed that using DriverKit for USB requires specific entitlements: DriverKit USB Transport – VendorID DriverKit USB Transport – VendorID and ProductID This raises a challenge: if our solution is meant to cover all types of USB devices, we would theoretically need entitlements for every VendorID/ProductID in existence. My questions are: Is DriverKit actually the right framework for this kind of general-purpose USB device control? If not, what framework or mechanism should we be looking at for enforcing these kinds of policies? We also developed an Endpoint Security product, but so far we haven’t found a relevant Endpoint Security event type that would allow us to achieve this. Any guidance on the correct technical approach would be much appreciated. Thanks in advance for your help.
Replies
6
Boosts
0
Views
314
Activity
Sep ’25
macos 15.6.1 - BSD sendto() fails for IPv4-mapped IPv6 addresses
There appears to be some unexplained change in behaviour in the recent version of macos 15.6.1 which is causing the BSD socket sendto() syscall to no longer send the data when the source socket is bound to a IPv4-mapped IPv6 address. I have attached a trivial native code which reproduces the issue. What this reproducer does is explained as a comment on that code's main() function: // Creates a AF_INET6 datagram socket, marks it as dual socket (i.e. IPV6_V6ONLY = 0), // then binds the socket to a IPv4-mapped IPv6 address (chosen on the host where this test runs). // // The test then uses sendto() to send some bytes. For the sake of this test, it uses the same IPv4-mapped // IPv6 address as the destination address to sendto(). The test then waits for (a maximum of) 15 seconds to // receive that sent message by calling recvfrom(). // // The test passes on macos (x64 and aarch64) hosts of versions 12.x, 13.x, 14.x and 15.x upto 15.5. // Only on macos 15.6.1 and the recent macos 26, the test fails. Specifically, the first message that is // sent using sendto() is never sent (and thus the recvfrom()) times out. sendto() however returns 0, // incorrectly indicating a successful send. Interesting, if you repeat sendto() a second message from the // same bound socket to the exact same destination address, the send message is indeed correctly sent and // received immediately by the recvfrom(). It's only the first message which goes missing (the test uses // unique content in each message to be sure which exact message was received and it has been observed that // only the second message is received and the first one lost). // // Logs collected using "sudo log collect --last 2m" (after the test program returns) shows the following log // message, which seem relevant: // ... // default kernel cfil_hash_entry_log:6088 <CFIL: Error: sosend_reinject() failed>: // [86868 a.out] <UDP(17) out so 59faaa5dbbcef55d 127846646561221313 127846646561221313 age 0> // lport 65051 fport 65051 laddr 192.168.1.2 faddr 192.168.1.2 hash 201AAC1 // default kernel cfil_service_inject_queue:4472 CFIL: sosend() failed 22 // ... // As noted, this test passes without issues on various macosx version (12 through 15.5), both x64 and aarch64 but always fails against 15.6.1. I have been told that it also fails on the recently released macos 26 but I don't have access to such host to verify it myself. The release notes don't usually contain this level of detail, so it's hard to tell if something changed intentionally or if this is a bug. Should I report this through the feedback assistant? Attached is the source of the reproducer, run it as: clang dgramsend.c ./a.out On macos 15.6.1, you will see that it will fail to send (and thus receive) the message on first attempt but the second one passes: ... created and bound a datagram dual socket to ::ffff:192.168.1.2:65055 ::ffff:192.168.1.2:65055 sendto() ::ffff:192.168.1.2:65055 ---- Attempt 1 ---- sending greeting "hello 1" sendto() succeeded, sent 8 bytes calling recvfrom() receive timed out --------------------- ---- Attempt 2 ---- sending greeting "hello 2" sendto() succeeded, sent 8 bytes calling recvfrom() received 8 bytes: "hello 2" --------------------- TEST FAILED ... The output "log collect --last 2m" contains a related error (and this log message consistently shows up every time you run that reproducer): ... default kernel cfil_hash_entry_log:6088 <CFIL: Error: sosend_reinject() failed>: [86248 a.out] <UDP(17) out so 59faaa5dbbcef55d 127846646561221313 127846646561221313 age 0> lport 65055 fport 65055 laddr 192.168.1.2 faddr 192.168.1.2 hash 201AAC1 default kernel cfil_service_inject_queue:4472 CFIL: sosend() failed 22 ... I don't know what it means though. dgramsend.c
Replies
2
Boosts
0
Views
325
Activity
Sep ’25
macOS 26 kernel open source?
Hi! I was wondering if there will be new XNU version for macOS 26 published open source? As far as I remember, previous version's source code was published the moment the OS was officially released, but not this time. If yes, when we can expect it to be published?
Replies
1
Boosts
0
Views
290
Activity
Sep ’25
No KDKs available for macOS 15.7.8 and 15.8
Any ideas where it could be published? Nothing on Developer Download, I presume they have been built but not publicly available.
Replies
1
Boosts
0
Views
156
Activity
Oct ’25
How to connect to a IOUSBHostInterface
I have poked around the web looking for a good example to do this and I haven't found a working example. I need to connect to a USB Device, its multiple ports and supports what looks to be a root port and 4 other ports I am no expert in USB but I do know how to write a kext and client drivers, but thats really not the way to solve this. I need to display the serialized output from these USB ports for a development board. I would rather do this on my Mac than have to cobble up a Linux machine and mess around with Linux. Here is the output from ioreg MCHP-Debug@03100000 <class IOUSBHostDevice, id 0x105f6fdc2, registered, matched, active, busy 0 (20 ms), retain 27> MCHP-Debug@0 <class IOUSBHostInterface, id 0x105f6fdc8, registered, matched, active, busy 0 (13 ms), retain 5> +-o MCHP-Debug@0 <class IOUSBHostInterface, id 0x105f6fdc8, registered, matched, active, busy 0 (13 ms), retain 5> +-o MCHP-Debug@1 <class IOUSBHostInterface, id 0x105f6fdc9, registered, matched, active, busy 0 (11 ms), retain 5> +-o MCHP-Debug@2 <class IOUSBHostInterface, id 0x105f6fdcb, registered, matched, active, busy 0 (9 ms), retain 5> | | | | | +-o MCHP-Debug@3 <class IOUSBHostInterface, id 0x105f6fdcc, registered, matched, active, busy 0 (7 ms), retain 5> I have been able to open a inservice to the device at the top level, but I get an error when I use. usbHostInterface = [[IOUSBHostInterface alloc] initWithIOService:usbDevice options: IOUSBHostObjectInitOptionsNone queue: queue error: &error interestHandler: handler]; Error:Failed to create IOUSBHostInterface. with reason: Unable to obtain configuration descriptor. Assertion failed: (usbHostInterface), function main, file main.m, line 87. I started using DeviceKit but I received signing errors and I shouldn't have to go down that path just to dump data from a USB port? Any suggestions would be great, most of the Apple documentation on USB ports is like 20 years old and the new stuff pushes you towards DeviceKit.
Replies
1
Boosts
0
Views
540
Activity
Dec ’25
Entitlement for extension to have read-only access to host's task?
Hi all, I'm building an iOS app extension using ExtensionKit that works exclusively with its containing host app, presenting UI via EXHostViewController. I'd like the extension to have read-only access to the host's task for process introspection purposes. I'm aware this would almost certainly require a special entitlement. I know get-task-allow and the debugger entitlement exist, but those aren't shippable to the App Store. I'm looking for something that could realistically be distributed to end users. My questions: Does an entitlement exist (or is one planned) that would grant an extension limited, read-only access to its host's task—given the extension is already tightly coupled to the host? If not, is this something Apple would consider adding? The use case is an extension that needs to inspect host process state without the ability to modify it. Is there a path to request such an entitlement through the provisioning profile process, or is this fundamentally off the table for App Store distribution? It seems like a reasonable trust boundary given the extension already lives inside the host's app bundle, but I understand the security implications. Any insight appreciated. Thanks!
Replies
8
Boosts
0
Views
388
Activity
Jan ’26
macos 26 - socket() syscall causes ENOBUFS "No buffer space available" error
As part of the OpenJDK testing we run several regression tests, including for Java SE networking APIs. These APIs ultimately end up calling BSD socket functions. On macos, starting macos 26, including on recent 26.2 version, we have started seeing some unexplained but consistent exception from one of these BSD socket APIs. We receive a "ENOBUFS" errno (No buffer space available) when trying to construct a socket(). These exact same tests continue to pass on many other older versions of macos (including 15.7.x). After looking into this more, we have been able to narrow this down to a very trivial C code which is as follows (also attached): #include <stdio.h> #include <sys/socket.h> #include <string.h> #include <unistd.h> #include <sys/errno.h> static int create_socket(const int attempt_number) { const int fd = socket(AF_INET6, SOCK_STREAM, 0); if (fd < 0) { fprintf(stderr, "socket creation failed on attempt %d," " due to: %s\n", attempt_number, strerror(errno)); return fd; } return fd; } int main() { const unsigned int num_times = 250000; for (unsigned int i = 1; i <= num_times; i++) { const int fd = create_socket(i); if (fd < 0) { return -1; } close(fd); } fprintf(stderr, "successfully created and closed %d sockets\n", num_times); } The code very trivially creates a socket() and close()s it. It does this repeatedly in a loop for a certain number of iterations. Compiling this as: clang sockbufspaceerr.c -o sockbufspaceerr.o and running it as: ./sockbufspaceerr.o consistently generates an error as follows on macos 26.x: socket creation failed on attempt 160995, due to: No buffer space available The iteration number on which the socket() creation fails varies, but the issue does reproduce. Running the same on older versions of macos doesn't reproduce the issue and the program terminates normally after those many iterations. Looking at the xnu source that is made available for each macos release here https://opensource.apple.com/releases/, I see that for macos 26.x there have been changes in this kernel code and there appears to be some kind of memory accountability code introduced in this code path. However, looking at the reproducer/application code in question, I believe it uses the right set of functions to both create as well as release the resources, so I can't see why this should cause the above error in macos 26.x. Does this look like some issue that needs attention in the macos kernel and should I report it through feedback assitant tool?
Replies
4
Boosts
0
Views
356
Activity
Jan ’26
FSKit questions and clarifications
I work on EdenFS, an open-source Virtual Filesystem that runs on macOS, Linux, and Windows. My team is very interested in using FSKit as the basis for EdenFS on macOS, but have found the documentation to be lacking and contains some mixed messaging on the future of FSKit. Below are a few questions that don’t seem to be fully covered by the current documentation: Does FSKit support process attribution? Each FUSE request provides a requester Process ID (and other information) through the fuse_in_header structure. Does FSKit pass similar information along for each request? Does the reclaimItem API function similarly to FUSE’s forget operation? If not, what are the differences? See #1 below for why forget/reclaimItem matters to us. Is Apple committed to releasing and supporting FSKit? Is there any timeline for release that we can plan around? Does FSKit have known performance/scalability limitations? We provide alternative methods that clients can use to make bulk requests to EdenFS, but some clients will necessarily be unable to use those and stress the default filesystem APIs. Throughput (on the order of tens of thousands of filesystem requests per minute) and request size are the main concerns, followed closely by directory size restrictions. Why we’re interested in FSKit As mentioned above, my team supports EdenFS on 3 platforms. On Linux, we utilize FUSE; on Windows, we utilize ProjectedFS; and on macOS, we’ve utilized a few different solutions in the past. We first utilized the macFUSE kext, which was great while it lasted. Due to (understandable) changes in supporting kernel extensions, we were forced to move to NFS version 3. NFS has been lackluster in comparison (and our initial investigations show that NFS version 4(.2) would be similar). We have had numerous scalability and reliability issues, some listed below: NFS does not provide a forget API similar to FUSE. EdenFS is forced to remember all file handles that have been loaded because the kernel never informs us when all references to that file handle have been dropped. We can hackily infer that a file handle should never be referenced again in some cases, but a large number of file handles end up being remembered forever. Many of our algorithms scale with the number of file handles that Eden has to consider, and therefore performance issues are inevitable after some time. NFS does not provide information about clients (requesters). We cannot tell which processes are sending EdenFS requests. This attribution is important due to issue #1. We are forced to work with tool owners to modify their applications to be VFS-friendly. If we can’t track down which tools are behaving poorly, they will continue to load excess file handles and cause performance issues. NFS “Server connections interrupted:” dialog during heavy load. Under heavy load, either EdenFS or system-wide, our users experience this dialog pop-up and are confused as to how they should respond (Ignore or Disconnect All). They become blocked in their work, and will be further blocked if they click “Disconnect All” as that unmounts their EdenFS mount. This forces them to restart EdenFS or reboot their laptop to remediate the issue. The above issues make us extremely motivated to use FSKit and partner with Apple to flesh out the final version of the FSKit API. Our use case likely mirrors what other user-space filesystems will be looking for in the FSKit API (albeit at a larger scale than most), and we’re willing to collaborate to work out any issues in the current FSKit offerings.
Replies
4
Boosts
1
Views
1.8k
Activity
Jun ’25
FSKit caching by kernel and performance
I've faced with some performance issues developing my readonly filesystem using fskit. For below screenshot: enumerateDirectory returns two hardcoded items, compiled with release config 3000 readdirsync are done from nodejs. macos 15.5 (24F74) I see that getdirentries syscall takes avg 121us. Because all other variables are minimised, it seems like it's fskit<->kernel overhead. This itself seems like a big number. I need to compare it with fuse though to be sure. But what fuse has and fskit seams don't (I checked every page in fskit docs) is kernel caching. Fuse supports: caching lookups (entry_timeout) negative lookups (entry_timeout) attributes (attr_timeout) readdir (via opendir cache_readdir and keep_cache) read and write ops but thats another topic. And afaik it works for both readonly and read-write file systems, because kernel can assume (if client is providing this) that cache is valid until kernel do write operations on corresponding inodes (create, setattr, write, etc). Questions are: is 100+us reasonable overhead for fskit? is there any way to do caching by kernel. If not currently, any plans to implement? Also, additional performance optimisation could be done by providing lower level api when we can operate with raw inodes (Uint64), this will eliminate overhead from storing, removing and retrieving FSItems in hashmap.
Replies
2
Boosts
1
Views
274
Activity
Jul ’25
Instructions for debugging recent macos kernel versions?
Is there any recent and a bit authoritative documentation which explains how to debug recent versions of macos kernel? I have found some blog posts from other users but those are either outdated or don't work for some other reason. I am guessing kernel debugging is pretty common for developers working on macos itself, so I'm hoping someone in this forum would have some working instructions for that.
Replies
9
Boosts
1
Views
460
Activity
Oct ’25
Kernel panic when using fclonefileat from ES
Hi, I am developing instant snapshot backup solution for macOS using Endpoint Security. We have stumbled upon a Kernel Panic when using "fclonefileat" API. We are catching a kernel panic on customer machines when attempting to clone the file during ES sync callback: panic(cpu 0 caller 0xfffffe002c495508): "apfs_io_lock_exclusive : Recursive exclusive lock attempt" @fs_utils.c:435 I have symbolized the backtrace to know it is related to clone operation with the following backtrace: apfs_io_lock_exclusive apfs_clone_internal apfs_vnop_clonefile I made a minimal repro that boils down to the following operations: apfs_crash_stress - launch thread to do rsrc writes static void *rsrc_write_worker(void *arg) { int id = (int)(long)arg; char buf[8192]; long n = 0; fill_pattern(buf, sizeof(buf), 'W' + id); while (n < ITERATION_LIMIT) { int file_idx = n % NUM_SOURCE_FILES; int fd = open(g_src_rsrc[file_idx], O_WRONLY | O_CREAT, 0644); if (fd >= 0) { off_t off = ((n * 4096) % RSRC_DATA_SIZE); pwrite(fd, buf, sizeof(buf), off); if ((n & 0x7) == 0) fsync(fd); close(fd); } else { setxattr(g_src[file_idx], "com.apple.ResourceFork", buf, sizeof(buf), 0, 0); } n++; } printf("[rsrc_wr_%d] done (%ld ops)\n", id, n); return NULL; } apfs_crash_es - simple ES client that is cloning the file (error checking omitted for brevity) static std::string volfsPath(uint64_t devId, uint64_t vnodeId) { return "/.vol/" + std::to_string(devId) + "/" + std::to_string(vnodeId); } static void cloneAndScheduleDelete(const std::string& sourcePath, dispatch_queue_t queue, uint64_t devId, uint64_t vnodeId) { struct stat st; if (stat(sourcePath.c_str(), &st) != 0 || !S_ISREG(st.st_mode)) return; int srcFd = open(sourcePath.c_str(), O_RDONLY); const char* cloneDir = "/Users/admin/Downloads/_clone"; mkdir(cloneDir, 0755); const char* filename = strrchr(sourcePath.c_str(), '/'); filename = filename ? filename + 1 : sourcePath.c_str(); std::string cloneFilename = std::string(filename) + ".clone." + std::to_string(time(nullptr)) + "." + std::to_string(getpid()); std::string clonePath = std::string(cloneDir) + "/" + cloneFilename; fclonefileat(srcFd, AT_FDCWD, clonePath.c_str(), 0); { dispatch_after(dispatch_time(DISPATCH_TIME_NOW, 1 * NSEC_PER_SEC), queue, ^{ if (unlink(clonePath.c_str()) == 0) { LOG("Deleted clone: %s", clonePath.c_str()); } else { LOG("Failed to delete clone: %s", clonePath.c_str()); } }); } close(srcFd); } static const es_file_t* file(const es_message_t* msg) { switch (msg->event_type) { case ES_EVENT_TYPE_AUTH_OPEN: return msg->event.open.file; case ES_EVENT_TYPE_AUTH_EXEC: return msg->event.exec.target->executable; case ES_EVENT_TYPE_AUTH_RENAME: return msg->event.rename.source; } return nullptr; } int main(void) { es_client_t* cli; auto ret = es_new_client(&cli, ^(es_client_t* client, const es_message_t * msgc) { if (msgc->process->is_es_client) { es_mute_process(client, &msgc->process->audit_token); return respond(client, msgc, true); } dispatch_async(esQueue, ^{ bool shouldClone = false; if (msgc->event_type == ES_EVENT_TYPE_AUTH_OPEN) { auto& ev = msgc->event.open; if (ev.fflag & (FWRITE | O_RDWR | O_WRONLY | O_TRUNC | O_APPEND)) { shouldClone = true; } } else if (msgc->event_type == ES_EVENT_TYPE_AUTH_UNLINK || msgc->event_type == ES_EVENT_TYPE_AUTH_RENAME) { shouldClone = true; } if (shouldClone) { if (auto f = ::file(msgc)) cloneAndScheduleDelete(f->path.data, cloneQueue, f->stat.st_dev, f->stat.st_ino); } respond(client, msgc, true); }); }); LOG("es_new_client -> %d", ret); es_event_type_t events[] = { ES_EVENT_TYPE_AUTH_OPEN, ES_EVENT_TYPE_AUTH_EXEC, ES_EVENT_TYPE_AUTH_RENAME, ES_EVENT_TYPE_AUTH_UNLINK, }; es_subscribe(cli, events, sizeof(events) / sizeof(*events)); } Create 2 terminal sessions and run the following commands: % sudo ./apfs_crash_es % sudo ./apfs_crash_stress ~/Downloads/test/ Machine will very quickly panic due to APFS deadlock. I expect that no userspace syscall should be able to cause kernel panic. It looks like a bug in APFS implementation and requires fix on XNU/kext side. We were able to reproduce this issue on macOS 26.3.1/15.6.1 on Intel/ARM machines. Here is the panic string: panic_string.txt Source code without XCode project: apfs_crash_es.cpp apfs_crash_stress.cpp Full XCode project + full panic is available at https://www.icloud.com/iclouddrive/0f215KkZffPOTLpETPo-LdaXw#apfs%5Fcrash%5Fes
Replies
3
Boosts
0
Views
103
Activity
1w
KDK for current stable version (26.1) missing
The current stable macOS version, 26.1 (build 25B78) is missing a corresponding Kernel Debug Kit (KDK) on the developer downloads page. This means I can't do any kernel-level development tasks currently. For example, if I try to build a new kernel collection with kmutil I get the message Missing Developer Kit: As of macOS 13.0, you will need to install a KDK matching your build 25B78 to rebuild kernel collections. but there is no build 25B78 KDK available to download. The latest 26.1 KDK on the download page is 25B5062e (from a beta I believe) and the final stable KDK for build 25B78 (which kernel development tools require) was never published. Is there any workaround for this to correctly do kernel-level development targeting the latest stable release, or a timeline for when the KDK will release? Thanks!
Replies
0
Boosts
3
Views
332
Activity
Nov ’25
KDK for recent MacOS Sequoia versions (15.6, 15.7RC)
The most recent KDK for MacOS Sequoia that appears in the Downloads is for MacOS 15.5 (24F74), but the current version of MacOS Sequoia is 15.6 (24G84) and 15.7 (24G207) is in RC. Is there an ETA for the KDKs for 15.6 (24G84) and 15.7 (24G207) to be made available to download? Many thanks for any help.
Replies
6
Boosts
4
Views
1.6k
Activity
8h
Incorrect packet handling in SMBClient MacOS 26.
SMBClient-593 introduces a crtitical bug. When reading and writing data at high volume, the SMBClient no longer properly receives and handle responses from the server. In some cases, the client mishandles the response packet and the following errors are seen in the logs: 2025-12-02 21:36:04.774772-0700 localhost kernel[0]: (smbfs) smb2_smb_parse_write_one: Bad struct size: 0 2025-12-02 21:36:04.774776-0700 localhost kernel[0]: (smbfs) smb2_smb_write: smb2_smb_read_write_async failed with an error 72 2025-12-02 21:36:04.774777-0700 localhost kernel[0]: (smbfs) smbfs_do_strategy: file.txt: WRITE failed with an error of 72 In other cases, the client mishandles the response packet and becomes completely unresponsive, unable to send or receive additional messages, and a forced shutdown of the computer is required to recover. This bug is only present on macos 26. We believe the operative change is in the latest commit, SMBClient-593 beginning at line now 3011 in smb_iod.c. The issue seems to be a race, and occurs much more frequently once throughput exceeds around 10Gbps, and again more frequently above 20Gbps.
Replies
6
Boosts
7
Views
388
Activity
Jan ’26