Develop kernel-resident device drivers and kernel extensions using Kernel.

Posts under Kernel tag

47 Posts
Sort by:

Post

Replies

Boosts

Views

Activity

KEXT Code Signing Problems
On modern systems all KEXTs must be code signed with a Developer ID. Additionally, the Developer ID must be specifically enabled for KEXT development. You can learn more about that process on the Developer ID page. If your KEXT is having code signing problems, check that it’s signed with a KEXT-enabled Developer ID. Do this by looking at the certificate used to sign the KEXT. First, extract the certificates from the signed KEXT: % codesign -d --extract-certificates MyKEXT.kext Executable=/Users/quinn/Desktop/MyKEXT/build/Debug/MyKEXT.kext/Contents/MacOS/MyKEXT This creates a bunch of certificates of the form codesignNNN, where NNN is a number in the range from 0 (the leaf) to N (the root). For example: % ls -lh codesign* -rw-r--r--+ 1 quinn staff 1.4K 20 Jul 10:23 codesign0 -rw-r--r--+ 1 quinn staff 1.0K 20 Jul 10:23 codesign1 -rw-r--r--+ 1 quinn staff 1.2K 20 Jul 10:23 codesign2 Next, rename each of those certificates to include the .cer extension: % for i in codesign*; do mv $i $i.cer; done Finally, look at the leaf certificate (codesign0.cer) to see if it has an extension with the OID 1.2.840.113635.100.6.1.18. The easiest way to view the certificate is to use Quick Look in Finder. Note If you’re curious where these Apple-specific OIDs comes from, check out the documents on the Apple PKI page. In this specific case, look at section 4.11.3 Application and Kernel Extension Code Signing Certificates of the Developer ID CPS. If the certificate does have this extension, there’s some other problems with your KEXT’s code signing. In that case, feel free to create a new thread here on DevForums with your details. If the certificate does not have this extension, there are two possible causes: Xcode might be using an out-of-date signing certificate. Re-create your Developer ID signing certificate using the developer site and see if the extension shows up there. If so, you’ll have to investigate why Xcode is not using the most up-to-date signing certificate. If a freshly-created Developer ID signing certificate does not have this extension, you need to apply to get your Developer ID enabled for KEXT development per the instructions on the Developer ID page. Share and Enjoy — Quinn “The Eskimo!” @ Developer Technical Support @ Apple let myEmail = "eskimo" + "1" + "@" + "apple.com" Change history: 20 Jul 2016 — First published. 28 Mar 2019 — Added a link to the Apple PKI site. Other, minor changes. 15 Mar 2022 — Fixed the formatting. Updated the section number in the Developer ID CPS. Made other minor editorial changes.
0
0
6.2k
Mar ’22
The curious case of the "IOKit Driver" Xcode target.
Firstly, I realise that Kexts are deprecated. And for my needs, user-space IOKit from an application might be more than I’ll ever need, but I can’t help ensuring I’ve got all my ducks in a row while I’m designing an app. I noticed in the above Kext deprecation notice that it’s more about specific APIs that are deprecated rather than the kext mechanism itself (no mention of IOKit kernel APIs there). Along with the fact that a reboot would be required and various other policy changes. While reading up on System Extensions and the various newer tools, DriverKit, Endpoint Security etc, I’ve noticed there’s no mention of this “IOKit Driver” target/deliverable in the Xcode template chooser. It looks like, from the requirements, that DriverKit is aimed at hardware manufacturers, with a need to request the DriverKit entitlement for development. With respect to Apple’s internal Kext deprecation roadmap, how safe is it to build a product that relies on using an IOKit Driver and are there any requirements similar to DriverKit? Would a developer need to request an entitlement? If I developed an app using user-space IOKit and, for illustration purposes let’s say I also included an IOKit Driver in the app bundle. If I used the IOKit API, including header file constants only (no digging around in the IORegistryExplorer for non-public keys), as Apple intends them to be used, would this fall foul of any App Store rules that anyone is aware of? Put another way, would making use of an IOKit Driver disqualify my app from being distributed via the App Store, similar to an Endpoint Security extension? As an aside, there are a lot of API in the ES Framework that could be used to build apps that have nothing to do with Endpoint Security. File system related apps for example. It’s a shame there isn’t an enhanced middle ground between FSEvents and Endpoint Security framework.
3
0
165
1w
Kernel panic in mac_label_verify()
Accessing a directory on my custom distributed filesystem results in a kernel panic. According to the backtrace, the last function called before the panic is triggered is mac_label_verify(). See the backtrace file attached. mac_label_verify-panic.txt The panic manifests itself given the following conditions: Machine-a: make a directory in Finder. Machine-b: remove the directory created on machine-a in Finder. Machine-a: access the directory removed on machine-b in Finder. Kernel panic ensues. The panic is reproducible on both Apple Silicon and x86-64. The backtrace is for x86-64 as I wasn't able to symbolicate it on Apple Silicon. Not sure how to tackle this one. Any pointers would be much appreciated.
10
0
422
1w
Subsequent expansion of same archive fails due to name collision
Extracting an archive into the same directory on my custom filesystem more than once fails with the following message: Unable to finish expanding 'misc.tar.xz' into 'extractme'. Could not move 'misc' into destination directory. I.e. initial extraction succeeds with archive contents extracted into extractme/misc. Subsequent extraction fails to rename extractme.sb-db71cd27-lFjN1f/misc to extractme/misc 2. This behaviour is observed on macOS Monterey and Ventura. It does work as expected on macOS Sonoma though. Dtrace(1)-ing the archive being extracted over smbfs results in the following sequence of calls being made: 2 -> smbfs_vnop_lookup AUHelperService-2163 -> extractme/misc 2 nameiop:0 2 <- smbfs_vnop_lookup AUHelperService-2163 -> extractme/misc 2 -> 2 ;ENOENT 2 -> smbfs_vnop_lookup AUHelperService-2163 -> extractme.sb-db71cd27-lFjN1f/misc nameiop:0x2 ;DELETE 2 <- smbfs_vnop_lookup AUHelperService-2163 -> extractme.sb-db71cd27-lFjN1f/misc -> 0 2 -> smbfs_vnop_lookup AUHelperService-2163 -> extractme/misc 2 nameiop:0x3 ;RENAME 2 <- smbfs_vnop_lookup AUHelperService-2163 -> extractme/misc 2 -> EJUSTRETURN 1 -> smbfs_vnop_rename AUHelperService-2163 -> extractme.sb-db71cd27-lFjN1f/misc -> extractme/nil 2 <- smbfs_vnop_rename AUHelperService-2163 -> extractme.sb-db71cd27-lFjN1f/misc -> extractme/nil -> 0 2 -> smbfs_vnop_lookup AUHelperService-2163 -> TheRooT/extractme/misc 2 nameiop:0 3 <- smbfs_vnop_lookup AUHelperService-2163 -> TheRooT/extractme/misc 2 -> 0 ;Successful lookup What I don't understand is what causes vnop_lookup to be called for misc to be removed from the temporary directory and renamed into 'misc 2' and placed in the destination directory, 'extractme' via vnop_rename? I had a look at smbfs_vnop_lookup and rename and didn't see anything that would cause 'misc 2' to come into being. Based on the output of the dtrace(1) script running on my custom filesystem, there are no vnop_lookup and vnop_rename calls being made to remove the 'misc' directory from the temporary directory and to rename it to 'misc 2' and place it in the destination directory at extractme. Archive extraction proceeds no further after extracting the archive contents into the temporary directory. What am I missing?
1
0
264
Oct ’24
FSKit questions and clarifications
I work on EdenFS, an open-source Virtual Filesystem that runs on macOS, Linux, and Windows. My team is very interested in using FSKit as the basis for EdenFS on macOS, but have found the documentation to be lacking and contains some mixed messaging on the future of FSKit. Below are a few questions that don’t seem to be fully covered by the current documentation: Does FSKit support process attribution? Each FUSE request provides a requester Process ID (and other information) through the fuse_in_header structure. Does FSKit pass similar information along for each request? Does the reclaimItem API function similarly to FUSE’s forget operation? If not, what are the differences? See #1 below for why forget/reclaimItem matters to us. Is Apple committed to releasing and supporting FSKit? Is there any timeline for release that we can plan around? Does FSKit have known performance/scalability limitations? We provide alternative methods that clients can use to make bulk requests to EdenFS, but some clients will necessarily be unable to use those and stress the default filesystem APIs. Throughput (on the order of tens of thousands of filesystem requests per minute) and request size are the main concerns, followed closely by directory size restrictions. Why we’re interested in FSKit As mentioned above, my team supports EdenFS on 3 platforms. On Linux, we utilize FUSE; on Windows, we utilize ProjectedFS; and on macOS, we’ve utilized a few different solutions in the past. We first utilized the macFUSE kext, which was great while it lasted. Due to (understandable) changes in supporting kernel extensions, we were forced to move to NFS version 3. NFS has been lackluster in comparison (and our initial investigations show that NFS version 4(.2) would be similar). We have had numerous scalability and reliability issues, some listed below: NFS does not provide a forget API similar to FUSE. EdenFS is forced to remember all file handles that have been loaded because the kernel never informs us when all references to that file handle have been dropped. We can hackily infer that a file handle should never be referenced again in some cases, but a large number of file handles end up being remembered forever. Many of our algorithms scale with the number of file handles that Eden has to consider, and therefore performance issues are inevitable after some time. NFS does not provide information about clients (requesters). We cannot tell which processes are sending EdenFS requests. This attribution is important due to issue #1. We are forced to work with tool owners to modify their applications to be VFS-friendly. If we can’t track down which tools are behaving poorly, they will continue to load excess file handles and cause performance issues. NFS “Server connections interrupted:” dialog during heavy load. Under heavy load, either EdenFS or system-wide, our users experience this dialog pop-up and are confused as to how they should respond (Ignore or Disconnect All). They become blocked in their work, and will be further blocked if they click “Disconnect All” as that unmounts their EdenFS mount. This forces them to restart EdenFS or reboot their laptop to remediate the issue. The above issues make us extremely motivated to use FSKit and partner with Apple to flesh out the final version of the FSKit API. Our use case likely mirrors what other user-space filesystems will be looking for in the FSKit API (albeit at a larger scale than most), and we’re willing to collaborate to work out any issues in the current FSKit offerings.
1
0
458
Oct ’24
Reasonable time for fix to easy-to-reproduce kernel panic?
Since I haven't heard so much as a peep from Apple on this, I thought I'd take a poll here on how long I could expect an easily reproducible (albeit possibly obscure) kernel panic to be fixed. I was under the impression that kernel panics were a big deal but it's been almost 2 months since I updated from macOS 14 to macOS 15.0 dev beta 7 / public beta 5 when I originally came across and reported a panic triggered while playing StarCraft II. I've been able to consistently trigger panics playing certain (maybe all) Co-op maps in SC2 and since my first report Aug 22, I've filed 8 additional bug reports, each automatically generated after hitting yet another panic. (I'm not sure exactly who is able to view these but for what it's worth, these are the reports I've filed so far: FB14886510, FB14905773, FB14960435, FB15304609, FB15391195, FB15467943, FB15468127, FB15491485, FB15491684.) A few other people have reported the issue to SC2's developer, Blizzard, and apparently Blizzard has acknowledged they're aware of the problem so it's safe to rule out the possibility of a hardware defect or other issue specific only to my computer. The logs point the blame at the AppleDCP driver, although I suppose the problem could technically be in the DCP firmware instead. Regardless, Apple's code is clearly at fault here. I'll admit the importance of a video game isn't exactly like keeping the power on at a hospital but I don't know why it would be deemed particularly unimportant either. At 53 days in, am I wrong to expect this to have been fixed by now or is Apple really being that slow?
1
1
276
Oct ’24
Understanding `EINTR`
I’ve talked about EINTR a bunch of times here on DevForums. Today I found myself talking about it again. On reading my other explanations, I didn’t think any of them were good enough to link to, so I decided to write it up properly. If you have questions or comments, please put them in a new thread here on DevForums. Use the App & System Services > Core OS topic area so that I see it. Share and Enjoy — Quinn “The Eskimo!” @ Developer Technical Support @ Apple let myEmail = "eskimo" + "1" + "@" + "apple.com" Understanding EINTR Many BSD-layer routines can fail with EINTR. To see this in action, consider the following program: import Darwin func main() { print("will read, pid: \(getpid())") var buf = [UInt8](repeating: 0, count: 1024) let bytesRead = read(STDIN_FILENO, &buf, buf.count) if bytesRead < 0 { let err = errno print("did not read, err: \(err)") } else { print("did read, count: \(bytesRead)") } } main() It reads some bytes from stdin and prints the result. Build this and run it in one Terminal window: % ./EINTRTest will read, pid: 13494 Then, in other window, stop and start the process by sending it the SIGSTOP and SIGCONT signals: % kill -STOP 13494 % kill -CONT 13494 In the original window you’ll see something like this: % ./EINTRTest will read, pid: 13494 zsh: suspended (signal) ./EINTRTest % did not read, err: 4 [1] + done ./EINTRTest When you send the SIGSTOP the process stops and the shell tells you that. But looks what happens when you continue the process. The read(…) call fails with error 4, that is, EINTR. The read man page explains this as: [EINTR] A read from a slow device was interrupted before any data arrived by the delivery of a signal. That’s true but unhelpful. You really want to know why this error happens and what you can do about it. There are other man pages that cover this topic in more detail — and you’ll find lots of info about it on the wider Internet — but the goal of this post is to bring that all together into one place. Signal and Interrupts In the beginning, Unix didn’t have threads. It implemented asynchronous event handling using signals. For more about signals, see the signal man page. The mechanism used to actually deliver a signal is highly dependent on the specific Unix implementation, but the general idea is that: The system decides on a specific process (or, nowadays, a thread) to run the signal handler. If that’s blocked inside the kernel waiting for a system call to complete [1], the system unblocks the system call by failing it with an EINTR error. Thus, every system call that can block [2] might fail with an EINTR. You see this listed as a potential error in the man pages for read, write, usleep, waitpid, and many others. [1] There’s some subtlety around the definition of system call. On traditional Unix systems, executables would make system calls directly. On Apple platforms that’s not supported. Rather, an executable calls a routine in the System framework which then makes the system call. In this context the term system call is a shortcut for a System framework routine that maps to a traditional Unix system call. [2] There’s also some subtlety around the definition of block. Pretty much every system call can block for some reason or another. In this context, however, a block means to enter an interruptible wait state, typically while waiting for I/O. This is what the above man page quote is getting at when it says slow device. Solutions This is an obvious pitfall and it would be nice if we could just get rid of it. However, that’s not possible due to compatibility concerns. And while there are a variety of mechanism to automatically retry a system call after a signal interrupt, none of them are universally applicable. If you’re working on a large scale program, like an app for Apple’s platforms, you only good option is to add code to retry any system call that can fail with EINTR. For example, to fix the program at the top of this post you might wrap the read(…) system call like so: func readQ(_ d: Int32, _ buf: UnsafeMutableRawPointer!, _ nbyte: Int) -> Int { repeat { let bytesRead = read(d, buf, nbyte) if bytesRead < 0 && errno == EINTR { continue } return bytesRead } while true } Note In this specific case you’d be better off using the read(into:retryOnInterrupt:) method from System framework. It retries by default (if that’s not appropriate, pass false to the retryOnInterrupt parameter). You can even implement the retry in a generic way. See the errnoQ(…) snippet in QSocket: System Additions. Library Code If you’re writing library code, it’s important that you handle EINTR so that your clients don’t have to. In some cases it might make sense to export a control for this, like the retryOnInterrupt parameter shown in the previous section, but it should default to retrying. If you’re using library code, you can reasonably expect it to handle EINTR for you. If it doesn’t, raise that issue with the library author. And you get this error back from an Apple framework, like Foundation or Network framework, please file a bug against the framework.
0
0
154
Oct ’24
Kernel Development Kit Missing
Hello, It seems like the Kernel Debug Kit for macOS 15.0.1 (24A348) is missing from the list of downloads at developer.apple.com. It would be great if you could add them to the list of available downloads. When trying to rebuild the kernel it fails with the following error message: Error Domain=KMErrorDomain Code=34 "Missing Developer Kit: As of macOS 13.0, you will need to install a KDK matching your build 24A348 to rebuild kernel collections." UserInfo={NSLocalizedDescription=Missing Developer Kit: As of macOS 13.0, you will need to install a KDK matching your build 24A348 to rebuild kernel collections.} But my macOS version is 15.0.1. Is there a workaround for this?
1
0
413
Oct ’24
Porting VFS kext to FSKit
So if one were to start the attempt of porting an existing kext VFS filesystem, to use the new FSKit (Since presumably kexts could go away), how would that look now? Is it ready? Are there any samples out there that already works (Filesystem using FSKit) ? How is the documentation? ChatGPT did not seem to know much at all. What would be Apple's reception to that? How flexible is FSKit ? Is it locked to the idea of a mount is connected to a physical device (or partition)? Or is it more virtual, in that I will have a pool of disks, and present 1, or many, mount points?
2
0
526
Oct ’24
What are the locking rules for VFS and VNOP operations?
I'm observing all sorts of race conditions occurring in various VNOPs my custom filesystem implements. I'm inclined to attribute this to my implementation not following the locking rules expected by the system of a 3rd party filesystem as well as it should. I've looked at how locking is done in Apple's own implementation of Samba and NFS clients. The Samba client uses read/write locks to protect its node from data races. While the NFS client uses mutex locks for the same purpose. I realised that I don't have a clear model in my head of how locking should be done properly. Thus my question, what are the locking rules for VFS and VNOP operations? Thanks.
4
0
410
Oct ’24
How to enable PCIDriverKit Bus Leader? (and Memory Space enable?)
I am porting a working kernel extension IOKit driver to a DriverKit system extension. Our device is a PCI device accessed through Thunderbolt. The change from IOPCIFamily to PCIDriverKit has some differences in approach, though. Namely, in IOKit / IOPCIFamily, this was the correct way to become Bus Leader: mPCIDevice->setBusLeadEnable(true); // setBusMasterEnable(..) deprecated in OS 12.4 but now, PCIDriverKit's IOPCIDevice does not have that function. Instead I am doing the following: // Set Bus Leader and Memory Space enable uint16_t commandRegister = 0; ivars->mPCIDevice->ConfigurationRead16(kIOPCIConfigurationOffsetCommand, &commandRegister); commandRegister |= (kIOPCICommandBusLead | kIOPCICommandMemorySpace); ivars->mPCIDevice->ConfigurationWrite16(kIOPCIConfigurationOffsetCommand, commandRegister); But I am not convinced this is working (I am still experiencing unexpected errors when attempting to DMA from our device, using the same steps that work for the kernel extension). The only hint I can find in the online documentation is here, which reads: Note The endpoint driver is responsible for enabling the Memory Space Enable and Bus Master Enable settings each time it configures the PCI device. When a crash occurs, or when the system unloads your driver, the system disables these features. ...but that does not state directly how to enable bus leader status. What is the "PCIDriverKit approved" way to become bus leader? Is there a way to verify/confirm that a device is bus leader? (This would be helpful to prove that bus leadership is not the issue for DMA errors, as well as to confirm that bus leadership was granted). Thanks in advance!
1
0
357
Oct ’24
Apps not quitting immediately
Hello all, Recently I observed a strange behaviour on macOS. Some apps with UI, after you quit them (right click on the Dock, select Quit or select Quit from the menubar), the apps are not actually quitting immediately, but in a few seconds (including in Activity Monitor the apps are staying alive). Also, if you open the apps again fast, the same PID is kept. Not all apps do this, some of them, for example WhatsApp. I'm not referring to closing all windows, but explicitly quitting. This was not the case in the past. Is there any reason for this? Is some kind of optimisation I'm not aware of? The actual issue is that in a Swift developed app events like NSWorkspace.didLaunchApplicationNotification or NSWorkspace.didTerminateApplicationNotification are not triggered. Is there any way to tell if an app was closed, even if macOS still keeps it around for a few more seconds? Thank you.
1
0
392
Sep ’24
vnop_lookup returning ENOENT aborts rm(1)
When recursively removing a directory with a large number of entries that resides on my custom filesystem, rm(1) aborts with ENOENT. % rm -Rv /Volumes/myfs/linux-kernel/linux-6.10.6 [...] /Volumes/myfs/linux-kernel/linux-6.10.6/include/drm/bridge/aux-bridge.h /Volumes/myfs/linux-kernel/linux-6.10.6/include/drm/bridge/dw_hdmi.h rm: fts_read: No such file or directory I'm observing the following sequence of calls being made. 2024-09-17 17:58:25 vnops.c:281 myfs_vnop_lookup: rm-936 -> dw_hdmi.h ;initial lookup call 2024-09-17 17:58:25 vnops.c:315 myfs_vnop_lookup: -> cache_lookup(dw_hdmi.h) 2024-09-17 17:58:25 vnops.c:317 myfs_vnop_lookup: <- cache_lookup(dw_hdmi.h) -> 0 ;cache miss 2024-09-17 17:58:25 rpc.c:431 myfsLookup: rm-936 -> dw_hdmi.h ;do remote lookup 2024-09-17 17:58:25 rpc.c:500 myfsLookup: -> myfs_lookup_rpc(dw_hdmi.h) 2024-09-17 17:58:25 rpc.c:502 myfsLookup: <- myfs_lookup_rpc(dw_hdmi.h) -> 0 ;file found and added to vfs cache 2024-09-17 17:58:25 vnops.c:281 myfs_vnop_lookup: rm-936 -> dw_hdmi.h ;subsequent lookup call 2024-09-17 17:58:25 vnops.c:315 myfs_vnop_lookup: -> cache_lookup(dw_hdmi.h) 2024-09-17 17:58:25 vnops.c:317 myfs_vnop_lookup: <- cache_lookup(dw_hdmi.h) -> -1 ;cache hit 2024-09-17 17:58:25 vnops.c:1478 myfs_vnop_remove: -> myfs_unlink(dw_hdmi.h) ;unlink sequence 2024-09-17 17:58:25 rpc.c:1992 myfs_unlink: -> myfs_unlink_rpc(dw_hdmi.h) 2024-09-17 17:58:25 rpc.c:1994 myfs_unlink: <- myfs_unlink_rpc(dw_hdmi.h) -> 0 ;remote unlink succeeded 2024-09-17 17:58:25 vnops.c:1480 myfs_vnop_remove: <- myfs_unlink(dw_hdmi.h) -> 0 2024-09-17 17:58:25 vnops.c:1487 myfs_vnop_remove: -> cache_purge(dw_hdmi.h) 2024-09-17 17:58:25 vnops.c:1489 myfs_vnop_remove: <- cache_purge(dw_hdmi.h) 2024-09-17 17:58:25 vnops.c:1499 myfs_vnop_remove: -> vnode_recycle(dw_hdmi.h) 2024-09-17 17:58:25 vnops.c:1501 myfs_vnop_remove: <- vnode_recycle(dw_hdmi.h) 2024-09-17 17:58:27 vnops.c:281 myfs_vnop_lookup: fseventsd-101 -> dw_hdmi.h ;another lookup; why? 2024-09-17 17:58:27 vnops.c:315 myfs_vnop_lookup: -> cache_lookup(dw_hdmi.h) 2024-09-17 17:58:27 vnops.c:317 myfs_vnop_lookup: <- cache_lookup(dw_hdmi.h) -> 0 2024-09-17 17:58:27 rpc.c:431 myfsLookup: fseventsd-101 -> dw_hdmi.h 2024-09-17 17:58:27 rpc.c:500 myfsLookup: -> myfs_lookup_rpc(dw_hdmi.h) 2024-09-17 17:58:27 rpc.c:502 myfsLookup: <- myfs_lookup_rpc(dw_hdmi.h) -> ENOENT(2) 2024-09-17 17:58:27 vnops.c:371 myfs_vnop_lookup: SET(NNEGNCENTRIES): dw_hdmi.h 2024-09-17 17:58:27 vnops.c:373 myfs_vnop_lookup: ENOENT(2) <- shouldAddToNegativeNameCache(dw_hdmi.h) I checked the value of vnode's v_iocount when vnop_remove and vnop_reclaim are being called. Each vnop_remove and followed by vnop_reclaim with v_iocount set to 1 in both calls, as expected. What I don't understand is why after removing the file is there another lookup call being made, which returns ENOENT to rm(1), which causes it to abort. Any pointers on what could be amiss there would be much appreciated.
3
0
423
Sep ’24
macOS mmap / dlopen problem
We are having a problem in our C++ app with dlopen returning memory addresses which were previous reserved using mmap() with the MAP_ANON | MAP_PRIVATE | MAP_JIT flags. The mmap is memory is 4Kb page-aligned and returns normally, however sometime later dlopen() is returning an address within the mmap range when no munmap() has been performed. This looks like a bug in the macOS kernal memory manager. Back in July, I opened support ticket FB14442215 where one of our Engineers was able to create a similar and reproducible problem using Preview to load a large bitmap. This ticket has not yet been acted upon, still showing a status of "Open" . Any help or suggestions would be most welcome. Norm Green norm(dot)green(at)gemtalksystems(dot)com
1
0
391
Sep ’24
iOS writeback behavior for mmap(MAP_SHARED) dirty pages
I'm evaluating a technique to implement a sort of an event logger that uses MAP_SHARED mapping of a file in the app sandbox as an event ring buffer. The reason to use mapping instead of traditionally allocated memory is to achieve log persistence across app termination of any kind (crashes, sigkill, etc.) and keep logs fast by avoiding syscalls. By definition MAP_SHARED area must be coherent with any other RW operations in the system on that file slice which practically means that kernel has to use page cache that is used to serve RW requests. This in turn means that after app process terminates by any reason - content of that memory will not be discarded but rather will be available on next app start via open()/read() or mmap() for that file. msync() can be used to tell kernel to initiate "writeback" - to flush modified mapping pages to the corresponding locations in the non-volatile storage but I haven't found any description of what is the writeback policy if user opts to NOT use msync() at all. And similarly no means to control this. In my case it appears to be important as if kernel does some automatic writebacks on its own - intensive logger traffic would put unneeded IO load to a disk device. After some experiments I was able to figure out that e.g. Linux is able to issue periodical writebacks w/o explicit msync(). For OS X according to "fs_usage -f diskio" no writeback occurs until app terminates (better to say until last reference in the system to that MAP_SHARED area is dropped). I'm now interested to learn about iOS behavior. Is it the same as OS X (no automatic writebacks)? Alternatively I'd happy to hear if there are other techniques available for iOS app to "pin" some memory so its content could survive app termination. Shared memory with an associated "retainer" process would work on other platforms but here we are limited to a single process. Thanks.
5
0
564
Sep ’24
How to Symbolicate an Apple Silicon Panic?
Investigating a kernel panic, I discovered that Apple Silicon Panic traces are not working with how I know to symbolicate the panic information. I have not found proper documentation that corrects this situation. Attached file is an indentity-removed panic, received from causing an intentional panic (dereferencing nullptr), so that I know what functions to expect in the call stack. This is cut-and-pasted from the "Report To Apple" dialog that appears after the reboot: panic_1_4_21_b.txt To start, I download and install the matching KDK (in this case KDK_14.6.1_23G93.kdk), identified from this line: OS version: 23G93 Kernel version: Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:04 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T8122 Then start lldb from Terminal, using this command: bash_prompt % lldb -arch arm64e /Library/Developer/KDKs/KDK_14.6.1_23G93.kdk/System/Library/Kernels/kernel.release.t8122 Next I load the remaining scripts per the instructions from lldb: (lldb) settings set target.load-script-from-symbol-file true I need to know what address to load my kext symbols to, which I read from this line of the panic log, after the @ symbol: com.company.product(1.4.21d119)[92BABD94-80A4-3F6D-857A-3240E4DA8009]@0xfffffe001203bfd0->0xfffffe00120533ab I am using a debug build of my kext, so the DWARF symbols are part of the binary. I use this line to load the symbols into the lldb session: (lldb) addkext -F /Library/Extensions/KextName.kext/Contents/MacOS/KextName 0xfffffe001203bfd0 And now I should be able to use lldb image lookup to identify pointers on the stack that land within my kext. For example, the current PC at the moment of the crash lands within the kext (expected, because it was intentional): (lldb) image lookup -a 0xfffffe001203fe10 Which gives the following incorrect result: Address: KextName[0x0000000000003e40] (KextName.__TEXT.__cstring + 14456) Summary: "ffer has %d retains\n" That's not even a program instruction - that's within a cstring. No, that cstring isn't involved in anything pertaining to the intentional panic I am expecting to see. Can someone please explain what I'm doing wrong and provide instructions that will give symbol information from a panic trace on an Apple Silicon Mac? Disclaimers: Yes I know IOPCIFamily is deprecated, I am in process of transitioning to DriverKit Dext from IOKit kext. Until then I must maintain the kext. Terminal command "atos" provides similar incorrect results, and seems to not work with debug-built-binaries (only dSYM files) Yes this is an intentional panic so that I can verify the symbolicate process before I move on to investigating an unexpected panic I have set nvram boot-args to include keepsyms=1 I have tried (lldb) command script import lldb.macosx but get a result of error: no images in crash log (after the nvram settings)
4
0
883
Sep ’24
vnop_advlock not being called for my filesystem
When running AJA System Test for my custom filesystem, the write and read tests get stuck intermittently. I didn't observe any error codes being returned by my vnop_read/write or sock_receive/send functions. Dtrace(1)'ing the vnops being called by AJA System Test for smbfs revealed that amongst other things vnop_advlock is being called: 0 -> smbfs_vnop_advlock ajasystemtest -> smbfs_vnop_advlock(ajatest.dat, op: 0x2, fl->l_start: 0, fl->l_len: 0, fl->l_pid: 0, fl->l_type: 2, fl->l_whence: 0, flags: 0x40, timeout: 0) 0 <- smbfs_vnop_advlock ajasystemtest -> smbfs_vnop_advlock(ajatest.dat) -> -1934627947504 op: 0x2 #define F_SETFD 2 /* set file descriptor flags */ fl->l_len: 0 ;len = 0 means until end of file fl->l_type: 2 ;#define F_UNLCK 2 /* unlock */ fl->l_whence: 0 ;#define SEEK_SET 0 /* set file offset to offset */ flags: 0x40 ;#define F_POSIX 0x040 /* Use POSIX semantics for lock */ As my filesystem didn't implement vnop_advlock, I thought I'd explore that avenue. My vnop_advlock simply returns KERN_SUCCESS. Both f_capabilities.valid and f_capabilities.capabilities of struct vfs_attr have VOL_CAP_INT_ADVLOCK and VOL_CAP_INT_FLOCK set. Yet, vnop_advlock doesn't get called for my filesystem when running AJA System Test. Any tips on what could be amiss there would be much appreciated.
4
0
542
Aug ’24