On a MacBook Pro, 16GB of RAM, 500 GB SSD, OS Sequoia 15.7.1, M3 chip, I am running some python3 code in a conda environment that requires lots of RAM and sure enough, once physical memory is almost exhausted, swapfiles of about 1GB each start being created, which I can see in /System/Volumes/VM. This folder has about 470 GB of available space at the start of the process (I can see this through get info) however, once about 40 or so swapfiles are created, for a total of about 40GB of virtual memory occupied (and thus still plenty of available space in VM), zsh kills the python process responsible for the RAM usage (notably, it does not kill another python process using only about 100 MB of RAM). The message received is "zsh: killed" in the tmux pane where the logging of the process is printed.
All the documentation I was able to consult says that macOS is designed to use up to all available storage on the startup disk (which is the one I am using since I have only one disk and the available space aforementioned reflects this) for swapping, when physical RAM is not enough. Then why is the process killed long before the swapping area is exhausted? In contrast, the same process on a Linux machine (basic python venv here) just keeps swapping, and never gets killed until swap area is exhausted.
One last note, I do not have administrator rights on this device, so I could not run dmesg to retrieve more precise information, I can only check with df -h how the swap area increases little by little. My employer's IT team confirmed that they do not mess with memory usage on managed profiles, so macOS is just doing its thing.
Thanks for any insight you can share on this issue, is it a known bug (perhaps with conda/python environments) or is it expected behaviour? Is there a way to keep the process from being killed?
I see, thank you for pointing this out. So it is not a percentage, but an actual number of pages. Could you expand a little on how to interpret <overcommit pages> in your previous answer?
So, stepping back for a moment, the basic issue here is deciding "when should the kernel stop just blindly backing memory". It COULD (and, historically, did) just limit that to total available storage; however, in practice, that just means the machine grinds itself into a useless state without actually "failing". So, what macOS does is artificially limit the VM system to ensure that the machine remains always in a functional state.
The next question then becomes "how to implement that limit". There are lots of places you COULD limit the VM system, but the problem is that the VM system is complicated enough that many obvious metrics don't really work. For example, purgable memory[1] means that simply dirty pages doesn't necessarily "work“ - a process could have a very large number of dirty pages, but if they're all purgable, they shouldn't really "count", since they'll never be written to disk. Similarly, memory compression means that there can be a very large difference between the size of memory and the size that's actually written to disk.
[1] Purgable is a mach memory configuration which tells the VM system that the pages should be discarded instead of swapped, clients then locking/unlocking the pages they actively work with.
All of those issues mean that the check ends up being entangled with the memory compression system. More specifically, I think the actual limit here is "how much memory the compression system will swap to disk". You could set it to "none", at which point you basically end up with how iOS works. Memory compression still occurs, but we terminate processes instead of swapping data out.
In any case, all of this basically means that setting that to a bigger number means we'll swap more data to disk.
How does one find the available range?
I don't think there is any specific range as such. The ultimate upper limit would be available storage, but that's already inherently dynamic (because the rest of the system can be eating storage), so the kernel already has to deal with that anyway.
What does it mean to overcommit pages?
As general terminology, overcommit just refers to the fact that the VM system is handing out more memory than it actually "has". In this particular case, I think it's just "borrowing" the word to mean how much memory will the compression system use beyond its normal range of physical memory... which translates to how much memory it will swap to disk.
Ideally, I would try to get as close as possible to a memory overcommitment scenario. Would this correspond to an "infinite" number of overcommitted pages?
To be clear, you're already overcommitting— that's how a machine with 16 GB of RAM is running a process that's using 40 GB of memory. You want to overcommit more.
Also, to be clear, I think you also need to think through what "infinite" here actually means. In real-world usage, infinite overcommit just means you're enabling swap death. There are limited cases where increasing memory usage won't cause that, but all of those cases are inherently somewhat broken. Case in point, my test tool above (on its own) won't really cause swap death— it consumes memory and completely ignores it, which allows the VM system to stream it to disk... and then ignore it too. The problem is real apps don't really work that way— the point of allocating memory is to "use it".
Is there a way to enter "infinite" in this parameter?
I don't think so. As a practical matter, this boot arg mostly exists to let the kernel team experiment with different scenarios, so "infinite" isn't really all that useful or necessary. If you really wanted to test that scenario, you'd just pass in a number larger than available storage.
Or there is a maximum number, which can change from machine to machine?
I don't think so. I believe this is just one constraint among many, so if you pass in a "large enough" number then those other constraints (like available storage) will determine what actually happens. You can easily see the reverse of this today— if you fill up your drive enough, you'll quickly see that the system won't let you use 40 GB of memory.
If I am interpreting the directionality this parameter has to move towards to, in order to get the desired behaviour, I need to retrieve this number, not just compute it roughly via the known 4KB size of a page and the capacity of the disk.
FYI, the page size today is actually 16KB, not 4KB.
I don't see why you'd need to be that specific. Honestly, I'd probably just pick a number and then use my test tool to see what happens. The main risks here are:
-
A really small number rendering the machine unusable, due to a lack of "usable" memory. I don't think this is actually possible, but it’s easy to avoid by just picking a really big number.
-
A big number creating increased risk of swap death due to excessive overcommit.
Both of those risks are "real", however, they're also relatively easy to control for. Just minimize what you actually "do" until you figure out how the boot arg has altered the system’s behavior.
Say the maximum number is 1200 pages. From the documentation,
First off, just to be clear, this is well outside the "documented" system. It isn't really secret (after all, the code is open source), but I don't want to give you the impression that this is something I'm really recommending. Notably, this isn't something I would ever change on another person’s machine or in some kind of broad deployment. It WILL create problems that would otherwise not occur.
This is also why my answers below are somewhat vague— if you're not comfortable testing and experimenting with this yourself, then I'm not sure this is something you should be messing around with.
I am supposed to boot in recovery mode, disable SIP, and then run sudo nvram boot-args="vm_compressor_limit=1200" and then restart to make the changes effective.
Haven't tried it, but sure, that sounds right.
Do I need to keep SIP disabled, or can I re-enable it after the changes make effect?
I don't know, that's something you'd need to test yourself.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware