Zsh kills Python process with plenty of available VM

On a MacBook Pro, 16GB of RAM, 500 GB SSD, OS Sequoia 15.7.1, M3 chip, I am running some python3 code in a conda environment that requires lots of RAM and sure enough, once physical memory is almost exhausted, swapfiles of about 1GB each start being created, which I can see in /System/Volumes/VM. This folder has about 470 GB of available space at the start of the process (I can see this through get info) however, once about 40 or so swapfiles are created, for a total of about 40GB of virtual memory occupied (and thus still plenty of available space in VM), zsh kills the python process responsible for the RAM usage (notably, it does not kill another python process using only about 100 MB of RAM). The message received is "zsh: killed" in the tmux pane where the logging of the process is printed.

All the documentation I was able to consult says that macOS is designed to use up to all available storage on the startup disk (which is the one I am using since I have only one disk and the available space aforementioned reflects this) for swapping, when physical RAM is not enough. Then why is the process killed long before the swapping area is exhausted? In contrast, the same process on a Linux machine (basic python venv here) just keeps swapping, and never gets killed until swap area is exhausted.

One last note, I do not have administrator rights on this device, so I could not run dmesg to retrieve more precise information, I can only check with df -h how the swap area increases little by little. My employer's IT team confirmed that they do not mess with memory usage on managed profiles, so macOS is just doing its thing.

Thanks for any insight you can share on this issue, is it a known bug (perhaps with conda/python environments) or is it expected behaviour? Is there a way to keep the process from being killed?

All the documentation I was able to consult says that macOS is designed to use up to all available storage on the startup disk (which is the one I am using since I have only one disk, and the available space aforementioned reflects this) for swapping when physical RAM is not enough.

Sure, that's what the system will do. Strictly speaking, it will actually start warning the user and then automatically terminating processes as it approaches "full", but it will basically use "all" available storage.

However...

Then why is the process killed long before the swapping area is exhausted?

...the fact that the system is willing to use "all" available storage doesn't mean that it should let any random process do that. Every process on the system has its own memory limit (both address space and used pages) enforced by the kernel. I'm not sure what the default limit is...

once about 40 or so...

...however, 40 GB doesn't seem like a terrible default. Keep in mind that the point of the default isn't simply to prevent the drive from filling up, but is really about enforcing "reasonable" behavior. Most processes never get anywhere CLOSE to using 40 GB of memory, so in practice, this limit is a lot closer to "how much memory will the system let a broken process pointlessly leak". From that perspective, 40 GB is extremely generous.

In terms of determining the exact size, os_proc_available_memory() will tell you how far from the limit you actually are and is much easier to use than task_info(). I think getrlimit()/setrlimit() (see the man page for more info) would also work, though raising the limit requires super user.

Thanks for any insight you can share on this issue. Is it a known bug (perhaps with conda/Python environments) or is it expected behaviour?

It is very much expected behaviour.

In contrast, the same process on a Linux machine (basic Python venv here) just keeps swapping, and never gets killed until the swap area is exhausted.

Yes. Well, everyone has made choices they're not proud of.

Is there a way to keep the process from being killed?

The limit itself is raisable. Have you tried using "ulimit()" in the shell? Aside from that, I'm not sure mapped files[1] are tracked through the same limit, so you might be able to map a 50 GB file even though the VM system wouldn't let you allocate 40 GB.

[1] In practice, mapped I/O is why hitting this limit isn't common. Most applications that want to interact with large amounts of RAM also have some interest in preserving whatever it is they're manipulating.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thank you so much for your reply, now I have a picture of what is going on. Could you share also how to use these functions? The only documentation I could find does not have examples. Say I have, among all, this process running, labelled python3 with PID 33238. I tried writing os_proc_available_memory() in my terminal (bash shell), and all I get is a prompt > awaiting for input. Same with getrlimit and setrlimit. I tried also os_proc_available_memory(33238) etc but I get error messages. The documentation keep mentioning 'the current process' but there are many, how do I run this functions relative to a specific ongoing process?

Zsh kills Python process with plenty of available VM
 
 
Q