recommendedMaxWorkingSetSize - is there a way to use all of our unified memory for GPU / Metal ?

It seems like Apple Silicon (even M1/M2 MAX) devices can only use a certain percentage of their total unified memory for GPU/Metal. This seems to be a limitation related to: recommendedMaxWorkingSetSize

Which is quite odd because even M1 Mac Mini's or Macbook Airs run totally fine with 8GB of total memory for both the OS and GPU so why limit this in the first place?

Also seems like false advertising to me from Apple by not clearly stating this limitation.

I am asking this in regards to the following open source project (but of course more software will be impacted by the same limitation): https://github.com/ggerganov/llama.cpp/pull/1826

another resource I've found: https://developer.apple.com/videos/play/tech-talks/10580/?time=546

If anyone has any ideas on how these limitations can be overcome and how to get apps to use more Memory for GPU (Meta)l I (and the open source community) would be truly grateful! thanks in advance!

Post not yet marked as solved Up vote post of Jake88 Down vote post of Jake88
857 views

Replies

This is an ongoing issue. I believe Apple has a hard coded limit of 75% of physical memory.

More on this here (fast forwarded to the relevant part)

An example table of the limits here:

This function will tell you how must you can actively use on your own machine

Apple's solution is to break up the jobs, but apparently that's not an easy task with LLMs -- more on that here:

There is a hack that lets you adjust the VRAM/RAM split, but it unfortunately requires keeping SIP disabled.

Hopefully someone at Apple will find this and give us a means to adjust ourselves.