Metal GPU Driver Crash on M5 Pro + macOS 26.5 — kIOGPUCommandBufferCallbackErrorOutOfMemory with <2GB working sets
Summary
The Metal driver AGXMetalG17X 351.2 on macOS 26.5 (25F71) for the M5 Pro chip crashes with kIOGPUCommandBufferCallbackErrorOutOfMemory (00000008) when running LLM inference workloads with working sets as small as ~1.5GB, despite 24GB of unified memory being available and Apple Diagnostics confirming the hardware is fully functional.
This affects multiple tools: MLX, llama.cpp (Metal backend), and native apps using Metal for inference.
System
Component
Value
Model
MacBook Pro (Mac17,9)
Chip
Apple M5 Pro (applegpu_g17s)
GPU Cores
16
RAM
24 GB LPDDR5
macOS
26.5 (25F71)
Metal
Metal 4
GPU Driver
AGXMetalG17X 351.2
Xcode
26.5 (17F42)
Reproduction
MLX (Python)
pip install mlx mlx-lm
python -m mlx_lm.generate \
--model mlx-community/Qwen2.5-3B-Instruct-4bit \
--max-tokens 10 \
--prompt "Hello"
Expected: Normal text generation
Actual: Crash with:
libc++abi: terminating due to uncaught exception of type std::runtime_error:
[METAL] Command buffer execution failed: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
llama.cpp
brew install llama.cpp
llama-cli --model model.gguf --prompt "Hello" --n-predict 20 --n-gpu-layers 99
Expected: Fast GPU generation
Actual: Process hangs indefinitely
Test Results
Tool
Model
Peak Memory
Result
MLX
Qwen2.5-0.5B-4bit
0.36 GB
✅ Works
MLX
Qwen2.5-1.5B-4bit
0.98 GB
✅ Works
MLX
Qwen3-1.7B-4bit
1.01 GB
✅ Works
MLX
Qwen2.5-3B-4bit
~1.5 GB
❌ Metal OOM crash
MLX
Qwen3-4B-4bit
~2.1 GB
❌ Metal OOM crash
MLX
Qwen3-8B-4bit
~4.5 GB
❌ Metal OOM crash
llama.cpp
Qwen2.5-0.5B GGUF
~0.5 GB
❌ Hangs with GPU
llama.cpp
Qwen2.5-0.5B GGUF
~0.5 GB
✅ Works with CPU only
Key Evidence
Hardware is healthy — Apple Diagnostics passed all tests
Basic Metal works — matmul, array ops work fine
CPU inference works — llama.cpp with -ngl 0 runs correctly
The error is NOT about actual memory exhaustion — kIOGPUCommandBufferCallbackErrorOutOfMemory means the kernel rejects the Metal memory commit, not that physical memory is full. The system reports 17.76GB available for Metal working set.
Crash Log Extract
Thread 31 Crashed:
0 libsystem_kernel.dylib __pthread_kill + 8
1 libsystem_pthread.dylib pthread_kill + 296
2 libsystem_c.dylib abort + 148
3 Metal MTLReportFailure.cold.1 + 48
4 Metal MTLReportFailure + 576
5 Metal -[_MTLCommandBuffer addCompletedHandler:] + 104
...
Exception Type: EXC_CRASH (SIGABRT)
Termination Reason: Namespace SIGNAL, Code 6, Abort trap: 6
Related Issues
ml-explore/mlx#3586 — Metal compiler regression on macOS 26.5
ml-explore/mlx#3534 — M5 float32 precision issue
ml-explore/mlx#3568 — M5 random divergence
ml-explore/mlx#3539 — Metal residency OOM (M4 Max)
Request
Please investigate the AGXMetalG17X driver for M5 Pro on macOS 26.5. The driver appears to incorrectly reject Metal memory commits for LLM inference workloads, even when the working set is well within the system's reported limits (1.5GB requested vs 17.76GB available).
Happy to provide full crash logs, sysdiagnose archives, or run additional tests.
0
0
23