-
MLX를 사용하여 Mac에서 로컬 에이전틱 AI 실행하기
개인정보 보호, 짧은 지연 시간, 오프라인 접근으로 AI 에이전트를 로컬로 실행하세요. MLX의 혁신 기술과 Mac 하드웨어가 어떻게 강력한 에이전틱 워크플로를 완전한 온디바이스 방식으로 구현할 수 있는지 자세히 알아보세요. OpenCode와 같은 코드 에이전트와 이들이 Xcode에 통합되는 방식을 살펴보고, 여러 Mac 확장을 위한 다양한 기법을 알아보며, Mac을 벗어나지 않고 도구를 원활하게 통합하는 방법을 확인해 보세요.
챕터
- 0:00 - Introduction
- 0:32 - The chat and agentic loop
- 2:42 - Local agentic AI stack
- 4:36 - Setting up your own agent
- 5:39 - Making agents fast
- 6:53 - Concurrency and distributed inference
- 9:20 - More examples
- 13:01 - Next steps
리소스
- MLX Swift LM on GitHub
- MLX Swift Examples
- MLX Examples
- MLX Swift
- MLX LM - Python API
- MLX Explore - Python API
- MLX Framework
- MLX
관련 비디오
WWDC26
WWDC25
-
비디오 검색…
-
-
4:40 - Set up MLX-LM and start the local server
# Step 1: Install MLX-LM pip install mlx-lm # Step 2: Start the server mlx_lm.server --model mlx-community/Qwen-3.5-4B-8bit # Step 3: Point your agent to the server curl -X POST \ http://127.0.0.1:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model":"default_model","messages":[{"role":"user","content":"Hello!"}]}' -
5:18 - Configure an agent to use your local MLX server
{ "$schema": "https://opencode.ai/config.json", "model": "mlx/default_model", "small_model": "mlx/default_model", "provider": { "mlx": { "npm": "@ai-sdk/openai-compatible", "name": "MLX (local)", "options": { "baseURL": "http://127.0.0.1:8080/v1" }, "models": { "default_model": { "name": "Default MLX Model" } } } } } -
8:33 - Launch distributed inference with MLX
mlx.launch --hostfile hosts.json \ --backend jaccl \ /remote/path/to/mlx_lm.server \ --model mlx-community/Qwen-3.5-122B-A3B-8bit
-
-
- 0:00 - Introduction
Overview of building and running agentic AI workflows entirely on Mac using MLX — no cloud, no API keys, just your hardware.
- 0:32 - The chat and agentic loop
How traditional chat differs from the agentic loop: the model decides what to do, calls tools to run commands, read files, and hit APIs, observes the results, and iterates — all running locally for privacy and offline availability.
- 2:42 - Local agentic AI stack
A walkthrough of the four-layer stack powering local agentic AI on the Mac: MLX (array framework for Apple Silicon), MLX-LM (model loading, quantization, and fine-tuning), MLX-LM Server (OpenAI-compatible HTTP server), and the agent layer — including popular tools like Ollama, LM Studio, and vLLM.
- 4:36 - Setting up your own agent
Three steps to go from zero to a fully local agentic workflow: install MLX-LM with pip, start the server with a tool-calling model, and configure your agent to point at the local endpoint.
- 5:39 - Making agents fast
How MLX tackles the first challenge of agentic workloads — efficiently processing large contexts with hundreds of thousands of tokens — including how M5 Neural Accelerators accelerate prompt processing speed.
- 6:53 - Concurrency and distributed inference
How MLX handles continuous batching for concurrent multi-agent requests, and distributed inference to spread large models across multiple Macs over Thunderbolt.
- 9:20 - More examples
Two-part live demo building SwiftUI apps entirely on-device. First, using OpenCode with MLX to generate a complete SwiftUI project from a description; then, using Xcode's agentic coding capabilities to build and fix a SwiftUI app — all running locally.
- 13:01 - Next steps
Summary of the full local AI stack and practical steps to get started: install MLX-LM, launch the server, and connect your agent. All shown tools are open-source and available now.