-
Execute IA agêntica localmente no Mac usando o MLX
Execute agentes de IA localmente com privacidade, baixa latência e acesso offline. Descubra como os avanços do MLX e o hardware do Mac viabilizam poderosos fluxos de trabalho agênticos inteiramente no dispositivo. Você explorará agentes de código, como OpenCode, verá como eles se integram ao Xcode, aprenderá técnicas para ampliar a capacidade usando vários Macs e descobrirá como integrar ferramentas perfeitamente, sem sair da sua máquina.
Capítulos
- 0:00 - Introduction
- 0:32 - The chat and agentic loop
- 2:42 - Local agentic AI stack
- 4:36 - Setting up your own agent
- 5:39 - Making agents fast
- 6:53 - Concurrency and distributed inference
- 9:20 - More examples
- 13:01 - Next steps
Recursos
- MLX Swift LM on GitHub
- MLX Swift Examples
- MLX Examples
- MLX Swift
- MLX LM - Python API
- MLX Explore - Python API
- MLX Framework
- MLX
Vídeos relacionados
WWDC26
- Explore a computação numérica em Swift com o MLX
- Explore inferência e treinamento distribuído com o MLX
WWDC25
-
Buscar neste vídeo...
-
-
4:40 - Set up MLX-LM and start the local server
# Step 1: Install MLX-LM pip install mlx-lm # Step 2: Start the server mlx_lm.server --model mlx-community/Qwen-3.5-4B-8bit # Step 3: Point your agent to the server curl -X POST \ http://127.0.0.1:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model":"default_model","messages":[{"role":"user","content":"Hello!"}]}' -
5:18 - Configure an agent to use your local MLX server
{ "$schema": "https://opencode.ai/config.json", "model": "mlx/default_model", "small_model": "mlx/default_model", "provider": { "mlx": { "npm": "@ai-sdk/openai-compatible", "name": "MLX (local)", "options": { "baseURL": "http://127.0.0.1:8080/v1" }, "models": { "default_model": { "name": "Default MLX Model" } } } } } -
8:33 - Launch distributed inference with MLX
mlx.launch --hostfile hosts.json \ --backend jaccl \ /remote/path/to/mlx_lm.server \ --model mlx-community/Qwen-3.5-122B-A3B-8bit
-
-
- 0:00 - Introduction
Overview of building and running agentic AI workflows entirely on Mac using MLX — no cloud, no API keys, just your hardware.
- 0:32 - The chat and agentic loop
How traditional chat differs from the agentic loop: the model decides what to do, calls tools to run commands, read files, and hit APIs, observes the results, and iterates — all running locally for privacy and offline availability.
- 2:42 - Local agentic AI stack
A walkthrough of the four-layer stack powering local agentic AI on the Mac: MLX (array framework for Apple Silicon), MLX-LM (model loading, quantization, and fine-tuning), MLX-LM Server (OpenAI-compatible HTTP server), and the agent layer — including popular tools like Ollama, LM Studio, and vLLM.
- 4:36 - Setting up your own agent
Three steps to go from zero to a fully local agentic workflow: install MLX-LM with pip, start the server with a tool-calling model, and configure your agent to point at the local endpoint.
- 5:39 - Making agents fast
How MLX tackles the first challenge of agentic workloads — efficiently processing large contexts with hundreds of thousands of tokens — including how M5 Neural Accelerators accelerate prompt processing speed.
- 6:53 - Concurrency and distributed inference
How MLX handles continuous batching for concurrent multi-agent requests, and distributed inference to spread large models across multiple Macs over Thunderbolt.
- 9:20 - More examples
Two-part live demo building SwiftUI apps entirely on-device. First, using OpenCode with MLX to generate a complete SwiftUI project from a description; then, using Xcode's agentic coding capabilities to build and fix a SwiftUI app — all running locally.
- 13:01 - Next steps
Summary of the full local AI stack and practical steps to get started: install MLX-LM, launch the server, and connect your agent. All shown tools are open-source and available now.