For assistants that need multi-step tool use (search → fetch → compare → respond), should third-party apps expose capabilities as App Intents for on-device model selection, or keep tool orchestration on the server and use on-device models only for speech and summarization? What breaks when the same action exists in both places?
The on-device model supports tool calling. I'd recommend using the Evaluations framework to determine whether a server model or an on-device model works best for your use case: https://developer.apple.com/documentation/evaluations
Here's a session introducing the framework: https://developer.apple.com/videos/play/wwdc2026/298/