JUST ENDED
|

Foundation Models Q&A

Connect with Apple engineers in the Foundation Models Q&A on the Apple Developer Forums.

Post

Replies

Boosts

Views

Activity

Siri As Coding Agent
In the new Xcode we saw examples of Claude, OAI & Google coding agents that you can start conversations with inside your project, giving it access to your project files context. As far as I understand, this requires an API key for those models & the processing is run on Anthropic / Google servers, not locally nor on Private Cloud Compute. Is it possible to instead, use the LLM powering Foundation Models, for a “Siri Code Agent” which operates in the place of those models, but runs on device or in Private Cloud Compute? I like how this works for Siri AI requests, and would love to have a coding assistant agent that can operate in the same privacy preserving way! Is this possible with any of the open source frameworks or the command line tools? If not, what is the best way to request this feature?
1
1
44
1d
Improved Guardrails Error Handling
I work on an app called one sec which helps people reduce the amount of time they spend on social media by interrupting app openings with Shortcuts automations. With the release of iOS 26, we added a new Conversational Reflection interruption, a feature backed by Foundation Models. The user talks through their reasoning for wanting to use social media. A significant fraction of our users suffer from ADHD, ADD, and struggle with mental health. As a result, we try to show crisis support banners in our conversation UI. We do so with structured outputs, asking the language model If the user appears to be in deep distress. However, often times the guardrails are triggered and we don’t receive the response from the language model, meaning we can’t support our users and show them related resources if needed. We’d love to see more specific guardrails errors introduced in the framework to better support our users. Here’s a radar with further details: FB20828230 Thank you!
1
0
33
1d
Confirmation, permissions, and reversibility for agentic actions
Apple demonstrated agentic behavior (e.g., the Passwords app changing credentials on the user's behalf), and Siri AI can now take systemwide actions in apps. Is there a first-class confirmation API for App Intents — a way to mark an action as requiring explicit user approval before execution, with a standard confirmation surface — or must developers build their own confirmation UI inside the intent? For irreversible or high-impact actions, what is Apple's recommended pattern to prevent the model from executing them autonomously, and can an intent declare a risk/sensitivity level the system respects? When Siri AI invokes an action, what authentication/authorization context is available to the intent (biometric gate, user-presence assertion), and how should an app require step-up auth for sensitive operations? Is there a supported audit trail for actions taken via Siri AI on the user's behalf, so an app can show the user what was done and when? How does the system handle an action that fails or partially completes during an agentic, multi-step flow?
1
1
42
1d
Can the SpotlightSearchTool work with a custom model executor?
When SpotlightSearchTool is used with a custom LanguageModel backend (for example Apple’s ChatCompletionsLanguageModel from apple/foundation-models-utilities, pointed at an OpenAI-compatible server), the tool can never be successfully invoked. The model produces tool-call arguments that exactly match the format documented in the tool’s own description, but those arguments fail validation against the tool’s generated parameters JSON Schema, throwing LanguageModelSession.ToolCallError with underlying error “Failed to parse generated content.” The root cause is a mismatch between two things the framework sends to the model in the same tool definition: the human-readable description (“Call format”), which presents the top-level arguments as { root, modelComposition, … }, and the parameters JSON Schema (FullArguments), which requires { "query": { "type": "search", "value": { root, modelComposition, … } } }. A model that follows the description is guaranteed to fail the schema. Secondary observation (may be a separate issue or intended) CoreSpotlightSource.fetchAttributes appears to have no effect on which attributes are returned to the model on this agentic-search path. Even with fetchAttributes: [.title, .contentDescription] set on the source, results contain only default metadata (kMDItemTitle, kMDItemDisplayName, dates, identifiers) and omit kMDItemDescription. The description is returned only when the in-query SearchArguments.fetchAttributes explicitly lists it. The searchableIndexDelegate was never invoked in any configuration tried (including .dynamic). If the source-level fetchAttributes is meant to drive returned attributes, that also seems incorrect; otherwise, clarifying the docs would help. Therefore my question, is this just not supported or does the scheme need an update? Or is There a different way that should be done?
1
0
33
1d
LLM search using Core Spotlight
If your app creates an Apple Intelligence schema conforming App Entity, Siri AI can only reason over the schema defined properties. (see this thread). But as a developer, I can add more optional properties on my App Entity with additional metadata about the entity. If my app contributes these App Entities to Spotlight as indexed entities, is SpotlightSearchTool also limited to reasoning over just the schema defined properties, or are these unrelated concepts? Will these additional optional properties on my App Entity enable a deeper SpotlightSearchTool powered search experience around these entities?
2
0
117
1d
Visual Intelligence and screen/camera understanding for third-party apps
Visual Intelligence lets users ask Siri about what the camera or screen shows, and the screenshot tool can extract structured data into system apps. Can a third-party app contribute results or actions when the user invokes Visual Intelligence over the app's own content or a screenshot of it (analogous to how a schedule becomes calendar events), and what API surfaces that? For the Image Playground API, what are the content, rate, and style constraints, and can generated assets be used in commercial app contexts? Is there a supported way for an app to provide its own visual understanding to the system rather than relying solely on Apple's model — for domain-specific imagery the on-device model may not recognize?
1
0
25
1d
Time Series Models
The Foundation Models framework is clearly designed around language, but there's a large class of on-device AI tasks that are not language tasks at all. Time series forecasting is one example think energy consumption modeling, or sensor anomaly detection. These models take sequences of numeric data and output probabilistic forecasts. No text involved at any layer. Is there any intention to extend Foundation Models or a sibling framework to non-language modalities specifically structured numeric and time series inference
1
1
28
1d
Questions About Apple Foundation Models, Context Window Limits, and the New Core AI Framework
After reviewing the WWDC sessions on Foundation Models and Core AI, I had a few questions around the practical limits and architectural direction of the platform. From my understanding, on-device Foundation Models remain optimized for privacy, latency, and efficiency, which naturally introduces constraints around context length and agent complexity. Has anything changed regarding the effective context window available to developers, or should we still design around similar context-management constraints as before? Core AI appears to introduce a more structured approach to building AI-powered applications. For developers building sophisticated assistants, how should we think about the boundary between application-level orchestration and framework-level orchestration? For example, are advanced patterns such as sub-agents, hierarchical planning, dynamic tool availability, and workflow decomposition expected to remain developer-managed, or are these areas Core AI aims to support more directly over time? I am also curious about Apple's vision for model interoperability. While Foundation Models provide an excellent on-device experience, many production-grade agent systems combine multiple specialized models for planning, reasoning, retrieval, and execution. Does Apple envision future pathways for integrating external models into Core AI driven workflows while maintaining the privacy and performance principles of the platform? Finally, for teams pushing the limits of on-device AI assistants, what architectural patterns do you recommend for handling long-horizon tasks, large context requirements, evolving toolsets, and multi-step reasoning within the current Foundation Models ecosystem?
0
0
21
1d
Questions About Apple Foundation Models, Context Window Limits and the New Core AI Framework
After reviewing the WWDC sessions on Foundation Models and Core AI, I had a few questions around the practical limits and architectural direction of the platform. From my understanding, on-device Foundation Models remain optimized for privacy, latency, and efficiency, which naturally introduces constraints around context length and agent complexity. Has anything changed regarding the effective context window available to developers, or should we still design around similar context-management constraints as before? Core AI appears to introduce a more structured approach to building AI-powered applications. For developers building sophisticated assistants, how should we think about the boundary between application-level orchestration and framework-level orchestration? For example, are advanced patterns such as sub-agents, hierarchical planning, dynamic tool availability, and workflow decomposition expected to remain developer-managed, or are these areas Core AI aims to support more directly over time? I am also curious about Apple's vision for model interoperability. While Foundation Models provide an excellent on-device experience, many production-grade agent systems combine multiple specialized models for planning, reasoning, retrieval, and execution. Does Apple envision future pathways for integrating external models into Core AI driven workflows while maintaining the privacy and performance principles of the platform? Finally, for teams pushing the limits of on-device AI assistants, what architectural patterns do you recommend for handling long-horizon tasks, large context requirements, evolving toolsets, and multi-step reasoning within the current Foundation Models ecosystem?
0
0
24
1d
Disambiguation when multiple entities match
When a spoken phrase could match several entities in our catalog — same region, similar names, or partial matches — who is responsible for disambiguation: Siri via App Schemas and entity resolution, or the app via EntityStringQuery returning multiple candidates? What’s the recommended UX pattern for ‘Did you mean A or B?’
5
0
42
1d
Custom vocabulary for speech and entity resolution
Whisper and other STT APIs let you pass a custom vocabulary or initial_prompt to bias recognition toward domain-specific proper nouns. In the App Intents / Siri stack, is there an equivalent way to supply dynamic, per-user term lists — for example favorites or recently used items — to improve how spoken names are transcribed or resolved?
Replies
1
Boosts
0
Views
25
Activity
1h
What is _the_ proper way to intercept tool calls modify them or dynamically approve/reject them?
What is the proper way to intercept tool calls modify them or dynamically approve/reject them?
Replies
4
Boosts
0
Views
93
Activity
12h
Framework Boundaries
Given that Foundation Models focus on native Swift APIs, is there any supported bridge for a WebKit-based app to access the Language Model Protocol?
Replies
1
Boosts
0
Views
33
Activity
23h
Siri As Coding Agent
In the new Xcode we saw examples of Claude, OAI & Google coding agents that you can start conversations with inside your project, giving it access to your project files context. As far as I understand, this requires an API key for those models & the processing is run on Anthropic / Google servers, not locally nor on Private Cloud Compute. Is it possible to instead, use the LLM powering Foundation Models, for a “Siri Code Agent” which operates in the place of those models, but runs on device or in Private Cloud Compute? I like how this works for Siri AI requests, and would love to have a coding assistant agent that can operate in the same privacy preserving way! Is this possible with any of the open source frameworks or the command line tools? If not, what is the best way to request this feature?
Replies
1
Boosts
1
Views
44
Activity
1d
Improved Guardrails Error Handling
I work on an app called one sec which helps people reduce the amount of time they spend on social media by interrupting app openings with Shortcuts automations. With the release of iOS 26, we added a new Conversational Reflection interruption, a feature backed by Foundation Models. The user talks through their reasoning for wanting to use social media. A significant fraction of our users suffer from ADHD, ADD, and struggle with mental health. As a result, we try to show crisis support banners in our conversation UI. We do so with structured outputs, asking the language model If the user appears to be in deep distress. However, often times the guardrails are triggered and we don’t receive the response from the language model, meaning we can’t support our users and show them related resources if needed. We’d love to see more specific guardrails errors introduced in the framework to better support our users. Here’s a radar with further details: FB20828230 Thank you!
Replies
1
Boosts
0
Views
33
Activity
1d
Confirmation, permissions, and reversibility for agentic actions
Apple demonstrated agentic behavior (e.g., the Passwords app changing credentials on the user's behalf), and Siri AI can now take systemwide actions in apps. Is there a first-class confirmation API for App Intents — a way to mark an action as requiring explicit user approval before execution, with a standard confirmation surface — or must developers build their own confirmation UI inside the intent? For irreversible or high-impact actions, what is Apple's recommended pattern to prevent the model from executing them autonomously, and can an intent declare a risk/sensitivity level the system respects? When Siri AI invokes an action, what authentication/authorization context is available to the intent (biometric gate, user-presence assertion), and how should an app require step-up auth for sensitive operations? Is there a supported audit trail for actions taken via Siri AI on the user's behalf, so an app can show the user what was done and when? How does the system handle an action that fails or partially completes during an agentic, multi-step flow?
Replies
1
Boosts
1
Views
42
Activity
1d
Can the SpotlightSearchTool work with a custom model executor?
When SpotlightSearchTool is used with a custom LanguageModel backend (for example Apple’s ChatCompletionsLanguageModel from apple/foundation-models-utilities, pointed at an OpenAI-compatible server), the tool can never be successfully invoked. The model produces tool-call arguments that exactly match the format documented in the tool’s own description, but those arguments fail validation against the tool’s generated parameters JSON Schema, throwing LanguageModelSession.ToolCallError with underlying error “Failed to parse generated content.” The root cause is a mismatch between two things the framework sends to the model in the same tool definition: the human-readable description (“Call format”), which presents the top-level arguments as { root, modelComposition, … }, and the parameters JSON Schema (FullArguments), which requires { "query": { "type": "search", "value": { root, modelComposition, … } } }. A model that follows the description is guaranteed to fail the schema. Secondary observation (may be a separate issue or intended) CoreSpotlightSource.fetchAttributes appears to have no effect on which attributes are returned to the model on this agentic-search path. Even with fetchAttributes: [.title, .contentDescription] set on the source, results contain only default metadata (kMDItemTitle, kMDItemDisplayName, dates, identifiers) and omit kMDItemDescription. The description is returned only when the in-query SearchArguments.fetchAttributes explicitly lists it. The searchableIndexDelegate was never invoked in any configuration tried (including .dynamic). If the source-level fetchAttributes is meant to drive returned attributes, that also seems incorrect; otherwise, clarifying the docs would help. Therefore my question, is this just not supported or does the scheme need an update? Or is There a different way that should be done?
Replies
1
Boosts
0
Views
33
Activity
1d
RAG boundary: static knowledge vs live data
Should static domain documentation live in on-device RAG (local embeddings + FM), while time-sensitive data always comes from network tools — and are there practical size/latency budgets for on-device embedding indexes?
Replies
1
Boosts
0
Views
33
Activity
1d
LLM search using Core Spotlight
If your app creates an Apple Intelligence schema conforming App Entity, Siri AI can only reason over the schema defined properties. (see this thread). But as a developer, I can add more optional properties on my App Entity with additional metadata about the entity. If my app contributes these App Entities to Spotlight as indexed entities, is SpotlightSearchTool also limited to reasoning over just the schema defined properties, or are these unrelated concepts? Will these additional optional properties on my App Entity enable a deeper SpotlightSearchTool powered search experience around these entities?
Replies
2
Boosts
0
Views
117
Activity
1d
Tool calling: App Intents vs server-side orchestration
For assistants that need multi-step tool use (search → fetch → compare → respond), should third-party apps expose capabilities as App Intents for on-device model selection, or keep tool orchestration on the server and use on-device models only for speech and summarization? What breaks when the same action exists in both places?
Replies
1
Boosts
0
Views
27
Activity
1d
Image size, format, and background vs other VLMs
With different VLMs supporting different size and background color if padding is needed… and iOS 27 AFM being the most flexible… the previous talk mentioned that the context size suffers for this flexibility… so what’s the best format/size/background for the app to pre-process to minimize token use… much thanks
Replies
0
Boosts
0
Views
11
Activity
1d
Visual Intelligence and screen/camera understanding for third-party apps
Visual Intelligence lets users ask Siri about what the camera or screen shows, and the screenshot tool can extract structured data into system apps. Can a third-party app contribute results or actions when the user invokes Visual Intelligence over the app's own content or a screenshot of it (analogous to how a schedule becomes calendar events), and what API surfaces that? For the Image Playground API, what are the content, rate, and style constraints, and can generated assets be used in commercial app contexts? Is there a supported way for an app to provide its own visual understanding to the system rather than relying solely on Apple's model — for domain-specific imagery the on-device model may not recognize?
Replies
1
Boosts
0
Views
25
Activity
1d
Time Series Models
The Foundation Models framework is clearly designed around language, but there's a large class of on-device AI tasks that are not language tasks at all. Time series forecasting is one example think energy consumption modeling, or sensor anomaly detection. These models take sequences of numeric data and output probabilistic forecasts. No text involved at any layer. Is there any intention to extend Foundation Models or a sibling framework to non-language modalities specifically structured numeric and time series inference
Replies
1
Boosts
1
Views
28
Activity
1d
On Advanced Context Management
When using the 'Summarize History' modifier, can we configure the summarization prompt to specifically preserve certain metadata like tool call IDs so that a resumed conversation can still reference previously executed app actions?
Replies
2
Boosts
0
Views
41
Activity
1d
Is AFM 3 Core a CoreAI model?
Are the on-device Apple foundation models like AFM 3 Core shipped as CoreAI models or do they use some different technology? Is it possible to open them in the Core AI Debugger to understand them in detail?
Replies
1
Boosts
0
Views
53
Activity
1d
Strict RAG implementation via .required tool calling and temp=0
Any guidance if we want the iOS 27 SystemLanguageModel to always defer to our app for all answers and not its built-in training for responses
Replies
1
Boosts
0
Views
24
Activity
1d
Questions About Apple Foundation Models, Context Window Limits, and the New Core AI Framework
After reviewing the WWDC sessions on Foundation Models and Core AI, I had a few questions around the practical limits and architectural direction of the platform. From my understanding, on-device Foundation Models remain optimized for privacy, latency, and efficiency, which naturally introduces constraints around context length and agent complexity. Has anything changed regarding the effective context window available to developers, or should we still design around similar context-management constraints as before? Core AI appears to introduce a more structured approach to building AI-powered applications. For developers building sophisticated assistants, how should we think about the boundary between application-level orchestration and framework-level orchestration? For example, are advanced patterns such as sub-agents, hierarchical planning, dynamic tool availability, and workflow decomposition expected to remain developer-managed, or are these areas Core AI aims to support more directly over time? I am also curious about Apple's vision for model interoperability. While Foundation Models provide an excellent on-device experience, many production-grade agent systems combine multiple specialized models for planning, reasoning, retrieval, and execution. Does Apple envision future pathways for integrating external models into Core AI driven workflows while maintaining the privacy and performance principles of the platform? Finally, for teams pushing the limits of on-device AI assistants, what architectural patterns do you recommend for handling long-horizon tasks, large context requirements, evolving toolsets, and multi-step reasoning within the current Foundation Models ecosystem?
Replies
0
Boosts
0
Views
21
Activity
1d
Structured intents vs free-form queries
For voice assistants with many capabilities, is it better to ship one generic ‘ask assistant’ intent with a natural-language parameter, or many typed intents (GetForecast, CompareLocations, etc.)? What are Siri’s limits on disambiguation and follow-up turns?
Replies
1
Boosts
0
Views
22
Activity
1d
Questions About Apple Foundation Models, Context Window Limits and the New Core AI Framework
After reviewing the WWDC sessions on Foundation Models and Core AI, I had a few questions around the practical limits and architectural direction of the platform. From my understanding, on-device Foundation Models remain optimized for privacy, latency, and efficiency, which naturally introduces constraints around context length and agent complexity. Has anything changed regarding the effective context window available to developers, or should we still design around similar context-management constraints as before? Core AI appears to introduce a more structured approach to building AI-powered applications. For developers building sophisticated assistants, how should we think about the boundary between application-level orchestration and framework-level orchestration? For example, are advanced patterns such as sub-agents, hierarchical planning, dynamic tool availability, and workflow decomposition expected to remain developer-managed, or are these areas Core AI aims to support more directly over time? I am also curious about Apple's vision for model interoperability. While Foundation Models provide an excellent on-device experience, many production-grade agent systems combine multiple specialized models for planning, reasoning, retrieval, and execution. Does Apple envision future pathways for integrating external models into Core AI driven workflows while maintaining the privacy and performance principles of the platform? Finally, for teams pushing the limits of on-device AI assistants, what architectural patterns do you recommend for handling long-horizon tasks, large context requirements, evolving toolsets, and multi-step reasoning within the current Foundation Models ecosystem?
Replies
0
Boosts
0
Views
24
Activity
1d
Disambiguation when multiple entities match
When a spoken phrase could match several entities in our catalog — same region, similar names, or partial matches — who is responsible for disambiguation: Siri via App Schemas and entity resolution, or the app via EntityStringQuery returning multiple candidates? What’s the recommended UX pattern for ‘Did you mean A or B?’
Replies
5
Boosts
0
Views
42
Activity
1d