JUST ENDED

|

Foundation Models Q&A

Connect with Apple engineers in the Foundation Models Q&A on the Apple Developer Forums.

Machine Learning & AI Foundation Models

Post

Replies

Boosts

Views

Activity

Siri As Coding Agent

In the new Xcode we saw examples of Claude, OAI & Google coding agents that you can start conversations with inside your project, giving it access to your project files context. As far as I understand, this requires an API key for those models & the processing is run on Anthropic / Google servers, not locally nor on Private Cloud Compute. Is it possible to instead, use the LLM powering Foundation Models, for a “Siri Code Agent” which operates in the place of those models, but runs on device or in Private Cloud Compute? I like how this works for Siri AI requests, and would love to have a coding assistant agent that can operate in the same privacy preserving way! Is this possible with any of the open source frameworks or the command line tools? If not, what is the best way to request this feature?

Machine Learning & AI Foundation Models

2

1

403

Jun ’26

Custom vocabulary for speech and entity resolution

Whisper and other STT APIs let you pass a custom vocabulary or initial_prompt to bias recognition toward domain-specific proper nouns. In the App Intents / Siri stack, is there an equivalent way to supply dynamic, per-user term lists — for example favorites or recently used items — to improve how spoken names are transcribed or resolved?

Machine Learning & AI Foundation Models

1

0

333

Jun ’26

What is _the_ proper way to intercept tool calls modify them or dynamically approve/reject them?

What is the proper way to intercept tool calls modify them or dynamically approve/reject them?

Machine Learning & AI Foundation Models

4

0

414

Jun ’26

Framework Boundaries

Given that Foundation Models focus on native Swift APIs, is there any supported bridge for a WebKit-based app to access the Language Model Protocol?

Machine Learning & AI Foundation Models

1

0

329

Jun ’26

Improved Guardrails Error Handling

I work on an app called one sec which helps people reduce the amount of time they spend on social media by interrupting app openings with Shortcuts automations. With the release of iOS 26, we added a new Conversational Reflection interruption, a feature backed by Foundation Models. The user talks through their reasoning for wanting to use social media. A significant fraction of our users suffer from ADHD, ADD, and struggle with mental health. As a result, we try to show crisis support banners in our conversation UI. We do so with structured outputs, asking the language model If the user appears to be in deep distress. However, often times the guardrails are triggered and we don’t receive the response from the language model, meaning we can’t support our users and show them related resources if needed. We’d love to see more specific guardrails errors introduced in the framework to better support our users. Here’s a radar with further details: FB20828230 Thank you!

Machine Learning & AI Foundation Models

1

0

228

Jun ’26

Confirmation, permissions, and reversibility for agentic actions

Apple demonstrated agentic behavior (e.g., the Passwords app changing credentials on the user's behalf), and Siri AI can now take systemwide actions in apps. Is there a first-class confirmation API for App Intents — a way to mark an action as requiring explicit user approval before execution, with a standard confirmation surface — or must developers build their own confirmation UI inside the intent? For irreversible or high-impact actions, what is Apple's recommended pattern to prevent the model from executing them autonomously, and can an intent declare a risk/sensitivity level the system respects? When Siri AI invokes an action, what authentication/authorization context is available to the intent (biometric gate, user-presence assertion), and how should an app require step-up auth for sensitive operations? Is there a supported audit trail for actions taken via Siri AI on the user's behalf, so an app can show the user what was done and when? How does the system handle an action that fails or partially completes during an agentic, multi-step flow?

Machine Learning & AI Foundation Models

1

1

283

Jun ’26

Can the SpotlightSearchTool work with a custom model executor?

When SpotlightSearchTool is used with a custom LanguageModel backend (for example Apple’s ChatCompletionsLanguageModel from apple/foundation-models-utilities, pointed at an OpenAI-compatible server), the tool can never be successfully invoked. The model produces tool-call arguments that exactly match the format documented in the tool’s own description, but those arguments fail validation against the tool’s generated parameters JSON Schema, throwing LanguageModelSession.ToolCallError with underlying error “Failed to parse generated content.” The root cause is a mismatch between two things the framework sends to the model in the same tool definition: the human-readable description (“Call format”), which presents the top-level arguments as { root, modelComposition, … }, and the parameters JSON Schema (FullArguments), which requires { "query": { "type": "search", "value": { root, modelComposition, … } } }. A model that follows the description is guaranteed to fail the schema. Secondary observation (may be a separate issue or intended) CoreSpotlightSource.fetchAttributes appears to have no effect on which attributes are returned to the model on this agentic-search path. Even with fetchAttributes: [.title, .contentDescription] set on the source, results contain only default metadata (kMDItemTitle, kMDItemDisplayName, dates, identifiers) and omit kMDItemDescription. The description is returned only when the in-query SearchArguments.fetchAttributes explicitly lists it. The searchableIndexDelegate was never invoked in any configuration tried (including .dynamic). If the source-level fetchAttributes is meant to drive returned attributes, that also seems incorrect; otherwise, clarifying the docs would help. Therefore my question, is this just not supported or does the scheme need an update? Or is There a different way that should be done?

Machine Learning & AI Foundation Models

1

0

256

Jun ’26

RAG boundary: static knowledge vs live data

Should static domain documentation live in on-device RAG (local embeddings + FM), while time-sensitive data always comes from network tools — and are there practical size/latency budgets for on-device embedding indexes?

Machine Learning & AI Foundation Models

1

0

179

Jun ’26

LLM search using Core Spotlight

If your app creates an Apple Intelligence schema conforming App Entity, Siri AI can only reason over the schema defined properties. (see this thread). But as a developer, I can add more optional properties on my App Entity with additional metadata about the entity. If my app contributes these App Entities to Spotlight as indexed entities, is SpotlightSearchTool also limited to reasoning over just the schema defined properties, or are these unrelated concepts? Will these additional optional properties on my App Entity enable a deeper SpotlightSearchTool powered search experience around these entities?

Machine Learning & AI Foundation Models

2

0

371

Jun ’26

Tool calling: App Intents vs server-side orchestration

For assistants that need multi-step tool use (search → fetch → compare → respond), should third-party apps expose capabilities as App Intents for on-device model selection, or keep tool orchestration on the server and use on-device models only for speech and summarization? What breaks when the same action exists in both places?

Machine Learning & AI Foundation Models

1

0

205

Jun ’26

Image size, format, and background vs other VLMs

With different VLMs supporting different size and background color if padding is needed… and iOS 27 AFM being the most flexible… the previous talk mentioned that the context size suffers for this flexibility… so what’s the best format/size/background for the app to pre-process to minimize token use… much thanks

Machine Learning & AI Foundation Models

0

0

192

Jun ’26

Visual Intelligence and screen/camera understanding for third-party apps

Visual Intelligence lets users ask Siri about what the camera or screen shows, and the screenshot tool can extract structured data into system apps. Can a third-party app contribute results or actions when the user invokes Visual Intelligence over the app's own content or a screenshot of it (analogous to how a schedule becomes calendar events), and what API surfaces that? For the Image Playground API, what are the content, rate, and style constraints, and can generated assets be used in commercial app contexts? Is there a supported way for an app to provide its own visual understanding to the system rather than relying solely on Apple's model — for domain-specific imagery the on-device model may not recognize?

Machine Learning & AI Foundation Models

1

0

221

Jun ’26

Time Series Models

The Foundation Models framework is clearly designed around language, but there's a large class of on-device AI tasks that are not language tasks at all. Time series forecasting is one example think energy consumption modeling, or sensor anomaly detection. These models take sequences of numeric data and output probabilistic forecasts. No text involved at any layer. Is there any intention to extend Foundation Models or a sibling framework to non-language modalities specifically structured numeric and time series inference

Machine Learning & AI Foundation Models

1

1

156

Jun ’26

On Advanced Context Management

When using the 'Summarize History' modifier, can we configure the summarization prompt to specifically preserve certain metadata like tool call IDs so that a resumed conversation can still reference previously executed app actions?

Machine Learning & AI Foundation Models

2

0

227

Jun ’26

Is AFM 3 Core a CoreAI model?

Are the on-device Apple foundation models like AFM 3 Core shipped as CoreAI models or do they use some different technology? Is it possible to open them in the Core AI Debugger to understand them in detail?

Machine Learning & AI Foundation Models

1

0

301

Jun ’26

Strict RAG implementation via .required tool calling and temp=0

Any guidance if we want the iOS 27 SystemLanguageModel to always defer to our app for all answers and not its built-in training for responses

Machine Learning & AI Foundation Models

1

0

181

Jun ’26

Questions About Apple Foundation Models, Context Window Limits, and the New Core AI Framework

After reviewing the WWDC sessions on Foundation Models and Core AI, I had a few questions around the practical limits and architectural direction of the platform. From my understanding, on-device Foundation Models remain optimized for privacy, latency, and efficiency, which naturally introduces constraints around context length and agent complexity. Has anything changed regarding the effective context window available to developers, or should we still design around similar context-management constraints as before? Core AI appears to introduce a more structured approach to building AI-powered applications. For developers building sophisticated assistants, how should we think about the boundary between application-level orchestration and framework-level orchestration? For example, are advanced patterns such as sub-agents, hierarchical planning, dynamic tool availability, and workflow decomposition expected to remain developer-managed, or are these areas Core AI aims to support more directly over time? I am also curious about Apple's vision for model interoperability. While Foundation Models provide an excellent on-device experience, many production-grade agent systems combine multiple specialized models for planning, reasoning, retrieval, and execution. Does Apple envision future pathways for integrating external models into Core AI driven workflows while maintaining the privacy and performance principles of the platform? Finally, for teams pushing the limits of on-device AI assistants, what architectural patterns do you recommend for handling long-horizon tasks, large context requirements, evolving toolsets, and multi-step reasoning within the current Foundation Models ecosystem?

Machine Learning & AI Foundation Models

0

0

211

Jun ’26

Structured intents vs free-form queries

For voice assistants with many capabilities, is it better to ship one generic ‘ask assistant’ intent with a natural-language parameter, or many typed intents (GetForecast, CompareLocations, etc.)? What are Siri’s limits on disambiguation and follow-up turns?

Machine Learning & AI Foundation Models

1

0

153

Jun ’26

Questions About Apple Foundation Models, Context Window Limits and the New Core AI Framework

After reviewing the WWDC sessions on Foundation Models and Core AI, I had a few questions around the practical limits and architectural direction of the platform. From my understanding, on-device Foundation Models remain optimized for privacy, latency, and efficiency, which naturally introduces constraints around context length and agent complexity. Has anything changed regarding the effective context window available to developers, or should we still design around similar context-management constraints as before? Core AI appears to introduce a more structured approach to building AI-powered applications. For developers building sophisticated assistants, how should we think about the boundary between application-level orchestration and framework-level orchestration? For example, are advanced patterns such as sub-agents, hierarchical planning, dynamic tool availability, and workflow decomposition expected to remain developer-managed, or are these areas Core AI aims to support more directly over time? I am also curious about Apple's vision for model interoperability. While Foundation Models provide an excellent on-device experience, many production-grade agent systems combine multiple specialized models for planning, reasoning, retrieval, and execution. Does Apple envision future pathways for integrating external models into Core AI driven workflows while maintaining the privacy and performance principles of the platform? Finally, for teams pushing the limits of on-device AI assistants, what architectural patterns do you recommend for handling long-horizon tasks, large context requirements, evolving toolsets, and multi-step reasoning within the current Foundation Models ecosystem?

Machine Learning & AI Foundation Models

0

0

155

Jun ’26

Disambiguation when multiple entities match

When a spoken phrase could match several entities in our catalog — same region, similar names, or partial matches — who is responsible for disambiguation: Siri via App Schemas and entity resolution, or the app via EntityStringQuery returning multiple candidates? What’s the recommended UX pattern for ‘Did you mean A or B?’

Machine Learning & AI Foundation Models

5

0

202

Jun ’26

Siri As Coding Agent

In the new Xcode we saw examples of Claude, OAI & Google coding agents that you can start conversations with inside your project, giving it access to your project files context. As far as I understand, this requires an API key for those models & the processing is run on Anthropic / Google servers, not locally nor on Private Cloud Compute. Is it possible to instead, use the LLM powering Foundation Models, for a “Siri Code Agent” which operates in the place of those models, but runs on device or in Private Cloud Compute? I like how this works for Siri AI requests, and would love to have a coding assistant agent that can operate in the same privacy preserving way! Is this possible with any of the open source frameworks or the command line tools? If not, what is the best way to request this feature?

Machine Learning & AI Foundation Models

Replies: 2
Boosts: 1
Views: 403
Activity: Jun ’26

Custom vocabulary for speech and entity resolution

Whisper and other STT APIs let you pass a custom vocabulary or initial_prompt to bias recognition toward domain-specific proper nouns. In the App Intents / Siri stack, is there an equivalent way to supply dynamic, per-user term lists — for example favorites or recently used items — to improve how spoken names are transcribed or resolved?

Machine Learning & AI Foundation Models

Replies: 1
Boosts: 0
Views: 333
Activity: Jun ’26

What is _the_ proper way to intercept tool calls modify them or dynamically approve/reject them?

What is the proper way to intercept tool calls modify them or dynamically approve/reject them?

Machine Learning & AI Foundation Models

Replies: 4
Boosts: 0
Views: 414
Activity: Jun ’26

Framework Boundaries

Given that Foundation Models focus on native Swift APIs, is there any supported bridge for a WebKit-based app to access the Language Model Protocol?

Machine Learning & AI Foundation Models

Replies: 1
Boosts: 0
Views: 329
Activity: Jun ’26

Improved Guardrails Error Handling

I work on an app called one sec which helps people reduce the amount of time they spend on social media by interrupting app openings with Shortcuts automations. With the release of iOS 26, we added a new Conversational Reflection interruption, a feature backed by Foundation Models. The user talks through their reasoning for wanting to use social media. A significant fraction of our users suffer from ADHD, ADD, and struggle with mental health. As a result, we try to show crisis support banners in our conversation UI. We do so with structured outputs, asking the language model If the user appears to be in deep distress. However, often times the guardrails are triggered and we don’t receive the response from the language model, meaning we can’t support our users and show them related resources if needed. We’d love to see more specific guardrails errors introduced in the framework to better support our users. Here’s a radar with further details: FB20828230 Thank you!

Machine Learning & AI Foundation Models

Replies: 1
Boosts: 0
Views: 228
Activity: Jun ’26

Confirmation, permissions, and reversibility for agentic actions

Apple demonstrated agentic behavior (e.g., the Passwords app changing credentials on the user's behalf), and Siri AI can now take systemwide actions in apps. Is there a first-class confirmation API for App Intents — a way to mark an action as requiring explicit user approval before execution, with a standard confirmation surface — or must developers build their own confirmation UI inside the intent? For irreversible or high-impact actions, what is Apple's recommended pattern to prevent the model from executing them autonomously, and can an intent declare a risk/sensitivity level the system respects? When Siri AI invokes an action, what authentication/authorization context is available to the intent (biometric gate, user-presence assertion), and how should an app require step-up auth for sensitive operations? Is there a supported audit trail for actions taken via Siri AI on the user's behalf, so an app can show the user what was done and when? How does the system handle an action that fails or partially completes during an agentic, multi-step flow?

Machine Learning & AI Foundation Models

Replies: 1
Boosts: 1
Views: 283
Activity: Jun ’26

Can the SpotlightSearchTool work with a custom model executor?

When SpotlightSearchTool is used with a custom LanguageModel backend (for example Apple’s ChatCompletionsLanguageModel from apple/foundation-models-utilities, pointed at an OpenAI-compatible server), the tool can never be successfully invoked. The model produces tool-call arguments that exactly match the format documented in the tool’s own description, but those arguments fail validation against the tool’s generated parameters JSON Schema, throwing LanguageModelSession.ToolCallError with underlying error “Failed to parse generated content.” The root cause is a mismatch between two things the framework sends to the model in the same tool definition: the human-readable description (“Call format”), which presents the top-level arguments as { root, modelComposition, … }, and the parameters JSON Schema (FullArguments), which requires { "query": { "type": "search", "value": { root, modelComposition, … } } }. A model that follows the description is guaranteed to fail the schema. Secondary observation (may be a separate issue or intended) CoreSpotlightSource.fetchAttributes appears to have no effect on which attributes are returned to the model on this agentic-search path. Even with fetchAttributes: [.title, .contentDescription] set on the source, results contain only default metadata (kMDItemTitle, kMDItemDisplayName, dates, identifiers) and omit kMDItemDescription. The description is returned only when the in-query SearchArguments.fetchAttributes explicitly lists it. The searchableIndexDelegate was never invoked in any configuration tried (including .dynamic). If the source-level fetchAttributes is meant to drive returned attributes, that also seems incorrect; otherwise, clarifying the docs would help. Therefore my question, is this just not supported or does the scheme need an update? Or is There a different way that should be done?

Machine Learning & AI Foundation Models

Replies: 1
Boosts: 0
Views: 256
Activity: Jun ’26

RAG boundary: static knowledge vs live data

Should static domain documentation live in on-device RAG (local embeddings + FM), while time-sensitive data always comes from network tools — and are there practical size/latency budgets for on-device embedding indexes?

Machine Learning & AI Foundation Models

Replies: 1
Boosts: 0
Views: 179
Activity: Jun ’26

LLM search using Core Spotlight

If your app creates an Apple Intelligence schema conforming App Entity, Siri AI can only reason over the schema defined properties. (see this thread). But as a developer, I can add more optional properties on my App Entity with additional metadata about the entity. If my app contributes these App Entities to Spotlight as indexed entities, is SpotlightSearchTool also limited to reasoning over just the schema defined properties, or are these unrelated concepts? Will these additional optional properties on my App Entity enable a deeper SpotlightSearchTool powered search experience around these entities?

Machine Learning & AI Foundation Models

Replies: 2
Boosts: 0
Views: 371
Activity: Jun ’26

Tool calling: App Intents vs server-side orchestration

For assistants that need multi-step tool use (search → fetch → compare → respond), should third-party apps expose capabilities as App Intents for on-device model selection, or keep tool orchestration on the server and use on-device models only for speech and summarization? What breaks when the same action exists in both places?

Machine Learning & AI Foundation Models

Replies: 1
Boosts: 0
Views: 205
Activity: Jun ’26

Image size, format, and background vs other VLMs

With different VLMs supporting different size and background color if padding is needed… and iOS 27 AFM being the most flexible… the previous talk mentioned that the context size suffers for this flexibility… so what’s the best format/size/background for the app to pre-process to minimize token use… much thanks

Machine Learning & AI Foundation Models

Replies: 0
Boosts: 0
Views: 192
Activity: Jun ’26

Visual Intelligence and screen/camera understanding for third-party apps

Visual Intelligence lets users ask Siri about what the camera or screen shows, and the screenshot tool can extract structured data into system apps. Can a third-party app contribute results or actions when the user invokes Visual Intelligence over the app's own content or a screenshot of it (analogous to how a schedule becomes calendar events), and what API surfaces that? For the Image Playground API, what are the content, rate, and style constraints, and can generated assets be used in commercial app contexts? Is there a supported way for an app to provide its own visual understanding to the system rather than relying solely on Apple's model — for domain-specific imagery the on-device model may not recognize?

Machine Learning & AI Foundation Models

Replies: 1
Boosts: 0
Views: 221
Activity: Jun ’26

Time Series Models

The Foundation Models framework is clearly designed around language, but there's a large class of on-device AI tasks that are not language tasks at all. Time series forecasting is one example think energy consumption modeling, or sensor anomaly detection. These models take sequences of numeric data and output probabilistic forecasts. No text involved at any layer. Is there any intention to extend Foundation Models or a sibling framework to non-language modalities specifically structured numeric and time series inference

Machine Learning & AI Foundation Models

Replies: 1
Boosts: 1
Views: 156
Activity: Jun ’26

On Advanced Context Management

When using the 'Summarize History' modifier, can we configure the summarization prompt to specifically preserve certain metadata like tool call IDs so that a resumed conversation can still reference previously executed app actions?

Machine Learning & AI Foundation Models

Replies: 2
Boosts: 0
Views: 227
Activity: Jun ’26

Is AFM 3 Core a CoreAI model?

Are the on-device Apple foundation models like AFM 3 Core shipped as CoreAI models or do they use some different technology? Is it possible to open them in the Core AI Debugger to understand them in detail?

Machine Learning & AI Foundation Models

Replies: 1
Boosts: 0
Views: 301
Activity: Jun ’26

Strict RAG implementation via .required tool calling and temp=0

Any guidance if we want the iOS 27 SystemLanguageModel to always defer to our app for all answers and not its built-in training for responses

Machine Learning & AI Foundation Models

Replies: 1
Boosts: 0
Views: 181
Activity: Jun ’26

Questions About Apple Foundation Models, Context Window Limits, and the New Core AI Framework

After reviewing the WWDC sessions on Foundation Models and Core AI, I had a few questions around the practical limits and architectural direction of the platform. From my understanding, on-device Foundation Models remain optimized for privacy, latency, and efficiency, which naturally introduces constraints around context length and agent complexity. Has anything changed regarding the effective context window available to developers, or should we still design around similar context-management constraints as before? Core AI appears to introduce a more structured approach to building AI-powered applications. For developers building sophisticated assistants, how should we think about the boundary between application-level orchestration and framework-level orchestration? For example, are advanced patterns such as sub-agents, hierarchical planning, dynamic tool availability, and workflow decomposition expected to remain developer-managed, or are these areas Core AI aims to support more directly over time? I am also curious about Apple's vision for model interoperability. While Foundation Models provide an excellent on-device experience, many production-grade agent systems combine multiple specialized models for planning, reasoning, retrieval, and execution. Does Apple envision future pathways for integrating external models into Core AI driven workflows while maintaining the privacy and performance principles of the platform? Finally, for teams pushing the limits of on-device AI assistants, what architectural patterns do you recommend for handling long-horizon tasks, large context requirements, evolving toolsets, and multi-step reasoning within the current Foundation Models ecosystem?

Machine Learning & AI Foundation Models

Replies: 0
Boosts: 0
Views: 211
Activity: Jun ’26

Structured intents vs free-form queries

For voice assistants with many capabilities, is it better to ship one generic ‘ask assistant’ intent with a natural-language parameter, or many typed intents (GetForecast, CompareLocations, etc.)? What are Siri’s limits on disambiguation and follow-up turns?

Machine Learning & AI Foundation Models

Replies: 1
Boosts: 0
Views: 153
Activity: Jun ’26

Questions About Apple Foundation Models, Context Window Limits and the New Core AI Framework

After reviewing the WWDC sessions on Foundation Models and Core AI, I had a few questions around the practical limits and architectural direction of the platform. From my understanding, on-device Foundation Models remain optimized for privacy, latency, and efficiency, which naturally introduces constraints around context length and agent complexity. Has anything changed regarding the effective context window available to developers, or should we still design around similar context-management constraints as before? Core AI appears to introduce a more structured approach to building AI-powered applications. For developers building sophisticated assistants, how should we think about the boundary between application-level orchestration and framework-level orchestration? For example, are advanced patterns such as sub-agents, hierarchical planning, dynamic tool availability, and workflow decomposition expected to remain developer-managed, or are these areas Core AI aims to support more directly over time? I am also curious about Apple's vision for model interoperability. While Foundation Models provide an excellent on-device experience, many production-grade agent systems combine multiple specialized models for planning, reasoning, retrieval, and execution. Does Apple envision future pathways for integrating external models into Core AI driven workflows while maintaining the privacy and performance principles of the platform? Finally, for teams pushing the limits of on-device AI assistants, what architectural patterns do you recommend for handling long-horizon tasks, large context requirements, evolving toolsets, and multi-step reasoning within the current Foundation Models ecosystem?

Machine Learning & AI Foundation Models

Replies: 0
Boosts: 0
Views: 155
Activity: Jun ’26

Disambiguation when multiple entities match

When a spoken phrase could match several entities in our catalog — same region, similar names, or partial matches — who is responsible for disambiguation: Siri via App Schemas and entity resolution, or the app via EntityStringQuery returning multiple candidates? What’s the recommended UX pattern for ‘Did you mean A or B?’

Machine Learning & AI Foundation Models

Replies: 5
Boosts: 0
Views: 202
Activity: Jun ’26