Foundation Models

RSS for tag

Discuss the Foundation Models framework which provides access to Apple’s on-device large language model that powers Apple Intelligence to help you perform intelligent tasks specific to your app.

Foundation Models Documentation

Posts under Foundation Models subtopic

Post

Replies

Boosts

Views

Activity

Foundation Models framework dyld symbol errors after macOS 26 Beta 2 - LanguageModelSession constructor missing
Foundation Models framework worked perfectly on macOS 26 Beta 2, but starting from Beta 3 and continuing through Beta 6 (latest), I get dyld symbol errors even with the exact code from Apple's documentation. Environment: macOS 26.0 Beta 6 (25A5351b) Xcode 26 Beta 6 M4 Max MacBook Pro Apple Intelligence enabled and downloaded Error Details: dyld[Process]: Symbol not found: _$s16FoundationModels20LanguageModelSessionC5model10guardrails5tools12instructionsAcA06SystemcD0C_AC10GuardrailsVSayAA4Tool_pGAA12InstructionsVSgtcfC Referenced from: /path/to/app.debug.dylib Expected in: /System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels Code Used (Exact from Documentation): import FoundationModels // This worked on Beta 2, crashes on Beta 3+ let model = SystemLanguageModel.default let session = LanguageModelSession(model: model) let response = try await session.respond(to: "Hello") What I've Verified: FoundationModels.framework exists in /System/Library/Frameworks/ Framework is properly linked in Xcode project Apple Intelligence is enabled and working Same code works in older beta versions Issue persists even with completely fresh Xcode projects Analysis: The dyld error suggests the LanguageModelSession(model:) constructor is missing. The symbol shows it's looking for a constructor with parameters (model:guardrails:tools:instructions:), but the documentation still shows the simple (model:) constructor. Questions: Has the LanguageModelSession API changed since Beta 2? Should we now use the constructor with guardrails/tools/instructions parameters? Is this a known issue with recent betas? Are there updated code samples for the current API? Additional Context: This affects both basic SystemLanguageModel usage AND custom adapter loading. The same dyld symbol errors occur when trying to create SystemLanguageModel(adapter: adapter) as well. Any guidance on the correct API usage for current betas would be greatly appreciated. The documentation appears to be out of sync with the actual framework implementation.
1
0
723
Sep ’25
SkillActivation Framework Fails to Build in Xcode 26 When Using foundation-models-utilities
Hi Apple Team, I'm trying to use the SkillActivation framework from the Foundation Models Utilities repository: https://github.com/apple/foundation-models-utilities Environment: Xcode 26 Beta iPadOS/macOS 26 Beta Apple Intelligence enabled Foundation Models Utilities: latest version from GitHub Issue: As soon as I import or use SkillActivation-related APIs, Xcode reports build errors and the project fails to compile. The rest of the Foundation Models framework works correctly, but the problem appears specifically when SkillActivation is added. Steps to Reproduce: Create a new project. Add foundation-models-utilities via Swift Package Manager. Import SkillActivation / follow the sample implementation. Build the project. Expected Result: The project should compile successfully and SkillActivation should be available. Actual Result: Xcode reports compilation errors and the build fails. Questions: Is there any additional entitlement, capability, or configuration required for SkillActivation? Is SkillActivation currently supported in Xcode 26 Beta? Are there any known issues with the current version of foundation-models-utilities? Thank you.
2
0
66
5d
Visual Intelligence and screen/camera understanding for third-party apps
Visual Intelligence lets users ask Siri about what the camera or screen shows, and the screenshot tool can extract structured data into system apps. Can a third-party app contribute results or actions when the user invokes Visual Intelligence over the app's own content or a screenshot of it (analogous to how a schedule becomes calendar events), and what API surfaces that? For the Image Playground API, what are the content, rate, and style constraints, and can generated assets be used in commercial app contexts? Is there a supported way for an app to provide its own visual understanding to the system rather than relying solely on Apple's model — for domain-specific imagery the on-device model may not recognize?
1
0
84
1w
Questions About Apple Foundation Models, Context Window Limits and the New Core AI Framework
After reviewing the WWDC sessions on Foundation Models and Core AI, I had a few questions around the practical limits and architectural direction of the platform. From my understanding, on-device Foundation Models remain optimized for privacy, latency, and efficiency, which naturally introduces constraints around context length and agent complexity. Has anything changed regarding the effective context window available to developers, or should we still design around similar context-management constraints as before? Core AI appears to introduce a more structured approach to building AI-powered applications. For developers building sophisticated assistants, how should we think about the boundary between application-level orchestration and framework-level orchestration? For example, are advanced patterns such as sub-agents, hierarchical planning, dynamic tool availability, and workflow decomposition expected to remain developer-managed, or are these areas Core AI aims to support more directly over time? I am also curious about Apple's vision for model interoperability. While Foundation Models provide an excellent on-device experience, many production-grade agent systems combine multiple specialized models for planning, reasoning, retrieval, and execution. Does Apple envision future pathways for integrating external models into Core AI driven workflows while maintaining the privacy and performance principles of the platform? Finally, for teams pushing the limits of on-device AI assistants, what architectural patterns do you recommend for handling long-horizon tasks, large context requirements, evolving toolsets, and multi-step reasoning within the current Foundation Models ecosystem?
0
0
57
1w
Plenty of LanguageModelSession.GenerationError.refusal errors after 26.4 update
Hello! After the 26.4 update I get a huge number of LanguageModelSession.GenerationError.refusal errors when using guided generation Generables for inexplicable reasons. Such errors also occur, if I want to cast a response to boolean by using 'generating: Bool.self'. The explanation generated on the grounds of the error always looks like this: Response(userPrompt: "", duration: 0.230917542, promptTokenCount: Optional(66), responseTokenCount: Optional(11), feedbackAttachment: nil, content: "I apologize, but I cannot fulfill this request.", rawContent: "I apologize, but I cannot fulfill this request.", transcriptEntries: ArraySlice([])) All the prompts and Generables I use are definitely not profane. Before 26.4 such errors on the same prompts and Generables never occurred. The 26.4 update rendered those features unusable to me. Is this a known bug or what am I doing wrong?
3
0
736
Mar ’26
MLX,MLX LM, MLX LM Server -> Is there a bootstrap repo?
theres a MLX, a MLX LM and a MLX LM Server mentioned. Is there a Bootstrap GitHub repo out there that can be used to directly, and quickly, set up an example of this, without the hassle of setting up, kind of like a bootstrap for "us mere mortals"? And what is the feasibility of using these on a M3 Pro with 18Gb of memory? - can these be bounced between a local M3 Pro and a Tailscale-linked M2 Pro with 36Gb memory? Do both need to be on macOS27 for it to work?
1
0
71
1w
Assert error breaking previews
A foundation models bug I keep running into when in the preview phase of the testing. The error never seems to occur or break the app when I am testing on the simulator or on a device but sometimes I am running into this error when in a longer session while being in preview. The error breaks the preview and crashes it and the waring on it is labeled as : "Assert in LanguageModelFeedback.swift" This is something I keep running into, where I have been using foundation models for my project
2
0
649
Feb ’26
ANE Performance for on-device Foundation model
I'm running MacOs 26 Beta 5. I noticed that I can no longer achieve 100% usage on the ANE as I could before with Apple Foundations on-device model. Has Apple activated some kind of throttling or power limiting of the ANE? I cannot get above 3w or 40% usage now since upgrading. I'm on the high power energy mode. I there an API rate limit being applied? I kave a M4 Pro mini with 64 GB of memory.
0
0
381
Aug ’25
Context window 90% of adapter model full after single user prompt
I have been able to train an adapter on Google's Colaboratory. I am able to start a LanguageModelSession and load it with my adapter. The problem is that after one simple prompt, the context window is 90% full. If I start the session without the adapter, the same simple prompt consumes only 1% of the context window. Has anyone encountered this? I asked Claude AI and it seems to think that my training script needs adjusting. Grok on the other hand is (wrongly, I tried) convinced that I just need to tweak some parameters of LanguageModelSession or SystemLanguageModel. Thanks for any tips.
13
0
3.6k
Feb ’26
SpotlightSearchTool arguments: description vs. JSON Schema mismatch → “Failed to parse generated content”
Using SpotlightSearchTool with a custom LanguageModel backend (Apple’s ChatCompletionsLanguageModel from foundation-models-utilities, pointed at an OpenAI-compatible server), every tool call fails with ToolCallError → "Failed to parse generated content." The model follows the tool’s documented "Call format" and emits { root, modelComposition, … }. But the generated parameters schema (FullArguments) requires { "query": { "type": "search", "value": { root, modelComposition, … } } }. Query is a QueryType union and a search must be wrapped in DiscriminatedSearch. Wrapping the args manually makes it parse and search correctly. So the description omits the query + type:"search" envelope the schema demands, which makes the tool uninvokable by any model that follows the documentation (it presumably works only with the on-device model trained on the real format). Is this a known issue / intended? Anyone gotten SpotlightSearchTool working with a non-Apple model? Secondary: CoreSpotlightSource.fetchAttributes seems to have no effect on returned attributes. kMDItemDescription only comes back when the in-query fetchAttributes requests it. Bug or expected?
1
0
79
1w
Approaching Custom VST GUI Automation: Combining local Vision OCR with the new FoundationModels framework for screen-grounding
Hello everyone, I’m working on a project to automate software controls inside non-standard macOS applications—specifically custom-drawn audio plugins (like the Roland TR-909 VST). The Challenge: These VST interfaces do not expose their buttons, knobs, or dials via the standard macOS Accessibility tree (NSAccessibility / event taps). Because they are custom-rendered, standard automation tools are blind to them. My Current Hybrid Approach: I am combining two of Apple's local machine learning technologies to solve this without sending data to the cloud: Step 1: Text-Based Layout Mapping (Vision Framework) I capture a screenshot of the targeted window using Quartz Window Services and run a local VNRecognizeTextRequest to extract coordinates for all text labels. This works exceptionally well for text buttons like "OPTION" or "ABOUT". Step 2: Contextual & Non-Text Element Interpretation (FoundationModels Framework) For controls that lack text labels (such as blank step sequencer buttons, parameter knobs, or toggle light states), I pass the screenshot as an Attachment into the new local LanguageModelSession. I ask the model to ground coordinates relative to the text landmarks mapped in Step 1. Here is a simplified snippet of how I am feeding the visual context into the local model: import Foundation import FoundationModels import Cocoa func analyzePluginInterface(cgImage: CGImage) async { guard SystemLanguageModel.default.isAvailable else { print("Local model not downloaded or available.") return } let instructions = """ You are a screen-aware assistant. Your job is to locate GUI controls on a custom 1024x802 VST window. """ let session = LanguageModelSession(instructions: instructions) do { let response = try await session.respond { "Look at this screenshot of the VST window." Attachment(cgImage) "Locate the blank step-sequencer buttons located below the instrument channel labels." "What are the center coordinates (X, Y) for the first active step?" } print("Model Grounding Output: \(response.content)") } catch { print("Inference failed: \(error)") } } My Questions for the Community: Performance & Latency: The local LanguageModelSession.respond call takes several seconds to run on device. For real-time DAW automation, this is a bottleneck. Has anyone experimented with using a custom LoRA adapter or a smaller model profile to speed up spatial coordinate inference? Coordinate Stability: Multimodal models can sometimes hallucinate coordinates (bounding box values). What strategies are you using to constrain the model output to precise pixel boundaries on varying display scaling configurations (Retina vs non-Retina)? Alternative Solutions: Are there newer on-device vision APIs (perhaps in CoreML or Vision) that are better suited for bounding-box grounding of abstract graphics (like dials/knobs) than a general language model session? Would love to hear how others are approaching screen-aware GUI interpretation with these new frameworks! Thanks!
0
0
48
1w
Clarifying the "Weight List"
In the WWDC26 AI Group Lab, it was mentioned as a 'spoiler alert' that the 'weight list applies only to Siri' and not to the Private Cloud Compute (PCC) language model . Could you clarify if there is a technical path for a developer’s custom adapter—running via the Language Model Protocol—to ever be added to this weight list to handle system-originated Siri requests?
0
0
24
1w
The standalone Siri app and cross-surface continuity
The new standalone Siri app keeps conversation history synced via iCloud across iPhone, iPad, and Mac. Can third-party content, results, or an app's agent surface appear inside the Siri app (e.g., as referenced sources or follow-up actions), and can the user deep-link from a Siri-app result back into the originating app with state intact? Is any conversation context from the Siri app exposed to a developer's intent when an action is invoked, so the app can act with the relevant context, and what are the privacy boundaries on that? When the same action is invoked from different surfaces (in-app, system Siri, the Siri app) and across synced devices, how should developers reason about execution location and idempotency to avoid duplicate side effects?
0
0
13
1w
Foundation Models framework dyld symbol errors after macOS 26 Beta 2 - LanguageModelSession constructor missing
Foundation Models framework worked perfectly on macOS 26 Beta 2, but starting from Beta 3 and continuing through Beta 6 (latest), I get dyld symbol errors even with the exact code from Apple's documentation. Environment: macOS 26.0 Beta 6 (25A5351b) Xcode 26 Beta 6 M4 Max MacBook Pro Apple Intelligence enabled and downloaded Error Details: dyld[Process]: Symbol not found: _$s16FoundationModels20LanguageModelSessionC5model10guardrails5tools12instructionsAcA06SystemcD0C_AC10GuardrailsVSayAA4Tool_pGAA12InstructionsVSgtcfC Referenced from: /path/to/app.debug.dylib Expected in: /System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels Code Used (Exact from Documentation): import FoundationModels // This worked on Beta 2, crashes on Beta 3+ let model = SystemLanguageModel.default let session = LanguageModelSession(model: model) let response = try await session.respond(to: "Hello") What I've Verified: FoundationModels.framework exists in /System/Library/Frameworks/ Framework is properly linked in Xcode project Apple Intelligence is enabled and working Same code works in older beta versions Issue persists even with completely fresh Xcode projects Analysis: The dyld error suggests the LanguageModelSession(model:) constructor is missing. The symbol shows it's looking for a constructor with parameters (model:guardrails:tools:instructions:), but the documentation still shows the simple (model:) constructor. Questions: Has the LanguageModelSession API changed since Beta 2? Should we now use the constructor with guardrails/tools/instructions parameters? Is this a known issue with recent betas? Are there updated code samples for the current API? Additional Context: This affects both basic SystemLanguageModel usage AND custom adapter loading. The same dyld symbol errors occur when trying to create SystemLanguageModel(adapter: adapter) as well. Any guidance on the correct API usage for current betas would be greatly appreciated. The documentation appears to be out of sync with the actual framework implementation.
Replies
1
Boosts
0
Views
723
Activity
Sep ’25
SkillActivation Framework Fails to Build in Xcode 26 When Using foundation-models-utilities
Hi Apple Team, I'm trying to use the SkillActivation framework from the Foundation Models Utilities repository: https://github.com/apple/foundation-models-utilities Environment: Xcode 26 Beta iPadOS/macOS 26 Beta Apple Intelligence enabled Foundation Models Utilities: latest version from GitHub Issue: As soon as I import or use SkillActivation-related APIs, Xcode reports build errors and the project fails to compile. The rest of the Foundation Models framework works correctly, but the problem appears specifically when SkillActivation is added. Steps to Reproduce: Create a new project. Add foundation-models-utilities via Swift Package Manager. Import SkillActivation / follow the sample implementation. Build the project. Expected Result: The project should compile successfully and SkillActivation should be available. Actual Result: Xcode reports compilation errors and the build fails. Questions: Is there any additional entitlement, capability, or configuration required for SkillActivation? Is SkillActivation currently supported in Xcode 26 Beta? Are there any known issues with the current version of foundation-models-utilities? Thank you.
Replies
2
Boosts
0
Views
66
Activity
5d
Structured intents vs free-form queries
For voice assistants with many capabilities, is it better to ship one generic ‘ask assistant’ intent with a natural-language parameter, or many typed intents (GetForecast, CompareLocations, etc.)? What are Siri’s limits on disambiguation and follow-up turns?
Replies
1
Boosts
0
Views
48
Activity
1w
Mixed languages and foreign proper nouns
If the user’s device language is French but they speak English, or they use one language for the sentence and another for proper nouns, how does Siri handle transcription and entity resolution? Do we need per-locale entity indexing, aliases, or can semantic indexing work across languages?
Replies
0
Boosts
0
Views
26
Activity
1w
Context Size Error But Size is Less Than Limit
Seeing this error from time to time: Context(debugDescription: "Content contains 4089 tokens, which exceeds the maximum allowed context size of 4096.", underlyingErrors: []) Of course, 4089 is less than 4096 so what is this telling me and how do I work around it? Is the limit actually lower than 4096?
Replies
2
Boosts
0
Views
259
Activity
Sep ’25
FoundationModels tool calling doesn't get triggered
In the play ground I'm trying to bias my LanguageModel to use a tool I registered, but I don't see it actually calling the tool. I'm following the developer video on landmarks itinerary generation tutorial almost verbatim. Is this a prompt engineering thing I'm missing? Or is it possible that I'm injecting my tool wrong?
Replies
1
Boosts
0
Views
307
Activity
Jul ’25
Visual Intelligence and screen/camera understanding for third-party apps
Visual Intelligence lets users ask Siri about what the camera or screen shows, and the screenshot tool can extract structured data into system apps. Can a third-party app contribute results or actions when the user invokes Visual Intelligence over the app's own content or a screenshot of it (analogous to how a schedule becomes calendar events), and what API surfaces that? For the Image Playground API, what are the content, rate, and style constraints, and can generated assets be used in commercial app contexts? Is there a supported way for an app to provide its own visual understanding to the system rather than relying solely on Apple's model — for domain-specific imagery the on-device model may not recognize?
Replies
1
Boosts
0
Views
84
Activity
1w
Questions About Apple Foundation Models, Context Window Limits and the New Core AI Framework
After reviewing the WWDC sessions on Foundation Models and Core AI, I had a few questions around the practical limits and architectural direction of the platform. From my understanding, on-device Foundation Models remain optimized for privacy, latency, and efficiency, which naturally introduces constraints around context length and agent complexity. Has anything changed regarding the effective context window available to developers, or should we still design around similar context-management constraints as before? Core AI appears to introduce a more structured approach to building AI-powered applications. For developers building sophisticated assistants, how should we think about the boundary between application-level orchestration and framework-level orchestration? For example, are advanced patterns such as sub-agents, hierarchical planning, dynamic tool availability, and workflow decomposition expected to remain developer-managed, or are these areas Core AI aims to support more directly over time? I am also curious about Apple's vision for model interoperability. While Foundation Models provide an excellent on-device experience, many production-grade agent systems combine multiple specialized models for planning, reasoning, retrieval, and execution. Does Apple envision future pathways for integrating external models into Core AI driven workflows while maintaining the privacy and performance principles of the platform? Finally, for teams pushing the limits of on-device AI assistants, what architectural patterns do you recommend for handling long-horizon tasks, large context requirements, evolving toolsets, and multi-step reasoning within the current Foundation Models ecosystem?
Replies
0
Boosts
0
Views
57
Activity
1w
Plenty of LanguageModelSession.GenerationError.refusal errors after 26.4 update
Hello! After the 26.4 update I get a huge number of LanguageModelSession.GenerationError.refusal errors when using guided generation Generables for inexplicable reasons. Such errors also occur, if I want to cast a response to boolean by using 'generating: Bool.self'. The explanation generated on the grounds of the error always looks like this: Response(userPrompt: "", duration: 0.230917542, promptTokenCount: Optional(66), responseTokenCount: Optional(11), feedbackAttachment: nil, content: "I apologize, but I cannot fulfill this request.", rawContent: "I apologize, but I cannot fulfill this request.", transcriptEntries: ArraySlice([])) All the prompts and Generables I use are definitely not profane. Before 26.4 such errors on the same prompts and Generables never occurred. The 26.4 update rendered those features unusable to me. Is this a known bug or what am I doing wrong?
Replies
3
Boosts
0
Views
736
Activity
Mar ’26
MLX,MLX LM, MLX LM Server -> Is there a bootstrap repo?
theres a MLX, a MLX LM and a MLX LM Server mentioned. Is there a Bootstrap GitHub repo out there that can be used to directly, and quickly, set up an example of this, without the hassle of setting up, kind of like a bootstrap for "us mere mortals"? And what is the feasibility of using these on a M3 Pro with 18Gb of memory? - can these be bounced between a local M3 Pro and a Tailscale-linked M2 Pro with 36Gb memory? Do both need to be on macOS27 for it to work?
Replies
1
Boosts
0
Views
71
Activity
1w
Speech generation by the new Foundation Model
During the Keynote (at 30m:20s) Craig Federighi mentions the second, "even more powerful version of our on-device model" and that this model lets supported products understand and generate speech. Is there any public API for generating speech using this model?
Replies
0
Boosts
0
Views
32
Activity
1w
Assert error breaking previews
A foundation models bug I keep running into when in the preview phase of the testing. The error never seems to occur or break the app when I am testing on the simulator or on a device but sometimes I am running into this error when in a longer session while being in preview. The error breaks the preview and crashes it and the waring on it is labeled as : "Assert in LanguageModelFeedback.swift" This is something I keep running into, where I have been using foundation models for my project
Replies
2
Boosts
0
Views
649
Activity
Feb ’26
ANE Performance for on-device Foundation model
I'm running MacOs 26 Beta 5. I noticed that I can no longer achieve 100% usage on the ANE as I could before with Apple Foundations on-device model. Has Apple activated some kind of throttling or power limiting of the ANE? I cannot get above 3w or 40% usage now since upgrading. I'm on the high power energy mode. I there an API rate limit being applied? I kave a M4 Pro mini with 64 GB of memory.
Replies
0
Boosts
0
Views
381
Activity
Aug ’25
Context window 90% of adapter model full after single user prompt
I have been able to train an adapter on Google's Colaboratory. I am able to start a LanguageModelSession and load it with my adapter. The problem is that after one simple prompt, the context window is 90% full. If I start the session without the adapter, the same simple prompt consumes only 1% of the context window. Has anyone encountered this? I asked Claude AI and it seems to think that my training script needs adjusting. Grok on the other hand is (wrongly, I tried) convinced that I just need to tweak some parameters of LanguageModelSession or SystemLanguageModel. Thanks for any tips.
Replies
13
Boosts
0
Views
3.6k
Activity
Feb ’26
SpotlightSearchTool arguments: description vs. JSON Schema mismatch → “Failed to parse generated content”
Using SpotlightSearchTool with a custom LanguageModel backend (Apple’s ChatCompletionsLanguageModel from foundation-models-utilities, pointed at an OpenAI-compatible server), every tool call fails with ToolCallError → "Failed to parse generated content." The model follows the tool’s documented "Call format" and emits { root, modelComposition, … }. But the generated parameters schema (FullArguments) requires { "query": { "type": "search", "value": { root, modelComposition, … } } }. Query is a QueryType union and a search must be wrapped in DiscriminatedSearch. Wrapping the args manually makes it parse and search correctly. So the description omits the query + type:"search" envelope the schema demands, which makes the tool uninvokable by any model that follows the documentation (it presumably works only with the on-device model trained on the real format). Is this a known issue / intended? Anyone gotten SpotlightSearchTool working with a non-Apple model? Secondary: CoreSpotlightSource.fetchAttributes seems to have no effect on returned attributes. kMDItemDescription only comes back when the in-query fetchAttributes requests it. Bug or expected?
Replies
1
Boosts
0
Views
79
Activity
1w
Image size, format, and background vs other VLMs
With different VLMs supporting different size and background color if padding is needed… and iOS 27 AFM being the most flexible… the previous talk mentioned that the context size suffers for this flexibility… so what’s the best format/size/background for the app to pre-process to minimize token use… much thanks
Replies
0
Boosts
0
Views
55
Activity
1w
Approaching Custom VST GUI Automation: Combining local Vision OCR with the new FoundationModels framework for screen-grounding
Hello everyone, I’m working on a project to automate software controls inside non-standard macOS applications—specifically custom-drawn audio plugins (like the Roland TR-909 VST). The Challenge: These VST interfaces do not expose their buttons, knobs, or dials via the standard macOS Accessibility tree (NSAccessibility / event taps). Because they are custom-rendered, standard automation tools are blind to them. My Current Hybrid Approach: I am combining two of Apple's local machine learning technologies to solve this without sending data to the cloud: Step 1: Text-Based Layout Mapping (Vision Framework) I capture a screenshot of the targeted window using Quartz Window Services and run a local VNRecognizeTextRequest to extract coordinates for all text labels. This works exceptionally well for text buttons like "OPTION" or "ABOUT". Step 2: Contextual & Non-Text Element Interpretation (FoundationModels Framework) For controls that lack text labels (such as blank step sequencer buttons, parameter knobs, or toggle light states), I pass the screenshot as an Attachment into the new local LanguageModelSession. I ask the model to ground coordinates relative to the text landmarks mapped in Step 1. Here is a simplified snippet of how I am feeding the visual context into the local model: import Foundation import FoundationModels import Cocoa func analyzePluginInterface(cgImage: CGImage) async { guard SystemLanguageModel.default.isAvailable else { print("Local model not downloaded or available.") return } let instructions = """ You are a screen-aware assistant. Your job is to locate GUI controls on a custom 1024x802 VST window. """ let session = LanguageModelSession(instructions: instructions) do { let response = try await session.respond { "Look at this screenshot of the VST window." Attachment(cgImage) "Locate the blank step-sequencer buttons located below the instrument channel labels." "What are the center coordinates (X, Y) for the first active step?" } print("Model Grounding Output: \(response.content)") } catch { print("Inference failed: \(error)") } } My Questions for the Community: Performance & Latency: The local LanguageModelSession.respond call takes several seconds to run on device. For real-time DAW automation, this is a bottleneck. Has anyone experimented with using a custom LoRA adapter or a smaller model profile to speed up spatial coordinate inference? Coordinate Stability: Multimodal models can sometimes hallucinate coordinates (bounding box values). What strategies are you using to constrain the model output to precise pixel boundaries on varying display scaling configurations (Retina vs non-Retina)? Alternative Solutions: Are there newer on-device vision APIs (perhaps in CoreML or Vision) that are better suited for bounding-box grounding of abstract graphics (like dials/knobs) than a general language model session? Would love to hear how others are approaching screen-aware GUI interpretation with these new frameworks! Thanks!
Replies
0
Boosts
0
Views
48
Activity
1w
IPC error
While runninf Apple Foundation Model in iPhone simulator, I got this error: IPC error: Underlying connection interrupted What does this mean? Related to foundation model?
Replies
2
Boosts
0
Views
242
Activity
Jul ’25
Clarifying the "Weight List"
In the WWDC26 AI Group Lab, it was mentioned as a 'spoiler alert' that the 'weight list applies only to Siri' and not to the Private Cloud Compute (PCC) language model . Could you clarify if there is a technical path for a developer’s custom adapter—running via the Language Model Protocol—to ever be added to this weight list to handle system-originated Siri requests?
Replies
0
Boosts
0
Views
24
Activity
1w
The standalone Siri app and cross-surface continuity
The new standalone Siri app keeps conversation history synced via iCloud across iPhone, iPad, and Mac. Can third-party content, results, or an app's agent surface appear inside the Siri app (e.g., as referenced sources or follow-up actions), and can the user deep-link from a Siri-app result back into the originating app with state intact? Is any conversation context from the Siri app exposed to a developer's intent when an action is invoked, so the app can act with the relevant context, and what are the privacy boundaries on that? When the same action is invoked from different surfaces (in-app, system Siri, the Siri app) and across synced devices, how should developers reason about execution location and idempotency to avoid duplicate side effects?
Replies
0
Boosts
0
Views
13
Activity
1w