Apple Intelligence

RSS for tag

Apple Intelligence is the personal intelligence system that puts powerful generative models right at the core of your iPhone, iPad, and Mac and powers incredible new features to help users communicate, work, and express themselves.

Posts under Apple Intelligence subtopic

Post

Replies

Boosts

Views

Activity

Siri not calling my INExtension
Things I did: created an Intents Extension target added "Supported Intents" to both my main app target and the intent extension, with "INAddTasksIntent" and "INCreateNoteIntent" created the AppIntentVocabulary in my main app target created the handlers in the code in the Intents Extension target class AddTaskIntentHandler: INExtension, INAddTasksIntentHandling { func resolveTaskTitles(for intent: INAddTasksIntent) async -> [INSpeakableStringResolutionResult] { if let taskTitles = intent.taskTitles { return taskTitles.map { INSpeakableStringResolutionResult.success(with: $0) } } else { return [INSpeakableStringResolutionResult.needsValue()] } } func handle(intent: INAddTasksIntent) async -> INAddTasksIntentResponse { // my code to handle this... let response = INAddTasksIntentResponse(code: .success, userActivity: nil) response.addedTasks = tasksCreated.map { INTask( title: INSpeakableString(spokenPhrase: $0.name), status: .notCompleted, taskType: .completable, spatialEventTrigger: nil, temporalEventTrigger: intent.temporalEventTrigger, createdDateComponents: DateHelper.localCalendar().dateComponents([.year, .month, .day, .minute, .hour], from: Date.now), modifiedDateComponents: nil, identifier: $0.id ) } return response } } class AddItemIntentHandler: INExtension, INCreateNoteIntentHandling { func resolveTitle(for intent: INCreateNoteIntent) async -> INSpeakableStringResolutionResult { if let title = intent.title { return INSpeakableStringResolutionResult.success(with: title) } else { return INSpeakableStringResolutionResult.needsValue() } } func resolveGroupName(for intent: INCreateNoteIntent) async -> INSpeakableStringResolutionResult { if let groupName = intent.groupName { return INSpeakableStringResolutionResult.success(with: groupName) } else { return INSpeakableStringResolutionResult.needsValue() } } func handle(intent: INCreateNoteIntent) async -> INCreateNoteIntentResponse { do { // my code for handling this... let response = INCreateNoteIntentResponse(code: .success, userActivity: nil) response.createdNote = INNote( title: INSpeakableString(spokenPhrase: itemName), contents: itemNote.map { [INTextNoteContent(text: $0)] } ?? [], groupName: INSpeakableString(spokenPhrase: list.name), createdDateComponents: DateHelper.localCalendar().dateComponents([.day, .month, .year, .hour, .minute], from: Date.now), modifiedDateComponents: nil, identifier: newItem.id ) return response } catch { return INCreateNoteIntentResponse(code: .failure, userActivity: nil) } } } uninstalled my app restarted my physical device and simulator Yet, when I say "Remind me to buy dog food in Index" (Index is the name of my app), as stated in the examples of INAddTasksIntent, Siri proceeds to say that a list named "Index" doesn't exist in apple Reminders app, instead of processing the request in my app. Am I missing something?
1
0
103
4w
Parallel/Steam processing of Apple Intelligence
I have built a MAC-OS machine intelligence application that uses Apple Intelligence. A part of the application is to preprocess text. For longer text content I have implemented chunking to get around the token limit. However the application performance is now limited by the fact that Apple Intelligence is sequential in operation. This has a large impact on the application performance. Is there any approach to operate Apple Intelligence in a parallel mode or even a streaming interface. As Apple Intelligence has Private Cloud Services I was hoping to be able to send multiple chunks in parallel as that would significantly improve performance. Any suggestions would be welcome. This could also be considered a request for a future enhancement.
2
0
184
3w
CoreML Instrument Testing Native Clawbot using FM.SyML & OAIC & Diffusion
After running performance test on my CoreML qwen3 vision, I appreciated the update where results were viewable... ON Mac it mentions Ios18 and im not sure if or how to change.. that bottle neck lead to rebuilding CoreML view. I woke up and realized I have all the pieces together... and ended up with a swift package working demo of Clawbot.. the current issue is Im trying to use gguf 3b to code it.. I have become well aware that everything I create using the big models, they soon become the default themes /layouts for everyone else simply asking for this or that (I appoligise) so here I am asking (while looking to schedule meet with dev) if its possible to speak with anyone about th 1000s of Apple Intelligence PCC, Xcode, and vision reports and feedback ive sent , in terms of just general ways I can work more efficiently without the crash... ive already build a TUI for MLX but the tools for coreML while seems promising are not intuitive, but the vision format instruction was nice to see. Anyway my question is:
0
0
79
3w
AppIntent search schema opens app as only option
I am trying to use @AppIntent(schema: .system.search) to search in my app via a Siri voice command, but I want to be able to return a .result that does not open the app, yet still get the model training benefits from the schema. Very new to this, this is my first app, so I would appreciate some guidance. I haven't gotten to the voice part, I tested on Shortcuts. Do I need to do AppIntents without the schema and wait until there is a search schema that does not open the app, or should I be using a different schema? What am I missing?
2
0
549
1w
New project with new AppIntent throws build error
I opened a new project, iOS app, in XCode and then tabbed into the system_search snippet and built the project and got a build error. I can't imagine this was intended, at least not for new developers to the ecosystem like me. I solved it by tweaking a configuration I don't really understand advised here: https://github.com/apple/swift-openapi-generator/issues/796, hopefully that's a valid workaround
2
0
435
1w
Best approach for animating a speaking avatar in a macOS/iOS SwiftUI application
I am developing a macOS application using SwiftUI (with an iOS version as well). One feature we are exploring is displaying an avatar that reads or speaks dynamically generated text produced by an AI service. The basic flow would be: Text generated by an AI service Text converted to speech using a TTS engine An avatar (2D or 3D) rendered in the app that animates lip movement synchronized with the speech Ideally the avatar would render locally on the device. Questions: What Apple frameworks would be most appropriate for implementing a speaking avatar? SceneKit RealityKit SpriteKit (for 2D avatars) Is there any recommended way to drive lip-sync animation from speech audio using Apple frameworks? Does AVSpeechSynthesizer expose phoneme or viseme timing information that could be used for avatar animation? If such timing information is not available, what is the recommended approach for synchronizing character mouth animation with speech audio on macOS/iOS? Are there examples of real-time character animation synchronized with speech on macOS/iOS? Any architectural guidance or references would be greatly appreciated.
0
0
509
6d
Building Real-Time Voice Input on macOS 26 with SpeechAnalyzer + ScreenCaptureKit
We built an open-source macOS menu bar app that turns speech into text and pastes it into the active app — using SpeechAnalyzer for on-device transcription, ScreenCaptureKit + Vision for screen-aware context, and FluidAudio for speaker diarization in meeting mode. Here's what we learned shipping it on macOS 26. GitHub: github.com/Marvinngg/ambient-voice Architecture The app has two modes: hotkey dictation (press to talk, release to inject) and meeting recording (continuous transcription with a floating panel). Dictation Mode Audio capture uses AVCaptureSession (more on why below). The captured audio feeds into SpeechAnalyzer via an AsyncStream: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults, .alternativeTranscriptions], attributeOptions: [.audioTimeRange, .transcriptionConfidence] ) let analyzer = SpeechAnalyzer(modules: [transcriber]) let (inputSequence, inputBuilder) = AsyncStream.makeStream() try await analyzer.start(inputSequence: inputSequence) While recording, we capture a screenshot of the focused window using ScreenCaptureKit, run Vision OCR (VNRecognizeTextRequest), extract keywords, and inject them into SpeechAnalyzer as contextual bias: let context = AnalysisContext() context.contextualStrings[.general] = ocrKeywords try await analyzer.setContext(context) This improves accuracy for technical terms and proper nouns visible on screen. If your screen shows "SpeechAnalyzer", saying it out loud is more likely to be transcribed correctly. After transcription, an optional L2 step sends the text through a local LLM (ollama) for spoken-to-written cleanup, then CGEvent simulates Cmd+V to paste into the active app. Meeting Mode Meeting mode forks the same audio stream to two consumers: SpeechAnalyzer — real-time streaming transcription, displayed in a floating NSPanel FluidAudio buffer — accumulates 16kHz Float32 mono samples for batch speaker diarization after recording stops When the user ends the meeting, FluidAudio's performCompleteDiarization() runs on the accumulated audio. We align transcription segments with speaker segments using audioTimeRange overlap matching — each transcription segment gets assigned the speaker ID with the most time overlap. Results export to Markdown. Pitfalls We Hit on macOS 26 1. AVAudioEngine installTap doesn't fire with Bluetooth devices We started with AVAudioEngine.inputNode.installTap() for audio capture. It worked fine with built-in mics but the tap callback never fired with Bluetooth devices (tested with vivo TWS 4 Hi-Fi). Fix: switched to AVCaptureSession. The delegate callback captureOutput(_:didOutput:from:) fires reliably regardless of audio device. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step. 2. NSEvent addGlobalMonitorForEvents crashes Our global hotkey listener used NSEvent.addGlobalMonitorForEvents. On macOS 26, this crashes with a Bus error inside GlobalObserverHandler — appears to be a Swift actor runtime issue. Fix: switched to CGEventTap. Works reliably, but the callback runs on a CFRunLoop context, which Swift doesn't recognize as MainActor. 3. CGEventTap callbacks aren't on MainActor If your CGEventTap callback touches any @MainActor state, you'll get concurrency violations. The callback runs on whatever thread owns the CFRunLoop. Fix: bridge with DispatchQueue.main.async {} inside the tap callback before touching any MainActor state. 4. CGPreflightScreenCaptureAccess doesn't request permission We used CGPreflightScreenCaptureAccess() as a guard before calling ScreenCaptureKit. If it returned false, we'd bail out. The problem: this function only checks — it never triggers macOS to add your app to the Screen Recording permission list. Chicken-and-egg: you can't get permission because you never ask for it. Fix: call CGRequestScreenCaptureAccess() at app startup. This adds your app to System Settings → Screen Recording. Then let ScreenCaptureKit calls proceed without the preflight guard — SCShareableContent will also trigger the permission prompt on first use. 5. Ad-hoc signing breaks TCC permissions on every rebuild During development, codesign --sign - (ad-hoc) generates a different code directory hash on every build. macOS TCC tracks permissions by this hash, so every rebuild = new app identity = all permissions reset. Fix: sign with a stable certificate. If you have an Apple Development certificate, use that. The TeamIdentifier stays constant across rebuilds, so TCC permissions persist. We also discovered that launching via open WE.app (LaunchServices) instead of directly executing the binary is required — otherwise macOS attributes TCC permissions to Terminal, not your app. Benchmarks We ran end-to-end benchmarks on public datasets (Mac Mini M4 16GB, macOS 26): Transcription (SpeechAnalyzer, AliMeeting Chinese): • Near-field CER 34% (excluding outliers ~25%) • Far-field CER 40% (single channel, no beamforming, >30% overlap) • Processing speed 74-89x real-time Speaker diarization (FluidAudio offline): • AMI English 16 meetings: avg DER 23.2% (collar=0.25s, ignoreOverlap=True) • AliMeeting Chinese 8 meetings: DER 48.5% (including overlap regions) • Memory: RSS ~500MB, peak 730-930MB Full evaluation methodology, scripts, and raw results are in the repo. Open Source The project is MIT licensed: github.com/Marvinngg/ambient-voice It includes the macOS client (Swift 6.2, SPM), server-side distillation/training scripts (Python), and a complete evaluation framework with reproducible benchmarks. Feedback and contributions welcome.
0
0
93
22h
Programmatic image creation using ImageCreator
Hello, Could you please provide details for maximum string length of the prompt and the title when using ImageCreator and the method extracted(from:title:)? static func extracted( from text: String, title: String? = nil ) -> ImagePlaygroundConcept Any additional details or example of prompt and title would help. Additionally, are ImagePlaygroundStyle.animation, ImagePlaygroundStyle.illustration and ImagePlaygroundStyle.sketch all available when using extracted(from:title:)? I am trying to generate images programmatically and would appreciate your guidance. Thank you.
0
0
94
1d
Warming Up Apple Intelligence
Whats to code to warm it up once? Saw this in a developer video but cannot find it. Prevent cold run within an application. Thank you in advance!
Replies
1
Boosts
0
Views
153
Activity
Feb ’26
Siri not calling my INExtension
Things I did: created an Intents Extension target added "Supported Intents" to both my main app target and the intent extension, with "INAddTasksIntent" and "INCreateNoteIntent" created the AppIntentVocabulary in my main app target created the handlers in the code in the Intents Extension target class AddTaskIntentHandler: INExtension, INAddTasksIntentHandling { func resolveTaskTitles(for intent: INAddTasksIntent) async -> [INSpeakableStringResolutionResult] { if let taskTitles = intent.taskTitles { return taskTitles.map { INSpeakableStringResolutionResult.success(with: $0) } } else { return [INSpeakableStringResolutionResult.needsValue()] } } func handle(intent: INAddTasksIntent) async -> INAddTasksIntentResponse { // my code to handle this... let response = INAddTasksIntentResponse(code: .success, userActivity: nil) response.addedTasks = tasksCreated.map { INTask( title: INSpeakableString(spokenPhrase: $0.name), status: .notCompleted, taskType: .completable, spatialEventTrigger: nil, temporalEventTrigger: intent.temporalEventTrigger, createdDateComponents: DateHelper.localCalendar().dateComponents([.year, .month, .day, .minute, .hour], from: Date.now), modifiedDateComponents: nil, identifier: $0.id ) } return response } } class AddItemIntentHandler: INExtension, INCreateNoteIntentHandling { func resolveTitle(for intent: INCreateNoteIntent) async -> INSpeakableStringResolutionResult { if let title = intent.title { return INSpeakableStringResolutionResult.success(with: title) } else { return INSpeakableStringResolutionResult.needsValue() } } func resolveGroupName(for intent: INCreateNoteIntent) async -> INSpeakableStringResolutionResult { if let groupName = intent.groupName { return INSpeakableStringResolutionResult.success(with: groupName) } else { return INSpeakableStringResolutionResult.needsValue() } } func handle(intent: INCreateNoteIntent) async -> INCreateNoteIntentResponse { do { // my code for handling this... let response = INCreateNoteIntentResponse(code: .success, userActivity: nil) response.createdNote = INNote( title: INSpeakableString(spokenPhrase: itemName), contents: itemNote.map { [INTextNoteContent(text: $0)] } ?? [], groupName: INSpeakableString(spokenPhrase: list.name), createdDateComponents: DateHelper.localCalendar().dateComponents([.day, .month, .year, .hour, .minute], from: Date.now), modifiedDateComponents: nil, identifier: newItem.id ) return response } catch { return INCreateNoteIntentResponse(code: .failure, userActivity: nil) } } } uninstalled my app restarted my physical device and simulator Yet, when I say "Remind me to buy dog food in Index" (Index is the name of my app), as stated in the examples of INAddTasksIntent, Siri proceeds to say that a list named "Index" doesn't exist in apple Reminders app, instead of processing the request in my app. Am I missing something?
Replies
1
Boosts
0
Views
103
Activity
4w
Parallel/Steam processing of Apple Intelligence
I have built a MAC-OS machine intelligence application that uses Apple Intelligence. A part of the application is to preprocess text. For longer text content I have implemented chunking to get around the token limit. However the application performance is now limited by the fact that Apple Intelligence is sequential in operation. This has a large impact on the application performance. Is there any approach to operate Apple Intelligence in a parallel mode or even a streaming interface. As Apple Intelligence has Private Cloud Services I was hoping to be able to send multiple chunks in parallel as that would significantly improve performance. Any suggestions would be welcome. This could also be considered a request for a future enhancement.
Replies
2
Boosts
0
Views
184
Activity
3w
CoreML Instrument Testing Native Clawbot using FM.SyML & OAIC & Diffusion
After running performance test on my CoreML qwen3 vision, I appreciated the update where results were viewable... ON Mac it mentions Ios18 and im not sure if or how to change.. that bottle neck lead to rebuilding CoreML view. I woke up and realized I have all the pieces together... and ended up with a swift package working demo of Clawbot.. the current issue is Im trying to use gguf 3b to code it.. I have become well aware that everything I create using the big models, they soon become the default themes /layouts for everyone else simply asking for this or that (I appoligise) so here I am asking (while looking to schedule meet with dev) if its possible to speak with anyone about th 1000s of Apple Intelligence PCC, Xcode, and vision reports and feedback ive sent , in terms of just general ways I can work more efficiently without the crash... ive already build a TUI for MLX but the tools for coreML while seems promising are not intuitive, but the vision format instruction was nice to see. Anyway my question is:
Replies
0
Boosts
0
Views
79
Activity
3w
AppIntent search schema opens app as only option
I am trying to use @AppIntent(schema: .system.search) to search in my app via a Siri voice command, but I want to be able to return a .result that does not open the app, yet still get the model training benefits from the schema. Very new to this, this is my first app, so I would appreciate some guidance. I haven't gotten to the voice part, I tested on Shortcuts. Do I need to do AppIntents without the schema and wait until there is a search schema that does not open the app, or should I be using a different schema? What am I missing?
Replies
2
Boosts
0
Views
549
Activity
1w
New project with new AppIntent throws build error
I opened a new project, iOS app, in XCode and then tabbed into the system_search snippet and built the project and got a build error. I can't imagine this was intended, at least not for new developers to the ecosystem like me. I solved it by tweaking a configuration I don't really understand advised here: https://github.com/apple/swift-openapi-generator/issues/796, hopefully that's a valid workaround
Replies
2
Boosts
0
Views
435
Activity
1w
Best approach for animating a speaking avatar in a macOS/iOS SwiftUI application
I am developing a macOS application using SwiftUI (with an iOS version as well). One feature we are exploring is displaying an avatar that reads or speaks dynamically generated text produced by an AI service. The basic flow would be: Text generated by an AI service Text converted to speech using a TTS engine An avatar (2D or 3D) rendered in the app that animates lip movement synchronized with the speech Ideally the avatar would render locally on the device. Questions: What Apple frameworks would be most appropriate for implementing a speaking avatar? SceneKit RealityKit SpriteKit (for 2D avatars) Is there any recommended way to drive lip-sync animation from speech audio using Apple frameworks? Does AVSpeechSynthesizer expose phoneme or viseme timing information that could be used for avatar animation? If such timing information is not available, what is the recommended approach for synchronizing character mouth animation with speech audio on macOS/iOS? Are there examples of real-time character animation synchronized with speech on macOS/iOS? Any architectural guidance or references would be greatly appreciated.
Replies
0
Boosts
0
Views
509
Activity
6d
Building Real-Time Voice Input on macOS 26 with SpeechAnalyzer + ScreenCaptureKit
We built an open-source macOS menu bar app that turns speech into text and pastes it into the active app — using SpeechAnalyzer for on-device transcription, ScreenCaptureKit + Vision for screen-aware context, and FluidAudio for speaker diarization in meeting mode. Here's what we learned shipping it on macOS 26. GitHub: github.com/Marvinngg/ambient-voice Architecture The app has two modes: hotkey dictation (press to talk, release to inject) and meeting recording (continuous transcription with a floating panel). Dictation Mode Audio capture uses AVCaptureSession (more on why below). The captured audio feeds into SpeechAnalyzer via an AsyncStream: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults, .alternativeTranscriptions], attributeOptions: [.audioTimeRange, .transcriptionConfidence] ) let analyzer = SpeechAnalyzer(modules: [transcriber]) let (inputSequence, inputBuilder) = AsyncStream.makeStream() try await analyzer.start(inputSequence: inputSequence) While recording, we capture a screenshot of the focused window using ScreenCaptureKit, run Vision OCR (VNRecognizeTextRequest), extract keywords, and inject them into SpeechAnalyzer as contextual bias: let context = AnalysisContext() context.contextualStrings[.general] = ocrKeywords try await analyzer.setContext(context) This improves accuracy for technical terms and proper nouns visible on screen. If your screen shows "SpeechAnalyzer", saying it out loud is more likely to be transcribed correctly. After transcription, an optional L2 step sends the text through a local LLM (ollama) for spoken-to-written cleanup, then CGEvent simulates Cmd+V to paste into the active app. Meeting Mode Meeting mode forks the same audio stream to two consumers: SpeechAnalyzer — real-time streaming transcription, displayed in a floating NSPanel FluidAudio buffer — accumulates 16kHz Float32 mono samples for batch speaker diarization after recording stops When the user ends the meeting, FluidAudio's performCompleteDiarization() runs on the accumulated audio. We align transcription segments with speaker segments using audioTimeRange overlap matching — each transcription segment gets assigned the speaker ID with the most time overlap. Results export to Markdown. Pitfalls We Hit on macOS 26 1. AVAudioEngine installTap doesn't fire with Bluetooth devices We started with AVAudioEngine.inputNode.installTap() for audio capture. It worked fine with built-in mics but the tap callback never fired with Bluetooth devices (tested with vivo TWS 4 Hi-Fi). Fix: switched to AVCaptureSession. The delegate callback captureOutput(_:didOutput:from:) fires reliably regardless of audio device. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step. 2. NSEvent addGlobalMonitorForEvents crashes Our global hotkey listener used NSEvent.addGlobalMonitorForEvents. On macOS 26, this crashes with a Bus error inside GlobalObserverHandler — appears to be a Swift actor runtime issue. Fix: switched to CGEventTap. Works reliably, but the callback runs on a CFRunLoop context, which Swift doesn't recognize as MainActor. 3. CGEventTap callbacks aren't on MainActor If your CGEventTap callback touches any @MainActor state, you'll get concurrency violations. The callback runs on whatever thread owns the CFRunLoop. Fix: bridge with DispatchQueue.main.async {} inside the tap callback before touching any MainActor state. 4. CGPreflightScreenCaptureAccess doesn't request permission We used CGPreflightScreenCaptureAccess() as a guard before calling ScreenCaptureKit. If it returned false, we'd bail out. The problem: this function only checks — it never triggers macOS to add your app to the Screen Recording permission list. Chicken-and-egg: you can't get permission because you never ask for it. Fix: call CGRequestScreenCaptureAccess() at app startup. This adds your app to System Settings → Screen Recording. Then let ScreenCaptureKit calls proceed without the preflight guard — SCShareableContent will also trigger the permission prompt on first use. 5. Ad-hoc signing breaks TCC permissions on every rebuild During development, codesign --sign - (ad-hoc) generates a different code directory hash on every build. macOS TCC tracks permissions by this hash, so every rebuild = new app identity = all permissions reset. Fix: sign with a stable certificate. If you have an Apple Development certificate, use that. The TeamIdentifier stays constant across rebuilds, so TCC permissions persist. We also discovered that launching via open WE.app (LaunchServices) instead of directly executing the binary is required — otherwise macOS attributes TCC permissions to Terminal, not your app. Benchmarks We ran end-to-end benchmarks on public datasets (Mac Mini M4 16GB, macOS 26): Transcription (SpeechAnalyzer, AliMeeting Chinese): • Near-field CER 34% (excluding outliers ~25%) • Far-field CER 40% (single channel, no beamforming, >30% overlap) • Processing speed 74-89x real-time Speaker diarization (FluidAudio offline): • AMI English 16 meetings: avg DER 23.2% (collar=0.25s, ignoreOverlap=True) • AliMeeting Chinese 8 meetings: DER 48.5% (including overlap regions) • Memory: RSS ~500MB, peak 730-930MB Full evaluation methodology, scripts, and raw results are in the repo. Open Source The project is MIT licensed: github.com/Marvinngg/ambient-voice It includes the macOS client (Swift 6.2, SPM), server-side distillation/training scripts (Python), and a complete evaluation framework with reproducible benchmarks. Feedback and contributions welcome.
Replies
0
Boosts
0
Views
93
Activity
22h
Programmatic image creation using ImageCreator
Hello, Could you please provide details for maximum string length of the prompt and the title when using ImageCreator and the method extracted(from:title:)? static func extracted( from text: String, title: String? = nil ) -> ImagePlaygroundConcept Any additional details or example of prompt and title would help. Additionally, are ImagePlaygroundStyle.animation, ImagePlaygroundStyle.illustration and ImagePlaygroundStyle.sketch all available when using extracted(from:title:)? I am trying to generate images programmatically and would appreciate your guidance. Thank you.
Replies
0
Boosts
0
Views
94
Activity
1d