Recognize spoken words in recorded or live audio using Speech.

Posts under Speech tag

59 Posts

Post

Replies

Boosts

Views

Activity

How to record voice, auto-transcribe, translate (auto-detect input language), and play back translated audio on same device in iOS Swift?
Hi everyone 👋 I’m building an iOS app in Swift where I want to do the following: Record the user’s voice Transcribe the spoken sentence (speech-to-text) Auto-detect the spoken language Translate it to another language selected by the user (e.g., English → Spanish or Hindi → English) Speak back (text-to-speech) the translated text on the same device Is this possible to record via phone mic and play the transcribe voice into headphone's audio?
0
0
122
1w
Question about SpeechTranscriber availability on specific iPad models
Hello, I am a developer planning to build an application using Apple's new SpeechTranscriber technology. I am facing an issue where SpeechTranscriber is not available on my iPad Pro (11-inch, 2nd generation, model number: MXDC2J/A), even though I have updated it to iPadOS 26. I was under the impression that SpeechTranscriber would be available on any device running iPadOS 26. Could you please clarify if this is incorrect? Furthermore, I am planning to purchase a new iPad with an A16 chip for the development and deployment of this application. Can you confirm if SpeechTranscriber will be fully functional on an iPad equipped with the A16 chip? Thank you for your assistance.
2
0
93
2w
Improving Speech Analyzer Transcription for technical terms
I am developing an app with transcription and I am exploring ways to improve the transcription from the SpeechAnalyzer/Transcriber for technical terms. SFSpeech... recognition had the capability of being augmented by contextualStrings. Does something similar exist for SpeechAnalyzer/Transcriber? If so please point me towards the documentation and any sample code that may exist for this. If there are other options, please let me know.
1
0
196
2w
Clarification on SFSpeechRecognizer system alert message and service URLs for whitelisting
Hello Apple Engineers, I am developing a feature related to SpeechRecognizer (import Speech), and I have two questions: After adding the NSSpeechRecognitionUsageDescription key in my Info.plist, when I initialize an SFSpeechRecognizer instance, the system shows an authorization alert. The alert contains a pre-defined message from Apple: “Speech data from this app will be sent to Apple to process your requests. This will also help Apple improve its speech recognition technology.” Is it possible to remove or customize this message? My app runs in a network environment with a whitelist. I need to know which URL the SFSpeechRecognizer instance connects to, and which port it uses, so that I can add it to the whitelist. Thank you very much for your support! Best regards, Yu Cheng
0
0
41
3w
Failed to change the TTS language to CN or TW
I have some question about the TTS My device default language is zh-HK. (cantonese) my device is iPhone 16 Pro, IOS 18.6 I create a function speakMandarin I want the device to speak the zh-CN (putonghua) however the device only can speak zh-HK (cantonese). I already set the AVSpeechSynthesisVoice language as zh-CN func speakMandarin(text: String) { print("speakMandarin, \(text)") lastError = nil // Reset error // Stop any ongoing speech before starting new if synthesizer.isSpeaking { synthesizer.stopSpeaking(at: .immediate) } // Configure speech utterance let utterance = AVSpeechUtterance(ssmlRepresentation: text)! utterance.rate = 0.5 // Natural speaking speed utterance.pitchMultiplier = 1.0 utterance.volume = 1.0 utterance.voice = AVSpeechSynthesisVoice(language: "zh-CN") let preferredLanguages = [ "zh-CN" , "zh-TW"] var selectedVoice: AVSpeechSynthesisVoice? for lang in preferredLanguages { if let voice = AVSpeechSynthesisVoice(language: lang) { print(lang) selectedVoice = voice utterance.voice = voice break } } // If no Mandarin voice found, use system default if selectedVoice == nil { selectedVoice = AVSpeechSynthesisVoice(language: nil) lastError = "未偵測到普通話語音包,將使用系統預設語音" print (lastError) } utterance.voice = selectedVoice print(utterance) synthesizer.speak(utterance) } here is my log speakMandarin, <speak>你好!我們來聊聊喜歡的動物吧。你喜歡什麼動物呢?</speak> zh-CN [AVSpeechUtterance 0x1194efb80] String: 你好!我們來聊聊喜歡的動物吧。你喜歡什麼動物呢? Voice: [AVSpeechSynthesisVoice 0x104ceff90] Language: zh-CN, Name: Tingting, Quality: Default [com.apple.voice.compact.zh-CN.Tingting] Rate: 0.50 Volume: 1.00 Pitch Multiplier: 1.00 Delays: Pre: 0.00(s) Post: 0.00(s)
0
0
203
3w
SpeechAnalyzer speech to text wwdc sample app
I am using the sample app from: https://developer.apple.com/videos/play/wwdc2025/277/?time=763 I installed this on an Iphone 15 Pro with iOS 26 beta 1. I was able to get good transcription with it. The app did crash sometimes when transcribing and I was going to post here with the details. I then installed iOS beta 2 and uninstalled the sample app. Now every time I try to run the sample app on the 15 Pro I get this message: SpeechAnalyzer: Input loop ending with error: Error Domain=SFSpeechErrorDomain Code=10 "Cannot use modules with unallocated locales [en_US (fixed en_US)]" UserInfo={NSLocalizedDescription=Cannot use modules with unallocated locales [en_US (fixed en_US)]} I can't continue our our work towards using SpeechAnalyzer now with this error. I have set breakpoints on all the catch handlers and it doesn't catch this error. My phone region is "United States"
20
8
1.4k
3w
SpeechAnalyzer error "asset not found after attempted download" for certain languages
I am trying to use the new SpeechAnalyzer framework in my Mac app, and am running into an issue for some languages. When I call AssetInstallationRequest.downloadAndInstall() for some languages, it throws an error: Error Domain=SFSpeechErrorDomain Code=1 "transcription.ar asset not found after attempted download." The ".ar" appears to be the language code, which in this case was Arabic. When I call AssetInventory.status(forModules:) before attempting the download, it is giving me a status of "downloading" (perhaps from an earlier attempt?). If this language was completely unsupported, I would expect it to return a status of "unsupported", so I'm not sure what's going on here. For other languages (Polish, for example) SpeechTranscriber.supportedLocale(equivalentTo:) is returning nil, so that seems like a clearly unsupported language. But I can't tell if the languages I'm trying, like Arabic, are supported and something is going wrong, or if this error represents something I can work around. Here's the relevant section of code. The error is thrown from downloadAndInstall(), so I never even get as far as setting up the SpeechAnalyzer itself. private func setUpAnalyzer() async throws { guard let sourceLanguage else { throw Error.languageNotSpecified } guard let locale = await SpeechTranscriber.supportedLocale(equivalentTo: Locale(identifier: sourceLanguage.rawValue)) else { throw Error.unsupportedLanguage } let transcriber = SpeechTranscriber(locale: locale, preset: .progressiveTranscription) self.transcriber = transcriber let reservedLocales = await AssetInventory.reservedLocales if !reservedLocales.contains(locale) && reservedLocales.count == AssetInventory.maximumReservedLocales { if let oldest = reservedLocales.last { await AssetInventory.release(reservedLocale: oldest) } } do { let status = await AssetInventory.status(forModules: [transcriber]) print("status: \(status)") if let installationRequest = try await AssetInventory.assetInstallationRequest(supporting: [transcriber]) { try await installationRequest.downloadAndInstall() } } ...
5
0
204
4w
Japanese “Hattori” TTS voice missing from Settings > General > Read & Speak > Voices > Japanese on iOS 26
Japanese “Hattori” TTS voice missing from Settings > General > Read & Speak > Voices > Japanese on iOS 26 Steps: Open the path above → “Hattori” is not listed and cannot be downloaded Expected: Hattori is available to download and select Actual: Hattori is absent from the catalog Regression: Was available on iOS 18.x on the same device
0
0
342
4w
Speech Recognition Problem in iOS 18.0
It looks like Apple has added some new API(s) to SFSpeechRecognition My app, which is currently listed on App Store does feature speech recognition. Yet, trying to use it under iOS 18.0 throws errors: -[SFSpeechRecognitionTask localSpeechRecognitionClient:speechRecordingDidFail:]_block_invoke Ignoring subsequent local speech recording error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)" What happens is that after several words are transcribed and displayed, the next sentence results in previous words disappearance. That's probably what that portion of the error text - "Ignoring subsequent local speech recording error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)" means. The problem occurs ONLY when the app is running under iOS 18.0 Even when it's compiled in Xcode 16.0 using iOS 17.5 everything works fine. Any suggestions?
39
9
7.5k
Sep ’25
Play Audio and Recognize Speech in Car
Hello, I'm trying to determine the best/recommended AVAudioSession configuration (i.e category, mode, and options) for the following use-case. Essentially, I'd like to switch between periods of playing an audio file and then recognizing speech. The audio file is typically speech and I don't intend for playback and speech recognition to occur simultaneously. I'd like for the user to sill be able to interact with Siri and I'd like for it to work with CarPlay where navigation prompts can occur. I would assume the category to use is 'playAndRecord', but I'm not sure if it's better to just set that once for the entire lifecycle, or set to 'playback' for audio file playback and then switch to 'playAndRecord' for speech recognition . I'm also not sure on the best 'mode' and 'options' to set. Any suggestions would be appreciated. Thanks.
0
0
448
Sep ’25
Crash when testing Speech sample app with FoundationModels on macOS 26.0 beta and iOS 26.0 beta
Hello, I am testing the sample project provided here: Bringing advanced speech-to-text capabilities to your app. On both macOS 26.0 beta and iOS 26.0 beta, the app crashes immediately on launch with a dyld "Symbol not found" error related to FoundationModels.framework. It feels like this may be related to testing primarily on newer Apple Silicon devices, as I am seeing consistent crashes on an Intel MacBook and on an older iPhone device. I would appreciate any insight, confirmation, or guidance on whether this is a known limitation or if there is a workaround. Is it planned to be resolved soon? Environment macOS: Device: MacBook Pro (Intel) Processor: 2 GHz Quad-Core Intel Core i5 Graphics: Intel Iris Plus Graphics 1536 MB Memory: 16 GB 3733 MHz LPDDR4X OS: macOS Tahoe Version 26.0 Beta (25A5338b) iOS: Device: iPhone 11 Model Number: MHDD3HN/A OS: iOS 26.0 Xcode: Version: 26.0 beta 3 (17A5276g) Crash (macOS) Abort signal received. Excerpt from crash dump: dyld`__abort_with_payload: 0x7ff80e3ad4a0 &lt;+0&gt;: movl $0x2000209, %eax 0x7ff80e3ad4a5 &lt;+5&gt;: movq %rcx, %r10 0x7ff80e3ad4a8 &lt;+8&gt;: syscall -&gt; 0x7ff80e3ad4aa &lt;+10&gt;: jae 0x7ff80e3ad4b4 Console: dyld[9819]: Symbol not found: _$s16FoundationModels20LanguageModelSessionC5model10guardrails5tools12instructionsAcA06SystemcD0C_AC10GuardrailsVSayAA4Tool_pGAA12InstructionsVSgtcfC Referenced from: /Users/userx/Library/Developer/Xcode/DerivedData/SwiftTranscriptionSampleApp-*/Build/Products/Debug/SwiftTranscriptionSampleApp.app/Contents/MacOS/SwiftTranscriptionSampleApp.debug.dylib Expected in: /System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels Crash (iOS) Abort signal received. Excerpt from crash dump: dyld`__abort_with_payload: 0x18f22b4b0 &lt;+0&gt;: mov x16, #0x209 0x18f22b4b4 &lt;+4&gt;: svc #0x80 -&gt; 0x18f22b4b8 &lt;+8&gt;: b.lo 0x18f22b4d8 Console dyld[2080]: Symbol not found: _$s16FoundationModels20LanguageModelSessionC5model10guardrails5tools12instructionsAcA06SystemcD0C_AC10GuardrailsVSayAA4Tool_pGAA12InstructionsVSgtcfC Referenced from: /private/var/containers/Bundle/Application/.../SwiftTranscriptionSampleApp.app/SwiftTranscriptionSampleApp.debug.dylib Expected in: /System/Library/Frameworks/FoundationModels.framework/FoundationModels Question Is this crash expected on Intel Macs and older iPhone models with the beta SDKs? Is there an official statement on whether macOS 26.x releases support Intel, or it exists only until macOS 26.1? Any suggested workarounds for testing this sample project on current hardware? Is this a known limitation for the 26.0 beta, and if so, should we expect a fix in 26.0 or only in subsequent releases? Attaching screenshots for reference. Thank you in advance.
4
0
485
Aug ’25
How to use the SpeechDetector Module
I am trying to use SpeechDetector Module in Speech framework along with SpeechTranscriber. and it is giving me an error Cannot convert value of type 'SpeechDetector' to expected element type 'Array.ArrayLiteralElement' (aka 'any SpeechModule') Below is how I am using it let speechDetector = Speech.SpeechDetector() let transcriber = SpeechTranscriber(locale: Locale.current, transcriptionOptions: [], reportingOptions: [.volatileResults], attributeOptions: [.audioTimeRange]) speechAnalyzer = try SpeechAnalyzer(modules: [transcriber,speechDetector])
4
2
322
Aug ’25
SensorKit Speech data question
We are a research team conducting a study collecting subject's SensorKit speech data, and we've encountered some questions we couldn't resolve ourselves or by looking up the online SensorKit documentation: Microphone Activation: In general, how is the microphone being turned on to capture a speech session? And how was each session determined to be an independent session? Negative Values: In the speech classification data, there are entries where some of the start and end values are negative (see screenshot below). How should we interpret and handle these values? Is it safe to filter them out? Duplicated sessions: From the same screenshot you can see there are multiple session identifiers linked to the same subject with the same timestamp - what does this represent? Another Negative Values: The same question for speech recognition data's average pause duration, what does the -1 mean and should we remove them as well? (Note that these screenshot got rid of subject IDs for privacy purposes but each screenshot was from one subject.) We greatly appreciate your time and help.
3
0
123
Aug ’25
[26] audioTimeRange would still be interesting for .volatileResults in SpeechTranscriber
So experimenting with the new SpeechTranscriber, if I do: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults], attributeOptions: [.audioTimeRange] ) only the final result has audio time ranges, not the volatile results. Is this a performance consideration? If there is no performance problem, it would be nice to have the option to also get speech time ranges for volatile responses. I'm not presenting the volatile text at all in the UI, I was just trying to keep statistics about the non-speech and the speech noise level, this way I can determine when the noise level falls under the noisefloor for a while. The goal here was to finalize the recording automatically, when the noise level indicate that the user has finished speaking.
4
0
308
Aug ’25
SpeechTranscriber/SpeechAnalyzer being relatively slow compared to FoundationModel and TTS
So, I've been wondering how fast a an offline STT -> ML Prompt -> TTS roundtrip would be. Interestingly, for many tests, the SpeechTranscriber (STT) takes the bulk of the time, compared to generating a FoundationModel response and creating the Audio using TTS. E.g. InteractionStatistics: - listeningStarted: 21:24:23 4480 2423 - timeTillFirstAboveNoiseFloor: 01.794 - timeTillLastNoiseAboveFloor: 02.383 - timeTillFirstSpeechDetected: 02.399 - timeTillTranscriptFinalized: 04.510 - timeTillFirstMLModelResponse: 04.938 - timeTillMLModelResponse: 05.379 - timeTillTTSStarted: 04.962 - timeTillTTSFinished: 11.016 - speechLength: 06.054 - timeToResponse: 02.578 - transcript: This is a test. - mlModelResponse: Sure! I'm ready to help with your test. What do you need help with? Here, between my audio input ending and the Text-2-Speech starting top play (using AVSpeechUtterance) the total response time was 2.5s. Of that time, it took the SpeechAnalyzer 2.1s to get the transcript finalized, FoundationModel only took 0.4s to respond (and TTS started playing nearly instantly). I'm already using reportingOptions: [.volatileResults, .fastResults] so it's probably as fast as possible right now? I'm just surprised the STT takes so much longer compared to the other parts (all being CoreML based, aren't they?)
2
0
519
Aug ’25
AVSpeechSynthesizer read Mandarin as Cantonese(iOS 26 beta 3))
In iOS 26, AVSpeechSynthesizer read Mandarin into Cantonese pronunciation. No matter how you set the language, and change the settings of my phone system, it doesn't work. let utterance = AVSpeechUtterance(string: "你好啊") //let voice = AVSpeechSynthesisVoice(language: "zh-CN") // not work let voice = AVSpeechSynthesisVoice(language: "zh-Hans") // not work too utterance.voice = voice et synth = AVSpeechSynthesizer() synth.speak(utterance)
1
0
213
Aug ’25
SpeechTranscriber extremely slow (14+ seconds) despite proper locale allocation and optimization
Using the official SwiftTranscriptionSampleApp from WWDC 2025, speech transcription takes 14+ seconds from audio input to first result, making it unusable for real-time applications. Environment iOS: 26.0 Beta Xcode: Beta 5 Device: iPhone 16 pro Sample App: Official Apple SwiftTranscriptionSampleApp from WWDC 2025 Configuration Tested Locale: en-US (properly allocated with AssetInventory.allocate(locale:)) and es-ES Setup: All optimizations applied (preheating, high priority, model retention) I started testing in my own app to replace SFSpeech API and include speech detection but after long fights with documentation (this part is quite terrible TBH) I tested the example (https://developer.apple.com/documentation/speech/bringing-advanced-speech-to-text-capabilities-to-your-app) and saw same results. I added some logs to check the specific time: 🎙️ [20:30:41.532] ✅ Analyzer started successfully - ready to receive audio! 🎙️ [20:30:41.532] Listening for transcription results... 🎙️ [20:30:56.342] 🚀 FIRST TRANSCRIPTION RESULT after 14.810s: 'Hello' (isFinal: false) Questions Is this expected performance for iOS 26 Beta, because old SFSpeech is far faster? Are there additional optimization steps for SpeechTranscriber? Should we expect significant performance improvements in later betas?
1
0
154
Aug ’25
SpeechTranscriber not providing audioTimeRange for most results
I started playing which transcription of audio files on macOS today, latest beta of Xcode and latest beta of Tahoe. Transcription itself works really well, but for some reason the majority of the results contain no audioTimeRange. I got 22 single-word results with time ranges, spread out all over total file of 53 minutes. Is there something I can do to improve this? To my understanding, I have followed sample code and instructions very closely, but the SwiftTranscriptionSampleApp and other examples I've seen lead me to believe I should be getting a lot more time ranges than I actually do.
3
0
104
Aug ’25