Recognize spoken words in recorded or live audio using Speech.

Posts under Speech tag

51 Posts

Post

Replies

Boosts

Views

Activity

Question on using Apple TTS voice (commercial use and license)
Apple provides a function to create TTS voice as a file in TTS. (AVSpeechUtterance/AVSpeechSynthesizer) Or, if the user records the video of TTS playback and uses that video I wonder what the scope of use is if I use this TTS voice to make YouTube, TikTok, or commercial videos. Is it impossible to use it commercially at all? Can I use it commercially with the source indicated? Can I use it commercially without a separate source indication? Is there a difference in commercial use license between Siri voices and regular TTS voices?
3
0
1.2k
2d
AVSpeechSynthesizer system voices (SLA clarification)
Hello, I am building an iOS-only, commercial app that uses AVSpeechSynthesizer with system voices, strictly using the APIs provided by Apple. Before distributing the app, I want to ensure that my current implementation does not conflict with the iOS Software License Agreement (SLA) and is aligned with Apple’s intended usage. For a better playback experience (more accurate estimation of utterance duration and smoother skip forward/backward during playback), I currently synthesize speech using: AVSpeechSynthesizer.write(_:toBufferCallback:) Converting the received AVAudioPCMBuffer buffers into audio data Storing the audio inside the app sandbox Playing it back using AVAudioPlayer / AVAudioEngine The cached audio is: Generated fully on-device using system voices Stored only inside the app’s private container Used only for internal playback controls (timeline, seek, skip ±5 seconds) Never shared, exported, uploaded, or exposed outside the app The alternative approaches would be: Keeping the generated audio entirely in memory (RAM) for playback purposes, without writing it to the file system at any point Or using AVSpeechSynthesizer.speak(_:) and playing speech strictly in real time which has a poorer user experience compared to my approach I have reviewed the current iOS Software License Agreement: https://www.apple.com/legal/sla/docs/iOS18_iPadOS18.pdf In particular, section (f) mentions restrictions around System Characters, Live Captions, and Personal Voice, including the following excerpt: “…use … only for your personal, non-commercial use… No other creation or use of the System Characters, Live Captions, or Personal Voice is permitted by this License, including but not limited to the use, reproduction, display, performance, recording, publishing or redistribution in a … commercial context.” I do not see a specific reference in the SLA to system text-to-speech voices used via AVSpeechSynthesizer, and I want to be certain that temporarily caching synthesized speech for internal, non-exported playback is acceptable in a commercial app. My question is: Is caching AVSpeechSynthesizer system-voice output inside the app sandbox for internal playback acceptable, or is Apple’s recommended approach to rely only on real-time playback (speak(_:)) or strictly in-memory buffering without file storage? If this question falls outside DTS technical scope and is instead a policy or licensing matter, I would appreciate guidance on the authoritative Apple documentation or the correct Apple team/contact. Thank you.
0
1
220
2d
SpeechAnalyzer error "asset not found after attempted download" for certain languages
I am trying to use the new SpeechAnalyzer framework in my Mac app, and am running into an issue for some languages. When I call AssetInstallationRequest.downloadAndInstall() for some languages, it throws an error: Error Domain=SFSpeechErrorDomain Code=1 "transcription.ar asset not found after attempted download." The ".ar" appears to be the language code, which in this case was Arabic. When I call AssetInventory.status(forModules:) before attempting the download, it is giving me a status of "downloading" (perhaps from an earlier attempt?). If this language was completely unsupported, I would expect it to return a status of "unsupported", so I'm not sure what's going on here. For other languages (Polish, for example) SpeechTranscriber.supportedLocale(equivalentTo:) is returning nil, so that seems like a clearly unsupported language. But I can't tell if the languages I'm trying, like Arabic, are supported and something is going wrong, or if this error represents something I can work around. Here's the relevant section of code. The error is thrown from downloadAndInstall(), so I never even get as far as setting up the SpeechAnalyzer itself. private func setUpAnalyzer() async throws { guard let sourceLanguage else { throw Error.languageNotSpecified } guard let locale = await SpeechTranscriber.supportedLocale(equivalentTo: Locale(identifier: sourceLanguage.rawValue)) else { throw Error.unsupportedLanguage } let transcriber = SpeechTranscriber(locale: locale, preset: .progressiveTranscription) self.transcriber = transcriber let reservedLocales = await AssetInventory.reservedLocales if !reservedLocales.contains(locale) && reservedLocales.count == AssetInventory.maximumReservedLocales { if let oldest = reservedLocales.last { await AssetInventory.release(reservedLocale: oldest) } } do { let status = await AssetInventory.status(forModules: [transcriber]) print("status: \(status)") if let installationRequest = try await AssetInventory.assetInstallationRequest(supporting: [transcriber]) { try await installationRequest.downloadAndInstall() } } ...
8
0
826
1w
SpeechAnalyzer > AnalysisContext lack of documentation
I'm using the new SpeechAnalyzer framework to detect certain commands and want to improve accuracy by giving context. Seems like AnalysisContext is the solution for this, but couldn't find any usage example. So I want to make sure that I'm doing it right or not. let context = AnalysisContext() context.contextualStrings = [ AnalysisContext.ContextualStringsTag("commands"): [ "set speed level", "set jump level", "increase speed", "decrease speed", ... ], AnalysisContext.ContextualStringsTag("vocabulary"): [ "speed", "jump", ... ] ] try await analyzer.setContext(context) With this implementation, it still gives outputs like "Set some speed level", "It's speed level", etc. Also, is it possible to make it expect number after those commands, in order to eliminate results like "set some speed level to" (instead of two).
0
0
256
2w
Can Critical Alerts Trigger Text-to-Speech and Vibration in Background & Terminated State?
Hello All, I want to implement Text-to-Speech (TTS) and vibration functionality when a push notification arrives. In my app, I am already using Critical Alerts, and the critical alert sound plays correctly in all app states. However, I need to confirm whether it is possible to trigger Text-to-Speech and custom vibration in all app states: Foreground Background Terminated (killed) state My Questions: Is it technically possible for iOS to run Text-to-Speech (using AVSpeechSynthesizer) when a critical alert notification arrives in background or terminated state? Is it possible to trigger custom vibration patterns from a critical alert when the app is not running? If yes, can someone please provide guidance or sample code on how to implement this? If no, can Apple explain the limitation or provide documentation confirming that TTS and vibration cannot be triggered in background/kill states? What works currently: TTS and vibration only work in foreground when the app is active. Critical alert sound works correctly in all states. I want confirmation on whether iOS supports background/terminated TTS and vibration, or if this is a platform restriction even when using Critical Alerts. Thank you!
1
0
314
Dec ’25
Video Audio + Speech To Text
Hello, I am wondering if it is possible to have audio from my AirPods be sent to my speech to text service and at the same time have the built in mic audio input be sent to recording a video? I ask because I want my users to be able to say "CAPTURE" and I start recording a video (with audio from the built in mic) and then when the user says "STOP" I stop the recording.
1
0
335
Dec ’25
[26] audioTimeRange would still be interesting for .volatileResults in SpeechTranscriber
So experimenting with the new SpeechTranscriber, if I do: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults], attributeOptions: [.audioTimeRange] ) only the final result has audio time ranges, not the volatile results. Is this a performance consideration? If there is no performance problem, it would be nice to have the option to also get speech time ranges for volatile responses. I'm not presenting the volatile text at all in the UI, I was just trying to keep statistics about the non-speech and the speech noise level, this way I can determine when the noise level falls under the noisefloor for a while. The goal here was to finalize the recording automatically, when the noise level indicate that the user has finished speaking.
6
0
705
Nov ’25
SpeechTranscriber not supported
I've tried SpeechTranscriber with a lot of my devices (from iPhone 12 series ~ iPhone 17 series) without issues. However, SpeechTranscriber.isAvailable value is false for my iPhone 11 Pro. https://developer.apple.com/documentation/speech/speechtranscriber/isavailable I'am curious why the iPhone 11 Pro device is not supported. Are all iPhone 11 series not supported intentionally? Or is there any problem with my specific device? I've also checked the supportedLocales, and the value is an empty array. https://developer.apple.com/documentation/speech/speechtranscriber/supportedlocales
4
0
622
Nov ’25
SpeechTranscriber supported Devices
I have the new iOS 26 SpeechTranscriber working in my application. The issue I am facing is how to determine if the device I am running on supports SpeechTranscriber. I was able to create code that tests if the device supports transcription but it takes a bit of time to run and thus the results are not available when the app launches. What I am looking for is a list of what iOS 26 devices it doesn't run on. I think its safe to assume any new devices will support it so if we can just have a list of what devices that can run iOS 26 and not able to do transcription it would be much faster for the app. I have determined it doesn't work on a SE 2nd Gen, it works on iPhone 12, SE 3rd Gen, iPhone 14 Pro, 15 Pro. As the SpeechTranscriber doesn't work in the simulator I can't determine that way. I have checked the docs and it doesn't list the devices it doesn't work on.
1
0
374
Nov ’25
SpeechAnalyzer speech to text wwdc sample app
I am using the sample app from: https://developer.apple.com/videos/play/wwdc2025/277/?time=763 I installed this on an Iphone 15 Pro with iOS 26 beta 1. I was able to get good transcription with it. The app did crash sometimes when transcribing and I was going to post here with the details. I then installed iOS beta 2 and uninstalled the sample app. Now every time I try to run the sample app on the 15 Pro I get this message: SpeechAnalyzer: Input loop ending with error: Error Domain=SFSpeechErrorDomain Code=10 "Cannot use modules with unallocated locales [en_US (fixed en_US)]" UserInfo={NSLocalizedDescription=Cannot use modules with unallocated locales [en_US (fixed en_US)]} I can't continue our our work towards using SpeechAnalyzer now with this error. I have set breakpoints on all the catch handlers and it doesn't catch this error. My phone region is "United States"
21
8
1.8k
Nov ’25
Question about SpeechTranscriber availability on specific iPad models
Hello, I am a developer planning to build an application using Apple's new SpeechTranscriber technology. I am facing an issue where SpeechTranscriber is not available on my iPad Pro (11-inch, 2nd generation, model number: MXDC2J/A), even though I have updated it to iPadOS 26. I was under the impression that SpeechTranscriber would be available on any device running iPadOS 26. Could you please clarify if this is incorrect? Furthermore, I am planning to purchase a new iPad with an A16 chip for the development and deployment of this application. Can you confirm if SpeechTranscriber will be fully functional on an iPad equipped with the A16 chip? Thank you for your assistance.
3
0
228
Nov ’25
Speech Recognition Problem in iOS 18.0
It looks like Apple has added some new API(s) to SFSpeechRecognition My app, which is currently listed on App Store does feature speech recognition. Yet, trying to use it under iOS 18.0 throws errors: -[SFSpeechRecognitionTask localSpeechRecognitionClient:speechRecordingDidFail:]_block_invoke Ignoring subsequent local speech recording error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)" What happens is that after several words are transcribed and displayed, the next sentence results in previous words disappearance. That's probably what that portion of the error text - "Ignoring subsequent local speech recording error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)" means. The problem occurs ONLY when the app is running under iOS 18.0 Even when it's compiled in Xcode 16.0 using iOS 17.5 everything works fine. Any suggestions?
40
9
8.1k
Oct ’25
How to record voice, auto-transcribe, translate (auto-detect input language), and play back translated audio on same device in iOS Swift?
Hi everyone 👋 I’m building an iOS app in Swift where I want to do the following: Record the user’s voice Transcribe the spoken sentence (speech-to-text) Auto-detect the spoken language Translate it to another language selected by the user (e.g., English → Spanish or Hindi → English) Speak back (text-to-speech) the translated text on the same device Is this possible to record via phone mic and play the transcribe voice into headphone's audio?
0
0
223
Oct ’25
Improving Speech Analyzer Transcription for technical terms
I am developing an app with transcription and I am exploring ways to improve the transcription from the SpeechAnalyzer/Transcriber for technical terms. SFSpeech... recognition had the capability of being augmented by contextualStrings. Does something similar exist for SpeechAnalyzer/Transcriber? If so please point me towards the documentation and any sample code that may exist for this. If there are other options, please let me know.
1
1
243
Sep ’25
Clarification on SFSpeechRecognizer system alert message and service URLs for whitelisting
Hello Apple Engineers, I am developing a feature related to SpeechRecognizer (import Speech), and I have two questions: After adding the NSSpeechRecognitionUsageDescription key in my Info.plist, when I initialize an SFSpeechRecognizer instance, the system shows an authorization alert. The alert contains a pre-defined message from Apple: “Speech data from this app will be sent to Apple to process your requests. This will also help Apple improve its speech recognition technology.” Is it possible to remove or customize this message? My app runs in a network environment with a whitelist. I need to know which URL the SFSpeechRecognizer instance connects to, and which port it uses, so that I can add it to the whitelist. Thank you very much for your support! Best regards, Yu Cheng
0
0
83
Sep ’25
Failed to change the TTS language to CN or TW
I have some question about the TTS My device default language is zh-HK. (cantonese) my device is iPhone 16 Pro, IOS 18.6 I create a function speakMandarin I want the device to speak the zh-CN (putonghua) however the device only can speak zh-HK (cantonese). I already set the AVSpeechSynthesisVoice language as zh-CN func speakMandarin(text: String) { print("speakMandarin, \(text)") lastError = nil // Reset error // Stop any ongoing speech before starting new if synthesizer.isSpeaking { synthesizer.stopSpeaking(at: .immediate) } // Configure speech utterance let utterance = AVSpeechUtterance(ssmlRepresentation: text)! utterance.rate = 0.5 // Natural speaking speed utterance.pitchMultiplier = 1.0 utterance.volume = 1.0 utterance.voice = AVSpeechSynthesisVoice(language: "zh-CN") let preferredLanguages = [ "zh-CN" , "zh-TW"] var selectedVoice: AVSpeechSynthesisVoice? for lang in preferredLanguages { if let voice = AVSpeechSynthesisVoice(language: lang) { print(lang) selectedVoice = voice utterance.voice = voice break } } // If no Mandarin voice found, use system default if selectedVoice == nil { selectedVoice = AVSpeechSynthesisVoice(language: nil) lastError = "未偵測到普通話語音包,將使用系統預設語音" print (lastError) } utterance.voice = selectedVoice print(utterance) synthesizer.speak(utterance) } here is my log speakMandarin, <speak>你好!我們來聊聊喜歡的動物吧。你喜歡什麼動物呢?</speak> zh-CN [AVSpeechUtterance 0x1194efb80] String: 你好!我們來聊聊喜歡的動物吧。你喜歡什麼動物呢? Voice: [AVSpeechSynthesisVoice 0x104ceff90] Language: zh-CN, Name: Tingting, Quality: Default [com.apple.voice.compact.zh-CN.Tingting] Rate: 0.50 Volume: 1.00 Pitch Multiplier: 1.00 Delays: Pre: 0.00(s) Post: 0.00(s)
0
0
257
Sep ’25
Japanese “Hattori” TTS voice missing from Settings > General > Read & Speak > Voices > Japanese on iOS 26
Japanese “Hattori” TTS voice missing from Settings > General > Read & Speak > Voices > Japanese on iOS 26 Steps: Open the path above → “Hattori” is not listed and cannot be downloaded Expected: Hattori is available to download and select Actual: Hattori is absent from the catalog Regression: Was available on iOS 18.x on the same device
0
0
405
Sep ’25
Play Audio and Recognize Speech in Car
Hello, I'm trying to determine the best/recommended AVAudioSession configuration (i.e category, mode, and options) for the following use-case. Essentially, I'd like to switch between periods of playing an audio file and then recognizing speech. The audio file is typically speech and I don't intend for playback and speech recognition to occur simultaneously. I'd like for the user to sill be able to interact with Siri and I'd like for it to work with CarPlay where navigation prompts can occur. I would assume the category to use is 'playAndRecord', but I'm not sure if it's better to just set that once for the entire lifecycle, or set to 'playback' for audio file playback and then switch to 'playAndRecord' for speech recognition . I'm also not sure on the best 'mode' and 'options' to set. Any suggestions would be appreciated. Thanks.
0
0
509
Sep ’25