Recognize spoken words in recorded or live audio using Speech.

Posts under Speech tag

55 Posts
Sort by:

Post

Replies

Boosts

Views

Activity

In Speech framework is SFTranscriptionSegment timing supposed to be off and speechRecognitionMetadata nil until isFinal?
I'm working in Swift/SwiftUI, running XCode 16.3 on macOS 15.4 and I've seen this when running in the iOS simulator and in a macOS app run from XCode. I've also seen this behaviour with 3 different audio files. Nothing in the documentation says that the speechRecognitionMetadata property on an SFSpeechRecognitionResult will be nil until isFinal, but that's the behaviour I'm seeing. I've stripped my class down to the following: private var isAuthed = false // I call this in a .task {} in my SwiftUI View public func requestSpeechRecognizerPermission() { SFSpeechRecognizer.requestAuthorization { authStatus in Task { self.isAuthed = authStatus == .authorized } } } public func transcribe(from url: URL) { guard isAuthed else { return } let locale = Locale(identifier: "en-US") let recognizer = SFSpeechRecognizer(locale: locale) let recognitionRequest = SFSpeechURLRecognitionRequest(url: url) // the behaviour occurs whether I set this to true or not, I recently set // it to true to see if it made a difference recognizer?.supportsOnDeviceRecognition = true recognitionRequest.shouldReportPartialResults = true recognitionRequest.addsPunctuation = true recognizer?.recognitionTask(with: recognitionRequest) { (result, error) in guard result != nil else { return } if result!.isFinal { //speechRecognitionMetadata is not nil } else { //speechRecognitionMetadata is nil } } } } Further, and this isn't documented either, the SFTranscriptionSegment values don't have correct timestamp and duration values until isFinal. The values aren't all zero, but they don't align with the timing in the audio and they change to accurate values when isFinal is true. The transcription otherwise "works", in that I get transcription text before isFinal and if I wait for isFinal the segments are correct and speechRecognitionMetadata is filled with values. The context here is I'm trying to generate a transcription that I can then highlight the spoken sections of as audio plays and I'm thinking I must be just trying to use the Speech framework in a way it does not work. I got my concept working if I pre-process the audio (i.e. run it through until isFinal and save the results I need to json), but being able to do even a rougher version of it 'on the fly' - which requires segments to have the right timestamp/duration before isFinal - is perhaps impossible?
1
0
85
Jul ’25
SFSpeechRecognizer is not working inside visionOS 2.4 simulator
I know there has been issues with SFSpeechRecognizer in iOS 17+ inside the simulator. Running into issues with speech not being recognised inside the visionOS 2.4 simulator as well (likely because it borrows from iOS frameworks). Just wondering if anyone has any work arounds or advice for this simulator issue. I can't test on device because I don't have an Apple Vision Pro. Using Swift 6 on Xcode 16.3. Below are the console logs & the code that I am using. Console Logs BACKGROUND SPATIAL TAP (hit BackgroundTapPlane) SpeechToTextManager.startRecording() called [0x15388a900|InputElement #0|Initialize] Number of channels = 0 in AudioChannelLayout does not match number of channels = 2 in stream format. iOSSimulatorAudioDevice-22270-1: Abandoning I/O cycle because reconfig pending iOSSimulatorAudioDevice-22270-1: Abandoning I/O cycle because reconfig pending iOSSimulatorAudioDevice-22270-1: Abandoning I/O cycle because reconfig pending iOSSimulatorAudioDevice-22270-1: Abandoning I/O cycle because reconfig pending iOSSimulatorAudioDevice-22270-1: Abandoning I/O cycle because reconfig pending iOSSimulatorAudioDevice-22270-1: Abandoning I/O cycle because reconfig pending SpeechToTextManager.startRecording() completed successfully and recording is active. GameManager.onTapToggle received. speechToTextManager.isAvailable: true, speechToTextManager.isRecording: true GameManager received tap toggle callback. Tapped Object: None BACKGROUND SPATIAL TAP (hit BackgroundTapPlane) GESTURE MANAGER - User is already recording, stopping recording SpeechToTextManager.stopRecording() called GameManager.onTapToggle received. speechToTextManager.isAvailable: true, speechToTextManager.isRecording: false Audio data size: 134400 bytes Recognition task error: No speech detected <--- Code private(set) var isRecording: Bool = false private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest? private var recognitionTask: SFSpeechRecognitionTask? @MainActor func startRecording() async throws { logger.debug("SpeechToTextManager.startRecording() called") guard !isRecording else { logger.warning("Cannot start recording: Already recording.") throw AppError.alreadyRecording } currentTranscript = "" processingError = nil audioBuffer = Data() isRecording = true do { try await configureAudioSession() try await Task.detached { [weak self] in guard let self = self else { throw AppError.internalError(description: "SpeechToTextManager instance deallocated during recording setup.") } try await self.audioProcessor.configureAudioEngine() let (recognizer, request) = try await MainActor.run { () -> (SFSpeechRecognizer, SFSpeechAudioBufferRecognitionRequest) in guard let result = self.createRecognitionRequest() else { throw AppError.configurationError(description: "Speech recognition not available or SFSpeechRecognizer initialization failed.") } return result } await MainActor.run { self.recognitionRequest = request } await MainActor.run { self.recognitionTask = recognizer.recognitionTask(with: request) { [weak self] result, error in guard let self = self else { return } if let error = error { // WE ENTER INTO THIS BLOCK, ALWAYS self.logger.error("Recognition task error: \(error.localizedDescription)") self.processingError = .speechRecognitionError(description: error.localizedDescription) return } . . . } } . . . }.value } catch { . . . } } @MainActor func stopRecording() { logger.debug("SpeechToTextManager.stopRecording() called") guard isRecording else { logger.debug("Not recording, nothing to do") return } isRecording = false Task.detached { [weak self] in guard let self = self else { return } await self.audioProcessor.stopEngine() let finalBuffer = await self.audioProcessor.getAudioBuffer() await MainActor.run { self.recognitionRequest?.endAudio() self.recognitionTask?.cancel() } . . . } }
0
0
76
May ’25
Requesting user Permission for Speech Framework crashes visionOS simulator
When a new application runs on VisionOS 2.4 simulator and tries to access the Speech Framework, prompting a request for authorisation to use Speech Recognition, the application freezes. Using Swift 6. Report Identifier: FB17666252 @MainActor func checkAvailabilityAndPermissions() async { logger.debug("Checking speech recognition availability and permissions...") // 1. Verify that the speechRecognizer instance exists guard let recognizer = speechRecognizer else { logger.error("Speech recognizer is nil - speech recognition won't be available.") reportError(.configurationError(description: "Speech recognizer could not be created."), context: "checkAvailabilityAndPermissions") self.isAvailable = false return } // 2. Check recognizer availability (might change at runtime) if !recognizer.isAvailable { logger.error("Speech recognizer is not available for the current locale.") reportError(.configurationError(description: "Speech recognizer not available."), context: "checkAvailabilityAndPermissions") self.isAvailable = false return } logger.trace("Speech recognizer exists and is available.") // 3. Request Speech Recognition Authorization // IMPORTANT: Add `NSSpeechRecognitionUsageDescription` to Info.plist let speechAuthStatus = SFSpeechRecognizer.authorizationStatus() // FAILS HERE logger.debug("Current Speech Recognition authorization status: \(speechAuthStatus.rawValue)") if speechAuthStatus == .notDetermined { logger.info("Requesting speech recognition authorization...") // Use structured concurrency to wait for permission result let authStatus = await withCheckedContinuation { continuation in SFSpeechRecognizer.requestAuthorization { status in continuation.resume(returning: status) } } logger.debug("Received authorization status: \(authStatus.rawValue)") // Now handle the authorization result let speechAuthorized = (authStatus == .authorized) handleAuthorizationStatus(status: authStatus, type: "Speech Recognition") // If speech is granted, now check microphone if speechAuthorized { await checkMicrophonePermission() } } else { let speechAuthorized = (speechAuthStatus == .authorized) handleAuthorizationStatus(status: speechAuthStatus, type: "Speech Recognition") // If speech is already authorized, check microphone if speechAuthorized { await checkMicrophonePermission() } } }
1
0
77
May ’25
Speech Recognition Entitlement Not Appearing in App ID Capabilities
Hello, I recently enrolled in the Apple Developer Program and created an App ID with the bundle ID com.echo.eyes.voice. I am trying to enable Speech Recognition in the App ID capabilities list, but the option does not appear — even after waiting over a week since my membership was activated. I’ve already: Confirmed my Apple Developer account is active Checked the Identifiers section in the Developer portal Tried editing the App ID, but Speech Recognition is not listed Contacted both Developer Support and Developer Technical Support (Case #102594089120), but was told to post here for help My app uses Capacitor + the @capacitor-community/speech-recognition plugin. I need the com.apple.developer.speech-recognition entitlement to appear so I can use native voice input in iOS. I would really appreciate help from an Apple engineer or anyone who has faced this issue. Thank you, — Daniel Colyer
3
0
92
Jun ’25
save audio file in iOS 18 instead of iOS 12
I'm able to get text to speech to audio file using the following code for iOS 12 iPhone 8 to create a car file: audioFile = try AVAudioFile( forWriting: saveToURL, settings: pcmBuffer.format.settings, commonFormat: .pcmFormatInt16, interleaved: false) where pcmBuffer.format.settings is: [AVAudioFileTypeKey: kAudioFileMP3Type, AVSampleRateKey: 48000, AVEncoderBitRateKey: 128000, AVNumberOfChannelsKey: 2, AVFormatIDKey: kAudioFormatLinearPCM] However, this code does not work when I run the app in iOS 18 on iPhone 13 Pro Max. The audio file is created, but it doesn't sound right. It has a lot of static and it seems the speech is very low pitch. Can anyone give me a hint or an answer?
2
0
56
Mar ’25
Spotlight search by keywords setuped in NSUserActivity doesn't work
Hey there! I faced issue in iOS 18 and newer when Spotlight search doesn't show my App in results. In older versions it works. Here is my code: func configureUserActivitity(with id: String, keywords: [String]) { let activity = NSUserActivity(activityType: id) activity.contentAttributeSet = self.defaultAttributeSet activity.isEligibleForSearch = true activity.keywords = Set(keywords) activity.becomeCurrent() self.userActivity = activity } I didn't find any reasons why it doesn't work now. Maybe I should report a bug?
6
0
141
Apr ’25
Live Captions only partially works - help?
Hope it's okay to post here - I haven't gotten resolution anywhere else. Apple's iOs Live Captions is supposed to translate speech into written text either on the phone (works like a charm!) or via microphone (think meeting in a conference room). Microphone doesn't work anywhere, anytime on a new iPhone 14 purchased November 2024. Anyone out there want to fix this and help a lot of people who have trouble hearing? I'm part of an entire generation that didn't know we were supposed to protect our hearing at concerts and clubs and worse, thought it was cool to snag a spot by the speakers...
3
1
199
Mar ’25
Question on using Apple TTS voice (commercial use and license)
Apple provides a function to create TTS voice as a file in TTS. (AVSpeechUtterance/AVSpeechSynthesizer) Or, if the user records the video of TTS playback and uses that video I wonder what the scope of use is if I use this TTS voice to make YouTube, TikTok, or commercial videos. Is it impossible to use it commercially at all? Can I use it commercially with the source indicated? Can I use it commercially without a separate source indication? Is there a difference in commercial use license between Siri voices and regular TTS voices?
1
0
358
Mar ’25
Unclear interimResults Web Speech API implementation in Safari iOS (WebKit)
Hi all! I have been working on a web speech recognition service using the Web Speech API. This service is intended to work on smartphones, primarily Chrome on Android and Safari (or WebKit WebView) on iOS. In my specific use case, I need to set the properties continuous = true and interimResults = true. However, I have noticed that interimResults = true does not always work as expected in WebKit. I understand that this setting should provide fast, native, on-device speech recognition with isFinal = false. However, at times, the recognition becomes throttled and slow, yielding isFinal = true and switching to cloud-based recognition. To confirm whether the recognition is cloud-based, I tested it by disabling the internet connection before starting speech recognition. In some cases, recognition fails entirely, which suggests that requiresOnDeviceRecognition = false is being applied. (Reference: SFSpeechRecognitionRequest.requiresOnDeviceRecognition) I believe this is not the expected behavior when setting interimResults = true. I have researched the native services used by the Web Speech API on iOS devices, and the following links seem relevant: • SFSpeechRecognizer • SFSpeechRecognitionRequest.shouldReportPartialResults • SFSpeechRecognizer.supportsOnDeviceRecognition • Recognizing speech in live audio • Apple Developer Forums Discussion I found that setRequiresOnDeviceRecognition and setShouldReportPartialResults appear to be set correctly, but apparently, they do not work as expected: WebKit Source Code
0
0
333
Mar ’25
SFSpeechRecognizer throws User denied access to speech recognition
I have created an app where you can speak using SFSpeechRecognizer and it will recognize you speech into text, translate it and then return it back using speech synthesis. All locales for SFSpeechRecognizer and switching between them work fine when the app is in the foreground but after I turn off my screen(the app is still running I just turned off the screen) and try to create new recognitionTask it it receives this error inside the recognition task: User denied access to speech recognition. The weird thing about this is it only happens with some languages. The error happens with Croatian or Hungarian locale for speech recognition but doesn't with English or Spanish locale.
1
0
351
Mar ’25
How to uninstall/delete Voice Control on macOS?
How to uninstall/delete Voice Control on macOS so that I can test my app for the case when the initial use of Voice Control causes it to be downloaded from Apple? Is there a folder in the macOS System or Library to delete to force a re-download of Voice Control? My macOS app uses the older NSSpeechRecognizer to handle speech commands, but to use NSSpeechRecognizer required authorization via [SFSpeechRecognizer requestAuthorization...]. I do this and on a macOS system it can trigger a download of Voice Control, the macOS feature. An alert appears with: "A 390 MB download is required to use speech recognition features in MyApp. You may need to quit and open MyApp again after download completes."
0
0
308
Jan ’25
Am I allowed to use Speech framework on Swift Student Challenge?
Hello! I would like to use Speech Framework on my App Playground for this year challenge. But I still can't understand if I am allowed to use it to respect the rule of "not rely on a network connection". That's why: Speech framework can use on-device Speech recognition – No internet connection needed ✅. But it can ask to download an Apple's native language package to use it for this on-device recognition – To get this, you need to be connected to the Internet ❌. When I try to add a Speech Recognition Capabilities on my App Playground, its' description says: "Required to perform speech recognition using Apple's servers." (screenshot is attached). Does it mean that I won't be able to use on-device recognition on my App Playground? – And therefore, only online-version of this framework is available and I can't use it to participate on the challenge successfully❓. If it's possible, could you please make it clearer? This framework is crucial for my App Playground and I really need this to make it work. Thanks for your help in advance! And a have a good day!
1
0
489
Jan ’25
Any way to adjust the speechRecognitionMetadata pause duration?
Speech Framework I've been checking for SFSpeechRecognitionMetadata to determine the end of a sentence when using Voice Recognition. Yet it doesn't detect small pauses but only large ones, so that I've transcribed basically an entire paragraph before going onto the next one. Besides implementing your own timer, are there any other ways to have more natural pauses to detect the end of sentences, similar to the browser's Web Speech recognition? Since it's in Safari, I assume there should be some similar feature that can be equivalent in MacOS.
0
0
317
Dec ’24
Speech synthesis from Safari app extension
I'm making a Safari extension for learning languages. I need speech synthesis for any language the user chooses to learn. I initially tried to make this work within JavaScript, but Safari 18 doesn't reliably list voices for all languages on the web SpeechSynthesis API as described here: https://stackoverflow.com/questions/79179072/how-do-you-use-a-japanese-voice-with-speechsynthesis-in-safari-ios-18 As a workaround, I've had to use AVSpeechSynthesizer in SafariWebExtensionHandler (NSExtensionRequestHandling implementation for the extension). This works in the simulator but not on a real device. I've found this note from Apple in a StackOverflow reply: "Safari extensions are very short-lived, hence not fit for audio playback or speech synthesis. Not being able to validate an app extension in Xcode with a manually-added plist entry for background audio is the designed behavior. The general recommendation is to synthesize speech using JavaScript in conjunction with the Web Speech API." Unfortunately, the suggestion to use the Web Speech API is unsuitable as I just explained. Is there a way to set up a background process in the host app that can do speech synthesis? The app extension would need a way to communicate with this process, and start it if it's not running. Is that possible?
0
0
563
Dec ’24
How to implement continuous speech recognition in the background?
Hi, I'd like to develop an app which runs speech recognition even after going into background. I know I can accomplish this using audio background mode and the process the audio but I am not sure if this workaround would get accepted into App Store because of the processing limitations while in the background. How can I accomplish this while still being compliant with Apples privacy policy and other restrictions? Thanks, Marek
0
0
447
Dec ’24
Starting/restarting SFSpeechRecognizer?
Hello all, I'm working on a project that involves listening to a person speak off of a script and I want to stop then restart the recognitionTask between sections so I don't run afoul of keeping the recognitionTask running for longer than it needs to. Also, I'd like to be able to flush the current input between sections so the input from the previous section doesn't roll over into the next one. This is based on the sample code for SFSpeechRecognizer so there's a chance I might be misunderstanding something. private func restartRecording() { let inputNode = audioEngine.inputNode audioEngine.stop() inputNode.removeTap(onBus: 0) recognitionRequest?.endAudio() recordingStarted = false recognitionTask?.cancel() do { try startRecording() } catch { print("Oopsie.") } } Here's my code. When I run it, the recognition task doesn't restart. Any ideas?
0
0
554
Dec ’24
Speech Recognizer Clears Transcription After Pause on iOS 18.0
Description: I have encountered an issue with SFSpeechRecognizer on iOS 18.0. During live dictation, if a natural pause (e.g., 1-2 seconds) is introduced, the previously transcribed text is cleared, and the transcription starts over. This behavior makes it difficult to use the API for real-time speech recognition scenarios where pauses are expected. Steps to Reproduce: Open Apple's demo app "SpokenWord". Start the dictation process using SFSpeechRecognizer. Speak a few words, pause for 1-2 seconds, and then continue speaking. Observe that the previously transcribed text is truncated, and the transcription starts anew. Expected Behavior: The transcription should continue appending new results to the previous ones after a natural pause, maintaining a seamless user experience. Observed Behavior: After a pause, the transcription resets, clearing previously transcribed text. Impact: This behavior makes the SFSpeechRecognizer API unreliable for scenarios requiring continuous speech recognition with intermittent pauses. Additional Information: iOS Version: 18.0 Device: [Specify your device, e.g., iPhone 13 Pro] Speech Recognizer Locale: [Specify locale, e.g., en-US] App Behavior: Issue persists in both Apple's demo app ('SpokenWord') and custom implementations.
1
0
609
Dec ’24
AVSpeechUtterance problem
I see this error in the debugger: #FactoryInstall Unable to query results, error: 5 IPCAUClient.cpp:129 IPCAUClient: bundle display name is nil Error in destroying pipe Error Domain=NSCocoaErrorDomain Code=4099 "The connection from pid 5476 on anonymousListener or serviceListener was invalidated from this process." UserInfo={NSDebugDescription=The connection from pid 5476 on anonymousListener or serviceListener was invalidated from this process.} on this function: func speakItem() { let utterance = AVSpeechUtterance(string: item.toString()) utterance.voice = AVSpeechSynthesisVoice(language: "en-GB") try? AVAudioSession.sharedInstance().setCategory(.playback) utterance.rate = 0.3 let synthesizer = AVSpeechSynthesizer() synthesizer.speak(utterance) } When running without the debugger, it will (usually) speak once, then it won't speak unless I tap the button that calls this function many times. I know AVSpeech has problems that Apple is long aware of, but I'm wondering if anyone has a work around. I was thinking there might be a way to call the destructor for AVSpeechUtterance and generate a new object each time speech is needed, but utterance.deinit() shows: "Deinitializers cannot be accessed"
0
1
613
Nov ’24
ios 18.2 beta(22C5131E) Speech Recognition Discards Previously Transcribed Audio
I was testing SFSpeechRecognition on my real device running ios 18.2 beta, and found that the result's "final" field is true, the result itself does not contain entire conversation's transcription. I came across some blog posts saying it's fixed in a 18.1 beta, is this not the case for 18.2 beta? Example code: recognitionTask = recognizer.recognitionTask(with: request) { [weak self] result, error in guard let self = self else { return } if let error = error { DispatchQueue.main.async { self.errorMessage = "Transcription failed: \(error.localizedDescription)" self.isTranscribing = false } } else if let result = result, result.isFinal { // HERE! } }
4
0
763
Nov ’24