Recognize spoken words in recorded or live audio using Speech.

Speech Documentation

Posts under Speech tag

70 Posts
Sort by:
Post not yet marked as solved
0 Replies
37 Views
I got this SSML from w3. org. AVSpeechUtterance(ssmlRepresentation:) is not complying with the contour. It doesn't change hz. <?xml version="1.0"?> <speak version="1.1" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis11/synthesis.xsd" xml:lang="en-US"> <prosody contour="(0%,+20Hz) (10%,+30%) (40%,+10Hz)"> good morning </prosody> </speak> override func viewDidLoad() { super.viewDidLoad() guard let localUtterance = AVSpeechUtterance(ssmlRepresentation: self.speechSML) else { print("SML did not work.") return } self.utterance = localUtterance self.utterance.voice = self.voiceNoelle } self.synthesizer.speak(self.utterance)
Posted
by
Post not yet marked as solved
0 Replies
37 Views
Hello iOS Developer Community, I hope this message finds you healthy and happy. I am reaching out to seek your expertise and assistance with a particular challenge I’ve encountered while using the Speak Screen and Speak Selection features on iOS. As you may know, these features are incredibly useful for reading text aloud, but they sometimes struggle with the correct pronunciation of homographs—words that are spelled the same but have different meanings and pronunciations. An example of this is the word “live,” which can be pronounced differently based on the context of the sentence. To enhance my user experience, I am looking to input corrections for the pronunciation of “live” in its “happening now” context, such as in “live broadcast” or “live event.” However, the current process requires manual entry for each phrase, which is quite labor-intensive. I am wondering if there is a way to automate or streamline this process, perhaps through a shortcut or script that allows for bulk input of these corrections. Additionally, if anyone has already compiled a list of common phrases with homographs and their correct pronunciations, I would greatly appreciate it if you could share it or guide me on where to find such resources. Your insights and guidance on this matter would be invaluable, and I believe any solutions could benefit not just myself but many other users facing similar issues. Thank you for your time and consideration. I look forward to any suggestions or advice you may have. Best regards, Alec
Posted
by
Post not yet marked as solved
0 Replies
117 Views
Hello! We have an app that utilises the SpeechKit Framework. Especially the local on-device speech recognition for the audio files with the user selected language. Up until recently it worked as expected. However after updating one of our testing device to iOS 17.4.1 we found out that the local recognition on it stopped working completely. The error that we are getting has code 102 at its localised description reads: "Failed to access assets". That sounds just like a rear though known issue in previous iOS versions. The solution was inconvenient for our users but at least it worked – they were to go to the System settings and tweak with the dictation setting in the keyboard section. Right now no tweaks of this sort appear to help us fix the situation. We even tried to do the setting reset of the device (not the factory reset though). The error persists. it appears one one of our devices 100% of the time, halting the local recognition process. It sometimes shows on other devices for some particular languages too, but it does not show for other languages. As it is a UX breaking bug for our app, today I decided to check the logs of the Console app at the moment of the recognition attempt. There are lots of errors with code 1101 which from our research appear to be the general notifications about some local recognition setup problems. Removing the lines about the 1101 error from the log we have some interesting stuff remaining, that is (almost) never mentioned in any of the searchable webpages in the Internet. I assume they are the private API calls that the SpeechKit Framework executes under the hood: default localspeechrecognition -[UAFAssetSet assetNamed:]_block_invoke 9067C4F1-0B29-4A57-85DD-F8740DF7C344: No assets in asset set com.apple.siri.understanding default localspeechrecognition -[UAFAssetSet assetNamed:] 9067C4F1-0B29-4A57-85DD-F8740DF7C344: Returning com.apple.siri.asr.assistant from source none error localspeechrecognition -[SFEntitledAssetManager _assetWithAssetConfig:regionId:] No asset found with name: com.apple.siri.asr.assistant, asset set: com.apple.siri.understanding, usage: <private> error localspeechrecognition +[LSRConnection modelRootWithLanguage:clientID:modelOverrideURL:returningAssetType:error:] Fetch asset error (null) error localspeechrecognition -[LSRConnection prepareRecognizerWithLanguage:recognitionOverrides:modelOverrideURL:anyConfiguration:task:clientID:error:] modelRoot is nil (null) default OurApp [0x113e96d40] invalidated because the current process cancelled the connection by calling xpc_connection_cancel() Looks like there are some language-model related problems that appeared after the device was updated to 17.4.1. The Settings -> General -> Keyboard -> Dictation Languages appear to be configured correctly, the dictation toggle is On, we tried tweaking all these setting, rebooting the device and resetting the device settings. However the log lines still tell us that there is something wrong with the private resources of the SpeechKit framework. We are very concerned as the speech recognition is the core of out application's logic. And we don't understand what is the scale of possible impact of such a faulty behaviour (rare occurrences / some users / all users?) and how we can fix it to provide our users with the desired behaviour.
Posted
by
Post not yet marked as solved
0 Replies
134 Views
Description: Problem Statement: State the problem clearly: The Siri Intent for the "Next","Previous","Repeat" command is not working as expected within the Speech Framework. Steps to Reproduce: Provide a detailed description of the steps to reproduce the issue. For example: Open the Speech Framework application. Tap on the Siri button to activate voice input. Say "Next" to trigger the intended action. Observe that the action is not executed correctly. IN Our Demo App: Steps of my demo application as below: Open SIRI Speak: Check In Response: Open dialog as below: What user wants? One 2) Next 3) Yes 4) Goodbye Speak: Next In Response: SIRI repeat same dialog (Step: 2) 3) Speak: Yes, or One or Goodbye In Response: SIRI goes to next dialog. Expected Behavior: Should be get "Next" Value in siri kit intent or app intent. Actual Behavior: But it give previous user input key word give in siri kit intent and recuresively repeat dialog in app intent. Device versions and Region and Language: Device model: IPhone 11 and OS version: 17.4.1 Region: Us and Language: English(US) Impact: User Cant use Iterative dialog in one context. Additional: How Different command work on app intent and siri kit intent on diffrent diffrent device. you can follow No vise in order. || No || Diffrent Device test on Diffrent sinario || SiriKit intent || app Intent || | 1 | ISG iPhone 11 - Next | Not | Not | | 2 | ISG iPhone 11 - Yes | Not | Yes (But Using Enum) | | 3 | ISG iPhone 11 - GoodBye | Not | Yes (But Using Enum) | | 4 | ISG iPhone 11 - One | Yes | Yes | | 5 | iPad - Next | Not | Not | | 6 | iPad - One | Yes | Yes | | 7 | iPad - GoodBye | Not | Yes | | 8 | iPad - Yes | Not | Yes | | 9 | Simulator - iPhone 15 - Next, Yes, One, GoodBye | Yes | Yes | Please help me in it...
Posted
by
Post not yet marked as solved
1 Replies
116 Views
Hello! I'm writing to the Apple developers to request the addition of an API for downloading premium voices directly within the app. Currently, this can only be done via the settings, which is not convenient for our users. As a developer for an application where this plays a crucial role, I ask you to take this into consideration. Thank you!
Posted
by
Post not yet marked as solved
0 Replies
134 Views
The application is developed in SwiftUI. Our application is responsible for audio recording, transcribing the audio file and uploading it to the backend. So, the 2 main components on the iOS application are : AVAudioRecorder, SFSpeechRecognizer. The UI compromises a visual design which showcases the recording of audio, and lets the user know if the audio is being recorded on not using a Text component. Lately the customer has been complaining that though the application says “Recording ” on the UI, their audios are not being are not being received at the backend. The customers try restarting there device(iPad) and the application started working normally We haven’t been able to reproduce the issue. But we suspect an intermittent failure in audio transmission or a potential UI freezing. Note : I have tried using Leaks instrument and had not encountered any memory leaks while using the application. Is there a way to determine whether the issue lies with the audio recorder, the speech recognizer, or elsewhere in the app? Are there any known issues or limitations with audio recorder lately on iOS that could be causing this behaviour? Please let me know if you have any suggestions to diagnose this issue. Also, do let me know if more information is required Thank you in advance
Posted
by
Post not yet marked as solved
0 Replies
198 Views
I would like to contact a developer on the SSML team regarding the possibility to create a new downloadable voice, in a language yet unsupported. I don't mind making a free contribution. Creating Custom voices does not seem to be a solution, since only English is supported when creating a custom voice.
Posted
by
Post not yet marked as solved
0 Replies
261 Views
I am using SpeechSynthesizer and SpeechRecognizer. After a recognition task completes, the SpeechSynthesizer stops producing audible output. I am using the latest SwiftUI in Xcode 15.2, deploying to an iPhone 14 Pro running iOS 17.3.1. Here's my SpeechSynthesizer function: func speak(_ text: String) { let utterance = AVSpeechUtterance(string: text) utterance.voice = AVSpeechSynthesisVoice(identifier: self.appState.chatParameters.voiceIdentifer) utterance.rate = 0.5 speechSynthesizer.speak(utterance) } And here's the code for setting up the SpeechRecognizer (borrowed from https://www.linkedin.com/pulse/transcribing-audio-text-swiftui-muhammad-asad-chattha): private static func prepareEngine() throws -> (AVAudioEngine, SFSpeechAudioBufferRecognitionRequest) { print("prepareEngine()") let audioEngine = AVAudioEngine() let request = SFSpeechAudioBufferRecognitionRequest() request.shouldReportPartialResults = false request.requiresOnDeviceRecognition = true let audioSession = AVAudioSession.sharedInstance() try audioSession.setCategory(.playAndRecord) try audioSession.setActive(true, options: .notifyOthersOnDeactivation) let inputNode = audioEngine.inputNode let recordingFormat = inputNode.outputFormat(forBus: 0) inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in request.append(buffer) } audioEngine.prepare() try audioEngine.start() return (audioEngine, request) } SpeechSynthesizer works fine as long as I don't call prepareEngine(). Thanks in advance for any assistance.
Posted
by
Post not yet marked as solved
0 Replies
303 Views
I want to develop an AI assistant ios application using whisper and chatGPT OpenAI apis. I am implementing these following steps. Audio-engine to record the user's voice Send audio chunk to Whisper for Speech to Text Send that text to chatgpt openAI to get response Now sending that response to Speech Synthesizer to speak response through built-in speaker In this process, i don't want to disable microphone. Because user can interrupt the speech synthesizer anytime he likes. It should be realtime and look like continuous call between the user and AI assistant. Problem: When user speaks, microphone takes the input and appends into the audioengine recording file. Then sends that chunk to whisper for transcribing, transcribed text is then sent to chatgpt api to get response and response is sent to speech synthesiser which generates an output on speaker. Issue is that the microphone again takes synthesiser voice from speaker, and create a loop. What should i possibly do to stop my microphone to not take the input from iphone speaker. Talking tom, callAnnie applications and many other ios applications are continuously using microphone and generating outputs from speaker without overlapping and loop. Suggest the possible ways. I tried to set all possible ways for setting audio-engine category and settings with record, playback, playandrecord etc. Nothing gives me the solution to avoid speaker voice into my microphone. Technically as I think of microphone should never take the device generated voices. What could be the possible solution. If my approach is wrong also i am open to plenty suggestions and guidance.
Posted
by
Post not yet marked as solved
1 Replies
378 Views
Hello, I’ve been trying to play system sounds in my app, but this hasn’t really been working. I am frequently switching between speech recognition (Speech framework) and sounds, so perhaps that’s where the issue lies. However, despite my best efforts, I haven't been able to solve the issue. I've been resetting the AVAudioSession category before playing a sound or starting speech recognition (as depicted in the code snippet below), to no avail. Has this happened to anyone else? Does anybody know how to fix the issue? recognizer = nil try? AVAudioSession.sharedInstance().setCategory(.playback, mode: .default, options: []) try? AVAudioSession.sharedInstance().setActive(true) AudioServicesPlaySystemSound(1113) try? AVAudioSession.sharedInstance().setCategory(.record, mode: .spokenAudio, options: []) try? AVAudioSession.sharedInstance().setActive(true) recognizer = SpeechRecognition(word: wordSheet) recognizer!.startRecognition() Thank you.
Posted
by
Post not yet marked as solved
1 Replies
351 Views
I have AVSpeechSynthesizer built in to 6 apps for iPad/iOS that were working fine until recently. Sometime between November 2023 and Feb 2024, they just quit speaking on all the apps for no apparent reason. There have been both XCode and iOS updates in the interim, but I cannot be sure which caused it. It doesn't work either in XCode on simulation, nor on devices. What did Apple change? XCode 15.2 iOS 17+ SwiftUI let synth = AVSpeechSynthesizer() var thisText = "" func sayit(thisText: String) { let utterance = AVSpeechUtterance(string: thisText) utterance.voice = AVSpeechSynthesisVoice(language:"en-US") utterance.rate = 0.4 utterance.preUtteranceDelay = 0.1 synth.speak(utterance)}
Posted
by
Post not yet marked as solved
1 Replies
604 Views
I am trying to use the Speech Synthesizer to speak the pronunciation of a word in British English rather than play a local audio file which I had before. However, I keep getting this in the debugger: #FactoryInstall Unable to query results, error: 5 Unable to list voice folder Unable to list voice folder Unable to list voice folder IPCAUClient.cpp:129 IPCAUClient: bundle display name is nil Unable to list voice folder Here is my code, any suggestions?? ` func playSampleAudio() { let speechSynthesizer = AVSpeechSynthesizer() let speechUtterance = AVSpeechUtterance(string: currentWord) // Search for a voice with a British English accent. let voices = AVSpeechSynthesisVoice.speechVoices() var foundBritishVoice = false for voice in voices { if voice.language == "en-GB" { speechUtterance.voice = voice foundBritishVoice = true break } } if !foundBritishVoice { print("British English voice not found. Using default voice.") } // Configure the utterance's properties as needed. speechUtterance.rate = AVSpeechUtteranceDefaultSpeechRate speechUtterance.pitchMultiplier = 1.0 speechUtterance.volume = 1.0 // Speak the word. speechSynthesizer.speak(speechUtterance) }
Posted
by
Post not yet marked as solved
0 Replies
338 Views
Is there a way to extract the list of words recognized by the Speech framework? I'm trying to filter out words that won't appear in the transcription output, but to do that I'll need a list of words that can appear. SFSpeechLanguageModel.Configuration can be initialized with a vocabulary, but there doesn't seem to be a way to read it, and while there are ways to create custom vocabularies, I have yet to find a way to retrieve it. I added the Natural Language tag in case the framework might contribute to a solution
Posted
by
wmk
Post not yet marked as solved
0 Replies
396 Views
I'm working with the new speech recognition APIs in iOS 17 and have encountered some confusion regarding the use of URLs in SFSpeechLanguageModel.prepareCustomLanguageModel and the SFSpeechLanguageModel.Configuration. In the SFSpeechLanguageModel.Configuration initializer, I provide a URL that points to a custom language model .bin file. However, there's also a URL parameter in the prepareCustomLanguageModel method. I'm unclear about the purpose of this second URL and how it differs from the one in the configuration. To add to the confusion, the documentation for these new APIs is not fully fleshed out at this point. I've tried injecting both .bin files (for the custom language model and the one for prepareCustomLanguageModel) into the same URL, but the results haven't clarified their distinct roles. In experiments I conducted, I checked the confidence level of recognized phrases from the same audio file with and without the custom language model .bin file. Surprisingly, the confidence levels remained the same in both scenarios, leading me to question if the custom model is being utilized correctly. Has anyone else worked with these new APIs and can provide clarity on: The distinct roles of the URLs in SFSpeechLanguageModel.Configuration and prepareCustomLanguageModel. Why there might be no noticeable difference in confidence levels when using a custom language model. Any insights or experiences with these new aspects of the iOS 17 speech recognition API would be greatly appreciated.
Posted
by
Post not yet marked as solved
1 Replies
285 Views
Hy, I'm French developer and I downloaded the Recognizing Speech in live Audio sample code from Developer Apple website. I tried to execute data generator command after changing the local identifier from 'en_US' to 'fr' in data generator main file , but when I ran the command in Xcode, I had this error message : " Identifier 'fr' does not parse into two elements." I checked the xml files associated to the bin archive file and the identifiers are no correct (they keep 'en-US' value). Thanks for your help !
Posted
by
Post not yet marked as solved
0 Replies
424 Views
I have a prototype web view (in a WKWebView) that uses webkitSpeechRecognition for getting short snippets of text from speech. I'm not thrilled with the quality of the "recognition" - the text generally isn't very accurate. I'm wondering if I'll get any more accuracy by using the "native" SFSpeechRecognizer. It seems to me that webkitSpeechRecognition is likely just a Javascript wrapper interface for SFSpeechRecognizer, and the quality of the speech recognition won't improve. Does anyone know for sure if this is the case? Does webKitSpeechRecognition on iOS use SFSpeechRecognizer under the hood? Or are they two completely different recognition systems, and one could be more accurate than the other?
Posted
by
Post not yet marked as solved
2 Replies
441 Views
Application is getting Crashed: AXSpeech EXC_BAD_ACCESS KERN_INVALID_ADDRESS 0x000056f023efbeb0 Crashed: AXSpeech 0 libobjc.A.dylib 0x4820 objc_msgSend + 32 1 libsystem_trace.dylib 0x6c34 _os_log_fmt_flatten_object + 116 2 libsystem_trace.dylib 0x5344 _os_log_impl_flatten_and_send + 1884 3 libsystem_trace.dylib 0x4bd0 _os_log + 152 4 libsystem_trace.dylib 0x9c48 _os_log_error_impl + 24 5 TextToSpeech 0xd0a8c _pcre2_xclass_8 6 TextToSpeech 0x3bc04 TTSSpeechUnitTestingMode 7 TextToSpeech 0x3f128 TTSSpeechUnitTestingMode 8 AXCoreUtilities 0xad38 -[NSArray(AXExtras) ax_flatMappedArrayUsingBlock:] + 204 9 TextToSpeech 0x3eb18 TTSSpeechUnitTestingMode 10 TextToSpeech 0x3c948 TTSSpeechUnitTestingMode 11 TextToSpeech 0x48824 AXAVSpeechSynthesisVoiceFromTTSSpeechVoice 12 TextToSpeech 0x49804 AXAVSpeechSynthesisVoiceFromTTSSpeechVoice 13 Foundation 0xf6064 __NSThreadPerformPerform + 264 14 CoreFoundation 0x37acc CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION + 28 15 CoreFoundation 0x36d48 __CFRunLoopDoSource0 + 176 16 CoreFoundation 0x354fc __CFRunLoopDoSources0 + 244 17 CoreFoundation 0x34238 __CFRunLoopRun + 828 18 CoreFoundation 0x33e18 CFRunLoopRunSpecific + 608 19 Foundation 0x2d4cc -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 212 20 TextToSpeech 0x24b88 TTSCFAttributedStringCreateStringByBracketingAttributeWithString 21 Foundation 0xb3154 NSThread__start + 732 com.livingMedia.AajTakiPhone_issue_3ceba855a8ad2d1af83655803dc13f70_crash_session_9081fa41ced440ae9a57c22cb432f312_DNE_0_v2_stacktrace.txt 22 libsystem_pthread.dylib 0x24d4 _pthread_start + 136 23 libsystem_pthread.dylib 0x1a10 thread_start + 8
Posted
by
Post not yet marked as solved
0 Replies
275 Views
Hi Apple Team, We have a technical query regarding one feature- Audio Recognition and Live captioning. We are developing an app for deaf community to avoid communication barriers. We want to know if there is any possibility to recognize the sound from other applications in an iPhone and show live captions in our application (based on iOS).
Posted
by
Post not yet marked as solved
1 Replies
839 Views
My app listens for verbal commands "Roll" & "Skip". It was working well until I used it while listening to a podcast in another app. I am getting a crash with the error: Thread 1: "required condition is false: IsFormatSampleRateAndChannelCountValid(format)" . It crashes when I am playing audio from the apps Snipd (a podcast app) or the Apple Podcast app. When I am playing audio from Youtube or the Apple Music it does not crash. This is the code for when I start listening for the commands: // MARK: - Speech Recognition func startListening() { do { try configureAudioSession() createRecognitionRequest() try prepareAudioEngine() } catch { print("Audio Engine error: \(error.localizedDescription)") } } private func configureAudioSession() throws { let audioSession = AVAudioSession.sharedInstance() try audioSession.setCategory(.playAndRecord, mode: .measurement, options: [.interruptSpokenAudioAndMixWithOthers, .duckOthers]) try audioSession.setActive(true, options: .notifyOthersOnDeactivation) } private func createRecognitionRequest() { recognitionRequest = SFSpeechAudioBufferRecognitionRequest() guard let recognitionRequest = recognitionRequest else { return } recognitionRequest.shouldReportPartialResults = true recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: handleRecognitionResult) } private func prepareAudioEngine() throws { let inputNode = audioEngine.inputNode inputNode.removeTap(onBus: 0) let inputFormat = inputNode.inputFormat(forBus: 0) inputNode.installTap(onBus: 0, bufferSize: 1024, format: inputFormat) { [weak self] (buffer, _) in self?.recognitionRequest?.append(buffer) } audioEngine.prepare() try audioEngine.start() isActuallyListening = true } Thanks
Posted
by