Recognize spoken words in recorded or live audio using Speech.

Speech Documentation

Posts under Speech tag

63 Posts
Sort by:
Post not yet marked as solved
1 Replies
82 Views
Hi, I face an issue with AVSpeechSynthesizer after iOS 16. Crashed: com.apple.TextToSpeech.SpeechThread 0 libobjc.A.dylib 0x3518 objc_release + 16 1 libobjc.A.dylib 0x3518 objc_release_x0 + 16 2 libobjc.A.dylib 0x15d8 AutoreleasePoolPage::releaseUntil(objc_object**) + 196 3 libobjc.A.dylib 0x4f40 objc_autoreleasePoolPop + 256 4 libobjc.A.dylib 0x329dc objc_tls_direct_base<AutoreleasePoolPage*, (tls_key)3, AutoreleasePoolPage::HotPageDealloc>::dtor_(void*) + 168 5 libsystem_pthread.dylib 0x1bd8 _pthread_tsd_cleanup + 620 6 libsystem_pthread.dylib 0x4674 _pthread_exit + 84 7 libsystem_pthread.dylib 0x16d8 _pthread_start + 160 8 libsystem_pthread.dylib 0xba4 thread_start + 8 I got many crash reports from my clients, but unfortunately, I can't reproduce this on my test devices. Does anybody face this also?
Posted
by ibatis.
Last updated
.
Post not yet marked as solved
0 Replies
42 Views
Using the write method from AVSpeechSynthesizer produces the following error: [AXTTSCommon] TTSPlaybackEnqueueFullAudioQueueBuffer: error -66686 enqueueing buffer This issue has first been seen on iOS 16. More information and code snippet: https://stackoverflow.com/questions/73716508/play-audio-buffers-generated-by-avspeechsynthesizer-directly
Posted
by apascual.
Last updated
.
Post not yet marked as solved
11 Replies
832 Views
Anyone experiencing issues with Speech to Text in Beta 4 ? It was working absolutely fine in earlier iOS versions.        let utterance = AVSpeechUtterance(string: "The quick brown fox jumped over the lazy dog.")       utterance.voice = AVSpeechSynthesisVoice(language: "en-US")       utterance.volume = 1       utterance.rate = 0.1       let synthesizer = AVSpeechSynthesizer()       synthesizer.speak(utterance)
Posted
by kinjal.
Last updated
.
Post not yet marked as solved
8 Replies
719 Views
Setting a voice for AVSpeechSynthesizer leads to an heap buffer overflow. Turn on address sanitizer in Xcode 14 beta and run the following code. Anybody else experiencing this problem, is there any workaround? let synthesizer = AVSpeechSynthesizer() var synthVoice : AVSpeechSynthesisVoice? func speak() { let voices = AVSpeechSynthesisVoice.speechVoices()           for voice in voices {       if voice.name == "Daniel" {    // select e.g. Daniel voice         synthVoice = voice       }     }           let utterance = AVSpeechUtterance(string: "Test 1 2 3")           if let synthVoice = synthVoice { utterance.voice = synthVoice     }           synthesizer.speak(utterance) // AddressSanitizer: heap-buffer-overflow }
Posted Last updated
.
Post not yet marked as solved
9 Replies
1k Views
I'm testing my App in the Xcode 14 beta (released with WWDC22) on iOS 16, and it seems that AVSpeechSynthesisVoice is not working correctly. The following code always returns an empty array: AVSpeechSynthesisVoice.speechVoices() Additionally, attempting to initialize AVSpeechSynthesisVoice returns nil for all of the following: AVSpeechSynthesisVoice(language: AVSpeechSynthesisVoice.currentLanguageCode()) AVSpeechSynthesisVoice(language: "en") AVSpeechSynthesisVoice(language: "en-US") AVSpeechSynthesisVoice(identifier: AVSpeechSynthesisVoiceIdentifierAlex) AVSpeechSynthesisVoice.speechVoices().first
Posted
by zabelc.
Last updated
.
Post not yet marked as solved
1 Replies
122 Views
Recently I updated to Xcode 14.0. I am building an iOS app to convert recorded audio into text. I got an exception while testing the application from the simulator(iOS 16.0). [SpeechFramework] -[SFSpeechRecognitionTask handleSpeechRecognitionDidFailWithError:]_block_invoke Ignoring subsequent recongition error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)" Error Domain=kAFAssistantErrorDomain Code=1107 "(null)" I have to know what does the error code means and why this error occurred.
Posted Last updated
.
Post marked as solved
1 Replies
133 Views
Hello! I have a question about AVSpeechSynthesizer relating to privacy. I'm wondering if the processing that happens in AVSpeechSynthesizer is local to the app I'm building or some part of the audio/text is shared with Apple in order to improve the service or for some other purpose.
Posted
by aargelius.
Last updated
.
Post not yet marked as solved
2 Replies
148 Views
I'm debuggin a macOS application that uses SFSpeechRecognizer to translate a video or audio file into a text file. I think that I'm doing something wrong when I call SFSpeechRecognizer. Here is the code:      guard let myRecognizer = SFSpeechRecognizer() else {       // A recognizer is not supported for the current locale       print("Recognizer not supported for current locale")       self.titleText = "Recognizer not supported for current locale"       return     }           if !myRecognizer.isAvailable {       // The recognizer is not available right now       print("Recognizer is not available right now")       self.titleText = "Recognizer is not available right now"       return     }               let request = SFSpeechURLRecognitionRequest(url: url)     myRecognizer.recognitionTask(with: request) { (result, error) in And this is the relevant stack trace. Application Specific Backtrace 0: 0 CoreFoundation 0x00000001b5e9d148 __exceptionPreprocess + 240 1 libobjc.A.dylib 0x00000001b5be7e04 objc_exception_throw + 60 2 AVFCore 0x00000001c851701c -[AVAssetReaderAudioMixOutput initWithAudioTracks:audioSettings:] + 984 3 AVFCore 0x00000001c8516c28 +[AVAssetReaderAudioMixOutput assetReaderAudioMixOutputWithAudioTracks:audioSettings:] + 52 4 Speech 0x00000001e46f1348 __151-[SFSpeechURLRecognitionRequest _handlePreRecordedAudioWithAsset:audioTracks:narrowband:addSpeechDataBlock:stopSpeechBlock:cancelSpeechWithErrorBlock:]_block_invoke.227 + 380 5 Speech 0x00000001e46f1130 __151-[SFSpeechURLRecognitionRequest _handlePreRecordedAudioWithAsset:audioTracks:narrowband:addSpeechDataBlock:stopSpeechBlock:cancelSpeechWithErrorBlock:]_block_invoke + 212 6 libdispatch.dylib 0x00000001b5b8a5f0 _dispatch_call_block_and_release + 32 7 libdispatch.dylib 0x00000001b5b8c1b4 _dispatch_client_callout + 20 8 libdispatch.dylib 0x00000001b5b8f2c8 _dispatch_queue_override_invoke + 784 9 libdispatch.dylib 0x00000001b5b9d8e8 _dispatch_root_queue_drain + 396 10 libdispatch.dylib 0x00000001b5b9e104 _dispatch_worker_thread2 + 164 11 libsystem_pthread.dylib 0x00000001b5d4c324 _pthread_wqthread + 228 12 libsystem_pthread.dylib 0x00000001b5d4b080 start_wqthread + 8 Full ips file attached. Thanks for taking a look :) Speech+Recognizer+2-2022-09-16-160824.txt
Posted
by joestone.
Last updated
.
Post marked as solved
2 Replies
687 Views
I am using below piece of code for TTS in iOS: let utterance = AVSpeechUtterance(string: "Hello World") utterance.voice = AVSpeechSynthesisVoice(language: "en-US") let synthesizer = AVSpeechSynthesizer() synthesizer.speak(utterance) It is working fine for iOS 15.6.1 and all lower versions. But the same code is giving below exception in iOS 16 Beta (latest beta 6 as well): [catalog] Unable to list voice folder The required voices are present in the iPhone and working properly in Voice Over and Spoken content. Even the voice API AVSpeechSynthesisVoice.speechVoices() is fetching all the voices, but I am getting above exception at line synthesizer.speak(utterance).
Posted
by aakashJ.
Last updated
.
Post not yet marked as solved
2 Replies
205 Views
OK so I have been trying to find the right way to develop a program that runs in the background of my MacBook Air 2021 and all It will do is create a read-only text transcript file of the speech from the audio output: speaker, headphones, etc. Not only that, but the original speech from the audio/video file will be transcripted into a read-only text file. Not only that but there will be a read-only file created to pinpoint he origin of the audio speaker output and the origin of the file the audio is supposedly coming from. So each time I play a video or movie whether its YouTube Netflix prime video Vimeo etc, an instance will occur that creates 4 of the read-only text files. The directory could be created when the program is setup on my laptop, and I could change the directory if need be. I read some things about programming in swift and it seems overwhelming in the sense that this program could take more time than I expect it to in order to be fully functional. And another thing is that I see no commercial value in this program/product so it will essentially be an example of the many possibilities of swift. I believe that the program needs these specifications but It could be written in a way I dont expect it to be written. Maybe If I was given a direction of what book would be best for developing this program, completely full of all the jargon I need to learn. I would be forever grateful to the swift development team. But this program seems very unnecessary I guess so I am not expecting too much.
Posted Last updated
.
Post not yet marked as solved
4 Replies
256 Views
I am on Monterey 12.1/Safari 15.2 and am seeing a problem with Web Speech boundary events in that if my utterance text has spaces in it then the tracking index only reflects half of those spaces rounded down. So if the first word is prefix by 1 space then no problem, but with 2 it acts like there's just 1 space and 3-4=>2, 5-6=>3 spaces with respect to the character index during tracking. This is not an issue for English content (and didn't use to be an issue for Spanish content). I have attached a test page as an example. ssml.html Load it up, choose a Spanish voice pack (e.g. Juan), type in 'Yo tengo un gato y un perro.' (Yeah, I know my Spanish is impressive.) and try clicking Speak and adding spaces. You'll note the character indices are the same for 1 or 2 spaces... etc. P.S. I can't find an appropriate tag for Speech Synthesis related issues.
Posted
by areeve.
Last updated
.
Post not yet marked as solved
1 Replies
193 Views
Hello. I have an app that makes use of the speech framework for speech to text. A couple of our testers have reported that the text to speech does not work unless Dictation is enabled in the keyboard settings. This was reported by a tester on an iPhone 8 Plus running iOS 15.5 (issue persisted after updating to 15.6), as well as another tester on an iPhone 12 running iOS 15.2.1. However, we were unable to reproduce this on our end on the following devices: iPhone SE (2nd gen) running iOS 15.6 iPad air running iOS 12.5.5 iPod touch (6th gen ) running iOS 12.5.5 iPhone 13 pro running iOS 15.6 STT works on these devices regardless if keyboard dictation is on or off. Why is this only required on a small handful of devices??
Posted
by GNUGradyn.
Last updated
.
Post not yet marked as solved
1 Replies
253 Views
Hey all... I have read up on a lot of forms but have not found a way yet to implement this. So i would like to have an app where i can record the sounds in the night. If a sound reaches a certain level of noise some music will play. We have already created this for Android but have not found a way to implement this for IOS. Android version: https://play.google.com/store/apps/details?id=com.sicgames.muziekindenacht&amp;hl=nl&amp;gl=US Does anyone have any suggestion on how to handle this?
Posted
by Tommy030.
Last updated
.
Post not yet marked as solved
2 Replies
191 Views
I found the Scrumdinger sample application really helpful in understanding SwiftUI, but I have a question about the transcription example. Regardless of using either the "StartingProject" and doing the tutorial section, or using the "Completed" project, the speech transcription works but only for a small number of seconds. Is this a side effect of something else in the project? Should I expect a complete transcription of everything said when the MeetingView view is presented? This was done on Xcode 13.4 and Xcode 14 beta 4, with iOS 15 and iOS 16 (beta 4). Thanks for any assistance!
Posted Last updated
.
Post not yet marked as solved
0 Replies
173 Views
So I have the same problem as this one https://developer.apple.com/forums/thread/69046. There is voice control in my application by using SFSpeechAudioBufferRecognitionRequest and it works fine without capturing my iPhone screen with QuickTime. While QuickTime iPhone mirroring is on – recognition tasks closure did not call at all
Posted
by TimofeyL.
Last updated
.
Post not yet marked as solved
0 Replies
266 Views
I'm comparing <sub> tags in SSML with iOS16 beta. AVSpeechSynthesizer adds short pause before and after tag, where Google TTS does not. https://cloud.google.com/text-to-speech/docs/ssml Also it pronunces "." as "period". let uttreance = AVSpeechUtterance(ssmlRepresentation: "<speak> I can also substitute phrases, like the <sub alias=\"World Wide Web Consortium\">W3C</sub>. </speak>") The same happens with <phoneme> tags. I think it shouldn't have extra pauses.
Posted
by Masatoshi.
Last updated
.
Post not yet marked as solved
2 Replies
516 Views
I'm building a game where the player is able to speak commands, so I want to enable speech-to-text capability. I've setup the required info.plist property (for speech recognition privacy) as well as the App Sandbox hardware setting (for audio input). I've confirmed that the application is listening via the audio tap and sending audio buffers to the recognition request. However, the recognition task never executes. NOTE: This is for MacOS, NOT iOS. Also, it works when I have this in a Playground, but when I try to do this in an actual application, the recognition task isn't called. Specs: MacOS: 12.1 XCode: 13.2.1 (13C100) Swift: 5.5.2 Here is the code that I've placed in the AppDelegate of a freshly built SpriteKit application: // // AppDelegate.swift // import Cocoa import AVFoundation import Speech @main class AppDelegate: NSObject, NSApplicationDelegate {   private let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!   private let audioEngine = AVAudioEngine()   private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?   private var recognitionTask: SFSpeechRecognitionTask?   func applicationDidFinishLaunching(_ aNotification: Notification) {     SFSpeechRecognizer.requestAuthorization(requestMicrophoneAccess)   }   func applicationWillTerminate(_ aNotification: Notification) {     // Insert code here to tear down your application   }   func applicationShouldTerminateAfterLastWindowClosed(_ sender: NSApplication) -> Bool {     return true   }   fileprivate func requestMicrophoneAccess(authStatus: SFSpeechRecognizerAuthorizationStatus) {     OperationQueue.main.addOperation {       switch authStatus {       case .authorized:           self.speechRecognizer.supportsOnDeviceRecognition = true           if let speechRecognizer = SFSpeechRecognizer() {             if speechRecognizer.isAvailable {               do {                 try self.startListening()               } catch {                 print(">>> ERROR >>> Listening Error: \(error)")               }             }           }       case .denied:           print("Denied")                 case .restricted:           print("Restricted")                 case .notDetermined:           print("Undetermined")                 default:           print("Unknown")       }     }   }   func startListening() throws {     // Cancel the previous task if it's running.     recognitionTask?.cancel()     recognitionTask = nil           let inputNode = audioEngine.inputNode     // Configure the microphone input.     let recordingFormat = inputNode.outputFormat(forBus: 0)     inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) {(buffer: AVAudioPCMBuffer, when: AVAudioTime) in /********** * Confirmed that the following line is executing continuously **********/       self.recognitionRequest?.append(buffer)     }     startRecognizing()     audioEngine.prepare()     try audioEngine.start()   }   func startRecognizing() {     // Create a recognition task for the speech recognition session.     recognitionRequest = SFSpeechAudioBufferRecognitionRequest()     guard let recognitionRequestInternal = recognitionRequest else { fatalError("Unable to create a SFSpeechAudioBufferRecognitionRequest object") }     recognitionRequestInternal.shouldReportPartialResults = true     recognitionRequestInternal.requiresOnDeviceRecognition = true /************** * Confirmed that the following line is executed, * however the function given to 'recognitionTask' is never called **************/     recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequestInternal) { result, error in       var isFinal = false               if result != nil {         let firstTranscriptionTimestamp = result!.transcriptions.first?.segments.first?.timestamp ?? TimeInterval.zero         isFinal = result!.isFinal || (firstTranscriptionTimestamp != 0)       }               if error != nil {         // Stop recognizing speech if there is a problem.         print("\n>>> ERROR >>> Recognition Error: \(error)")         self.audioEngine.stop()         self.audioEngine.inputNode.removeTap(onBus: 0)         self.recognitionRequest = nil         self.recognitionTask = nil       } else if isFinal {         self.recognitionTask = nil       }     }   } }
Posted Last updated
.
Post not yet marked as solved
0 Replies
152 Views
Hello, Please help. In our application we are using the speech recognizer. And I have 2 questions. Please let me know how can I turn on speech recognizer with the timer when the application is in the background. How can i use speech recognizer unlimited? Thanks in advance for your help.
Posted Last updated
.
Post not yet marked as solved
4 Replies
1.1k Views
Hi, I am facing a strange issue in my app there is an intermittent crash, I am using AVSpeechSynthesizer for speech discovery not sure if that is causing the problem crash log has below information: Firebase Crash log Crashed: AXSpeech 0 CoreFoundation 0x197325d00 _CFAssertMismatchedTypeID + 112 1 CoreFoundation 0x197229188 CFRunLoopSourceIsSignalled + 314 2 Foundation 0x198686ca0 performQueueDequeue + 440 3 Foundation 0x19868641c __NSThreadPerformPerform + 112 4 CoreFoundation 0x19722c990 __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 28 5 CoreFoundation 0x19722c88c __CFRunLoopDoSource0 + 208 6 CoreFoundation 0x19722bbfc __CFRunLoopDoSources0 + 376 7 CoreFoundation 0x197225b70 __CFRunLoopRun + 820 8 CoreFoundation 0x197225308 CFRunLoopRunSpecific + 600 9 Foundation 0x198514d8c -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 232 10 libAXSpeechManager.dylib 0x1c3ad0bbc -[AXSpeechThread main] 11 Foundation 0x19868630c __NSThread__start__ + 864 12 libsystem_pthread.dylib 0x1e2f20bfc _pthread_start + 320 13 libsystem_pthread.dylib 0x1e2f29758 thread_start + 8 Apple Crash log Crash log
Posted Last updated
.