Recognize spoken words in recorded or live audio using Speech.

Speech Documentation

Posts under Speech tag

69 results found
Sort by:
Post not yet marked as solved
400 Views

SFSpeechRecognizer Broken in iPadOS 15.0?

I updated Xcode to Xcode 13 and iPadOS to 15.0. Now my previously working application using SFSpeechRecognizer fails to start, regardless of whether I'm using on device mode or not. I use the delegate approach, and it looks like although the plist is set-up correctly (the authorization is successful and I get the orange circle indicating the microphone is on), the delegate method speechRecognitionTask(_:didFinishSuccessfully:) always returns false, but there is no particular error message to go along with this. I also downloaded the official example from Apple's documentation pages: SpokenWord SFSpeechRecognition example project page Unfortunately, it also does not work anymore. I'm working on a time-sensitive project and don't know where to go from here. How can we troubleshoot this? If it's an issue with Apple's API update or something has changed in the initial setup, I really need to know as soon as possible. Thanks.
Asked Last updated
.
Post not yet marked as solved
85 Views

AVSpeechSynthesizer works wrong with Russian language

Here is a simple app to demonstrate problem: import SwiftUI import AVFoundation struct ContentView: View {     var synthVM = SpeakerViewModel()     var body: some View {         VStack {             Text("Hello, world!")                 .padding()             HStack {               Button("Speak") {                   if self.synthVM.speaker.isPaused {                       self.synthVM.speaker.continueSpeaking()                 } else {                     self.synthVM.speak(text: "Привет на корабле! Кто это пришел к нам, чтобы посмотреть на это произведение?")                 }               }               Button("Pause") {                   if self.synthVM.speaker.isSpeaking {                       self.synthVM.speaker.pauseSpeaking(at: .word)                 }               }               Button("Stop") {                   self.synthVM.speaker.stopSpeaking(at: .word)               }             }         }     } } struct ContentView_Previews: PreviewProvider {     static var previews: some View {         ContentView()     } } class SpeakerViewModel: NSObject {     var speaker = AVSpeechSynthesizer()      override init() {     super.init()     self.speaker.delegate = self   }      func speak(text: String) {     let utterance = AVSpeechUtterance(string: text)       utterance.voice = AVSpeechSynthesisVoice(language: "ru")     speaker.speak(utterance)   } } extension SpeakerViewModel: AVSpeechSynthesizerDelegate {   func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didStart utterance: AVSpeechUtterance) {     print("started")   }   func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didPause utterance: AVSpeechUtterance) {     print("paused")   }   func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didContinue utterance: AVSpeechUtterance) {}   func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didCancel utterance: AVSpeechUtterance) {}   func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, willSpeakRangeOfSpeechString characterRange: NSRange, utterance: AVSpeechUtterance) {       guard let rangeInString = Range(characterRange, in: utterance.speechString) else { return }       print("Will speak: \(utterance.speechString[rangeInString])")   }   func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {     print("finished")   } } On simulator all works fine, but on real device there are many strange words appears in synthesis speak. And willSpeakRangeOfSpeechString output is different on simulator and real device Simulator: started Will speak: Привет Will speak: на Will speak: корабле! Will speak: Кто Will speak: это Will speak: пришел Will speak: к Will speak: нам, Will speak: чтобы Will speak: посмотреть Will speak: на Will speak: это Will speak: произведение? finished iPhone output have errors: 2021-10-12 17:09:32.613273+0300 VoiceTest[9027:203522] [AXTTSCommon] Broken user rule: \b([234567890]+)2 (мили|кварты|чашки|{столовых }ложки)(?=$|\s|[[:punct:]»\xa0]) > Error Domain=NSCocoaErrorDomain Code=2048 "The value “\b([234567890]+)2 (мили|кварты|чашки|{столовых }ложки)(?=$|\s|[[:punct:]»\xa0])” is invalid." UserInfo={NSInvalidValue=\b([234567890]+)2 (мили|кварты|чашки|{столовых }ложки)(?=$|\s|[[:punct:]»\xa0])} 2021-10-12 17:09:32.613548+0300 VoiceTest[9027:203522] [AXTTSCommon] Broken user rule: \b(1\d+)2 (мили|кварты|чашки|{столовых }ложки)(?=$|\s|[[:punct:]»\xa0]) > Error Domain=NSCocoaErrorDomain Code=2048 "The value “\b(1\d+)2 (мили|кварты|чашки|{столовых }ложки)(?=$|\s|[[:punct:]»\xa0])” is invalid." UserInfo={NSInvalidValue=\b(1\d+)2 (мили|кварты|чашки|{столовых }ложки)(?=$|\s|[[:punct:]»\xa0])} 2021-10-12 17:09:32.613725+0300 VoiceTest[9027:203522] [AXTTSCommon] Broken user rule: \b2 (мили|кварты|чашки|{столовых }ложки)(?=$|\s|[[:punct:]»\xa0]) > Error Domain=NSCocoaErrorDomain Code=2048 "The value “\b2 (мили|кварты|чашки|{столовых }ложки)(?=$|\s|[[:punct:]»\xa0])” is invalid." UserInfo={NSInvalidValue=\b2 (мили|кварты|чашки|{столовых }ложки)(?=$|\s|[[:punct:]»\xa0])} started Will speak: Привет Will speak: на Will speak: ивет на корабле! Will speak: Кто Will speak: это Will speak: Кто это пришел Will speak: к Will speak: нам, Will speak: чтобы Will speak: посмотреть Will speak: на Will speak: реть на это Will speak: на это произведение? finished Error appears on iOS / iPadOS 15.0, 15.0.1, 15.0.2, 14.7 But all works fine on 14.8 Looks like engine error. How to fix that issue?
Asked
by sanctor.
Last updated
.
Post not yet marked as solved
148 Views

WatchOS dictation timeout

Hi! Great to see this forum! I’m new to developing WatchOS apps, and I have a question with regard to dictation. As far as I know I can develop an independent Apple Watch app with dictation capabilities that can connect to Apple services for speech recognition through WiFi, 4G, etc. I’ve experienced with another app that after about 30 seconds, live dictation cuts off, and the “Done” button on the top disappears, leaving only the “Cancel” button. I’m not sure if this is an app specific issue, but it results in loss of input due to being forced to press the “Cancel” button. Apart from that, I like to prepare speeches while I take a jog, so I want to develop a practical app where I can continue to speak with dictation. My questions are: Is there a way to increase live dictation timeout? Can we expect offline dictation anytime soon?
Asked Last updated
.
Post not yet marked as solved
170 Views

iOS15: SFSpeechRecognizer error kLSRErrorDomain code 201

Hi, I use device-local speech recognition for speech input. Now some iOS 15 upgraded devices return the new error domain / code kLSRErrorDomain, code 201 (previously the errors were mostly in kAFAssistantErrorDomain). Has anybody an idea what it means and how to fix it? Thanks!
Asked
by TH0MAS.
Last updated
.
Post not yet marked as solved
42 Views

STT IOS Safari

Since Version 14.2 we are having issues with STT. By the past we were using Azure and it was working fine. Since you've implemented partially the Speech Recognition API things are getting worse, on IOS. No problem on Osx. It seems like the recording we send to STT has a very poor quality and some part of sentence missing. When I implement it solo ti works fine, but soon as I play an audio before opening the microphone it does'nt work anymore (or only partially). I come to the question : Would there be a solution while waiting for you to deploy a working Speech Recongnition API ?
Asked
by dede_013.
Last updated
.
Post not yet marked as solved
2.2k Views

Repeat After Me utility missing

Hi,I'm looking for the Repeat After Me application that used to be included in /Developer/Applications/Utilities/Speech. I can find mention of it (https://developer.apple.com/library/prerelease/content/documentation/UserExperience/Conceptual/SpeechSynthesisProgrammingGuide/FineTuning/FineTuning.html), but can't locate it with the most recent install of xcode.Do I need to download something else? Has it been moved or removed?Thanks!
Asked
by bputman.
Last updated
.
Post not yet marked as solved
75 Views

iOS 15 - AVSpeechSynthesizerDelegate didCancel not getting called

in iOS 15, on stopSpeaking of AVSpeechSynthesizer, didFinish delegate method getting called instead of didCancel which is working fine in iOS 14 and below version.
Asked Last updated
.
Post not yet marked as solved
123 Views

How does Apple transfer/store Siri (WebSpeechAPI) voice data?

I'm trying to find specific information on how Apple transfers & stores the voice data that's transferred for speech recognition in Safari as part of WebSpeechAPI. All I keep seeing are generic privacy documents that do not provide any detail. Is anyone able to point me in the right direction of an explanation of how customer data is used?
Asked
by antonpug.
Last updated
.
Post marked as solved
186 Views

kLSRErrorDomain Error 301

I’m getting a flood of these errors in a shipping speech recognition app since users started upgrading to iOS15. It’s usually being returned by the speech recogniser a few seconds after recognition begins. I can’t find any reference to it anywhere in Apple’s documentation. What is it? Code: 301 Domain: kLSRErrorDomain Description: Recognition request was canceled
Asked Last updated
.
Post not yet marked as solved
1.8k Views

Speech to text API per-app limits

Hello to all kind developers out there,I’m currently developing a audio-messaging app, for users to send short audio to each other.To make the communication experience as smooth and natural as possible, we are using Speech Framework for transcribing user-input live.Since this feature is high in demand for some of our users, we are worried about unexpected quotas and limits.(1) We know that individual recordings should be less than one minute(2) The answer here says about 1000 reqs/hour per device:https://developer.apple.com/library/archive/qa/qa1951/_index.html#//apple_ref/doc/uid/DTS40017662(3) The documentation says: “Individual devices may be limited in the number of recognitions that can be performed per day and an individual app may be throttled globally, based on the number of requests it makes per day”.We are well under limits for (1) and (2), but there are no specific documentation for limits per-app, and being “throttled globally” sounds scary.Can anyone give us information about per-app limits or any other kinds of limit that might potentially put an end to our lives?Thank you
Asked
by fkymy.
Last updated
.
Post not yet marked as solved
225 Views

AVSpeechSynthesizer - how to run callback onError

I use AVSpeechSynthesizer to pronounce some text in German. Sometimes it works just fine and sometimes it doesn't for some unknown to me reason (there is no error, because the speak() method doesn't throw and the only thing I am able to observe is the following message logged in the console): _BeginSpeaking: couldn't begin playback I tried to find some API in the AVSpeechSynthesizerDelegate to register a callback when error occurs, but I have found none. The closest match was this (but it appears to be only available for macOS, not iOS): https://developer.apple.com/documentation/appkit/nsspeechsynthesizerdelegate/1448407-speechsynthesizer?changes=_10 Below you can find how I initialize and use the speech synthesizer in my app: class Speaker: NSObject, AVSpeechSynthesizerDelegate {   class func sharedInstance() -> Speaker {     struct Singleton {       static var sharedInstance = Speaker()     }     return Singleton.sharedInstance   }       let audioSession = AVAudioSession.sharedInstance()   let synth = AVSpeechSynthesizer()       override init() {     super.init()     synth.delegate = self   }       func initializeAudioSession() {     do {       try audioSession.setCategory(.playback, mode: .spokenAudio, options: .duckOthers)       try audioSession.setActive(true, options: .notifyOthersOnDeactivation)     } catch {             }   }       func speak(text: String, language: String = "de-DE") { guard !self.synth.isSpeaking else { return }     let utterance = AVSpeechUtterance(string: text)     let voice = AVSpeechSynthesisVoice.speechVoices().filter { $0.language == language }.first!           utterance.voice = voice     self.synth.speak(utterance)   } } The audio session initialization is ran during app started just once. Afterwards, speech is synthesized by running the following code: Speaker.sharedInstance.speak(text: "Lederhosen") The problem is that I have no way of knowing if the speech synthesis succeeded—the UI is showing "speaking" state, but nothing is actually being spoken.
Asked Last updated
.
Post not yet marked as solved
210 Views

AXSpeech crash with CoreFoundation _CFGetNonObjCTypeID

Hi, I am facing a strange issue in my app there is an intermittent crash, I am using AVSpeechSynthesizer for speech discovery not sure if that is causing the problem crash log has below information: Firebase Crash log Crashed: AXSpeech 0 CoreFoundation 0x197325d00 _CFAssertMismatchedTypeID + 112 1 CoreFoundation 0x197229188 CFRunLoopSourceIsSignalled + 314 2 Foundation 0x198686ca0 performQueueDequeue + 440 3 Foundation 0x19868641c __NSThreadPerformPerform + 112 4 CoreFoundation 0x19722c990 __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 28 5 CoreFoundation 0x19722c88c __CFRunLoopDoSource0 + 208 6 CoreFoundation 0x19722bbfc __CFRunLoopDoSources0 + 376 7 CoreFoundation 0x197225b70 __CFRunLoopRun + 820 8 CoreFoundation 0x197225308 CFRunLoopRunSpecific + 600 9 Foundation 0x198514d8c -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 232 10 libAXSpeechManager.dylib 0x1c3ad0bbc -[AXSpeechThread main] 11 Foundation 0x19868630c __NSThread__start__ + 864 12 libsystem_pthread.dylib 0x1e2f20bfc _pthread_start + 320 13 libsystem_pthread.dylib 0x1e2f29758 thread_start + 8 Apple Crash log Crash log
Asked Last updated
.
Post not yet marked as solved
184 Views

Can you perform two or more OFFLINE speech recognition tasks simultaneously?

Can you perform two or more OFFLINE speech recognition tasks simultaneously? SFSpeechRecognizer, SFSpeechURLRecognitionRequest offline limitation? Running on macOS Big Sur 11.5.2 I would like to be perform two or more offline speech recognition tasks simultaneously. I've executed two tasks in the same application AND executed two different applications, both using offline recognition. Once I initiate the other thread or other application, the first recognition stops. Since the computer supports multiple threads, I planned to take make use of the concurrency. Use cases #1 multiple Audio or video files that I wish to transcribe -- cuts down on the wait time. #2 split a single large file up into multiple sections and stitch the results together -- again cuts down on the wait time. I set on device recognition to TRUE because my target files can be up to two hours in length. My test files are 15-30 minutes in length and I have a number of them, so recognition must be done on the device. func recognizeFile_Compact(url:NSURL) { let language = "en-US" //"en-GB" let recognizer = SFSpeechRecognizer(locale: Locale.init(identifier: language))! let recogRequest = SFSpeechURLRecognitionRequest(url: url as URL) recognizer.supportsOnDeviceRecognition = true // ensure the DEVICE does the work -- don't send to cloud recognizer.defaultTaskHint = .dictation // give a hint as dictation recogRequest.requiresOnDeviceRecognition = true // don recogRequest.shouldReportPartialResults = false // we dont want partial results var strCount = 0 let recogTask = recognizer.recognitionTask(with: recogRequest, resultHandler: { (result, error) in guard let result = result else { print("Recognition failed, \(error!)") return } let text = result.bestTranscription.formattedString strCount += 1 print(" #\(strCount), "Best: \(text) \n" ) if (result.isFinal) { print("WE ARE FINALIZED") } }) }
Asked
by MisterE.
Last updated
.
Post not yet marked as solved
563 Views

TTS Alex voice is treated like available when it's actually not

Hello everybody! In my app I allow the user to change TTS voices and English Alex voice is one of possible options. However, there are some cases when it's treated as available when it's actually not. It results in pronouncing the TTS utterance with another voice. To prepare the list of available voices I use next code: NSMutableArray *voices = [NSMutableArray new]; for (AVSpeechSynthesisVoice *voice in [AVSpeechSynthesisVoice speechVoices]) { 		[voices addObject:@{ 				@"id": voice.identifier, 				@"name": voice.name, 				@"language": voice.language, 				@"quality": (voice.quality == AVSpeechSynthesisVoiceQualityEnhanced) ? @500 : @300 		}]; } To start the playback I use the next code (here it's simplified a bit): AVSpeechUtterance *utterance = [[AVSpeechUtterance alloc] initWithString:text]; utterance.voice = [AVSpeechSynthesisVoice voiceWithIdentifier:voice]; [AVSpeechSynthesizer speakUtterance:utterance]; Cases when AVSpeechSynthesisVoice returns Alex as available when it's not: The easiest way to reproduce it is on simulator. When I download Alex from iOS settings, the download button disappear, but when I press on the voice nothing happens. As a result it seems to be downloaded, however it can't be deleted. In some cases Alex is dowloaded correctly and is actually available in the app, but when I try to delete it, it looks like it's not fully deleted. As a result it's treated as available in my app, but in iOS settings it's shown as not downloaded. If the iPhone storage is close to the full state as much as possible and Alex voice hasn't been used recently, looks like it's being offloaded, so it's shown as available both in iOS settings and in my app, but in fact the utterance is being pronounced by another voice. For all cases above in my app Alex looks like it's available, but when I pass it to the utterance it's pronounced with some different voice. Note that it happens only with this voice, I haven't seen such a case for others. Maybe this voice should be treated separately somehow?
Asked
by grindos.
Last updated
.
Post not yet marked as solved
404 Views

AVSpeechSynthesizer buffer conversion, write format bug?

Is the format description AVSpeechSynthesizer for the speech buffer is correct? When I attempt to convert it, I get back noise from two different conversion methods. I am seeking to convert the speech buffer provided by the AVSpeechSynthesizer "func write(_ utterance: AVSpeechUtterance..." method. The goal is to convert the sample type, change the sample rate and change from mono to stereo buffer. I later manipulate the buffer data and pass it through AVAudioengine. For testing purposes, I have kept the sample rate to the original 22050.0 What have I tried? I have a method that I've been using for years named "resampleBuffer" that does this. When I apply it to the speech buffer, I get back noise. When I attempt to manually convert format and to stereo with "convertSpeechBufferToFloatStereo", I am getting back clipped output. I tested flipping the samples, addressing the Big Endian, Signed Integer but that didn't work. The speech buffer description is inBuffer description: <AVAudioFormat 0x6000012862b0: 1 ch, 22050 Hz, 'lpcm' (0x0000000E) 32-bit big-endian signed integer> import Cocoa import AVFoundation class SpeakerTest: NSObject, AVSpeechSynthesizerDelegate { let synth = AVSpeechSynthesizer() override init() { super.init() } func resampleBuffer( inSource: AVAudioPCMBuffer, newSampleRate: Double) -> AVAudioPCMBuffer? { // resample and convert mono to stereo var error : NSError? let kChannelStereo = AVAudioChannelCount(2) let convertRate = newSampleRate / inSource.format.sampleRate let outFrameCount = AVAudioFrameCount(Double(inSource.frameLength) * convertRate) let outFormat = AVAudioFormat(standardFormatWithSampleRate: newSampleRate, channels: kChannelStereo)! let avConverter = AVAudioConverter(from: inSource.format, to: outFormat ) let outBuffer = AVAudioPCMBuffer(pcmFormat: outFormat, frameCapacity: outFrameCount)! let inputBlock : AVAudioConverterInputBlock = { (inNumPackets, outStatus) -> AVAudioBuffer? in outStatus.pointee = AVAudioConverterInputStatus.haveData // very important, must have let audioBuffer : AVAudioBuffer = inSource return audioBuffer } avConverter?.sampleRateConverterAlgorithm = AVSampleRateConverterAlgorithm_Mastering avConverter?.sampleRateConverterQuality = .max if let converter = avConverter { let status = converter.convert(to: outBuffer, error: &error, withInputFrom: inputBlock) // print("\(status): \(status.rawValue)") if ((status != .haveData) || (error != nil)) { print("\(status): \(status.rawValue), error: \(String(describing: error))") return nil // conversion error } } else { return nil // converter not created } // print("success!") return outBuffer } func writeToFile(_ stringToSpeak: String, speaker: String) { var output : AVAudioFile? let utterance = AVSpeechUtterance(string: stringToSpeak) let desktop = "~/Desktop" let fileName = "Utterance_Test.caf" // not in sandbox var tempPath = desktop + "/" + fileName tempPath = (tempPath as NSString).expandingTildeInPath let usingSampleRate = 22050.0 // 44100.0 let outSettings = [ AVFormatIDKey : kAudioFormatLinearPCM, // kAudioFormatAppleLossless AVSampleRateKey : usingSampleRate, AVNumberOfChannelsKey : 2, AVEncoderAudioQualityKey : AVAudioQuality.max.rawValue ] as [String : Any] // temporarily ignore the speaker and use the default voice let curLangCode = AVSpeechSynthesisVoice.currentLanguageCode() utterance.voice = AVSpeechSynthesisVoice(language: curLangCode) // utterance.volume = 1.0 print("Int32.max: \(Int32.max), Int32.min: \(Int32.min)") synth.write(utterance) { (buffer: AVAudioBuffer) in guard let pcmBuffer = buffer as? AVAudioPCMBuffer else { fatalError("unknown buffer type: \(buffer)") } if ( pcmBuffer.frameLength == 0 ) { // done } else { // append buffer to file var outBuffer : AVAudioPCMBuffer outBuffer = self.resampleBuffer( inSource: pcmBuffer, newSampleRate: usingSampleRate)! // doesnt work // outBuffer = self.convertSpeechBufferToFloatStereo( pcmBuffer ) // doesnt work // outBuffer = pcmBuffer // original format does work if ( output == nil ) { //var bufferSettings = utterance.voice?.audioFileSettings // Audio files cannot be non-interleaved. var outSettings = outBuffer.format.settings outSettings["AVLinearPCMIsNonInterleaved"] = false let inFormat = pcmBuffer.format print("inBuffer description: \(inFormat.description)") print("inBuffer settings: \(inFormat.settings)") print("inBuffer format: \(inFormat.formatDescription)") print("outBuffer settings: \(outSettings)\n") print("outBuffer format: \(outBuffer.format.formatDescription)") output = try! AVAudioFile( forWriting: URL(fileURLWithPath: tempPath),settings: outSettings) } try! output?.write(from: outBuffer) print("done") } } } } class ViewController: NSViewController { let speechDelivery = SpeakerTest() override func viewDidLoad() { super.viewDidLoad() let targetSpeaker = "Allison" var sentenceToSpeak = "" for indx in 1...10 { sentenceToSpeak += "This is sentence number \(indx). [[slnc 3000]] \n" } speechDelivery.writeToFile(sentenceToSpeak, speaker: targetSpeaker) } } Three test can be performed. The only one that works is to directly write the buffer to disk Is this really "32-bit big-endian signed integer"? Am I addressing this correctly or is this a bug? I'm on macOS 11.4
Asked
by MisterE.
Last updated
.