Recognize spoken words in recorded or live audio using Speech.

Speech Documentation

Posts under Speech tag

51 Posts
Sort by:
Post not yet marked as solved
0 Replies
57 Views
When we used the SFSpeechRecognizer last week, the returned results were normal. However, it was found during the use this week that the returned results contain punctuation marks. For example, we say yes, and the result returns yes?
Posted
by
Post not yet marked as solved
0 Replies
99 Views
Hi, I have a question regarding the integration of the speech to text library called SFSpeechRecognizer. I need SFSpeechRecognizer to recognize terms that are not present in the iOS dictionary like medication names, chemistry terms, etc. I would have to add them, somehow, for SFSpeechRecognizer to be able to recognise them. Is this possible? Thanks
Posted
by
Post not yet marked as solved
0 Replies
80 Views
Hi, I am trying to use the Speech Recognizer in the Apple's Official Document for my application, also I added the try and catch expression when calling SFSpeechRecognizer, if the user triggers Siri during the runtime, it would immediately crash the whole application when calling SFSpeechRecognizer again, has anyone encountered with similar problems? Here's the code from my application  func transcribe() {  DispatchQueue(label: "Speech Recognizer Queue", qos: .background).async { [weak self] in guard let self = self, let recognizer = self.recognizer, recognizer.isAvailable else {        self?.speakError(RecognizerError.recognizerIsUnavailable)                 return             }              do {           let (audioEngine, request) = try Self.prepareEngine()                 self.audioEngine = audioEngine                 self.request = request                 self.task = recognizer.recognitionTask(with: request, resultHandler: self.recognitionHandler(result:error:))             } catch {                 self.reset()                 self.speakError(error)             }         }     } private static func prepareEngine() throws -> (AVAudioEngine, SFSpeechAudioBufferRecognitionRequest) {         let audioEngine = AVAudioEngine()         let request = SFSpeechAudioBufferRecognitionRequest()         request.shouldReportPartialResults = true         let audioSession = AVAudioSession.sharedInstance()         try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)         try audioSession.setActive(true, options: .notifyOthersOnDeactivation)         let inputNode = audioEngine.inputNode                  let recordingFormat = inputNode.outputFormat(forBus: 0)         inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in             request.append(buffer)         }         audioEngine.prepare()         try audioEngine.start()         return (audioEngine, request)     }
Posted
by
Post not yet marked as solved
0 Replies
205 Views
I'm testing my App in the Xcode 14 beta (released with WWDC22) on iOS 16, and it seems that AVSpeechSynthesisVoice is not working correctly. The following code always returns an empty array: AVSpeechSynthesisVoice.speechVoices() Additionally, attempting to initialize AVSpeechSynthesisVoice returns nil for all of the following: AVSpeechSynthesisVoice(language: AVSpeechSynthesisVoice.currentLanguageCode()) AVSpeechSynthesisVoice(language: "en") AVSpeechSynthesisVoice(language: "en-US") AVSpeechSynthesisVoice(identifier: AVSpeechSynthesisVoiceIdentifierAlex) AVSpeechSynthesisVoice.speechVoices().first
Posted
by
Post not yet marked as solved
2 Replies
242 Views
Hola a todos DEVS Actualmente estoy terminando mi aprendizaje autodidacta Swift, y estaba pensando que una vez que terminé, estaba pensando en conseguir un MacBook Pro de 14" para empezar a trabajar con él, pero no sé qué configuración sería la más recomendable tomar. un M1 Pro con CPU de 10 núcleos, GPU de 16 núcleos, Neural Engine de 16 núcleos, 16 RAM y 512 SSD o recomiendan aumentar la RAM a 32 o la SSD a 1 TB. Dime que me gustaría hacer una inversión que me dure al menos 5 años y no porque me quede corto en la configuración después de unos años, tengo que adquirir otro equipo. ¿Qué recomiendas? Gracias por sus respuestas y su tiempo. Saludos cordiales
Posted
by
Post not yet marked as solved
0 Replies
152 Views
Problem: AVSpeechSynthesiser sometimes describes words rather than just speaking them as a real person would. When speaking in English AVSpeechSynthesiser pronounces the word "A" on its own as "Capital A", while the phrase "A little test" is pronounced correctly. A workaround of lowercasing the speech string - so "A" becomes "a" fixes this specific example. (I'm not yet sure if lowercasing sentences could affect pronunciation badly in some instances.) A more serious example: When speaking French the word "allé" on its own is pronounced by AVSpeechSynthesiser as "allé - e accent aigu" (accent aigu = acute accent). And here the problem exists even when the word is part of a sentence! With "Je suis allé au cinéma" (I went to the cinema) AVSpeechSynthesiser says "Je suis allé e accent aigu au cinéma" which is clearly wrong and unhelpful. Is there a way to fix this?
Posted
by
Post not yet marked as solved
1 Replies
165 Views
@interface MineViewController () @property (nonatomic, strong) AVSpeechSynthesizer *speechSynthesizer; @end @implementation MineViewController (void)speak { //version1 self.speechSynthesizer = [[AVSpeechSynthesizer alloc] init]; self.speechSynthesizer.delegate = self; AVSpeechUtterance *utterance = [[AVSpeechUtterance alloc] initWithString:@"12345678"]; AVSpeechSynthesisVoice *voice = [AVSpeechSynthesisVoice voiceWithLanguage:@"zh-CN"]; [utterance setVoice:voice]; //(worked speakUtterance successed) [self.speechSynthesizer speakUtterance:utterance]; //version2 AVSpeechSynthesizer *speechSynthesizer = [[AVSpeechSynthesizer alloc] init]; speechSynthesizer.delegate = self; AVSpeechUtterance *utterance = [[AVSpeechUtterance alloc] initWithString:@"12345678"]; AVSpeechSynthesisVoice *voice = [AVSpeechSynthesisVoice voiceWithLanguage:@"zh-CN"]; [utterance setVoice:voice]; //(not worked speakUtterance no response) [speechSynthesizer speakUtterance:utterance]; } @end
Posted
by
Post not yet marked as solved
0 Replies
158 Views
I installed last update and now my only phone is useless!! I can’t make even a phone call. No apps will open and I can’t even restart it. I tried the suggestions and nothing. If you don’t update you constantly are reminded, which I try to ignore as my phone was working fine now it is useless!
Posted
by
Post not yet marked as solved
0 Replies
184 Views
I have updated to macOS Monterrey and my code for SFSPeechRecognizer just broke. I get this error if I try to configure an offline speech recognizer for macOS Error Domain=kLSRErrorDomain Code=102 "Failed to access assets" UserInfo={NSLocalizedDescription=Failed to access assets, NSUnderlyingError=0x6000003c5710 {Error Domain=kLSRErrorDomain Code=102 "No asset installed for language=es-ES" UserInfo={NSLocalizedDescription=No asset installed for language=es-ES}}} Here is a code snippet from a demo project: private func process(url: URL) throws {     speech = SFSpeechRecognizer.init(locale: Locale(identifier: "es-ES"))     speech.supportsOnDeviceRecognition = true     let request = SFSpeechURLRecognitionRequest(url: url)     request.requiresOnDeviceRecognition = true     request.shouldReportPartialResults = false     speech.recognitionTask(with: request) { result, error in       guard let result = result else {         if let error = error {           print(error)           return         }         return       }       if let error = error {         print(error)         return       }       if result.isFinal {         print(result.bestTranscription.formattedString)       }     }   } I have tried with different languages (es-ES, en-US) and it says the same error each time. Any idea on how to install these assets or how to fix this?
Posted
by
Post not yet marked as solved
0 Replies
204 Views
Hello, My application has functionality to record a speech and convert the recorded speech to text. The application also tells the user what action he must perform using TTS (Text-to-Speech). When I start the screen recording from control centre and the app starts recording voice. This works. But as soon as the TTS voice is played the recorder will stop recording my voice or the voice played TTS. Please let me know what additional information is required from my side to debug this issue.
Posted
by
Post not yet marked as solved
0 Replies
264 Views
Hi, I'm trying to get this example working on MacOS now that SFSpeechRecognizer is available for the platform. A few questions ... Do I need to make an authorization request of the user if I intend to use "on device recognition"? When I ask for authorization to use speech recognition the dialog that pops up contains text that's not in my speech recognition usage description indicating that recordings will be sent to Apple's servers. But that is not accurate if I am using on device recognition (as far as I can tell). Is there a way to suppress that language if I am not using online speech recognition? Is there an updated example of the article I linked to that describes how to accomplish the same thing with MacOS instead of IOS? My compiler is complaining that AVAudioSession() is not available in MacOS and I'm not sure how to set things up for passing audio from the microphone to the speech recognizer. Thanks :-D Brian Duffy
Posted
by
Post not yet marked as solved
0 Replies
221 Views
We are creating an online book reading app in which we are initiating video call (group call:- for video call. we are using agora SDK) and at the join of call we start book reading and highlight words at other members' end also and recording/recognition text we are using SFSpeechRecognizer but whenever call kit start and video call start SFSpeechRecognizer start to record audio at others end it's getting failed always, can you please provide any solution to record audio during the video call. // // Speech.swift // Edsoma // // Created by Kapil on 16/02/22. // import Foundation import AVFoundation import Speech protocol SpeechRecognizerDelegate {   func didSpoke(speechRecognizer : SpeechRecognizer , word : String?) } class SpeechRecognizer: NSObject {       private let speechRecognizer = SFSpeechRecognizer(locale: Locale.init(identifier: "en-US")) //1   private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?   private var recognitionTask: SFSpeechRecognitionTask?   private let audioEngine = AVAudioEngine()   var delegate : SpeechRecognizerDelegate?   static let shared = SpeechRecognizer()   var isOn = false       func setup(){     speechRecognizer?.delegate = self //3           SFSpeechRecognizer.requestAuthorization { (authStatus) in //4               var isButtonEnabled = false               switch authStatus { //5       case .authorized:         isButtonEnabled = true                 case .denied:         isButtonEnabled = false         print("User denied access to speech recognition")                 case .restricted:         isButtonEnabled = false         print("Speech recognition restricted on this device" )                 case .notDetermined:         isButtonEnabled = false         print("Speech recognition not yet authorized")       @unknown default:         break;       }               OperationQueue.main.addOperation() {         // self.microphoneButton.isEnabled = isButtonEnabled       }     }   }   func transcribeAudio(url: URL) {     // create a new recognizer and point it at our audio     let recognizer = SFSpeechRecognizer()     let request = SFSpeechURLRecognitionRequest(url: url)     // start recognition!     recognizer?.recognitionTask(with: request) { [unowned self] (result, error) in       // abort if we didn't get any transcription back       guard let result = result else {         print("There was an error: \(error!)")         return       }       // if we got the final transcription back, print it       if result.isFinal {         // pull out the best transcription...         print(result.bestTranscription.formattedString)       }     }   }       func startRecording() {     isOn = true     let inputNode = audioEngine.inputNode     if recognitionTask != nil {       inputNode.removeTap(onBus: 0)       self.audioEngine.stop()       self.recognitionRequest = nil       self.recognitionTask = nil       DispatchQueue.main.asyncAfter(deadline: DispatchTime.now() + 1) {         self.startRecording()       }       return       debugPrint("****** recognitionTask != nil *************")     }           let audioSession = AVAudioSession.sharedInstance()     do {               try audioSession.setCategory(AVAudioSession.Category.multiRoute)       try audioSession.setMode(AVAudioSession.Mode.measurement)       try audioSession.setActive(true, options: .notifyOthersOnDeactivation)     } catch {       print("audioSession properties weren't set because of an error.")     }           recognitionRequest = SFSpeechAudioBufferRecognitionRequest()                 guard let recognitionRequest = recognitionRequest else {       fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")     }           recognitionRequest.shouldReportPartialResults = true           recognitionRequest.taskHint = .search           recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in               var isFinal = false               if result != nil {         self.delegate?.didSpoke(speechRecognizer: self, word: result?.bestTranscription.formattedString)          debugPrint(result?.bestTranscription.formattedString)         isFinal = (result?.isFinal)!                 }               if error != nil {         debugPrint("Speech Error ====>",error)         inputNode.removeTap(onBus: 0)         self.audioEngine.stop()         self.recognitionRequest = nil         self.recognitionTask = nil         if BookReadingSettings.isSTTEnable{           DispatchQueue.main.asyncAfter(deadline: DispatchTime.now() + 1) {             self.startRecording()           }         }         // self.microphoneButton.isEnabled = true       }     })          // let recordingFormat = AVAudioFormat.init(commonFormat: .pcmFormatFloat32, sampleRate: <#T##Double#>, interleaved: <#T##Bool#>, channelLayout: <#T##AVAudioChannelLayout#>)//inputNode.outputFormat(forBus: 0)     inputNode.removeTap(onBus: 0)     let sampleRate = AVAudioSession.sharedInstance().sampleRate     let recordingFormat = AVAudioFormat(standardFormatWithSampleRate: sampleRate, channels: 1)     inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in       self.recognitionRequest?.append(buffer)     }           audioEngine.prepare()           do {       try audioEngine.start()     } catch {       print("audioEngine couldn't start because of an error.")     }     debugPrint("Say something, I'm listening!")     //textView.text = "Say something, I'm listening!"         }       /* func stopRecording(){     isOn = false     debugPrint("Recording stoped")     self.audioEngine.stop()     recognitionTask?.cancel()     let inputNode = audioEngine.inputNode     inputNode.removeTap(onBus: 0)     self.recognitionRequest = nil     self.recognitionTask = nil         }*/       func stopRecording(){     isOn = false     debugPrint("Recording stoped")     let inputNode = audioEngine.inputNode     inputNode.removeTap(onBus: 0)     self.audioEngine.stop()     recognitionTask?.cancel()     self.recognitionRequest = nil     self.recognitionTask = nil    }     } extension SpeechRecognizer : SFSpeechRecognizerDelegate {     }
Posted
by
Post not yet marked as solved
0 Replies
307 Views
I’m building a playgroundbook for the upcoming swift student challenge and am confused with the whole rules about fetching data from the internet in my playgroundbook. I have implemented the speech framework in one chapter and am implementing it so that it is to be done offline. The problem is that this is only possible on iPads with an A12 Bionic chip or newer and thus the older iPads and even my mac shows an error when Wifi is turned off. So I’m concerned as to whether my submission will be accepted in such a case or not?
Posted
by
Post not yet marked as solved
0 Replies
306 Views
I’m building a playgroundbook for the upcoming swift student challenge and am confused with the whole rules against using the internet in my playgroundbook. i have implemented the speech framework in one chapter and am forcing it to be done offline. The problem is that this is only possible on iPads with an A12 Bionic chip or newer and thus the older iPads and even my mac shows an error when Wifi is turned off. So I’m concerned as to whether my submission will be accepted in such a case or not?
Posted
by
Post not yet marked as solved
5 Replies
311 Views
It seems that voices with same id behave differently on difference OS versions and devices. How can I distinguish voices across OS and devices? Is it safe to use combination of voice id and OS version? Or is there a voice version code or something better to distinguish voices with same identifier?
Posted
by
Post not yet marked as solved
0 Replies
352 Views
Apple added support for WebKit speech recognition in Safari 14.1. We're trying to use it in our WebApp and facing some issues. The issue is mic never stops after the user stops speaking and we never get the recognized text on iPhone and iPad. Here is a simple WebApp to test : https://oiyw7.csb.app/
Posted
by
Post not yet marked as solved
0 Replies
307 Views
I'm building a game where the player is able to speak commands, so I want to enable speech-to-text capability. I've setup the required info.plist property (for speech recognition privacy) as well as the App Sandbox hardware setting (for audio input). I've confirmed that the application is listening via the audio tap and sending audio buffers to the recognition request. However, the recognition task never executes. NOTE: This is for MacOS, NOT iOS. Also, it works when I have this in a Playground, but when I try to do this in an actual application, the recognition task isn't called. Specs: MacOS: 12.1 XCode: 13.2.1 (13C100) Swift: 5.5.2 Here is the code that I've placed in the AppDelegate of a freshly built SpriteKit application: // // AppDelegate.swift // import Cocoa import AVFoundation import Speech @main class AppDelegate: NSObject, NSApplicationDelegate {   private let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!   private let audioEngine = AVAudioEngine()   private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?   private var recognitionTask: SFSpeechRecognitionTask?   func applicationDidFinishLaunching(_ aNotification: Notification) {     SFSpeechRecognizer.requestAuthorization(requestMicrophoneAccess)   }   func applicationWillTerminate(_ aNotification: Notification) {     // Insert code here to tear down your application   }   func applicationShouldTerminateAfterLastWindowClosed(_ sender: NSApplication) -> Bool {     return true   }   fileprivate func requestMicrophoneAccess(authStatus: SFSpeechRecognizerAuthorizationStatus) {     OperationQueue.main.addOperation {       switch authStatus {       case .authorized:           self.speechRecognizer.supportsOnDeviceRecognition = true           if let speechRecognizer = SFSpeechRecognizer() {             if speechRecognizer.isAvailable {               do {                 try self.startListening()               } catch {                 print(">>> ERROR >>> Listening Error: \(error)")               }             }           }       case .denied:           print("Denied")                 case .restricted:           print("Restricted")                 case .notDetermined:           print("Undetermined")                 default:           print("Unknown")       }     }   }   func startListening() throws {     // Cancel the previous task if it's running.     recognitionTask?.cancel()     recognitionTask = nil           let inputNode = audioEngine.inputNode     // Configure the microphone input.     let recordingFormat = inputNode.outputFormat(forBus: 0)     inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) {(buffer: AVAudioPCMBuffer, when: AVAudioTime) in /********** * Confirmed that the following line is executing continuously **********/       self.recognitionRequest?.append(buffer)     }     startRecognizing()     audioEngine.prepare()     try audioEngine.start()   }   func startRecognizing() {     // Create a recognition task for the speech recognition session.     recognitionRequest = SFSpeechAudioBufferRecognitionRequest()     guard let recognitionRequestInternal = recognitionRequest else { fatalError("Unable to create a SFSpeechAudioBufferRecognitionRequest object") }     recognitionRequestInternal.shouldReportPartialResults = true     recognitionRequestInternal.requiresOnDeviceRecognition = true /************** * Confirmed that the following line is executed, * however the function given to 'recognitionTask' is never called **************/     recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequestInternal) { result, error in       var isFinal = false               if result != nil {         let firstTranscriptionTimestamp = result!.transcriptions.first?.segments.first?.timestamp ?? TimeInterval.zero         isFinal = result!.isFinal || (firstTranscriptionTimestamp != 0)       }               if error != nil {         // Stop recognizing speech if there is a problem.         print("\n>>> ERROR >>> Recognition Error: \(error)")         self.audioEngine.stop()         self.audioEngine.inputNode.removeTap(onBus: 0)         self.recognitionRequest = nil         self.recognitionTask = nil       } else if isFinal {         self.recognitionTask = nil       }     }   } }
Posted
by
Post not yet marked as solved
0 Replies
209 Views
Hi There! A number of my apps are hanging in Big Sur (11.6.2). A spindump of 2 of them show them waiting for NSSpeechSynthesizer to return from CountVoices. I looked in /System/Library/Speech/Voices and it was empty. So I went to System Preferences to see what I could see. Trying to look voices up in Accessibility caused it to hang. Also trying to enter the Siri control panel. So clearly I have a problem with speech synthesis. Why this should hang Chrome, bleep only knows. I just restored the system, so I'm not keen on doing it again, knowing it won't work. What I want to know is: Can I install just the speech synthesis part of MacOS? Where would I find it and how? Is it a kernel extension? I think when I first install this os I skipped Siri, something I never planned to use. Is this what caused the problem? Thanks!
Posted
by