Recognize spoken words in recorded or live audio using Speech.

Speech Documentation

Posts under Speech tag

51 Posts
Sort by:
Post not yet marked as solved
0 Replies
322 Views
After discovering that the upgrade of ios15, the topSpeaking method using AVSpeechSynthesizer did not correctly trigger the speedSynthesizer (:didCancel:), but rather triggered the speedSynthesizer method ('didFinish:') which led to some of my business errors and solved
Posted
by
Post not yet marked as solved
1 Replies
370 Views
Hello, I am looking for information on TTS and STT. I am aware that there is possibility to implement both offline and online. I am interested in knowing if it is possible to enable ondevice TTS and STT for third party app, even when the device is online. **Our use case is: when the app is still online, we wish to do TTS and STT on device and not on Apple server(privacy concerns). ** Please let me know if it is possible at all or point me in the right direction. I really appreciate and look forward to your reply.
Posted
by
Post not yet marked as solved
2 Replies
445 Views
Hello everyone, I expose my problem to you. I should make for MacOS, iOS, iPadOS, a 3D woman virtual assistant able to listen to questions and provide answers: a kind of Siri but with a model of a girl in 3D on the screen that has to mimic speech (lip sync) and will have to move according to the needs of the program. I would like to do it all with Xcode, SwiftUI, SceneKit. I have already done some good experiments with the Speech Framework for speech recognition with good results. For the spoken part (TTS) I will use an external service. Here I have a problem: the Speech Framework listens and transcribes even when my app speaks. I would like to be able to mute the microphone when the audio file is playing and unmute it when playback ends. I also tried to create a 3D female model with Mixamo and exported some animations. I was able to import the animations into an Xcode project and get them to work (https://youtu.be/HJtbUHdPjzQ). Next I want to try to create a model using 3D Object Capture. I also saw the video session 604 (https://developer.apple.com/videos/play/wwdc2017/604/) which clarified many doubts for me. What I still haven't understood: How can I blend multiple animations from code? For example: I could have the animation of the girl walking and the animation of the still girl that she greets by moving her arm and I would like to be able to join them and make her greet while she walks. If I have the character completely rigged, from the code how can I make my mouth, eyes, etc. move to create a kind of lip sync and facial expressions? Do you know if there is any good tutorial even for a fee that can fill these gaps? I've also searched Udemy but haven't found a SceneKit course similar to the one I need. However, I think that the solutions for ARKit or RealityKit can also be fine.
Posted
by
Post marked as solved
1 Replies
301 Views
So sorry if I should't be asking this here, but am trying to find a current-ish tutorial on how to make an app that converts speech to text in real time. Transcribing from text to speech as you're speaking. I've found a few one's on YouTube, but they are quite old, or just transcribing from a recorded file, etc. etc. If anyone is aware of a good tutorial, paid or not, I would so appreciate any link. Thank you
Posted
by
Post marked as solved
2 Replies
661 Views
So am watching a Speech To Text demo on YouTube, here: https://www.youtube.com/watch?v=SZJ8zjMGUcY There are no files, so am typing from the screen, and immediately run into an error that confuses me. at class ViewController : UIViewController, SFSpeechRecognizer { here's a screenshot: Swift gives me an error indicating that Multiple Inheritance is not allowed. The programmer doesn't have files to download, and I like to start from scratch anyway, typing and copying so I am forced to read each line. Is there something I have to change in the project itself that allows Multiple Inheritances? This video is from last year, and is on Swift 5.0. So I don't think there could be that much of a major change in Swift in that time. Thanks
Posted
by
Post not yet marked as solved
2 Replies
1.1k Views
Hi, I've been working on a project that utilizes the Web Speech API: https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API. However, I've noticed some strange behavior in the newest versions of Safari on iOS, iPadOS, and macOS. One issue that occurs regularly is that the text input will repeat after voice input has ended. This can be seen on this demo provided by Google: https://www.google.com/intl/en/chrome/demos/speech.html This was not happening when I tested on 14.1 (the version I upgraded from). Upon debugging, it appears the doubling of text is included in transcriptions that are not flagged as isFinal, as well as transcriptions that are, which makes me think that something isn't working properly in the implementation of the API. Anecdotally, the speech synthesis appears to be much less accurate now as well, and I've noticed some odd behavior when I set the continuous flag to false as well. The API delegates the actual speech synthesis work to Siri, so I'm wondering why there would be a different here compared to using dictation in other apps. My main question is: has anyone else run into problems like this? If so, how are you working around them?
Posted
by
Post marked as solved
1 Replies
516 Views
I'm developing a game that will use speech recognition to execute various commands. I am using code from Apple's Recognizing Speech in Live Audio documentation page. When I run this in a Swift Playground, it works just fine. However, when I make a SpriteKit game application (basic setup from Xcode's "New Project" menu option), I get the following error: required condition is false: IsFormatSampleRateAndChannelCountValid(hwFormat) Upon further research, it appears that my input node has no channels. The following is the relevant portion of my code, along with debug output: let inputNode = audioEngine.inputNode print("Number of inputs: \(inputNode.numberOfInputs)") // 1 print("Input Format: \(inputNode.inputFormat(forBus: 0))") // <AVAudioFormat 0x600001bcf200: 0 ch, 0 Hz, 'lpcm' (0x00000029) 32-bit little-endian float, deinterleaved> let channelCount = inputNode.inputFormat(forBus: 0).channelCount print("Channel Count: \(channelCount)") // 0 <== Agrees with the inputFormat output listed previously // Configure the microphone input. print("Number of outputs: \(inputNode.numberOfOutputs)") // 1 let recordingFormat = inputNode.outputFormat(forBus: 0) print("Output Format: \(recordingFormat)") // <AVAudioFormat 0x600001bf3160: 2 ch, 44100 Hz, Float32, non-inter> inputNode.installTap(onBus: 0, bufferSize: 256, format: recordingFormat, block: audioTap) // <== This is where the error occurs. // NOTE: 'audioTap' is a function defined in this class. Using this defined function instead of an inline, anonymous function. The code snippet is included in the game's AppDelegate class (which includes import statements for Cocoa, AVFoundation, and Speech), and executes during its applicationDidFinishLaunching function. I'm having trouble understanding why Playground works, but a game app doesn't work. Do I need to do something specific to get the application to recognize the microphone? NOTE: This if for MacOS, NOT iOS. While the "How To" documentation cited earlier indicates iOS, Apple stated at WWDC19 that it is now supported on the MacOS. NOTE: I have included the NSSpeechRecognitionUsageDescription key in the applications plist, and successfully acknowledged the authorization request for the microphone.
Posted
by
Post not yet marked as solved
0 Replies
458 Views
I am trying to run two SFSpeechRecognizer simultaneously with different languages. So I tried the following: var speechRecognizer1: SFSpeechRecognizer? = SFSpeechRecognizer(locale: Locale(identifier: "en-GB")) var speechRecognizer2: SFSpeechRecognizer? = SFSpeechRecognizer(locale: Locale(identifier: "it-IT")) var speechAudioBufferRecognitionRequest = SFSpeechAudioBufferRecognitionRequest() var speechRecognitionTask1: SFSpeechRecognitionTask! var speechRecognitionTask2: SFSpeechRecognitionTask! ... let node = self.audioEngine.inputNode let recordingFormat = node.outputFormat(forBus: 0) node.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat, block: { (buffer, time) in self.speechAudioBufferRecognitionRequest.append(buffer) }) self.audioEngine.prepare() do { try audioEngine.start() } catch { print("Error") return } guard let myRecognition = SFSpeechRecognizer() else { print("Error") return } if(!myRecognition.isAvailable) { print("Error") return } self.speechRecognitionTask1 = self.speechRecognizer1?.recognitionTask(with: self.speechAudioBufferRecognitionRequest, resultHandler: { (response, error) in if(response != nil) { guard let response = response else { if let error = error { print("Error") return } else { print("Error") return } } var message = response.bestTranscription.formattedString }) self.speechRecognitionTask2 = self.speechRecognizer2?.recognitionTask(with: self.speechAudioBufferRecognitionRequest, resultHandler: { (response, error) in if(response != nil) { guard let response = response else { if let error = error { print("Error") return } else { print("Error") return } } var message = response.bestTranscription.formattedString }) This gave me the error: SFSpeechAudioBufferRecognitionRequest cannot be re-used So I tried to create two instances and initialized them by: self.speechAudioBufferRecognitionRequest1.append(buffer) self.speechAudioBufferRecognitionRequest2.append(buffer) }) But this also didn't work. There was no error, but one speechRecognition just overwrote the other... I tried some other stuff like changing the bus etc. but was not successful...
Posted
by
Post not yet marked as solved
0 Replies
215 Views
Hi, I am trying the speech recognition API. For live recognition, I use SFSpeechAudioBufferRecognitionRequest It works fine in general. But when I try to use file recognition with SFSpeechAudioBufferRecognitionRequest I found that I can't find a way to cancel the recognition? Cancel or kill the request can't stop the call back from result handler. So I need to wait the recognition finish when I receive the result's isFinal flag is true. It make me a little inconvenient if I try to recognize a long audio file and finally decide to cancel the recognition. Are there some ways to cancel the file recognition directly? Thank you~~ Eric
Posted
by
Post not yet marked as solved
0 Replies
209 Views
Hi There! A number of my apps are hanging in Big Sur (11.6.2). A spindump of 2 of them show them waiting for NSSpeechSynthesizer to return from CountVoices. I looked in /System/Library/Speech/Voices and it was empty. So I went to System Preferences to see what I could see. Trying to look voices up in Accessibility caused it to hang. Also trying to enter the Siri control panel. So clearly I have a problem with speech synthesis. Why this should hang Chrome, bleep only knows. I just restored the system, so I'm not keen on doing it again, knowing it won't work. What I want to know is: Can I install just the speech synthesis part of MacOS? Where would I find it and how? Is it a kernel extension? I think when I first install this os I skipped Siri, something I never planned to use. Is this what caused the problem? Thanks!
Posted
by
Post not yet marked as solved
0 Replies
307 Views
I'm building a game where the player is able to speak commands, so I want to enable speech-to-text capability. I've setup the required info.plist property (for speech recognition privacy) as well as the App Sandbox hardware setting (for audio input). I've confirmed that the application is listening via the audio tap and sending audio buffers to the recognition request. However, the recognition task never executes. NOTE: This is for MacOS, NOT iOS. Also, it works when I have this in a Playground, but when I try to do this in an actual application, the recognition task isn't called. Specs: MacOS: 12.1 XCode: 13.2.1 (13C100) Swift: 5.5.2 Here is the code that I've placed in the AppDelegate of a freshly built SpriteKit application: // // AppDelegate.swift // import Cocoa import AVFoundation import Speech @main class AppDelegate: NSObject, NSApplicationDelegate {   private let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!   private let audioEngine = AVAudioEngine()   private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?   private var recognitionTask: SFSpeechRecognitionTask?   func applicationDidFinishLaunching(_ aNotification: Notification) {     SFSpeechRecognizer.requestAuthorization(requestMicrophoneAccess)   }   func applicationWillTerminate(_ aNotification: Notification) {     // Insert code here to tear down your application   }   func applicationShouldTerminateAfterLastWindowClosed(_ sender: NSApplication) -> Bool {     return true   }   fileprivate func requestMicrophoneAccess(authStatus: SFSpeechRecognizerAuthorizationStatus) {     OperationQueue.main.addOperation {       switch authStatus {       case .authorized:           self.speechRecognizer.supportsOnDeviceRecognition = true           if let speechRecognizer = SFSpeechRecognizer() {             if speechRecognizer.isAvailable {               do {                 try self.startListening()               } catch {                 print(">>> ERROR >>> Listening Error: \(error)")               }             }           }       case .denied:           print("Denied")                 case .restricted:           print("Restricted")                 case .notDetermined:           print("Undetermined")                 default:           print("Unknown")       }     }   }   func startListening() throws {     // Cancel the previous task if it's running.     recognitionTask?.cancel()     recognitionTask = nil           let inputNode = audioEngine.inputNode     // Configure the microphone input.     let recordingFormat = inputNode.outputFormat(forBus: 0)     inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) {(buffer: AVAudioPCMBuffer, when: AVAudioTime) in /********** * Confirmed that the following line is executing continuously **********/       self.recognitionRequest?.append(buffer)     }     startRecognizing()     audioEngine.prepare()     try audioEngine.start()   }   func startRecognizing() {     // Create a recognition task for the speech recognition session.     recognitionRequest = SFSpeechAudioBufferRecognitionRequest()     guard let recognitionRequestInternal = recognitionRequest else { fatalError("Unable to create a SFSpeechAudioBufferRecognitionRequest object") }     recognitionRequestInternal.shouldReportPartialResults = true     recognitionRequestInternal.requiresOnDeviceRecognition = true /************** * Confirmed that the following line is executed, * however the function given to 'recognitionTask' is never called **************/     recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequestInternal) { result, error in       var isFinal = false               if result != nil {         let firstTranscriptionTimestamp = result!.transcriptions.first?.segments.first?.timestamp ?? TimeInterval.zero         isFinal = result!.isFinal || (firstTranscriptionTimestamp != 0)       }               if error != nil {         // Stop recognizing speech if there is a problem.         print("\n>>> ERROR >>> Recognition Error: \(error)")         self.audioEngine.stop()         self.audioEngine.inputNode.removeTap(onBus: 0)         self.recognitionRequest = nil         self.recognitionTask = nil       } else if isFinal {         self.recognitionTask = nil       }     }   } }
Posted
by
Post not yet marked as solved
0 Replies
353 Views
Apple added support for WebKit speech recognition in Safari 14.1. We're trying to use it in our WebApp and facing some issues. The issue is mic never stops after the user stops speaking and we never get the recognized text on iPhone and iPad. Here is a simple WebApp to test : https://oiyw7.csb.app/
Posted
by
Post not yet marked as solved
5 Replies
313 Views
It seems that voices with same id behave differently on difference OS versions and devices. How can I distinguish voices across OS and devices? Is it safe to use combination of voice id and OS version? Or is there a voice version code or something better to distinguish voices with same identifier?
Posted
by
Post not yet marked as solved
0 Replies
306 Views
I’m building a playgroundbook for the upcoming swift student challenge and am confused with the whole rules against using the internet in my playgroundbook. i have implemented the speech framework in one chapter and am forcing it to be done offline. The problem is that this is only possible on iPads with an A12 Bionic chip or newer and thus the older iPads and even my mac shows an error when Wifi is turned off. So I’m concerned as to whether my submission will be accepted in such a case or not?
Posted
by
Post not yet marked as solved
0 Replies
307 Views
I’m building a playgroundbook for the upcoming swift student challenge and am confused with the whole rules about fetching data from the internet in my playgroundbook. I have implemented the speech framework in one chapter and am implementing it so that it is to be done offline. The problem is that this is only possible on iPads with an A12 Bionic chip or newer and thus the older iPads and even my mac shows an error when Wifi is turned off. So I’m concerned as to whether my submission will be accepted in such a case or not?
Posted
by
Post not yet marked as solved
0 Replies
222 Views
We are creating an online book reading app in which we are initiating video call (group call:- for video call. we are using agora SDK) and at the join of call we start book reading and highlight words at other members' end also and recording/recognition text we are using SFSpeechRecognizer but whenever call kit start and video call start SFSpeechRecognizer start to record audio at others end it's getting failed always, can you please provide any solution to record audio during the video call. // // Speech.swift // Edsoma // // Created by Kapil on 16/02/22. // import Foundation import AVFoundation import Speech protocol SpeechRecognizerDelegate {   func didSpoke(speechRecognizer : SpeechRecognizer , word : String?) } class SpeechRecognizer: NSObject {       private let speechRecognizer = SFSpeechRecognizer(locale: Locale.init(identifier: "en-US")) //1   private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?   private var recognitionTask: SFSpeechRecognitionTask?   private let audioEngine = AVAudioEngine()   var delegate : SpeechRecognizerDelegate?   static let shared = SpeechRecognizer()   var isOn = false       func setup(){     speechRecognizer?.delegate = self //3           SFSpeechRecognizer.requestAuthorization { (authStatus) in //4               var isButtonEnabled = false               switch authStatus { //5       case .authorized:         isButtonEnabled = true                 case .denied:         isButtonEnabled = false         print("User denied access to speech recognition")                 case .restricted:         isButtonEnabled = false         print("Speech recognition restricted on this device" )                 case .notDetermined:         isButtonEnabled = false         print("Speech recognition not yet authorized")       @unknown default:         break;       }               OperationQueue.main.addOperation() {         // self.microphoneButton.isEnabled = isButtonEnabled       }     }   }   func transcribeAudio(url: URL) {     // create a new recognizer and point it at our audio     let recognizer = SFSpeechRecognizer()     let request = SFSpeechURLRecognitionRequest(url: url)     // start recognition!     recognizer?.recognitionTask(with: request) { [unowned self] (result, error) in       // abort if we didn't get any transcription back       guard let result = result else {         print("There was an error: \(error!)")         return       }       // if we got the final transcription back, print it       if result.isFinal {         // pull out the best transcription...         print(result.bestTranscription.formattedString)       }     }   }       func startRecording() {     isOn = true     let inputNode = audioEngine.inputNode     if recognitionTask != nil {       inputNode.removeTap(onBus: 0)       self.audioEngine.stop()       self.recognitionRequest = nil       self.recognitionTask = nil       DispatchQueue.main.asyncAfter(deadline: DispatchTime.now() + 1) {         self.startRecording()       }       return       debugPrint("****** recognitionTask != nil *************")     }           let audioSession = AVAudioSession.sharedInstance()     do {               try audioSession.setCategory(AVAudioSession.Category.multiRoute)       try audioSession.setMode(AVAudioSession.Mode.measurement)       try audioSession.setActive(true, options: .notifyOthersOnDeactivation)     } catch {       print("audioSession properties weren't set because of an error.")     }           recognitionRequest = SFSpeechAudioBufferRecognitionRequest()                 guard let recognitionRequest = recognitionRequest else {       fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")     }           recognitionRequest.shouldReportPartialResults = true           recognitionRequest.taskHint = .search           recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in               var isFinal = false               if result != nil {         self.delegate?.didSpoke(speechRecognizer: self, word: result?.bestTranscription.formattedString)          debugPrint(result?.bestTranscription.formattedString)         isFinal = (result?.isFinal)!                 }               if error != nil {         debugPrint("Speech Error ====>",error)         inputNode.removeTap(onBus: 0)         self.audioEngine.stop()         self.recognitionRequest = nil         self.recognitionTask = nil         if BookReadingSettings.isSTTEnable{           DispatchQueue.main.asyncAfter(deadline: DispatchTime.now() + 1) {             self.startRecording()           }         }         // self.microphoneButton.isEnabled = true       }     })          // let recordingFormat = AVAudioFormat.init(commonFormat: .pcmFormatFloat32, sampleRate: <#T##Double#>, interleaved: <#T##Bool#>, channelLayout: <#T##AVAudioChannelLayout#>)//inputNode.outputFormat(forBus: 0)     inputNode.removeTap(onBus: 0)     let sampleRate = AVAudioSession.sharedInstance().sampleRate     let recordingFormat = AVAudioFormat(standardFormatWithSampleRate: sampleRate, channels: 1)     inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in       self.recognitionRequest?.append(buffer)     }           audioEngine.prepare()           do {       try audioEngine.start()     } catch {       print("audioEngine couldn't start because of an error.")     }     debugPrint("Say something, I'm listening!")     //textView.text = "Say something, I'm listening!"         }       /* func stopRecording(){     isOn = false     debugPrint("Recording stoped")     self.audioEngine.stop()     recognitionTask?.cancel()     let inputNode = audioEngine.inputNode     inputNode.removeTap(onBus: 0)     self.recognitionRequest = nil     self.recognitionTask = nil         }*/       func stopRecording(){     isOn = false     debugPrint("Recording stoped")     let inputNode = audioEngine.inputNode     inputNode.removeTap(onBus: 0)     self.audioEngine.stop()     recognitionTask?.cancel()     self.recognitionRequest = nil     self.recognitionTask = nil    }     } extension SpeechRecognizer : SFSpeechRecognizerDelegate {     }
Posted
by
Post not yet marked as solved
0 Replies
265 Views
Hi, I'm trying to get this example working on MacOS now that SFSpeechRecognizer is available for the platform. A few questions ... Do I need to make an authorization request of the user if I intend to use "on device recognition"? When I ask for authorization to use speech recognition the dialog that pops up contains text that's not in my speech recognition usage description indicating that recordings will be sent to Apple's servers. But that is not accurate if I am using on device recognition (as far as I can tell). Is there a way to suppress that language if I am not using online speech recognition? Is there an updated example of the article I linked to that describes how to accomplish the same thing with MacOS instead of IOS? My compiler is complaining that AVAudioSession() is not available in MacOS and I'm not sure how to set things up for passing audio from the microphone to the speech recognizer. Thanks :-D Brian Duffy
Posted
by
Post not yet marked as solved
0 Replies
205 Views
Hello, My application has functionality to record a speech and convert the recorded speech to text. The application also tells the user what action he must perform using TTS (Text-to-Speech). When I start the screen recording from control centre and the app starts recording voice. This works. But as soon as the TTS voice is played the recorder will stop recording my voice or the voice played TTS. Please let me know what additional information is required from my side to debug this issue.
Posted
by