Post not yet marked as solved
Hi,
I am trying the speech recognition API.
For live recognition, I use
SFSpeechAudioBufferRecognitionRequest
It works fine in general.
But when I try to use file recognition with
SFSpeechAudioBufferRecognitionRequest
I found that I can't find a way to cancel the recognition?
Cancel or kill the request can't stop the call back from result handler. So I need to wait the recognition finish when I receive the result's isFinal flag is true. It make me a little inconvenient if I try to recognize a long audio file and finally decide to cancel the recognition.
Are there some ways to cancel the file recognition directly?
Thank you~~
Eric
Post not yet marked as solved
I am trying to run two SFSpeechRecognizer simultaneously with different languages.
So I tried the following:
var speechRecognizer1: SFSpeechRecognizer? = SFSpeechRecognizer(locale: Locale(identifier: "en-GB"))
var speechRecognizer2: SFSpeechRecognizer? = SFSpeechRecognizer(locale: Locale(identifier: "it-IT"))
var speechAudioBufferRecognitionRequest = SFSpeechAudioBufferRecognitionRequest()
var speechRecognitionTask1: SFSpeechRecognitionTask!
var speechRecognitionTask2: SFSpeechRecognitionTask!
...
let node = self.audioEngine.inputNode
let recordingFormat = node.outputFormat(forBus: 0)
node.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat, block: { (buffer, time) in
self.speechAudioBufferRecognitionRequest.append(buffer)
})
self.audioEngine.prepare()
do {
try audioEngine.start()
} catch {
print("Error")
return
}
guard let myRecognition = SFSpeechRecognizer() else {
print("Error")
return
}
if(!myRecognition.isAvailable) {
print("Error")
return
}
self.speechRecognitionTask1 = self.speechRecognizer1?.recognitionTask(with: self.speechAudioBufferRecognitionRequest, resultHandler: { (response, error) in
if(response != nil) {
guard let response = response else {
if let error = error {
print("Error")
return
} else {
print("Error")
return
}
}
var message = response.bestTranscription.formattedString
})
self.speechRecognitionTask2 = self.speechRecognizer2?.recognitionTask(with: self.speechAudioBufferRecognitionRequest, resultHandler: { (response, error) in
if(response != nil) {
guard let response = response else {
if let error = error {
print("Error")
return
} else {
print("Error")
return
}
}
var message = response.bestTranscription.formattedString
})
This gave me the error: SFSpeechAudioBufferRecognitionRequest cannot be re-used
So I tried to create two instances and initialized them by:
self.speechAudioBufferRecognitionRequest1.append(buffer)
self.speechAudioBufferRecognitionRequest2.append(buffer)
})
But this also didn't work. There was no error, but one speechRecognition just overwrote the other...
I tried some other stuff like changing the bus etc. but was not successful...
I'm developing a game that will use speech recognition to execute various commands. I am using code from Apple's Recognizing Speech in Live Audio documentation page.
When I run this in a Swift Playground, it works just fine. However, when I make a SpriteKit game application (basic setup from Xcode's "New Project" menu option), I get the following error:
required condition is false: IsFormatSampleRateAndChannelCountValid(hwFormat)
Upon further research, it appears that my input node has no channels. The following is the relevant portion of my code, along with debug output:
let inputNode = audioEngine.inputNode
print("Number of inputs: \(inputNode.numberOfInputs)")
// 1
print("Input Format: \(inputNode.inputFormat(forBus: 0))")
// <AVAudioFormat 0x600001bcf200: 0 ch, 0 Hz, 'lpcm' (0x00000029) 32-bit little-endian float, deinterleaved>
let channelCount = inputNode.inputFormat(forBus: 0).channelCount
print("Channel Count: \(channelCount)")
// 0 <== Agrees with the inputFormat output listed previously
// Configure the microphone input.
print("Number of outputs: \(inputNode.numberOfOutputs)")
// 1
let recordingFormat = inputNode.outputFormat(forBus: 0)
print("Output Format: \(recordingFormat)")
// <AVAudioFormat 0x600001bf3160: 2 ch, 44100 Hz, Float32, non-inter>
inputNode.installTap(onBus: 0, bufferSize: 256, format: recordingFormat, block: audioTap) // <== This is where the error occurs.
// NOTE: 'audioTap' is a function defined in this class. Using this defined function instead of an inline, anonymous function.
The code snippet is included in the game's AppDelegate class (which includes import statements for Cocoa, AVFoundation, and Speech), and executes during its applicationDidFinishLaunching function. I'm having trouble understanding why Playground works, but a game app doesn't work. Do I need to do something specific to get the application to recognize the microphone?
NOTE: This if for MacOS, NOT iOS. While the "How To" documentation cited earlier indicates iOS, Apple stated at WWDC19 that it is now supported on the MacOS.
NOTE: I have included the NSSpeechRecognitionUsageDescription key in the applications plist, and successfully acknowledged the authorization request for the microphone.
Post not yet marked as solved
Hello,
I am working on a healthcare app and want to integrate speech to text functionality in app. Before that, I wanted to make sure that the framework 'Speech', provided by Apple is HIPAA complaint. Please help me to know this.
Post not yet marked as solved
Hi, I've been working on a project that utilizes the Web Speech API: https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API. However, I've noticed some strange behavior in the newest versions of Safari on iOS, iPadOS, and macOS.
One issue that occurs regularly is that the text input will repeat after voice input has ended. This can be seen on this demo provided by Google: https://www.google.com/intl/en/chrome/demos/speech.html
This was not happening when I tested on 14.1 (the version I upgraded from). Upon debugging, it appears the doubling of text is included in transcriptions that are not flagged as isFinal, as well as transcriptions that are, which makes me think that something isn't working properly in the implementation of the API.
Anecdotally, the speech synthesis appears to be much less accurate now as well, and I've noticed some odd behavior when I set the continuous flag to false as well. The API delegates the actual speech synthesis work to Siri, so I'm wondering why there would be a different here compared to using dictation in other apps.
My main question is: has anyone else run into problems like this? If so, how are you working around them?
So am watching a Speech To Text demo on YouTube, here:
https://www.youtube.com/watch?v=SZJ8zjMGUcY
There are no files, so am typing from the screen, and immediately run into an error that confuses me.
at
class ViewController : UIViewController, SFSpeechRecognizer {
here's a screenshot:
Swift gives me an error indicating that Multiple Inheritance is not allowed.
The programmer doesn't have files to download, and I like to start from scratch anyway, typing and copying so I am forced to read each line.
Is there something I have to change in the project itself that allows Multiple Inheritances?
This video is from last year, and is on Swift 5.0. So I don't think there could be that much of a major change in Swift in that time.
Thanks
So sorry if I should't be asking this here, but am trying to find a current-ish tutorial on how to make an app that converts speech to text in real time.
Transcribing from text to speech as you're speaking.
I've found a few one's on YouTube, but they are quite old, or just transcribing from a recorded file, etc. etc.
If anyone is aware of a good tutorial, paid or not, I would so appreciate any link.
Thank you
Post not yet marked as solved
Hello everyone, I expose my problem to you. I should make for MacOS, iOS, iPadOS, a 3D woman virtual assistant able to listen to questions and provide answers: a kind of Siri but with a model of a girl in 3D on the screen that has to mimic speech (lip sync) and will have to move according to the needs of the program.
I would like to do it all with Xcode, SwiftUI, SceneKit.
I have already done some good experiments with the Speech Framework for speech recognition with good results. For the spoken part (TTS) I will use an external service.
Here I have a problem: the Speech Framework listens and transcribes even when my app speaks. I would like to be able to mute the microphone when the audio file is playing and unmute it when playback ends.
I also tried to create a 3D female model with Mixamo and exported some animations. I was able to import the animations into an Xcode project and get them to work (https://youtu.be/HJtbUHdPjzQ). Next I want to try to create a model using 3D Object Capture.
I also saw the video session 604 (https://developer.apple.com/videos/play/wwdc2017/604/) which clarified many doubts for me.
What I still haven't understood:
How can I blend multiple animations from code? For example: I could have the animation of the girl walking and the animation of the still girl that she greets by moving her arm and I would like to be able to join them and make her greet while she walks.
If I have the character completely rigged, from the code how can I make my mouth, eyes, etc. move to create a kind of lip sync and facial expressions?
Do you know if there is any good tutorial even for a fee that can fill these gaps? I've also searched Udemy but haven't found a SceneKit course similar to the one I need.
However, I think that the solutions for ARKit or RealityKit can also be fine.
Post not yet marked as solved
Hello,
I am interested in knowing if AVSpeechSynthesizer saves the speech transcript somewhere or shares to Apple Server?
Highly appreciate the replies.
Regards!
Post not yet marked as solved
Hello,
I am looking for information on TTS and STT. I am aware that there is possibility to implement both offline and online. I am interested in knowing if it is possible to enable ondevice TTS and STT for third party app, even when the device is online.
**Our use case is: when the app is still online, we wish to do TTS and STT on device and not on Apple server(privacy concerns). **
Please let me know if it is possible at all or point me in the right direction.
I really appreciate and look forward to your reply.
Post not yet marked as solved
After discovering that the upgrade of ios15, the topSpeaking method using AVSpeechSynthesizer did not correctly trigger the speedSynthesizer (:didCancel:), but rather triggered the speedSynthesizer method ('didFinish:') which led to some of my business errors and solved
Post not yet marked as solved
Here is a simple app to demonstrate problem:
import SwiftUI
import AVFoundation
struct ContentView: View {
var synthVM = SpeakerViewModel()
var body: some View {
VStack {
Text("Hello, world!")
.padding()
HStack {
Button("Speak") {
if self.synthVM.speaker.isPaused {
self.synthVM.speaker.continueSpeaking()
} else {
self.synthVM.speak(text: "Привет на корабле! Кто это пришел к нам, чтобы посмотреть на это произведение?")
}
}
Button("Pause") {
if self.synthVM.speaker.isSpeaking {
self.synthVM.speaker.pauseSpeaking(at: .word)
}
}
Button("Stop") {
self.synthVM.speaker.stopSpeaking(at: .word)
}
}
}
}
}
struct ContentView_Previews: PreviewProvider {
static var previews: some View {
ContentView()
}
}
class SpeakerViewModel: NSObject {
var speaker = AVSpeechSynthesizer()
override init() {
super.init()
self.speaker.delegate = self
}
func speak(text: String) {
let utterance = AVSpeechUtterance(string: text)
utterance.voice = AVSpeechSynthesisVoice(language: "ru")
speaker.speak(utterance)
}
}
extension SpeakerViewModel: AVSpeechSynthesizerDelegate {
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didStart utterance: AVSpeechUtterance) {
print("started")
}
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didPause utterance: AVSpeechUtterance) {
print("paused")
}
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didContinue utterance: AVSpeechUtterance) {}
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didCancel utterance: AVSpeechUtterance) {}
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, willSpeakRangeOfSpeechString characterRange: NSRange, utterance: AVSpeechUtterance) {
guard let rangeInString = Range(characterRange, in: utterance.speechString) else { return }
print("Will speak: \(utterance.speechString[rangeInString])")
}
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
print("finished")
}
}
On simulator all works fine, but on real device there are many strange words appears in synthesis speak.
And willSpeakRangeOfSpeechString output is different on simulator and real device
Simulator:
started
Will speak: Привет
Will speak: на
Will speak: корабле!
Will speak: Кто
Will speak: это
Will speak: пришел
Will speak: к
Will speak: нам,
Will speak: чтобы
Will speak: посмотреть
Will speak: на
Will speak: это
Will speak: произведение?
finished
iPhone output have errors:
2021-10-12 17:09:32.613273+0300 VoiceTest[9027:203522] [AXTTSCommon] Broken user rule: \b([234567890]+)2 (мили|кварты|чашки|{столовых }ложки)(?=$|\s|[[:punct:]»\xa0]) > Error Domain=NSCocoaErrorDomain Code=2048 "The value “\b([234567890]+)2 (мили|кварты|чашки|{столовых }ложки)(?=$|\s|[[:punct:]»\xa0])” is invalid." UserInfo={NSInvalidValue=\b([234567890]+)2 (мили|кварты|чашки|{столовых }ложки)(?=$|\s|[[:punct:]»\xa0])}
2021-10-12 17:09:32.613548+0300 VoiceTest[9027:203522] [AXTTSCommon] Broken user rule: \b(1\d+)2 (мили|кварты|чашки|{столовых }ложки)(?=$|\s|[[:punct:]»\xa0]) > Error Domain=NSCocoaErrorDomain Code=2048 "The value “\b(1\d+)2 (мили|кварты|чашки|{столовых }ложки)(?=$|\s|[[:punct:]»\xa0])” is invalid." UserInfo={NSInvalidValue=\b(1\d+)2 (мили|кварты|чашки|{столовых }ложки)(?=$|\s|[[:punct:]»\xa0])}
2021-10-12 17:09:32.613725+0300 VoiceTest[9027:203522] [AXTTSCommon] Broken user rule: \b2 (мили|кварты|чашки|{столовых }ложки)(?=$|\s|[[:punct:]»\xa0]) > Error Domain=NSCocoaErrorDomain Code=2048 "The value “\b2 (мили|кварты|чашки|{столовых }ложки)(?=$|\s|[[:punct:]»\xa0])” is invalid." UserInfo={NSInvalidValue=\b2 (мили|кварты|чашки|{столовых }ложки)(?=$|\s|[[:punct:]»\xa0])}
started
Will speak: Привет
Will speak: на
Will speak: ивет на корабле!
Will speak: Кто
Will speak: это
Will speak: Кто это пришел
Will speak: к
Will speak: нам,
Will speak: чтобы
Will speak: посмотреть
Will speak: на
Will speak: реть на это
Will speak: на это произведение?
finished
Error appears on iOS / iPadOS 15.0, 15.0.1, 15.0.2, 14.7
But all works fine on 14.8
Looks like engine error. How to fix that issue?
Post not yet marked as solved
Since Version 14.2 we are having issues with STT. By the past we were using Azure and it was working fine. Since you've implemented partially the Speech Recognition API things are getting worse, on IOS. No problem on Osx.
It seems like the recording we send to STT has a very poor quality and some part of sentence missing. When I implement it solo ti works fine, but soon as I play an audio before opening the microphone it does'nt work anymore (or only partially).
I come to the question : Would there be a solution while waiting for you to deploy a working Speech Recongnition API ?
Post not yet marked as solved
in iOS 15, on stopSpeaking of AVSpeechSynthesizer,
didFinish delegate method getting called instead of didCancel which is working fine in iOS 14 and below version.
Post not yet marked as solved
Hi,
I use device-local speech recognition for speech input.
Now some iOS 15 upgraded devices return the new error domain / code
kLSRErrorDomain, code 201
(previously the errors were mostly in kAFAssistantErrorDomain). Has anybody an idea what it means and how to fix it?
Thanks!
I’m getting a flood of these errors in a shipping speech recognition app since users started upgrading to iOS15. It’s usually being returned by the speech recogniser a few seconds after recognition begins.
I can’t find any reference to it anywhere in Apple’s documentation. What is it?
Code: 301
Domain: kLSRErrorDomain
Description: Recognition request was canceled
Post not yet marked as solved
I updated Xcode to Xcode 13 and iPadOS to 15.0.
Now my previously working application using SFSpeechRecognizer fails to start, regardless of whether I'm using on device mode or not.
I use the delegate approach, and it looks like although the plist is set-up correctly (the authorization is successful and I get the orange circle indicating the microphone is on), the delegate method speechRecognitionTask(_:didFinishSuccessfully:) always returns false, but there is no particular error message to go along with this.
I also downloaded the official example from Apple's documentation pages:
SpokenWord SFSpeechRecognition example project page
Unfortunately, it also does not work anymore.
I'm working on a time-sensitive project and don't know where to go from here. How can we troubleshoot this? If it's an issue with Apple's API update or something has changed in the initial setup, I really need to know as soon as possible.
Thanks.
Post not yet marked as solved
I'm trying to find specific information on how Apple transfers & stores the voice data that's transferred for speech recognition in Safari as part of WebSpeechAPI.
All I keep seeing are generic privacy documents that do not provide any detail. Is anyone able to point me in the right direction of an explanation of how customer data is used?
Post not yet marked as solved
I use AVSpeechSynthesizer to pronounce some text in German. Sometimes it works just fine and sometimes it doesn't for some unknown to me reason (there is no error, because the speak() method doesn't throw and the only thing I am able to observe is the following message logged in the console):
_BeginSpeaking: couldn't begin playback
I tried to find some API in the AVSpeechSynthesizerDelegate to register a callback when error occurs, but I have found none.
The closest match was this (but it appears to be only available for macOS, not iOS):
https://developer.apple.com/documentation/appkit/nsspeechsynthesizerdelegate/1448407-speechsynthesizer?changes=_10
Below you can find how I initialize and use the speech synthesizer in my app:
class Speaker: NSObject, AVSpeechSynthesizerDelegate {
class func sharedInstance() -> Speaker {
struct Singleton {
static var sharedInstance = Speaker()
}
return Singleton.sharedInstance
}
let audioSession = AVAudioSession.sharedInstance()
let synth = AVSpeechSynthesizer()
override init() {
super.init()
synth.delegate = self
}
func initializeAudioSession() {
do {
try audioSession.setCategory(.playback, mode: .spokenAudio, options: .duckOthers)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
} catch {
}
}
func speak(text: String, language: String = "de-DE") {
guard !self.synth.isSpeaking else { return }
let utterance = AVSpeechUtterance(string: text)
let voice = AVSpeechSynthesisVoice.speechVoices().filter { $0.language == language }.first!
utterance.voice = voice
self.synth.speak(utterance)
}
}
The audio session initialization is ran during app started just once.
Afterwards, speech is synthesized by running the following code:
Speaker.sharedInstance.speak(text: "Lederhosen")
The problem is that I have no way of knowing if the speech synthesis succeeded—the UI is showing "speaking" state, but nothing is actually being spoken.
Post not yet marked as solved
Can you perform two or more OFFLINE speech recognition tasks simultaneously?
SFSpeechRecognizer, SFSpeechURLRecognitionRequest offline limitation?
Running on macOS Big Sur 11.5.2
I would like to be perform two or more offline speech recognition tasks simultaneously.
I've executed two tasks in the same application AND executed two different applications, both using offline recognition.
Once I initiate the other thread or other application, the first recognition stops.
Since the computer supports multiple threads, I planned to take make use of the concurrency.
Use cases
#1 multiple Audio or video files that I wish to transcribe -- cuts down on the wait time.
#2 split a single large file up into multiple sections and stitch the results together -- again cuts down on the wait time.
I set on device recognition to TRUE because my target files can be up to two hours in length.
My test files are 15-30 minutes in length and I have a number of them, so recognition must be done on the device.
func recognizeFile_Compact(url:NSURL) {
let language = "en-US" //"en-GB"
let recognizer = SFSpeechRecognizer(locale: Locale.init(identifier: language))!
let recogRequest = SFSpeechURLRecognitionRequest(url: url as URL)
recognizer.supportsOnDeviceRecognition = true // ensure the DEVICE does the work -- don't send to cloud
recognizer.defaultTaskHint = .dictation // give a hint as dictation
recogRequest.requiresOnDeviceRecognition = true // don
recogRequest.shouldReportPartialResults = false // we dont want partial results
var strCount = 0
let recogTask = recognizer.recognitionTask(with: recogRequest, resultHandler: { (result, error) in
guard let result = result else {
print("Recognition failed, \(error!)")
return
}
let text = result.bestTranscription.formattedString
strCount += 1
print(" #\(strCount), "Best: \(text) \n" )
if (result.isFinal) { print("WE ARE FINALIZED") }
})
}