Post not yet marked as solved
Hello,
Please help.
In our application we are using the speech recognizer.
And I have 2 questions.
Please let me know how can I turn on speech recognizer with the timer when the application is in the background.
How can i use speech recognizer unlimited?
Thanks in advance for your help.
Post not yet marked as solved
When we used the SFSpeechRecognizer last week, the returned results were normal. However, it was found during the use this week that the returned results contain punctuation marks. For example, we say yes, and the result returns yes?
Post not yet marked as solved
Hi,
I have a question regarding the integration of the speech to text library called SFSpeechRecognizer.
I need SFSpeechRecognizer to recognize terms that are not present in the iOS dictionary like medication names, chemistry terms, etc.
I would have to add them, somehow, for SFSpeechRecognizer to be able to recognise them.
Is this possible?
Thanks
Post not yet marked as solved
Hi, I am trying to use the Speech Recognizer in the Apple's Official Document for my application, also I added the try and catch expression when calling SFSpeechRecognizer, if the user triggers Siri during the runtime, it would immediately crash the whole application when calling SFSpeechRecognizer again, has anyone encountered with similar problems?
Here's the code from my application
func transcribe() {
DispatchQueue(label: "Speech Recognizer Queue", qos: .background).async { [weak self] in
guard let self = self,
let recognizer = self.recognizer, recognizer.isAvailable
else {
self?.speakError(RecognizerError.recognizerIsUnavailable)
return
}
do {
let (audioEngine, request) = try Self.prepareEngine()
self.audioEngine = audioEngine
self.request = request
self.task = recognizer.recognitionTask(with: request, resultHandler: self.recognitionHandler(result:error:))
} catch {
self.reset()
self.speakError(error)
}
}
}
private static func prepareEngine() throws -> (AVAudioEngine, SFSpeechAudioBufferRecognitionRequest) {
let audioEngine = AVAudioEngine()
let request = SFSpeechAudioBufferRecognitionRequest()
request.shouldReportPartialResults = true
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
let inputNode = audioEngine.inputNode
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
request.append(buffer)
}
audioEngine.prepare()
try audioEngine.start()
return (audioEngine, request)
}
Post not yet marked as solved
I’m already a member of the beta program, have downloaded the profile through settings, have no pending software updates, restarted several times. Still, I can’t get live transcribe to appear as a feature under accessibility. Any ideas what to try?
Post not yet marked as solved
I'm testing my App in the Xcode 14 beta (released with WWDC22) on iOS 16, and it seems that AVSpeechSynthesisVoice is not working correctly.
The following code always returns an empty array:
AVSpeechSynthesisVoice.speechVoices()
Additionally, attempting to initialize AVSpeechSynthesisVoice returns nil for all of the following:
AVSpeechSynthesisVoice(language: AVSpeechSynthesisVoice.currentLanguageCode())
AVSpeechSynthesisVoice(language: "en")
AVSpeechSynthesisVoice(language: "en-US")
AVSpeechSynthesisVoice(identifier: AVSpeechSynthesisVoiceIdentifierAlex)
AVSpeechSynthesisVoice.speechVoices().first
Post not yet marked as solved
Hola a todos DEVS
Actualmente estoy terminando mi aprendizaje autodidacta Swift, y estaba pensando que una vez que terminé, estaba pensando en conseguir un MacBook Pro de 14" para empezar a trabajar con él, pero no sé qué configuración sería la más recomendable tomar.
un M1 Pro con CPU de 10 núcleos, GPU de 16 núcleos, Neural Engine de 16 núcleos, 16 RAM y 512 SSD o recomiendan aumentar la RAM a 32 o la SSD a 1 TB.
Dime que me gustaría hacer una inversión que me dure al menos 5 años y no porque me quede corto en la configuración después de unos años, tengo que adquirir otro equipo.
¿Qué recomiendas?
Gracias por sus respuestas y su tiempo.
Saludos cordiales
Post not yet marked as solved
Problem: AVSpeechSynthesiser sometimes describes words rather than just speaking them as a real person would.
When speaking in English AVSpeechSynthesiser pronounces the word "A" on its own as "Capital A", while the phrase "A little test" is pronounced correctly.
A workaround of lowercasing the speech string - so "A" becomes "a" fixes this specific example. (I'm not yet sure if lowercasing sentences could affect pronunciation badly in some instances.)
A more serious example: When speaking French the word "allé" on its own is pronounced by AVSpeechSynthesiser as "allé - e accent aigu" (accent aigu = acute accent). And here the problem exists even when the word is part of a sentence!
With "Je suis allé au cinéma" (I went to the cinema)
AVSpeechSynthesiser says "Je suis allé e accent aigu au cinéma" which is clearly wrong and unhelpful.
Is there a way to fix this?
Post not yet marked as solved
@interface MineViewController ()
@property (nonatomic, strong) AVSpeechSynthesizer *speechSynthesizer;
@end
@implementation MineViewController
(void)speak {
//version1
self.speechSynthesizer = [[AVSpeechSynthesizer alloc] init];
self.speechSynthesizer.delegate = self;
AVSpeechUtterance *utterance = [[AVSpeechUtterance alloc] initWithString:@"12345678"];
AVSpeechSynthesisVoice *voice = [AVSpeechSynthesisVoice voiceWithLanguage:@"zh-CN"];
[utterance setVoice:voice];
//(worked speakUtterance successed)
[self.speechSynthesizer speakUtterance:utterance];
//version2
AVSpeechSynthesizer *speechSynthesizer = [[AVSpeechSynthesizer alloc] init];
speechSynthesizer.delegate = self;
AVSpeechUtterance *utterance = [[AVSpeechUtterance alloc] initWithString:@"12345678"];
AVSpeechSynthesisVoice *voice = [AVSpeechSynthesisVoice voiceWithLanguage:@"zh-CN"];
[utterance setVoice:voice];
//(not worked speakUtterance no response)
[speechSynthesizer speakUtterance:utterance];
}
@end
Post not yet marked as solved
I installed last update and now my only phone is useless!! I can’t make even a phone call. No apps will open and I can’t even restart it. I tried the suggestions and nothing. If you don’t update you constantly are reminded, which I try to ignore as my phone was working fine now it is useless!
Post not yet marked as solved
I have updated to macOS Monterrey and my code for SFSPeechRecognizer just broke. I get this error if I try to configure an offline speech recognizer for macOS
Error Domain=kLSRErrorDomain Code=102 "Failed to access assets" UserInfo={NSLocalizedDescription=Failed to access assets, NSUnderlyingError=0x6000003c5710 {Error Domain=kLSRErrorDomain Code=102 "No asset installed for language=es-ES" UserInfo={NSLocalizedDescription=No asset installed for language=es-ES}}}
Here is a code snippet from a demo project:
private func process(url: URL) throws {
speech = SFSpeechRecognizer.init(locale: Locale(identifier: "es-ES"))
speech.supportsOnDeviceRecognition = true
let request = SFSpeechURLRecognitionRequest(url: url)
request.requiresOnDeviceRecognition = true
request.shouldReportPartialResults = false
speech.recognitionTask(with: request) { result, error in
guard let result = result else {
if let error = error {
print(error)
return
}
return
}
if let error = error {
print(error)
return
}
if result.isFinal {
print(result.bestTranscription.formattedString)
}
}
}
I have tried with different languages (es-ES, en-US) and it says the same error each time.
Any idea on how to install these assets or how to fix this?
Post not yet marked as solved
Hello,
My application has functionality to record a speech and convert the recorded speech to text. The application also tells the user what action he must perform using TTS (Text-to-Speech).
When I start the screen recording from control centre and the app starts recording voice. This works.
But as soon as the TTS voice is played the recorder will stop recording my voice or the voice played TTS.
Please let me know what additional information is required from my side to debug this issue.
Post not yet marked as solved
Hi,
I'm trying to get this example working on MacOS now that SFSpeechRecognizer is available for the platform. A few questions ...
Do I need to make an authorization request of the user if I intend to use "on device recognition"?
When I ask for authorization to use speech recognition the dialog that pops up contains text that's not in my speech recognition usage description indicating that recordings will be sent to Apple's servers. But that is not accurate if I am using on device recognition (as far as I can tell). Is there a way to suppress that language if I am not using online speech recognition?
Is there an updated example of the article I linked to that describes how to accomplish the same thing with MacOS instead of IOS? My compiler is complaining that AVAudioSession() is not available in MacOS and I'm not sure how to set things up for passing audio from the microphone to the speech recognizer.
Thanks :-D
Brian Duffy
Post not yet marked as solved
We are creating an online book reading app in which we are initiating video call (group call:- for video call. we are using agora SDK) and at the join of call we start book reading and highlight words at other members' end also and recording/recognition text we are using SFSpeechRecognizer but whenever call kit start and video call start SFSpeechRecognizer start to record audio at others end it's getting failed always, can you please provide any solution to record audio during the video call.
//
// Speech.swift
// Edsoma
//
// Created by Kapil on 16/02/22.
//
import Foundation
import AVFoundation
import Speech
protocol SpeechRecognizerDelegate {
func didSpoke(speechRecognizer : SpeechRecognizer , word : String?)
}
class SpeechRecognizer: NSObject {
private let speechRecognizer = SFSpeechRecognizer(locale: Locale.init(identifier: "en-US")) //1
private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
private var recognitionTask: SFSpeechRecognitionTask?
private let audioEngine = AVAudioEngine()
var delegate : SpeechRecognizerDelegate?
static let shared = SpeechRecognizer()
var isOn = false
func setup(){
speechRecognizer?.delegate = self //3
SFSpeechRecognizer.requestAuthorization { (authStatus) in //4
var isButtonEnabled = false
switch authStatus { //5
case .authorized:
isButtonEnabled = true
case .denied:
isButtonEnabled = false
print("User denied access to speech recognition")
case .restricted:
isButtonEnabled = false
print("Speech recognition restricted on this device" )
case .notDetermined:
isButtonEnabled = false
print("Speech recognition not yet authorized")
@unknown default:
break;
}
OperationQueue.main.addOperation() {
// self.microphoneButton.isEnabled = isButtonEnabled
}
}
}
func transcribeAudio(url: URL) {
// create a new recognizer and point it at our audio
let recognizer = SFSpeechRecognizer()
let request = SFSpeechURLRecognitionRequest(url: url)
// start recognition!
recognizer?.recognitionTask(with: request) { [unowned self] (result, error) in
// abort if we didn't get any transcription back
guard let result = result else {
print("There was an error: \(error!)")
return
}
// if we got the final transcription back, print it
if result.isFinal {
// pull out the best transcription...
print(result.bestTranscription.formattedString)
}
}
}
func startRecording() {
isOn = true
let inputNode = audioEngine.inputNode
if recognitionTask != nil {
inputNode.removeTap(onBus: 0)
self.audioEngine.stop()
self.recognitionRequest = nil
self.recognitionTask = nil
DispatchQueue.main.asyncAfter(deadline: DispatchTime.now() + 1) {
self.startRecording()
}
return
debugPrint("****** recognitionTask != nil *************")
}
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(AVAudioSession.Category.multiRoute)
try audioSession.setMode(AVAudioSession.Mode.measurement)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
} catch {
print("audioSession properties weren't set because of an error.")
}
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let recognitionRequest = recognitionRequest else {
fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
}
recognitionRequest.shouldReportPartialResults = true
recognitionRequest.taskHint = .search
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
var isFinal = false
if result != nil {
self.delegate?.didSpoke(speechRecognizer: self, word: result?.bestTranscription.formattedString)
debugPrint(result?.bestTranscription.formattedString)
isFinal = (result?.isFinal)!
}
if error != nil {
debugPrint("Speech Error ====>",error)
inputNode.removeTap(onBus: 0)
self.audioEngine.stop()
self.recognitionRequest = nil
self.recognitionTask = nil
if BookReadingSettings.isSTTEnable{
DispatchQueue.main.asyncAfter(deadline: DispatchTime.now() + 1) {
self.startRecording()
}
}
// self.microphoneButton.isEnabled = true
}
})
// let recordingFormat = AVAudioFormat.init(commonFormat: .pcmFormatFloat32, sampleRate: <#T##Double#>, interleaved: <#T##Bool#>, channelLayout: <#T##AVAudioChannelLayout#>)//inputNode.outputFormat(forBus: 0)
inputNode.removeTap(onBus: 0)
let sampleRate = AVAudioSession.sharedInstance().sampleRate
let recordingFormat = AVAudioFormat(standardFormatWithSampleRate: sampleRate, channels: 1)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch {
print("audioEngine couldn't start because of an error.")
}
debugPrint("Say something, I'm listening!")
//textView.text = "Say something, I'm listening!"
}
/* func stopRecording(){
isOn = false
debugPrint("Recording stoped")
self.audioEngine.stop()
recognitionTask?.cancel()
let inputNode = audioEngine.inputNode
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
}*/
func stopRecording(){
isOn = false
debugPrint("Recording stoped")
let inputNode = audioEngine.inputNode
inputNode.removeTap(onBus: 0)
self.audioEngine.stop()
recognitionTask?.cancel()
self.recognitionRequest = nil
self.recognitionTask = nil
}
}
extension SpeechRecognizer : SFSpeechRecognizerDelegate {
}
Post not yet marked as solved
I’m building a playgroundbook for the upcoming swift student challenge and am confused with the whole rules about fetching data from the internet in my playgroundbook.
I have implemented the speech framework in one chapter and am implementing it so that it is to be done offline. The problem is that this is only possible on iPads with an A12 Bionic chip or newer and thus the older iPads and even my mac shows an error when Wifi is turned off.
So I’m concerned as to whether my submission will be accepted in such a case or not?
Post not yet marked as solved
I’m building a playgroundbook for the upcoming swift student challenge and am confused with the whole rules against using the internet in my playgroundbook.
i have implemented the speech framework in one chapter and am forcing it to be done offline. The problem is that this is only possible on iPads with an A12 Bionic chip or newer and thus the older iPads and even my mac shows an error when Wifi is turned off.
So I’m concerned as to whether my submission will be accepted in such a case or not?
Post not yet marked as solved
It seems that voices with same id behave differently on difference OS versions and devices.
How can I distinguish voices across OS and devices?
Is it safe to use combination of voice id and OS version? Or is there a voice version code or something better to distinguish voices with same identifier?
Post not yet marked as solved
Apple added support for WebKit speech recognition in Safari 14.1. We're trying to use it in our WebApp and facing some issues. The issue is mic never stops after the user stops speaking and we never get the recognized text on iPhone and iPad.
Here is a simple WebApp to test : https://oiyw7.csb.app/
Post not yet marked as solved
I'm building a game where the player is able to speak commands, so I want to enable speech-to-text capability. I've setup the required info.plist property (for speech recognition privacy) as well as the App Sandbox hardware setting (for audio input). I've confirmed that the application is listening via the audio tap and sending audio buffers to the recognition request. However, the recognition task never executes.
NOTE: This is for MacOS, NOT iOS. Also, it works when I have this in a Playground, but when I try to do this in an actual application, the recognition task isn't called.
Specs:
MacOS: 12.1
XCode: 13.2.1 (13C100)
Swift: 5.5.2
Here is the code that I've placed in the AppDelegate of a freshly built SpriteKit application:
//
// AppDelegate.swift
//
import Cocoa
import AVFoundation
import Speech
@main
class AppDelegate: NSObject, NSApplicationDelegate {
private let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!
private let audioEngine = AVAudioEngine()
private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
private var recognitionTask: SFSpeechRecognitionTask?
func applicationDidFinishLaunching(_ aNotification: Notification) {
SFSpeechRecognizer.requestAuthorization(requestMicrophoneAccess)
}
func applicationWillTerminate(_ aNotification: Notification) {
// Insert code here to tear down your application
}
func applicationShouldTerminateAfterLastWindowClosed(_ sender: NSApplication) -> Bool {
return true
}
fileprivate func requestMicrophoneAccess(authStatus: SFSpeechRecognizerAuthorizationStatus) {
OperationQueue.main.addOperation {
switch authStatus {
case .authorized:
self.speechRecognizer.supportsOnDeviceRecognition = true
if let speechRecognizer = SFSpeechRecognizer() {
if speechRecognizer.isAvailable {
do {
try self.startListening()
} catch {
print(">>> ERROR >>> Listening Error: \(error)")
}
}
}
case .denied:
print("Denied")
case .restricted:
print("Restricted")
case .notDetermined:
print("Undetermined")
default:
print("Unknown")
}
}
}
func startListening() throws {
// Cancel the previous task if it's running.
recognitionTask?.cancel()
recognitionTask = nil
let inputNode = audioEngine.inputNode
// Configure the microphone input.
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) {(buffer: AVAudioPCMBuffer, when: AVAudioTime) in
/**********
* Confirmed that the following line is executing continuously
**********/
self.recognitionRequest?.append(buffer)
}
startRecognizing()
audioEngine.prepare()
try audioEngine.start()
}
func startRecognizing() {
// Create a recognition task for the speech recognition session.
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let recognitionRequestInternal = recognitionRequest else { fatalError("Unable to create a SFSpeechAudioBufferRecognitionRequest object") }
recognitionRequestInternal.shouldReportPartialResults = true
recognitionRequestInternal.requiresOnDeviceRecognition = true
/**************
* Confirmed that the following line is executed,
* however the function given to 'recognitionTask' is never called
**************/
recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequestInternal) { result, error in
var isFinal = false
if result != nil {
let firstTranscriptionTimestamp = result!.transcriptions.first?.segments.first?.timestamp ?? TimeInterval.zero
isFinal = result!.isFinal || (firstTranscriptionTimestamp != 0)
}
if error != nil {
// Stop recognizing speech if there is a problem.
print("\n>>> ERROR >>> Recognition Error: \(error)")
self.audioEngine.stop()
self.audioEngine.inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
} else if isFinal {
self.recognitionTask = nil
}
}
}
}
Post not yet marked as solved
Hi There!
A number of my apps are hanging in Big Sur (11.6.2). A spindump of 2 of them show them waiting for NSSpeechSynthesizer to return from CountVoices. I looked in /System/Library/Speech/Voices and it was empty. So I went to System Preferences to see what I could see. Trying to look voices up in Accessibility caused it to hang. Also trying to enter the Siri control panel.
So clearly I have a problem with speech synthesis. Why this should hang Chrome, bleep only knows. I just restored the system, so I'm not keen on doing it again, knowing it won't work. What I want to know is:
Can I install just the speech synthesis part of MacOS? Where would I find it and how? Is it a kernel extension? I think when I first install this os I skipped Siri, something I never planned to use. Is this what caused the problem? Thanks!