Recognize spoken words in recorded or live audio using Speech.

Speech Documentation

Posts under Speech tag

70 results found
Sort by:
Post not yet marked as solved
86 Views

IPA notation AVSpeechSynthesizer is not working?

I am attempting to utilize alternative pronunciation utilizing the IPA notation for AVSpeechSynthesizer on macOS (Big Sur 11.4). The attributed string is being ignored and so the functionality is not working. I tried this on iOS simulator and it works properly. The India English voice pronounces the word "shame" as shy-em, so I applied the correct pronunciation but no change was heard. I then substituted the pronunciation for a completely different word but there was no change. Is there something else that must be done to make this work? AVSpeechSynthesisIPANotationAttribute Attributed String: It's a '{ }shame{ AVSpeechSynthesisIPANotationAttribute = "\U0283\U02c8e\U0361\U026am"; }' it didn't work out.{ } Target Range: {8, 5} Target String: shame, Substitution: ʃˈe͡ɪm Attributed String: It's a '{ }shame{ AVSpeechSynthesisIPANotationAttribute = "\U0283\U02c8e\U0361\U026am"; }' it didn't work out.{ } Target Range: {8, 5} Target String: shame, Substitution: ʃˈe͡ɪm Attributed String: It's a '{ }shame{ AVSpeechSynthesisIPANotationAttribute = "t\U0259.\U02c8me\U0361\U026a.do\U0361\U028a"; }' it didn't work out.{ } Target Range: {8, 5} Target String: shame, Substitution: tə.ˈme͡ɪ.do͡ʊ Attributed String: It's a '{ }shame{ AVSpeechSynthesisIPANotationAttribute = "t\U0259.\U02c8me\U0361\U026a.do\U0361\U028a"; }' it didn't work out.{ } Target Range: {8, 5} Target String: shame, Substitution: tə.ˈme͡ɪ.do͡ʊ class SpeakerTest: NSObject, AVSpeechSynthesizerDelegate { let synth = AVSpeechSynthesizer() func speakIPA_Substitution(subst: String, voice: AVSpeechSynthesisVoice) { let text = "It's a 'shame' it didn't work out." let mutAttrStr = NSMutableAttributedString(string: text) let range = NSString(string: text).range(of: "shame") let pronounceKey = NSAttributedString.Key(rawValue: AVSpeechSynthesisIPANotationAttribute) mutAttrStr.setAttributes([pronounceKey: subst], range: range) let utterance = AVSpeechUtterance(attributedString: mutAttrStr) utterance.voice = voice utterance.postUtteranceDelay = 1.0 let swiftRange = Range(range, in: text)! print("Attributed String: \(mutAttrStr)") print("Target Range: \(range)") print("Target String: \(text[swiftRange]), Substitution: \(subst)\n") synth.speak(utterance) } func customPronunciation() { let shame = "ʃˈe͡ɪm" // substitute correct pronunciation let tomato = "tə.ˈme͡ɪ.do͡ʊ" // completely different word pronunciation let britishVoice = AVSpeechSynthesisVoice(language: "en-GB")! let indiaVoice = AVSpeechSynthesisVoice(language: "en-IN")! speakIPA_Substitution(subst: shame, voice: britishVoice) // already correct, no substitute needed // pronounced incorrectly and ignoring the corrected pronunciation from IPA Notation speakIPA_Substitution(subst: shame, voice: indiaVoice) // ignores substitution speakIPA_Substitution(subst: tomato, voice: britishVoice) // ignores substitution speakIPA_Substitution(subst: tomato, voice: indiaVoice) // ignores substitution } }
Asked
by MisterE.
Last updated
.
Post not yet marked as solved
84 Views

AVSpeechSynthesizer buffer conversion, write format bug?

Is the format description AVSpeechSynthesizer for the speech buffer is correct? When I attempt to convert it, I get back noise from two different conversion methods. I am seeking to convert the speech buffer provided by the AVSpeechSynthesizer "func write(_ utterance: AVSpeechUtterance..." method. The goal is to convert the sample type, change the sample rate and change from mono to stereo buffer. I later manipulate the buffer data and pass it through AVAudioengine. For testing purposes, I have kept the sample rate to the original 22050.0 What have I tried? I have a method that I've been using for years named "resampleBuffer" that does this. When I apply it to the speech buffer, I get back noise. When I attempt to manually convert format and to stereo with "convertSpeechBufferToFloatStereo", I am getting back clipped output. I tested flipping the samples, addressing the Big Endian, Signed Integer but that didn't work. The speech buffer description is inBuffer description: <AVAudioFormat 0x6000012862b0: 1 ch, 22050 Hz, 'lpcm' (0x0000000E) 32-bit big-endian signed integer> import Cocoa import AVFoundation class SpeakerTest: NSObject, AVSpeechSynthesizerDelegate { let synth = AVSpeechSynthesizer() override init() { super.init() } func resampleBuffer( inSource: AVAudioPCMBuffer, newSampleRate: Double) -> AVAudioPCMBuffer? { // resample and convert mono to stereo var error : NSError? let kChannelStereo = AVAudioChannelCount(2) let convertRate = newSampleRate / inSource.format.sampleRate let outFrameCount = AVAudioFrameCount(Double(inSource.frameLength) * convertRate) let outFormat = AVAudioFormat(standardFormatWithSampleRate: newSampleRate, channels: kChannelStereo)! let avConverter = AVAudioConverter(from: inSource.format, to: outFormat ) let outBuffer = AVAudioPCMBuffer(pcmFormat: outFormat, frameCapacity: outFrameCount)! let inputBlock : AVAudioConverterInputBlock = { (inNumPackets, outStatus) -> AVAudioBuffer? in outStatus.pointee = AVAudioConverterInputStatus.haveData // very important, must have let audioBuffer : AVAudioBuffer = inSource return audioBuffer } avConverter?.sampleRateConverterAlgorithm = AVSampleRateConverterAlgorithm_Mastering avConverter?.sampleRateConverterQuality = .max if let converter = avConverter { let status = converter.convert(to: outBuffer, error: &error, withInputFrom: inputBlock) // print("\(status): \(status.rawValue)") if ((status != .haveData) || (error != nil)) { print("\(status): \(status.rawValue), error: \(String(describing: error))") return nil // conversion error } } else { return nil // converter not created } // print("success!") return outBuffer } func writeToFile(_ stringToSpeak: String, speaker: String) { var output : AVAudioFile? let utterance = AVSpeechUtterance(string: stringToSpeak) let desktop = "~/Desktop" let fileName = "Utterance_Test.caf" // not in sandbox var tempPath = desktop + "/" + fileName tempPath = (tempPath as NSString).expandingTildeInPath let usingSampleRate = 22050.0 // 44100.0 let outSettings = [ AVFormatIDKey : kAudioFormatLinearPCM, // kAudioFormatAppleLossless AVSampleRateKey : usingSampleRate, AVNumberOfChannelsKey : 2, AVEncoderAudioQualityKey : AVAudioQuality.max.rawValue ] as [String : Any] // temporarily ignore the speaker and use the default voice let curLangCode = AVSpeechSynthesisVoice.currentLanguageCode() utterance.voice = AVSpeechSynthesisVoice(language: curLangCode) // utterance.volume = 1.0 print("Int32.max: \(Int32.max), Int32.min: \(Int32.min)") synth.write(utterance) { (buffer: AVAudioBuffer) in guard let pcmBuffer = buffer as? AVAudioPCMBuffer else { fatalError("unknown buffer type: \(buffer)") } if ( pcmBuffer.frameLength == 0 ) { // done } else { // append buffer to file var outBuffer : AVAudioPCMBuffer outBuffer = self.resampleBuffer( inSource: pcmBuffer, newSampleRate: usingSampleRate)! // doesnt work // outBuffer = self.convertSpeechBufferToFloatStereo( pcmBuffer ) // doesnt work // outBuffer = pcmBuffer // original format does work if ( output == nil ) { //var bufferSettings = utterance.voice?.audioFileSettings // Audio files cannot be non-interleaved. var outSettings = outBuffer.format.settings outSettings["AVLinearPCMIsNonInterleaved"] = false let inFormat = pcmBuffer.format print("inBuffer description: \(inFormat.description)") print("inBuffer settings: \(inFormat.settings)") print("inBuffer format: \(inFormat.formatDescription)") print("outBuffer settings: \(outSettings)\n") print("outBuffer format: \(outBuffer.format.formatDescription)") output = try! AVAudioFile( forWriting: URL(fileURLWithPath: tempPath),settings: outSettings) } try! output?.write(from: outBuffer) print("done") } } } } class ViewController: NSViewController { let speechDelivery = SpeakerTest() override func viewDidLoad() { super.viewDidLoad() let targetSpeaker = "Allison" var sentenceToSpeak = "" for indx in 1...10 { sentenceToSpeak += "This is sentence number \(indx). [[slnc 3000]] \n" } speechDelivery.writeToFile(sentenceToSpeak, speaker: targetSpeaker) } } Three test can be performed. The only one that works is to directly write the buffer to disk Is this really "32-bit big-endian signed integer"? Am I addressing this correctly or is this a bug? I'm on macOS 11.4
Asked
by MisterE.
Last updated
.
Post not yet marked as solved
1.9k Views

Error Domain = kAFAssistantErrorDomain Code = 1700 "(null)"

My app is Bluetooth supportive, I need to use under background mode.I've already checked 'use Bluetooth LE accessories',so it seems workable under the background mode.I want to record, thus I need speech recognition by using Speech Kit.But I found that it can work only under the foreground.Once my app is back to the background mode,it shows immediately and directly an error message as: Error Domain = kAFAssistantErrorDomain Code = 1700 "(null)"Is there anything I have missed or is it the limitation that iOS doesn't support Speech Kit under background mode?
Asked
by jtain.
Last updated
.
Post not yet marked as solved
51 Views

Is there an IPA notation guide for AVSpeechSynthesizer?

I am utilizing macOS and would like to know how to create accurate or alternate pronunciations using AVSpeechSynthesizer. Is there a guide or document that indicates the unicode symbols that are used or accepted for the IPA notation? The only method that I've found is to create or obtain pronunciations is through an iPhone. References: AVSpeechSynthesisIPANotationAttribute https://developer.apple.com/videos/play/wwdc2018/236/?time=424 https://a11y-guidelines.orange.com/en/mobile/ios/wwdc/2018/236/ https://developer.apple.com/documentation/avfaudio/avspeechsynthesisipanotationattribute
Asked
by MisterE.
Last updated
.
Post not yet marked as solved
41 Views

Does Apple's Speech framework support speech recognition to work in the background of the App?

I wrote a class for speech recognition with the Speech framework. // Cancel the previous task if it's running. if (self.recognitionTask) { //[self.recognitionTask cancel]; // Will cause the system error and memory problems. [self.recognitionTask finish]; } self.recognitionTask = nil; // Configure the audio session for the app. NSError *error = nil; [AVAudioSession.sharedInstance setCategory:AVAudioSessionCategoryRecord withOptions:AVAudioSessionCategoryOptionDuckOthers error:&error]; if (error) { [self stopWithError:error]; return; } [AVAudioSession.sharedInstance setActive:YES withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&error]; if (error) { [self stopWithError:error]; return; } // Create and configure the speech recognition request. self.recognitionRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init]; self.recognitionRequest.taskHint = SFSpeechRecognitionTaskHintConfirmation; // Keep speech recognition data on device if (@available(iOS 13, *)) { self.recognitionRequest.requiresOnDeviceRecognition = NO; } // Create a recognition task for the speech recognition session. // Keep a reference to the task so that it can be canceled. __weak typeof(self)weakSelf = self; self.recognitionTask = [self.speechRecognizer recognitionTaskWithRequest:self.recognitionRequest resultHandler:^(SFSpeechRecognitionResult * _Nullable result, NSError * _Nullable error) { // ... if (error != nil || result.final) { } }]; I want to know if the Speech framework supports background tasks. If support, how do I modify the iOS code?
Asked
by Mengxiang.
Last updated
.
Post not yet marked as solved
100 Views

Is the default voice using AVSpeechSynthesizer incorrect?

My understanding from the documentation is that an utterance will use the default voice for the current user locale but that does not appear to be the case or I am doing something wrong. Is this the correct way to obtain the default system voice using AVSpeechSynthesizer or is the returned value incorrect? If it matters, I am utilizing Big Sur, 11.4 but I am not getting the correct default voice. What I get back is coincidentally, the last voice in my accessibility voice list. The default voice on my machine is currently "Kate". When using NSSpeechSynthesizer.defaultVoice is get "Kate" as the listed default voice. When using AVSpeechSynthesisVoice, the default voice returned is "Albert" which incorrect. My language code is: en-US let userCode = AVSpeechSynthesisVoice.currentLanguageCode() let usedVoice = AVSpeechSynthesisVoice(language: userCode) // should be the default voice let voice = NSSpeechSynthesizer.defaultVoice print("userCode: \(userCode)") print("NSSpeechSynthesizer: \(voice)") print("AVSpeechSynthesisVoice: \(usedVoice)") . Result: userCode: en-US NSSpeechSynthesizer: NSSpeechSynthesizerVoiceName(_rawValue: com.apple.speech.synthesis.voice.kate.premium) <--- this is the correct system default AVSpeechSynthesisVoice: Optional([AVSpeechSynthesisVoice 0x6000000051a0] Language: en-US, Name: Albert, Quality: Enhanced [com.apple.speech.synthesis.voice.Albert])
Asked
by MisterE.
Last updated
.
Post not yet marked as solved
85 Views

How to to wait for AVSpeechSynthesizer write method inline

How is it possible to wait for speech to buffer to complete inline before proceeding? I have a function that writes speech to a buffer, then resamples and manipulates the output, then included in an AVAudioengine workflow, where speech is done in faster than real-time. func createSpeechToBuffer( stringToSpeak: String, sampleRate: Float) -> AVAudioPCMBuffer? { var outBuffer : AVAudioPCMBuffer? = nil let utterance = AVSpeechUtterance(string: stringToSpeak) var speechIsBusy = true utterance.voice = AVSpeechSynthesisVoice(language: "en-us") _speechSynth.write(utterance) { (buffer: AVAudioBuffer) in guard let pcmBuffer = buffer as? AVAudioPCMBuffer else { fatalError("unknown buffer type: \(buffer)") } if ( pcmBuffer.frameLength == 0 ) { print("buffer is empty") } else { print("buffer has content \(buffer)") } outBuffer = self.resampleBuffer( inSource: pcmBuffer, newSampleRate: sampleRate) speechIsBusy = false } // wait for completion of func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) while ( _speechSynth.isSpeaking ) { /* arbitrary task waiting for write to complete */ } while ( speechIsBusy ) { /* arbitrary task waiting for write to complete */ } return outBuffer } After I wrote the method and it failed to produce the desired output (inline), I realized that it returns before getting the results of the resampling. The callback is escaping, so the initial AVAudioBuffer from the callback will return after createSpeechToBuffer has completed. The resampling does work, however I currently must save the result and continue after being notified by the delegate "didFinish utterance" to proceed. func write(_ utterance: AVSpeechUtterance, toBufferCallback bufferCallback: @escaping AVSpeechSynthesizer.BufferCallback) Attempts at waiting for _speechSynth.isSpeaking or the speechIsBusy flag are not working and a dispatch queue or semaphore are blocking the write method from completing. How is it possible to wait for the result inline versus recreating a workflow depending on the delegate "didFinish utterance"? on macOS 11.4 (Big Sur)
Asked
by MisterE.
Last updated
.
Post marked as solved
125 Views

How to make AVSpeechSynthesizer work for write and delegate (Big Sur)

I am unable to get AVSpeechSynthesizer to write or to acknowledge the delegate actions . I was informed this was resolved in macOS 11. I thought it was a lot to ask but am now running on macOS 11.4 (Big Sur). My target is to output speech faster than real-time and and drive the output through AVAudioengine. First, I need to know why the write doesnt occur and neither do delegates get called whether I am using write or simply uttering to the default speakers in "func speak(_ string: String)". What am I missing? Is there a workaround? Reference: https://developer.apple.com/forums/thread/678287 let sentenceToSpeak = "This should write to buffer and also call 'didFinish' and 'willSpeakRangeOfSpeechString' delegates." SpeakerTest().writeToBuffer(sentenceToSpeak) SpeakerTest().speak(sentenceToSpeak) class SpeakerTest: NSObject, AVSpeechSynthesizerDelegate { let synth = AVSpeechSynthesizer() override init() { super.init() synth.delegate = self } func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) { print("Utterance didFinish") } func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, willSpeakRangeOfSpeechString characterRange: NSRange, utterance: AVSpeechUtterance) { print("speaking range: \(characterRange)") } func speak(_ string: String) { let utterance = AVSpeechUtterance(string: string) var usedVoice = AVSpeechSynthesisVoice(language: "en") // should be the default voice let voices = AVSpeechSynthesisVoice.speechVoices() let targetVoice = "Allison" for voice in voices { // print("\(voice.identifier) \(voice.name) \(voice.quality) \(voice.language)") if (voice.name.lowercased() == targetVoice.lowercased()) { usedVoice = AVSpeechSynthesisVoice(identifier: voice.identifier) break } } utterance.voice = usedVoice print("utterance.voice: \(utterance.voice)") synth.speak(utterance) } func writeToBuffer(_ string: String) { print("entering writeToBuffer") let utterance = AVSpeechUtterance(string: string) synth.write(utterance) { (buffer: AVAudioBuffer) in print("executing synth.write") guard let pcmBuffer = buffer as? AVAudioPCMBuffer else { fatalError("unknown buffer type: \(buffer)") } if pcmBuffer.frameLength == 0 { print("buffer is empty") } else { print("buffer has content \(buffer)") } } } }
Asked
by MisterE.
Last updated
.
Post not yet marked as solved
112 Views

Local Speech Recognition on WatchOS

How to implement local speech recognition exclusively on WatchOS? I think there was mention of local ASR, this year? Use case is for a visually impaired person requesting information by raising their wrist and speaking a small number of words for a variety of intents. Because they would be walking with a white cane, the UI must be hands free. This works for sighted folks, as well. The number of intents are too diverse for creating individual Siri shortcuts. Ideally, the Watch App would call something akin to SFSpeechAudioBufferRecognitionRequest. Is this available? Where can I find the documentation?
Asked
by Muse.
Last updated
.
Post not yet marked as solved
184 Views

AVSpeechUtterance with Dutch AVSpeechSynthesisVoice glitches on some words

Posting this here for visibility. Already opened a bug report but maybe this helps other developers. In our app we use AVSpeechSynthesizer to speak navigation directions. Our Dutch users notice that many of the utterances glitch, where it seems to speak a random '-' in-between the text. This is reproducible by simply speaking this text on a Dutch AVSpeechSynthesisVoice: "Over 300 meter houd rechts aan richting Breda." (which means "In 300 meters, keep right towards Breda."). It glitches on the word 'aan'. Reproducible only on-device, as the Xcode simulator doesn't seem to have this issue. Tested on iOS 14.4 and 14.6, where both have this issue. The issue is very obvious to hear. Texts that also have this issue: "Over 900 meter houd rechts aan en blijf op Muntweg." "Houd rechts aan." Reproducible on-device with the following code: // This is Dutch for "In 300 meters, keep right towards Breda." let reproducableSpeakText = "Over 300 meter houd rechts aan richting Breda." let speechUtterance = AVSpeechUtterance(string: reproducableSpeakText) // Configure a Dutch voice let dutchVoice = AVSpeechSynthesisVoice(language: "nl-NL") speechUtterance.voice = dutchVoice // Speak the text synthesizer.speak(speechUtterance)
Asked Last updated
.
Post not yet marked as solved
144 Views

Allow access to the user microphone in an active phone call (not the other person on the line, only local)

For my app I want to build a SharePlay experience that uses the Speech framework. During a FaceTime call I want to perform Speech to text on the users device. I tried to get this to work, but as soon as I’m on an active phonecall and I try to set the AudioSession, it doesn’t work. try AVAudioSession.sharedInstance().setCategory(.playAndRecord, options: . try AVAudioSession.sharedInstance().setActive(true, options: []) I get the error: The operation couldn’t be completed. (OSStatus error 561017449.) Which basically means: "The app was not allowed to set the audio category because another app (Phone, etc.) is controlling it." It would be great if there was a way to get access to only the users microphone, not even the other person on the line so that it could be implemented in a privacy first way. If there is a way to achieve this please let me know, Thanks Jordi
Asked
by jordi.
Last updated
.
Post not yet marked as solved
92 Views

In speech api in safari, still don’t work properly

When we use speeach api on our page the isFinal still return false and still does not send the final result
Asked
by vAhy.
Last updated
.
Post not yet marked as solved
212 Views

AVSpeechSynthesizer Pitch Bug on Mac Catalyst

When you adjust the pitch to something greater than the default, the voice pitch becomes gradually more distorted. The only way to fix it is by creating a new instance of AVSpeechSynthesizer which ultimately leads to a crash due to a leak in system code.
Asked Last updated
.
Post marked as solved
145 Views

How do you archive mixed objects that conforms to NSSecureCoding for later retrieval?

How do you archive mixed objects that conforms to NSSecureCoding (SFTranscription) for later retrieval? I am utilizing SFSpeechRecognizer and attempting to save the results of the transcription for later analysis and processing. My issue isnt specifically with speech but rather with archiving. Unfortunately, I haven't archived before and after Googling, I have encountered challenges. struct TranscriptionResults: Codable { var currTime : Double // Running start time from beginning of file var currSegStart : Double // start from beginning of segment var currSegSecs : Double // segment length in seconds var currSegEnd : Double // end = currStart + segmentSecs, calculate dont need to save var elapsedTime : Double // how much time to process to this point var fileName : String var fileURL : URL var fileLength : Int64 var transcription : SFTranscription //* does not conform to Codable ** } Type 'TranscriptionResults' does not conform to protocol 'Decodable' Type 'TranscriptionResults' does not conform to protocol 'Encodable' When I add the SFTranscription type with " var transcription : SFTranscription" I get the above error I looked it up and SFTranscription is the following open class SFTranscription : NSObject, NSCopying, NSSecureCoding {...} SFTranscription is my issue is with complying with Codable, as it doe not look like you can mix with NSSecureCoding. I don't think my issue is specifically with SFTranscription but understanding how to save the results that include a mix of NSSecureCoding to disk. How do you save the result for later retrieval?
Asked
by MisterE.
Last updated
.
Post not yet marked as solved
1.6k Views

Speech to text API per-app limits

Hello to all kind developers out there,I’m currently developing a audio-messaging app, for users to send short audio to each other.To make the communication experience as smooth and natural as possible, we are using Speech Framework for transcribing user-input live.Since this feature is high in demand for some of our users, we are worried about unexpected quotas and limits.(1) We know that individual recordings should be less than one minute(2) The answer here says about 1000 reqs/hour per device:https://developer.apple.com/library/archive/qa/qa1951/_index.html#//apple_ref/doc/uid/DTS40017662(3) The documentation says: “Individual devices may be limited in the number of recognitions that can be performed per day and an individual app may be throttled globally, based on the number of requests it makes per day”.We are well under limits for (1) and (2), but there are no specific documentation for limits per-app, and being “throttled globally” sounds scary.Can anyone give us information about per-app limits or any other kinds of limit that might potentially put an end to our lives?Thank you
Asked
by fkymy.
Last updated
.