Recognize spoken words in recorded or live audio using Speech.

Speech Documentation

Posts under Speech tag

51 Posts
Sort by:
Post not yet marked as solved
1 Replies
370 Views
Hello, I am looking for information on TTS and STT. I am aware that there is possibility to implement both offline and online. I am interested in knowing if it is possible to enable ondevice TTS and STT for third party app, even when the device is online. **Our use case is: when the app is still online, we wish to do TTS and STT on device and not on Apple server(privacy concerns). ** Please let me know if it is possible at all or point me in the right direction. I really appreciate and look forward to your reply.
Posted Last updated
.
Post not yet marked as solved
5 Replies
1k Views
Is the format description AVSpeechSynthesizer for the speech buffer is correct? When I attempt to convert it, I get back noise from two different conversion methods. I am seeking to convert the speech buffer provided by the AVSpeechSynthesizer "func write(_ utterance: AVSpeechUtterance..." method. The goal is to convert the sample type, change the sample rate and change from mono to stereo buffer. I later manipulate the buffer data and pass it through AVAudioengine. For testing purposes, I have kept the sample rate to the original 22050.0 What have I tried? I have a method that I've been using for years named "resampleBuffer" that does this. When I apply it to the speech buffer, I get back noise. When I attempt to manually convert format and to stereo with "convertSpeechBufferToFloatStereo", I am getting back clipped output. I tested flipping the samples, addressing the Big Endian, Signed Integer but that didn't work. The speech buffer description is inBuffer description: <AVAudioFormat 0x6000012862b0: 1 ch, 22050 Hz, 'lpcm' (0x0000000E) 32-bit big-endian signed integer> import Cocoa import AVFoundation class SpeakerTest: NSObject, AVSpeechSynthesizerDelegate { let synth = AVSpeechSynthesizer() override init() { super.init() } func resampleBuffer( inSource: AVAudioPCMBuffer, newSampleRate: Double) -> AVAudioPCMBuffer? { // resample and convert mono to stereo var error : NSError? let kChannelStereo = AVAudioChannelCount(2) let convertRate = newSampleRate / inSource.format.sampleRate let outFrameCount = AVAudioFrameCount(Double(inSource.frameLength) * convertRate) let outFormat = AVAudioFormat(standardFormatWithSampleRate: newSampleRate, channels: kChannelStereo)! let avConverter = AVAudioConverter(from: inSource.format, to: outFormat ) let outBuffer = AVAudioPCMBuffer(pcmFormat: outFormat, frameCapacity: outFrameCount)! let inputBlock : AVAudioConverterInputBlock = { (inNumPackets, outStatus) -> AVAudioBuffer? in outStatus.pointee = AVAudioConverterInputStatus.haveData // very important, must have let audioBuffer : AVAudioBuffer = inSource return audioBuffer } avConverter?.sampleRateConverterAlgorithm = AVSampleRateConverterAlgorithm_Mastering avConverter?.sampleRateConverterQuality = .max if let converter = avConverter { let status = converter.convert(to: outBuffer, error: &error, withInputFrom: inputBlock) // print("\(status): \(status.rawValue)") if ((status != .haveData) || (error != nil)) { print("\(status): \(status.rawValue), error: \(String(describing: error))") return nil // conversion error } } else { return nil // converter not created } // print("success!") return outBuffer } func writeToFile(_ stringToSpeak: String, speaker: String) { var output : AVAudioFile? let utterance = AVSpeechUtterance(string: stringToSpeak) let desktop = "~/Desktop" let fileName = "Utterance_Test.caf" // not in sandbox var tempPath = desktop + "/" + fileName tempPath = (tempPath as NSString).expandingTildeInPath let usingSampleRate = 22050.0 // 44100.0 let outSettings = [ AVFormatIDKey : kAudioFormatLinearPCM, // kAudioFormatAppleLossless AVSampleRateKey : usingSampleRate, AVNumberOfChannelsKey : 2, AVEncoderAudioQualityKey : AVAudioQuality.max.rawValue ] as [String : Any] // temporarily ignore the speaker and use the default voice let curLangCode = AVSpeechSynthesisVoice.currentLanguageCode() utterance.voice = AVSpeechSynthesisVoice(language: curLangCode) // utterance.volume = 1.0 print("Int32.max: \(Int32.max), Int32.min: \(Int32.min)") synth.write(utterance) { (buffer: AVAudioBuffer) in guard let pcmBuffer = buffer as? AVAudioPCMBuffer else { fatalError("unknown buffer type: \(buffer)") } if ( pcmBuffer.frameLength == 0 ) { // done } else { // append buffer to file var outBuffer : AVAudioPCMBuffer outBuffer = self.resampleBuffer( inSource: pcmBuffer, newSampleRate: usingSampleRate)! // doesnt work // outBuffer = self.convertSpeechBufferToFloatStereo( pcmBuffer ) // doesnt work // outBuffer = pcmBuffer // original format does work if ( output == nil ) { //var bufferSettings = utterance.voice?.audioFileSettings // Audio files cannot be non-interleaved. var outSettings = outBuffer.format.settings outSettings["AVLinearPCMIsNonInterleaved"] = false let inFormat = pcmBuffer.format print("inBuffer description: \(inFormat.description)") print("inBuffer settings: \(inFormat.settings)") print("inBuffer format: \(inFormat.formatDescription)") print("outBuffer settings: \(outSettings)\n") print("outBuffer format: \(outBuffer.format.formatDescription)") output = try! AVAudioFile( forWriting: URL(fileURLWithPath: tempPath),settings: outSettings) } try! output?.write(from: outBuffer) print("done") } } } } class ViewController: NSViewController { let speechDelivery = SpeakerTest() override func viewDidLoad() { super.viewDidLoad() let targetSpeaker = "Allison" var sentenceToSpeak = "" for indx in 1...10 { sentenceToSpeak += "This is sentence number \(indx). [[slnc 3000]] \n" } speechDelivery.writeToFile(sentenceToSpeak, speaker: targetSpeaker) } } Three test can be performed. The only one that works is to directly write the buffer to disk Is this really "32-bit big-endian signed integer"? Am I addressing this correctly or is this a bug? I'm on macOS 11.4
Posted
by MisterE.
Last updated
.
Post not yet marked as solved
0 Replies
322 Views
After discovering that the upgrade of ios15, the topSpeaking method using AVSpeechSynthesizer did not correctly trigger the speedSynthesizer (:didCancel:), but rather triggered the speedSynthesizer method ('didFinish:') which led to some of my business errors and solved
Posted
by yuxutao.
Last updated
.
Post not yet marked as solved
0 Replies
363 Views
Hi! Great to see this forum! I’m new to developing WatchOS apps, and I have a question with regard to dictation. As far as I know I can develop an independent Apple Watch app with dictation capabilities that can connect to Apple services for speech recognition through WiFi, 4G, etc. I’ve experienced with another app that after about 30 seconds, live dictation cuts off, and the “Done” button on the top disappears, leaving only the “Cancel” button. I’m not sure if this is an app specific issue, but it results in loss of input due to being forced to press the “Cancel” button. Apart from that, I like to prepare speeches while I take a jog, so I want to develop a practical app where I can continue to speak with dictation. My questions are: Is there a way to increase live dictation timeout? Can we expect offline dictation anytime soon?
Posted Last updated
.
Post not yet marked as solved
2 Replies
709 Views
Hi, I use device-local speech recognition for speech input. Now some iOS 15 upgraded devices return the new error domain / code kLSRErrorDomain, code 201 (previously the errors were mostly in kAFAssistantErrorDomain). Has anybody an idea what it means and how to fix it? Thanks!
Posted
by TH0MAS.
Last updated
.
Post not yet marked as solved
0 Replies
184 Views
Since Version 14.2 we are having issues with STT. By the past we were using Azure and it was working fine. Since you've implemented partially the Speech Recognition API things are getting worse, on IOS. No problem on Osx. It seems like the recording we send to STT has a very poor quality and some part of sentence missing. When I implement it solo ti works fine, but soon as I play an audio before opening the microphone it does'nt work anymore (or only partially). I come to the question : Would there be a solution while waiting for you to deploy a working Speech Recongnition API ?
Posted
by dede_013.
Last updated
.
Post not yet marked as solved
1 Replies
365 Views
I'm trying to find specific information on how Apple transfers & stores the voice data that's transferred for speech recognition in Safari as part of WebSpeechAPI. All I keep seeing are generic privacy documents that do not provide any detail. Is anyone able to point me in the right direction of an explanation of how customer data is used?
Posted
by antonpug.
Last updated
.
Post not yet marked as solved
2 Replies
2.2k Views
Hello to all kind developers out there,I’m currently developing a audio-messaging app, for users to send short audio to each other.To make the communication experience as smooth and natural as possible, we are using Speech Framework for transcribing user-input live.Since this feature is high in demand for some of our users, we are worried about unexpected quotas and limits.(1) We know that individual recordings should be less than one minute(2) The answer here says about 1000 reqs/hour per device:https://developer.apple.com/library/archive/qa/qa1951/_index.html#//apple_ref/doc/uid/DTS40017662(3) The documentation says: “Individual devices may be limited in the number of recognitions that can be performed per day and an individual app may be throttled globally, based on the number of requests it makes per day”.We are well under limits for (1) and (2), but there are no specific documentation for limits per-app, and being “throttled globally” sounds scary.Can anyone give us information about per-app limits or any other kinds of limit that might potentially put an end to our lives?Thank you
Posted
by fkymy.
Last updated
.
Post not yet marked as solved
0 Replies
441 Views
I use AVSpeechSynthesizer to pronounce some text in German. Sometimes it works just fine and sometimes it doesn't for some unknown to me reason (there is no error, because the speak() method doesn't throw and the only thing I am able to observe is the following message logged in the console): _BeginSpeaking: couldn't begin playback I tried to find some API in the AVSpeechSynthesizerDelegate to register a callback when error occurs, but I have found none. The closest match was this (but it appears to be only available for macOS, not iOS): https://developer.apple.com/documentation/appkit/nsspeechsynthesizerdelegate/1448407-speechsynthesizer?changes=_10 Below you can find how I initialize and use the speech synthesizer in my app: class Speaker: NSObject, AVSpeechSynthesizerDelegate {   class func sharedInstance() -> Speaker {     struct Singleton {       static var sharedInstance = Speaker()     }     return Singleton.sharedInstance   }       let audioSession = AVAudioSession.sharedInstance()   let synth = AVSpeechSynthesizer()       override init() {     super.init()     synth.delegate = self   }       func initializeAudioSession() {     do {       try audioSession.setCategory(.playback, mode: .spokenAudio, options: .duckOthers)       try audioSession.setActive(true, options: .notifyOthersOnDeactivation)     } catch {             }   }       func speak(text: String, language: String = "de-DE") { guard !self.synth.isSpeaking else { return }     let utterance = AVSpeechUtterance(string: text)     let voice = AVSpeechSynthesisVoice.speechVoices().filter { $0.language == language }.first!           utterance.voice = voice     self.synth.speak(utterance)   } } The audio session initialization is ran during app started just once. Afterwards, speech is synthesized by running the following code: Speaker.sharedInstance.speak(text: "Lederhosen") The problem is that I have no way of knowing if the speech synthesis succeeded—the UI is showing "speaking" state, but nothing is actually being spoken.
Posted Last updated
.
Post not yet marked as solved
0 Replies
327 Views
To whom this may concern, I am Nakata in Japan. in charge of Software developer. I am writing to you for the first time. I am contacting you to enquire about . 【Question】 1.What are you doing by connecting to the network? 2.Please tell me how to use it without connecting to the network. 【Thing you want to do】 I want to perform voice recognition offline and in Japanese. 【Current status】 I'm using SFSpeechRecognizer. When voice recognition is performed on a tablet that is not connected to the network. Speech recognition is not available because 「SFSpeechRecognitionResult == Null」. You can use it when you are connected to the network, with the application that implements voice recognition running. 【Development environment】 Xamarin Xcode C# iPad mini with ios13.4.1.
Posted
by nakata.
Last updated
.