Hi, Apple's engineer.
Hoping that you can reply to this one.
We're developing a Text-to-Speak app. Everything went well until the IOS got upgraded to 18.
AVSpeechSynthesisVoice(language: "zh-CN") is running well under IOS 16 AND IOS 17. It speaks Mandarin correctly.
In IOS 18, we noticed that Siri's Language setting interrupted the performance of AVSpeechSynthesisVoice. It plays Cantonese instead of Mandarin.
Buggy language setting in Siri that affects the AVSpeechSynthesisVoice :
Chinese (Cantonese - China mainland)
Chinese (Cantonese -Hong Kong)
Speech
RSS for tagRecognize spoken words in recorded or live audio using Speech.
Posts under Speech tag
55 Posts
Sort by:
Post
Replies
Boosts
Views
Activity
I am attempting to do batch Transcription of audio files exported from Voice Memos, and I am running into an interesting issue. If I only transcribe a single file it works every time, but if I try to batch it, only the last one works, and the others fail with No speech detected. I assumed it must be something about concurrency, so I implemented what I think should remove any chance of transcriptions running in parallel. And with a mocked up unit of work, everything looked good. So I added the transcription back in, and
1: It still fails on all but the last file. This happens if I am processing 10 files or just 2.
2: It no longer processes in order, any file can be the last one that succeeds. And it seems to not be related to file size. I have had paragraph sized notes finish last, but also a single short sentence that finishes last.
I left the mocked processFiles() for reference.
Any insights would be greatly appreciated.
import Speech
import SwiftUI
struct ContentView: View {
@State private var processing: Bool = false
@State private var fileNumber: String?
@State private var fileName: String?
@State private var files: [URL] = []
let locale = Locale(identifier: "en-US")
let recognizer: SFSpeechRecognizer?
init() {
self.recognizer = SFSpeechRecognizer(locale: self.locale)
}
var body: some View {
VStack {
if files.count > 0 {
ZStack {
ProgressView()
Text(fileNumber ?? "-")
.bold()
}
Text(fileName ?? "-")
} else {
Image(systemName: "folder.badge.minus")
Text("No audio files found")
}
}
.onAppear {
files = getFiles()
Task {
await processFiles()
}
}
}
private func getFiles() -> [URL] {
do {
let documentsURL = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).first!
let path = documentsURL.appendingPathComponent("Voice Memos").absoluteURL
let contents = try FileManager.default.contentsOfDirectory(at: path, includingPropertiesForKeys: nil, options: [])
let files = (contents.filter {$0.pathExtension == "m4a"}).sorted { url1, url2 in
url1.path < url2.path
}
return files
}
catch {
print(error.localizedDescription)
return []
}
}
private func processFiles() async {
var fileCount = files.count
for file in files {
fileNumber = String(fileCount)
fileName = file.lastPathComponent
await processFile(file)
fileCount -= 1
}
}
// private func processFile(_ url: URL) async {
// let seconds = Double.random(in: 2.0...10.0)
// await withCheckedContinuation { continuation in
// DispatchQueue.main.asyncAfter(deadline: .now() + seconds) {
// continuation.resume()
// print("\(url.lastPathComponent) \(seconds)")
// }
// }
// }
private func processFile(_ url: URL) async {
let recognitionRequest = SFSpeechURLRecognitionRequest(url: url)
recognitionRequest.requiresOnDeviceRecognition = false
recognitionRequest.shouldReportPartialResults = false
await withCheckedContinuation { continuation in
recognizer?.recognitionTask(with: recognitionRequest) { (transcriptionResult, error) in
guard transcriptionResult != nil else {
print("\(url.lastPathComponent.uppercased())")
print(error?.localizedDescription ?? "")
return
}
if ((transcriptionResult?.isFinal) == true) {
if let finalText: String = transcriptionResult?.bestTranscription.formattedString {
print("\(url.lastPathComponent.uppercased())")
print(finalText)
}
}
}
continuation.resume()
}
}
}
While running Swift's SpeechRecognition capabilities I get the error below. However, the app successfully transcribes the audio file.
So am not sure how worried I have to be, as well, would like to know that if when that error occurred, did that mean that the app went to the internet to transcribe that file? Yes, requiresOnDeviceRecognition is set to false.
Would like to know what that error meant, and how much I need to worry about it?
Received an error while accessing com.apple.speech.localspeechrecognition service: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)"
VoiceOver does not support the plist property CFBundleSpokenName. This is wrong and should be fixed.
Ultimately the issue I am dealing with is that our app name is UWCU, and instead of VoiceOver pronouncing each letter, it tries to read this as a word and horribly butchers our organization's/app's name.
Alternatives such as using U.W.C.U. and U W C U are not acceptable.
@Apple, I know you're first response is going to be "no, it is working perfectly," but quite frankly you are wrong. I know you feel strongly about this, given your response in posts like this:
https://forums.developer.apple.com/forums/thread/734545?answerId=760084022
HOWEVER, with iOS 18, your argument for "VoiceOver should read what's on the screen" doesn't hold water anymore. With iOS 18, you Apple have added a new feature that lets users customize their home screens and completely remove the name of apps. Here's your own guide:
https://support.apple.com/guide/iphone/customize-apps-and-widgets-on-the-home-screen-iph385473442/ios
Quoted from your guide:
Make the icons bigger: Tap Large. (In large size, the names of the apps disappear.)
With large icons + VoiceOver turned on, VoiceOver still reads the app name even though it has disappeared from the screen. So, your own argument "VoiceOver should read the text as it appears on the screen" is invalid, because there is NO text on the screen.
If you can't tell, I'm pretty peeved about all this. There's a reason why screen readers support aria attributes to help deliver the right accessible experience. It's a simple ask for VoiceOver to do the same thing.
Can anyone please guide me on how to use SFCustomLanguageModelData.CustomPronunciation?
I am following the below example from WWDC23
https://wwdcnotes.com/documentation/wwdcnotes/wwdc23-10101-customize-ondevice-speech-recognition/
While using this kind of custom pronunciations we need X-SAMPA string of the specific word.
There are tools available on the web to do the same
Word to IPA: https://openl.io/
IPA to X-SAMPA: https://tools.lgm.cl/xsampa.html
But these tools does not seem to produce the same kind of X-SAMPA strings used in demo, example - "Winawer" is converted to "w I n aU @r".
While using any online tools it gives - "/wI"nA:w@r/".
So, I'm trying to create my own text-to-speech setup. Problem I'm having is whenever I do a test run, the speech gets a bit choppy at the start kind of skipping over maybe a word or a few characters.
A few details:
I've essentially built a separate class for handling the speech events.
AVSpeechSynthesizer is set up as a private variable for the class so I don't expect deallocation to be the issue. Especially since it's a problem at the start.
I've got a queue set up for what it's worth so that shouldn't be a problem.
I'd appreciate any advice.
Hello.
I can't find anything about the SSML that is used in Apple's speech synthesis.
SSML from Google, Amazon and W3C either don't work or work incorrectly.
Where is Apple's documentation for their implementation of SSML?
Have there been any hints that Apple may offer speech diarization services (speech recongnition which recognizes multiple speakers) in the near future?
Here is the demo from Apple's site
This issues is specific to iOS 18.
When running this demo, we are getting new text when we have a gap in speaking, the recognitionTask(with:resultHandler:) provides new text which is only spoken after the gap and not the concatenation of old text and the new spoken text.
Hello,
I noticed that SFSpeechRecognizer is broken on iOS 18. During a recognition task, it keeps dropping the recognized text on every pause. For example, if you say "how are you fine", it will drop the "how are you" part and only give you "fine" as the result.
Say "how are you <pause> fine"
// iOS 17 ✅ (perfect final result)
How
How are
How are you
How are you.
How are you. Fine.
// iOS 18 ❌
How
How are
How are you
How are you
Fine
(the text before the pause is dropped, and fail to recognize the punctuations.)
Reproducing the issue:
Download the official sample project.
Run it on an iOS 18 device or simulator.
Say "how are you fine"
Only "fine" will be displayed.
It looks like Apple has added some new API(s) to SFSpeechRecognition
My app, which is currently listed on App Store does feature speech recognition.
Yet, trying to use it under iOS 18.0 throws errors:
-[SFSpeechRecognitionTask localSpeechRecognitionClient:speechRecordingDidFail:]_block_invoke Ignoring subsequent local speech recording error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)"
What happens is that after several words are transcribed and displayed, the next sentence results in previous words disappearance.
That's probably what that portion of the error text - "Ignoring subsequent local speech recording error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)" means.
The problem occurs ONLY when the app is running under iOS 18.0
Even when it's compiled in Xcode 16.0 using iOS 17.5 everything works fine.
Any suggestions?
Hi everyone !
I'm getting random crashes when I'm using the Speech Recognizer functionality in my app.
This is an old bug (for 8 years on Apple Forums) and I will really appreciate if anyone from Apple will be able to find a fix for this crashes.
Can anyone also help me please to understand what could I do to keep the Speech Recognizer functionality still available in my app, but to avoid this crashes (if there is any other native library available or a CocoaPod library).
Here is my code and also the crash log for it.
Code:
func startRecording() {
startStopRecordBtn.setImage(UIImage(#imageLiteral(resourceName: "microphone_off")), for: .normal)
if UserDefaults.standard.bool(forKey: Constants.darkTheme) {
commentTextView.textColor = .white
} else {
commentTextView.textColor = .black
}
commentTextView.isUserInteractionEnabled = false
recordingLabel.text = Constants.recording
if recognitionTask != nil {
recognitionTask?.cancel()
recognitionTask = nil
}
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(AVAudioSession.Category.record)
try audioSession.setMode(AVAudioSession.Mode.measurement)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
} catch {
showAlertWithTitle(message: Constants.error)
}
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
let inputNode = audioEngine.inputNode
guard let recognitionRequest = recognitionRequest else {
fatalError(Constants.error)
}
recognitionRequest.shouldReportPartialResults = true
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
var isFinal = false
if result != nil {
self.commentTextView.text = result?.bestTranscription.formattedString
isFinal = (result?.isFinal)!
}
if error != nil || isFinal {
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
self.startStopRecordBtn.isEnabled = true
}
})
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) {[weak self] (buffer: AVAudioPCMBuffer, when: AVAudioTime) in // CRASH HERE
self?.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch {
showAlertWithTitle(message: Constants.error)
}
}
Here is the crash log:
Thanks for very much for reading this !
Hi everyone, I might need some help with on-device recognition. It seems that the speech recognition task will discard whatever it has transcribed after a new sentence starts (or it believes it becomes a new sentence) during a single audio session, with requiresOnDeviceRecognition is set to true.
This doesn't happen with requiresOnDeviceRecognition set to false.
System environment: macOS 14 with Xcode 15, deploying to iOS 17
Thank you all!
Recently I updated to Xcode 14.0. I am building an iOS app to convert recorded audio into text. I got an exception while testing the application from the simulator(iOS 16.0).
[SpeechFramework] -[SFSpeechRecognitionTask handleSpeechRecognitionDidFailWithError:]_block_invoke Ignoring subsequent recongition error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)"
Error Domain=kAFAssistantErrorDomain Code=1107 "(null)"
I have to know what does the error code means and why this error occurred.
in iOS 15, on stopSpeaking of AVSpeechSynthesizer,
didFinish delegate method getting called instead of didCancel which is working fine in iOS 14 and below version.