We recently started working on getting an iOS app to work on Macs with Apple Silicon as a "Designed for iPhone" app and are having issues with speech synthesis.
Specifically, voices retuned by AVSpeechSynthesisVoice.speechVoices() do not all work on the Mac. When we build an utterance and attempt to speak, the synthesizer falls back on a default voice and says some very odd text about voice parameters (that is not in the utterance speech text) before it does say the intended speech.
Here is some sample code to setup the utterance and speak:
func speak(_ text: String, _ settings: AppSettings) {
let utterance = AVSpeechUtterance(string: text)
if let voice = AVSpeechSynthesisVoice(identifier: settings.selectedVoiceIdentifier) {
utterance.voice = voice
print("speak: voice assigned \(voice.audioFileSettings)")
} else {
print("speak: voice error")
}
utterance.rate = settings.speechRate
utterance.pitchMultiplier = settings.speechPitch
do {
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(.playback, mode: .default, options: .duckOthers)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
self.synthesizer.speak(utterance)
return
} catch let error {
print("speak: Error setting up AVAudioSession: \(error.localizedDescription)")
}
}
When running the app on the Mac, this is the kind of error we get with "com.apple.eloquence.en-US.Rocko" as the selectedVoiceIdentifier:
speak: voice assgined [:]
2023-05-29 18:00:14.245513-0700 A.I.[9244:240554] [aqme] AQMEIO_HAL.cpp:742 kAudioDevicePropertyMute returned err 2003332927
2023-05-29 18:00:14.410477-0700 A.I.[9244:240554] Could not retrieve voice [AVSpeechSynthesisProviderVoice 0x6000033794f0] Name: Rocko, Identifier: com.apple.eloquence.en-US.Rocko, Supported Languages (
"en-US"
), Age: 0, Gender: 0, Size: 0, Version: (null)
2023-05-29 18:00:14.412837-0700 A.I.[9244:240554] Could not retrieve voice [AVSpeechSynthesisProviderVoice 0x6000033794f0] Name: Rocko, Identifier: com.apple.eloquence.en-US.Rocko, Supported Languages (
"en-US"
), Age: 0, Gender: 0, Size: 0, Version: (null)
2023-05-29 18:00:14.413774-0700 A.I.[9244:240554] Could not retrieve voice [AVSpeechSynthesisProviderVoice 0x6000033794f0] Name: Rocko, Identifier: com.apple.eloquence.en-US.Rocko, Supported Languages (
"en-US"
), Age: 0, Gender: 0, Size: 0, Version: (null)
2023-05-29 18:00:14.414661-0700 A.I.[9244:240554] Could not retrieve voice [AVSpeechSynthesisProviderVoice 0x6000033794f0] Name: Rocko, Identifier: com.apple.eloquence.en-US.Rocko, Supported Languages (
"en-US"
), Age: 0, Gender: 0, Size: 0, Version: (null)
2023-05-29 18:00:14.415544-0700 A.I.[9244:240554] Could not retrieve voice [AVSpeechSynthesisProviderVoice 0x6000033794f0] Name: Rocko, Identifier: com.apple.eloquence.en-US.Rocko, Supported Languages (
"en-US"
), Age: 0, Gender: 0, Size: 0, Version: (null)
2023-05-29 18:00:14.416384-0700 A.I.[9244:240554] Could not retrieve voice [AVSpeechSynthesisProviderVoice 0x6000033794f0] Name: Rocko, Identifier: com.apple.eloquence.en-US.Rocko, Supported Languages (
"en-US"
), Age: 0, Gender: 0, Size: 0, Version: (null)
2023-05-29 18:00:14.416804-0700 A.I.[9244:240554] [AXTTSCommon] Audio Unit failed to start after 5 attempts.
2023-05-29 18:00:14.416974-0700 A.I.[9244:240554] [AXTTSCommon] VoiceProvider: Could not start synthesis for request SSML Length: 140, Voice: [AVSpeechSynthesisProviderVoice 0x6000033794f0] Name: Rocko, Identifier: com.apple.eloquence.en-US.Rocko, Supported Languages (
"en-US"
), Age: 0, Gender: 0, Size: 0, Version: (null), converted from tts request [TTSSpeechRequest 0x600002c29590] <speak><voice name="com.apple.eloquence.en-US.Rocko">How much wood would a woodchuck chuck if a wood chuck could chuck wood?</voice></speak> language: en-US footprint: premium rate: 0.500000 pitch: 1.000000 volume: 1.000000
2023-05-29 18:00:14.428421-0700 A.I.[9244:240360] [VOTSpeech] Failed to speak request with error: Error Domain=TTSErrorDomain Code=-4010 "(null)". Attempting to speak again with fallback identifier: com.apple.voice.compact.en-US.Samantha
When we run AVSpeechSynthesisVoice.speechVoices(), the "com.apple.eloquence.en-US.Rocko" is absolutely in the list but fails to speak properly.
Notice that the line:
print("speak: voice assigned \(voice.audioFileSettings)")
Shows:
speak: voice assigned [:]
The .audioFileSettings being empty seems to be a common factor for the voices that do not work properly on the Mac.
For voices that do work, we see this kind of output and values in the .audioFileSettings:
speak: voice assigned ["AVFormatIDKey": 1819304813, "AVLinearPCMBitDepthKey": 16, "AVLinearPCMIsBigEndianKey": 0, "AVLinearPCMIsFloatKey": 0, "AVSampleRateKey": 22050, "AVLinearPCMIsNonInterleaved": 0, "AVNumberOfChannelsKey": 1]
So we added a function to check the .audioFileSettings for each voice returned by AVSpeechSynthesisVoice.speechVoices():
//The voices are set in init():
var voices = AVSpeechSynthesisVoice.speechVoices()
...
func checkVoices() {
DispatchQueue.global().async { [weak self] in
guard let self = self else { return }
let checkedVoices = self.voices.map { ($0.0, $0.0.audioFileSettings.count) }
DispatchQueue.main.async {
self.voices = checkedVoices
}
}
}
That looks simple enough, and does work to identify which voices have no data in their .audioFileSettings. But we have to run it asynchronously because on a real iPhone device, it takes more than 9 seconds and produces a tremendous amount of error spew to the console.
2023-06-02 10:56:59.805910-0700 A.I.[17186:910118] [catalog] Query for com.apple.MobileAsset.VoiceServices.VoiceResources failed: 2
2023-06-02 10:56:59.971435-0700 A.I.[17186:910118] [catalog] Query for com.apple.MobileAsset.VoiceServices.VoiceResources failed: 2
2023-06-02 10:57:00.122976-0700 A.I.[17186:910118] [catalog] Query for com.apple.MobileAsset.VoiceServices.VoiceResources failed: 2
2023-06-02 10:57:00.144430-0700 A.I.[17186:910116] [AXTTSCommon] MauiVocalizer: 11006 (Can't compile rule): regularExpression=\Oviedo(?=, (\x1b\\pause=\d+\\)?Florida)\b, message=unrecognized character follows \, characterPosition=1
2023-06-02 10:57:00.147993-0700 A.I.[17186:910116] [AXTTSCommon] MauiVocalizer: 16038 (Resource load failed): component=ttt/re, uri=, contentType=application/x-vocalizer-rettt+text, lhError=88602000
2023-06-02 10:57:00.148036-0700 A.I.[17186:910116] [AXTTSCommon] Error loading rules: 2147483648
... This goes on and on and on ...
There must be a better way?