Computing the duration of an utterance

Is there a way to compute (or estimate) the duration of an AVSpeechUtterance? I would like to know how long the AVSpeechSynthesizer will take to speak a phrase without actually speaking it. Thanks.

Not that I know of. I wish there was, because I have a use case for it too.


It's not a terrible idea to estimate the duration from the number of characters. For example, if you're speaking to accompany an animation, then it works reasonably well to use a duration estimate before proceeding with a later animation. You can also string partial utterances together by having them play sequentially (which I think is the default). If you combine this with the delegate completion method, you can probably do something that looks reasonably good.

Thanks for the reply. I was actually using a similar approach to estimate the duration but I've found that it became completely unreliable now that I've translated my app to Chinese; where each charater can be a complete word.

Well, there is CFStringTransform/kCFStringTransformToLatin. In Asian languages, the latin text is probably going to give better estimates than European languages give, since the transliteration is generally closer to phonetic.

Thanks for the help Quincey. The CFStringTransform/kCFStringTransformToLatin hint was useful but in the end I had to write a separate app that I run offline to precompute all the durations and save them to a file. The app just plays the strings with synthesizer and waits for the completion delegate method to be called. It turned out for my particular case that the estimate wasn't good enough and I needed the exact durations.

It's disappointing that there's no AVSpeechUtterance.duration property or something like it.


I'd like to make logic deicisions based on some of the speech. Very little of it in my app is known at compile time. Are there any 3rd party text-to-speech libraries that have this functionality?

Computing the duration of an utterance
 
 
Q