Can I use the speech framework to detect phonemes?

Hi,


I'm working on an app that is a kind of language tutorial that teaches students learning English how to pronounce difficult words.


One of the things I'd like to do is ask the user to speak into the app and use the speech framework to evaluate their speech.

It would be helpful if I could read out the results of the speech recognition as phonemes (syllables) instead of, or in addition to, rendered text. This would allow the app to pinpoint errors in pronounciation more accurately.


Is there a way to get this information out of the speech framework? I would imagine that speech recognition starts by recognizing phonemes, which are eventually translated into words, so I think the information must be present.


Thanks,

Frank

Keep in mind that traditionally, there are two types of speech recognition. One is where the human trains the device, and the other is where the device trains the human.


Apple's speech api is the second example. Recognition is confidence based, relying on existing capabilities. You can't train or alter or expand what already exists on Apple's servers.


I think your example would frustrate a user who is expecting your app to come to them, rather than the other way around.


Not that you can't make an app that meets your goals, but the onus is on you to come up with a solution - there are very comprehensive solutions that already exist, but I believe they are not casually available to the open market.

Can I use the speech framework to detect phonemes?
 
 
Q