I am experimenting with SpeechTranscriber and am curious if I can get quicker results when using buffered audio, rather than a file. The use case is a voice ordering experience for a restaurant. When I've been playing with it, it takes about 3 seconds for faster results and 7-8 seconds for accurate results. Is there any way to bring this down a bit?
In this WWDC demo, the results appear nearly instantaneously. I'm curious how to replicate this in my app. I presume DicationTranscriber is faster, but how is siri detecting when the user stops speaking? Is it custom code, or is it using SpeechDetector? I tried using SpeechDetector with SpeechTranscriber but the detector didn't emit any results and seemed to slow down the results of SpeechTranscriber. I also assumed SpeechTranscriber makes more sense than DictationTranscriber in this use case, but want to confirm.