Consider using the SpeechDetector module in conjunction with SpeechTranscriber. SpeechDetector performs a similar voice activity detection function and integrates with SpeechTranscriber.
thank you, so i've been using SpeechDetector
like so for a while:
let detector = SpeechDetector(detectionOptions: SpeechDetector.DetectionOptions(sensitivityLevel: .medium), reportResults: true)
if analyzer == nil {
analyzer = SpeechAnalyzer(modules: [detector, transcriber], options: SpeechAnalyzer.Options(priority: .high, modelRetention: .processLifetime))
}
self.analyzerFormat = await SpeechAnalyzer.bestAvailableAudioFormat(compatibleWith: [transcriber])
(inputSequence, inputBuilder) = AsyncStream<AnalyzerInput>.makeStream()
Task {
for try await result in detector.results {
print("result: \(result.description)]")
}
}
recognizerTask = Task {
// ..
but I have never seen any result:
in the logs.
Is there any API where SpeechDetector would tell my app when it thinks the speech is over?
The docs say
This module asks “is there speech?” and provides you with the ability to gate transcription by the presence of voices, saving power otherwise used by attempting to transcribe what is likely to be silence.
but this seems to be happening behind the scenes, without getting direct feedback.
At the moment, I keep observing the input volume, and once it is below my estimated noise-floor for about a 1 sec I stop the recording.
I do this so I can trigger the next even programmatically without cutting of the users speech mid-sentence. The apps user flow does not involve a "start"/"stop" recording button, so I need to end recordings without automatically to create a seamless flow.