[26] audioTimeRange would still be interesting for .volatileResults in SpeechTranscriber

So experimenting with the new SpeechTranscriber, if I do:

let transcriber = SpeechTranscriber(
    locale: locale,
    transcriptionOptions: [],
    reportingOptions: [.volatileResults],
    attributeOptions: [.audioTimeRange]
)

only the final result has audio time ranges, not the volatile results.

Is this a performance consideration? If there is no performance problem, it would be nice to have the option to also get speech time ranges for volatile responses. I'm not presenting the volatile text at all in the UI, I was just trying to keep statistics about the non-speech and the speech noise level, this way I can determine when the noise level falls under the noisefloor for a while.

The goal here was to finalize the recording automatically, when the noise level indicate that the user has finished speaking.

Turns out it was my bad, I had a bug in looking through the runs of the AttributredString, I now found all the audioTimeRanges

Consider using the SpeechDetector module in conjunction with SpeechTranscriber. SpeechDetector performs a similar voice activity detection function and integrates with SpeechTranscriber.

Consider using the SpeechDetector module in conjunction with SpeechTranscriber. SpeechDetector performs a similar voice activity detection function and integrates with SpeechTranscriber.

thank you, so i've been using SpeechDetector like so for a while:

let detector = SpeechDetector(detectionOptions: SpeechDetector.DetectionOptions(sensitivityLevel: .medium), reportResults: true)

if analyzer == nil {
    analyzer = SpeechAnalyzer(modules: [detector, transcriber], options: SpeechAnalyzer.Options(priority: .high, modelRetention: .processLifetime))
}

self.analyzerFormat = await SpeechAnalyzer.bestAvailableAudioFormat(compatibleWith: [transcriber])
(inputSequence, inputBuilder) = AsyncStream<AnalyzerInput>.makeStream()

Task {
    for try await result in detector.results {
            print("result: \(result.description)]")
    }
}

recognizerTask = Task {
    // ..

but I have never seen any result: in the logs.

Is there any API where SpeechDetector would tell my app when it thinks the speech is over?

The docs say

This module asks “is there speech?” and provides you with the ability to gate transcription by the presence of voices, saving power otherwise used by attempting to transcribe what is likely to be silence.

but this seems to be happening behind the scenes, without getting direct feedback.

At the moment, I keep observing the input volume, and once it is below my estimated noise-floor for about a 1 sec I stop the recording. I do this so I can trigger the next even programmatically without cutting of the users speech mid-sentence. The apps user flow does not involve a "start"/"stop" recording button, so I need to end recordings without automatically to create a seamless flow.

I see. Yes, SpeechDetector currently does its work behind the scenes and doesn't provide that sort of information — the only use it makes of its results is to give an error — so keep on doing what you're doing.

[26] audioTimeRange would still be interesting for .volatileResults in SpeechTranscriber
 
 
Q