Am trying to distinguish the differences in volumes between background noise, and someone speaking in Swift.
Previously, I had come across a tutorial which had me looking at the power levels in each channel. It come out as the code listed in Sample One which I called within the installTap closure. It was ok, but the variance between background and the intended voice to record, wasn't that great. Sure, it could have been the math used to calculate it, but since I have no experience in audio data, it was like reading another language.
Then I came across another demo. It's code was much simpler, and the difference in values between background noise and speaking voice was much greater, therefore much more detectable. It's listed here in Sample Two, which I also call within the installTap closure.
My issue here is wanting to understand what is happening in the code. In all my experiences with other languages, voice was something I never dealt with before, so this is way over my head.
Not looking for someone to explain this to me line by line. But if someone could let me know where I can find decent documentation so I can better grasp what is going on, I would appreciate it.
Thank you
Sample One
func audioMetering(buffer:AVAudioPCMBuffer) {
// buffer.frameLength = 1024
let inNumberFrames:UInt = UInt(buffer.frameLength)
if buffer.format.channelCount > 0 {
let samples = (buffer.floatChannelData![0])
var avgValue:Float32 = 0
vDSP_meamgv(samples,1 , &avgValue, inNumberFrames)
var v:Float = -100
if avgValue != 0 {
v = 20.0 * log10f(avgValue)
}
self.averagePowerForChannel0 = (self.LEVEL_LOWPASS_TRIG*v) + ((1-self.LEVEL_LOWPASS_TRIG)*self.averagePowerForChannel0)
self.averagePowerForChannel1 = self.averagePowerForChannel0
}
if buffer.format.channelCount > 1 {
let samples = buffer.floatChannelData![1]
var avgValue:Float32 = 0
vDSP_meamgv(samples, 1, &avgValue, inNumberFrames)
var v:Float = -100
if avgValue != 0 {
v = 20.0 * log10f(avgValue)
}
self.averagePowerForChannel1 = (self.LEVEL_LOWPASS_TRIG*v) + ((1-self.LEVEL_LOWPASS_TRIG)*self.averagePowerForChannel1)
}
}
Sample Two
private func getVolume(from buffer: AVAudioPCMBuffer, bufferSize: Int) -> Float {
guard let channelData = buffer.floatChannelData?[0] else {
return 0
}
let channelDataArray = Array(UnsafeBufferPointer(start:channelData, count: bufferSize))
var outEnvelope = [Float]()
var envelopeState:Float = 0
let envConstantAtk:Float = 0.16
let envConstantDec:Float = 0.003
for sample in channelDataArray {
let rectified = abs(sample)
if envelopeState < rectified {
envelopeState += envConstantAtk * (rectified - envelopeState)
} else {
envelopeState += envConstantDec * (rectified - envelopeState)
}
outEnvelope.append(envelopeState)
}
// 0.007 is the low pass filter to prevent
// getting the noise entering from the microphone
if let maxVolume = outEnvelope.max(),
maxVolume > Float(0.015) {
return maxVolume
} else {
return 0.0
}
}