Shazam fails to find match

Hi,

I'm trying to convert a stream into a PCMBuffer and then use Shazam to match. Shazam always fails to match. I have a theory it "listens" to the playback at double speed or more.

Starts from here:

...
let format = audioEngine.outputNode.inputFormat(forBus: 0)
guard let pcmBuffer = format.toPCMBuffer(frame: currentFrame) else {
         return
}
session.matchStreamingBuffer(pcmBuffer, at: nil)

Where toPCMBuffer is:

extension AVAudioFormat {
    func toPCMBuffer(frame: AudioFrame) -> AVAudioPCMBuffer? {
        guard let pcmBuffer = AVAudioPCMBuffer(pcmFormat: self, frameCapacity: UInt32(frame.dataWrap.size[0]) / streamDescription.pointee.mBytesPerFrame) else {
            return nil
        }
        pcmBuffer.frameLength = pcmBuffer.frameCapacity
        for i in 0 ..< min(Int(pcmBuffer.format.channelCount), frame.dataWrap.size.count) {
            frame.dataWrap.data[i]?.withMemoryRebound(to: Float.self, capacity: Int(pcmBuffer.frameCapacity)) { srcFloatsForChannel in
                pcmBuffer.floatChannelData?[i].assign(from: srcFloatsForChannel, count: Int(pcmBuffer.frameCapacity))
            }
        }
        return pcmBuffer
    }
}

AudioFrame is:

final class AudioFrame: MEFrame {

    var timebase = Timebase.defaultValue
    var duration: Int64 = 0
    var size: Int64 = 0
    var position: Int64 = 0
    var numberOfSamples = 0
    let dataWrap: ByteDataWrap

    public init(bufferSize: Int32, channels: Int32) {
        dataWrap = ObjectPool.share.object(class: ByteDataWrap.self, key: "AudioData_\(channels)") { ByteDataWrap() }

        if dataWrap.size[0] < bufferSize {
            dataWrap.size = Array(repeating: Int(bufferSize), count: Int(channels))
        }
    }
...

}

and MEFrame is:

extension MEFrame {
    public var seconds: TimeInterval { cmtime.seconds }
    public var cmtime: CMTime { timebase.cmtime(for: position) }

}
Post not yet marked as solved Up vote post of wotson Down vote post of wotson
1.3k views

Replies

Hello

Could you share how you are building the AudioFrame? There could very well be a difference between the AVAudioFormat you are using and the actual audio that you dealing with.

You can confirm that your buffers are in the correct format by writing them to an AVAudioFile and playing them back.

A couple of small things that may or may not be relevant:

  • This code assumes that the format contains float data, it could be one of the other common formats, check the AVAudioFormat 
  • Check for interleaved formats, it doesn’t appear to be handled (ShazamKit also does not accept interleaved formats, use a converter to a supported format).

Thanks

Thank you.

So this is how the Audio Frame is being allocated:

private func audioPlayerShouldInputData(ioData: UnsafeMutableAudioBufferListPointer, numberOfFrames: UInt32) {
        var ioDataWriteOffset = 0
        var numberOfSamples = Int(numberOfFrames)
        while numberOfSamples > 0 {
            if currentRender == nil {
                currentRender = renderSource?.getAudioOutputRender()
            }
            guard let currentRender = currentRender else {
                break
            }
            let residueLinesize = currentRender.numberOfSamples - currentRenderReadOffset
            guard residueLinesize > 0 else {
                self.currentRender = nil
                continue
            }
            let framesToCopy = min(numberOfSamples, residueLinesize)
            let bytesToCopy = framesToCopy * MemoryLayout<Float>.size
            let offset = currentRenderReadOffset * MemoryLayout<Float>.size
            for i in 0 ..< min(ioData.count, currentRender.dataWrap.data.count) {
                (ioData[i].mData! + ioDataWriteOffset).copyMemory(from: currentRender.dataWrap.data[i]! + offset, byteCount: bytesToCopy)
            }
            numberOfSamples -= framesToCopy
            ioDataWriteOffset += bytesToCopy
            currentRenderReadOffset += framesToCopy
        }
        let sizeCopied = (Int(numberOfFrames) - numberOfSamples) * MemoryLayout<Float>.size
        for i in 0 ..< ioData.count {
            let sizeLeft = Int(ioData[i].mDataByteSize) - sizeCopied
            if sizeLeft > 0 {
                memset(ioData[i].mData! + sizeCopied, 0, sizeLeft)
            }
        }
    }

I've followed getAudioOutputRender and I believe the Audio Frame is first generated here:

let result = avcodec_receive_frame(codecContext, coreFrame)
            if result == 0, let avframe = coreFrame {
                var frame = try swresample.transfer(avframe: filter?.filter(inputFrame: avframe) ?? avframe)
                frame.timebase = packet.assetTrack.timebase
                frame.duration = avframe.pointee.pkt_duration
                frame.size = Int64(avframe.pointee.pkt_size)
                if packet.assetTrack.mediaType == .audio {
                    bestEffortTimestamp = max(bestEffortTimestamp, avframe.pointee.pts)
                    frame.position = bestEffortTimestamp
                    if frame.duration == 0 {
                        frame.duration = Int64(avframe.pointee.nb_samples) * Int64(frame.timebase.den) / (Int64(avframe.pointee.sample_rate) * Int64(frame.timebase.num))
                    }
                    bestEffortTimestamp += frame.duration
                } else {
                    frame.position = avframe.pointee.best_effort_timestamp
                }
                delegate?.decodeResult(frame: frame)

Hello! I think I have the same exact problem, were you able to fix this issue?