shazamkit

This documentation is intended for Android Developers that wish to integrate audio recognition to their applications.

Set up

ShazamKit comes in the form of an Android Archive (AAR) file. Once downloaded, place the file in the libs directory in the root of your project. You may need to create the directory if it does not already exist. In order for Gradle to recognize any dependencies coming from the libs directory, the following snippet needs to be included in the top level build.gradle file:

allprojects {
repositories {
flatDir {
dirs 'libs'
}
}
}

Lastly, specify the dependency in your app/build.gradle file within the dependencies block along with the Kotlin Coroutines, OkHttp and Retrofit libraries which are used by ShazamKit like so:

dependencies {
implementation(name: "shazamkit-android-release", ext: "aar")
implementation 'org.jetbrains.kotlinx:kotlinx-coroutines-core:1.4.1'
implementation 'org.jetbrains.kotlinx:kotlinx-coroutines-android:1.4.1'
implementation 'com.squareup.okhttp3:okhttp:4.9.0'
implementation 'com.squareup.retrofit2:retrofit:2.9.0'
implementation 'com.squareup.retrofit2:converter-gson:2.9.0'
}

For more information on how to include an AAR file in your project, see the Android Developers documentation.

Basic audio recognition using Session

The following snippet performs basic audio recognition on pre-recorded audio data using the ShazamCatalog.

val signatureGenerator = (ShazamKit.createSignatureGenerator(AudioSampleRateInHz.SAMPLE_RATE_48000) as Success).data

signatureGenerator.append(bytes, meaningfulLengthInBytes, System.currentTimeMillis())
val signature = signatureGenerator.generateSignature()

val catalog = ShazamKit.createShazamCatalog(developerTokenProvider, selectedLocale.value)
val session = (ShazamKit.createSession(catalog) as Success).data
val matchResult = session.match(signature)

The recognition is performed as soon as a Session is passed a Signature to match. The snippet uses the SignatureGenerator in order to process a given recorded audio into a Signature.

Depending on the Sample rate of your recorded audio, you might want to adjust the com.shazam.shazamkit.AudioSampleRateInHz accordingly.

Note that in order to use the ShazamCatalog you need to have an Apple Developer token, which you need to provide using your own DeveloperTokenProvider.

Apps that intent to use the Shazam Catalog will require Internet access. Make sure to include the INTERNET permission in your application's Manifest file.

Error handling

All examples in this documentation are handling happy cases only. In a real-world application developers would want to carefully handle any case where a Failure might be returned from a ShazamKit operation. A convenient way of doing so is by using Kotlin's when to process the output ShazamKitResult.

Continuous audio recognition using StreamingSession

ShazamKit supports continuous audio recognition by flowing audio into StreamingSession. Here is how it looks like:

val catalog = (ShazamKit.createShazamCatalog(developerTokenProvider) as Success).data
val currentSession = (ShazamKit.createStreamingSession(
catalog,
AudioSampleRateInHz.SAMPLE_RATE_48000,
readBufferSize
) as Success).data

coroutineScope.launch {
// record audio and flow it to the StreamingSession
recordingFlow().collect { audioChunk ->
currentSession?.matchStream(
audioChunk.buffer,
audioChunk.meaningfulLengthInBytes,
audioChunk.timestamp
)
}
}

coroutineScope.launch {
currentSession?.recognitionResults().collect { matchResult ->
println("Received MatchResult: $matchResult")
}
}

Audio recognition using a CustomCatalog

Developers can provide their own catalog instead of using the default Shazam Catalog. A CustomCatalog can be used both in a Session or a StreamingSession when using ShazamKit.createSession() or ShazamKit.createStreamingSession() respectively. There are no limitations on how or where you can store your Custom Catalog files.

The catalog in the following snippet is loaded from a local Uri retrieved using the ACTION_OPEN_DOCUMENT Intent action:

val inputStreaming = contentResolver.openInputStream(uri)
val customCatalog = ShazamKit.createCustomCatalog()
.apply { addFromCatalog(inputStreaming) }

val session = (ShazamKit.createSession(catalog) as Success).data
val matchResult = session.match(signature)

Custom catalogs support:

  • Timed Media Items: Implement exact audio matching by specificing when an event starts and stops by specifying time ranges. For a deeper overview please refer to https://developer.apple.com/videos/play/wwdc2022/10028/

  • Frequency Skew Ranges: The range specifies as a percentage how much the audio differs from the original. A value of zero indicates the audio is unskewed and a value of .01 indicates a 1 percent skew. For a deeper overview please refer to https://developer.apple.com/videos/play/wwdc2022/10028/

Requirements

Changelog

  • ShazamKit 2.0

    • Adds “Timed Media Items"

    • Adds "Frequency Skew Ranges"

  • ShazamKit 2.0.1

    • Updates link to "Create a media identifier" webpage

  • ShazamKit 2.0.2

    • Lowers the "minSdkVersion" to 21

Record from Microphone

Developers can provide the SDK with audio obtained from any source as long as the audio format is PCM 16bit MONO in one of the following sample rates: 48000Hz, 44100Hz, 32000Hz, 16000Hz. For more details see ShazamKit.createStreamingSession() or ShazamKit.createSignatureGenerator().

Here is an example you can use as starting point for audio recording in Android:

    @RequiresPermission(Manifest.permission.RECORD_AUDIO)
@WorkerThread
private fun simpleMicRecording(catalog: Catalog) : ByteArray{
val audioSource = MediaRecorder.AudioSource.UNPROCESSED

val audioFormat = AudioFormat.Builder()
.setChannelMask(AudioFormat.CHANNEL_IN_MONO)
.setEncoding(AudioFormat.ENCODING_PCM_16BIT)
.setSampleRate(48_000)
.build()

val audioRecord = AudioRecord.Builder()
.setAudioSource(audioSource)
.setAudioFormat(audioFormat)
.build()

val seconds = catalog.maximumQuerySignatureDurationInMs

// Final desired buffer size to allocate 12 seconds of audio
val size = audioFormat.sampleRate * audioFormat.encoding.toByteAllocation() * seconds
val destination = ByteBuffer.allocate(size)

// Small buffer to retrieve chunks of audio
val bufferSize = AudioRecord.getMinBufferSize(
48_000,
AudioFormat.CHANNEL_IN_MONO,
AudioFormat.ENCODING_PCM_16BIT
)

// Make sure you are on a dedicated thread or thread pool for mic recording only and
// elevate the priority to THREAD_PRIORITY_URGENT_AUDIO
Process.setThreadPriority(Process.THREAD_PRIORITY_URGENT_AUDIO)

audioRecord.startRecording()
val readBuffer = ByteArray(bufferSize)
while (destination.remaining()>0) {
val actualRead = audioRecord.read(readBuffer, 0, bufferSize)
val byteArray = readBuffer.sliceArray(0 until actualRead)
destination.putTrimming(byteArray)
}
audioRecord.release()
return destination.array()
}

private fun Int.toByteAllocation(): Int {
return when (this) {
AudioFormat.ENCODING_PCM_16BIT -> 2
else -> throw IllegalArgumentException("Unsupported encoding")
}
}

fun ByteBuffer.putTrimming(byteArray: ByteArray) {
if (byteArray.size <= this.capacity() - this.position()) {
this.put(byteArray)
} else {
this.put(byteArray, 0, this.capacity() - this.position())
}
}

For further details you can seek more information in Android Developers Documentation.

Packages

com.shazam.shazamkit
Link copied to clipboard