Learn how to quickly and easily create Core ML models capable of classifying the sounds heard in audio files and live audio streams. In addition to providing you the ability to train and evaluate these models, the Create ML app allows you to test the model performance in real-time using the microphone on your Mac. Leverage these on-device models in your app using the new Sound Analysis framework.
My name's Dan Klingler.
And, I'm a software engineer on
the audio team here at Apple.
And, today I'm really excited to
talk to you about training sound
classification models in Create
So, before we start, we might
want to ask the question what is
sound classification, and how
might it be useful in your
Sound classification is the task
of taking a sound, and placing
it into one of many categories.
But, if you think about it,
there are many different ways
that we can categorize sound.
One way might be to think about
the object that makes the sound.
So, in this example, we have the
sound of guitar, or the sound of
And, it's really the acoustical
properties of that object which
are different, and allow us as
humans to disambiguate these
But, a second way we could think
about sound classification is
where the sound comes from, and
if you've ever gone on a hike,
or been in the middle of a busy
city, you'll understand that the
texture of the sound around you
is very different, even though
there's not one particular sound
that stands out, necessarily.
And, a third way we could think
about sound classification, is
by looking at the attributes of
the sound, or the properties of
And so in this example, a baby's
laugh versus a baby's cry.
Both come from the same source,
but the properties of the sound
are very different.
And, this allows us to tell the
difference between these sounds.
Now, as app developers, you all
have different apps, and you
might have your different use
case for sound classification.
And, wouldn't it be great if you
could train your own model,
tailor-made for your own
application's use case?
And, you can do that today using
the Create ML app, which is
built right into Xcode.
This is the simplest way to
train a sound classifier model.
To train a sound classifier,
you'll first provide labeled
audio data to Create ML, in the
form of audio files.
Then, Create ML will train a
sound classifier model on your
And then, you can take that
sound classifier, and use it
right in your application.
And, I'd love to show you this
process in action today, with a
So, to start, I'm going to
launch the Create ML app, which
you can find bundled with your
We're going to be creating a new
document, and select Sound from
the template chooser.
We'll click Next, and name our
Let's save this project to our
Once the Create ML app launches,
you'll see this home screen, and
the Input tab is selected on the
This is where we'll provide our
training data to the Create ML
app, in order to train our
You'll see some other tabs
across the top here.
Like Training, Validation, and
And, these allow us to view some
statistics on the accuracy of
our model for each of these
stages of training.
And, finally, the Output tab is
where we'll expect to find our
model after it's been trained.
And, we can also interact with
our model in real time.
Now, today, I'm going to be
training a musical instrument
And, I've brought some
instruments with me that we can
I have a TrainingData directory
that we can open, and take a
look at some of the sound files
These contain recordings from an
acoustic guitar, for example, or
cowbell, or shaker.
To train our model, all we need
to do is drag this directory
straight into Create ML.
Create ML has figured out that
we have a total of 49 sound
files we're going to be using
And, that spans 7 different
All we need to do is press the
Train button, and our model will
Now, the first thing Create ML
is going to be doing when
training this model is walking
through each of the sound files
we provided, and extracting
audio features across the entire
And, once it's collected all
these audio features, it will
begin the process you're seeing
now, which is where the model
weights are updating
As the model weights are
updating, you can see that the
performance is increasing, and
our accuracy is moving towards
100%, which is a good sign that
our model is converging.
Now, if you think about the
sounds we've collected today,
they're fairly distinct, like
cowbell and acoustic guitar
sound fairly different.
And so, this particular model
we've trained is able to do a
really good job with the sounds,
as you can see, on both the
training and the validation
The testing pane is a good place
to provide a large data set that
you might have for benchmarking.
The Create ML app allows you to
train multiple models at the
And, potentially provide
different sets of training data.
And so, the testing pane is a
great place if you want to
provide a common benchmark for
all those different model
configurations you're training.
Finally, as we make our way to
the Output tab, you'll see a UI
that shows how we can interact
with our model.
Now, I've collected one other
file that I didn't include in my
And, I've placed this in the
When I drag that directory into
the UI, you can see it
recognizes a file called
As we scroll through this file,
it appears that Create ML has
classified the first second or
so of this file as background
Then, speech for the next couple
seconds, and finally shaker.
But, let's find out if we agree
with this classification.
And, we can listen to this file
right here in the UI.
Test, 1, 2, 3.
[ Shaker Playing ]
So, it seems at least on the
file that we've collected, that
this model seems to be
Now, what would be even better,
is if we could interact with
this model live.
And, to do that, we've added a
button here that has Record
And, once I begin recording, my
Mac will begin feeding the
built-in microphone data into
the model we've just trained.
So, what you can see is, anytime
I'm speaking, the model's
recognizing speech with high
And, as I quiet down, you can
see the model settle back down
into a background state.
And, I brought a few instruments
with me so that we can play
along and see if the model can
Let's start with a shaker.
[ Playing Shaker ]
I've also brought my trusty
[ Playing Cowbell ]
Well, the people have spoken.
There you have it.
[ Playing Cowbell ]
And, I also brought my acoustic
guitar with me here today, so we
can try some of this as well.
I can start with some
[ Playing Guitar ]
And then, we can try some chords
[ Playing Guitar ]
So, that seemed to be working
pretty well, and I think that's
something we can work with.
So, I can stop the recording
And, in the Create ML app, I'm
able to scroll back through this
recording, and take a look at
any of the segments that we've
been analyzing so far.
This might be a great place to
check if there are any
anomalies, or things that it
didn't get correct, and maybe we
can clip some parts of this file
to use, as part of our training
set to improve the performance
of our model.
And, finally, when we're happy
that our model is performing the
way we'd like, we can simply
drag this model to our desktop,
where we can integrate it into
And, that's training a sound
classifier in the Create ML app
in under a minute, with zero
lines of code.
So, you saw during the demo,
there's some things to consider
when collecting your training
And, the first thing you'll
notice is how I collected this
data in directories.
All the sounds that come from a
guitar are placed in the Guitar
And, likewise with a file like
Drums or Background.
Now, let's talk about the
background class for a minute.
Even though we're training a
musical instrument classifier,
you still need to consider what
might happen if there's not any
musical instruments being
played, and if you only trained
your model on musical
instruments, but then fed it
That's data it's never seen
And so, make sure that when
you're training a sound
classifier, if you expect your
model to work in situations
where there's background noise,
to provide that as part of the
class as well.
Now, suppose you had a file that
was called sounds.
And, the file started at the
beginning with drums, and then
transitioned to background
noise, and then finally ended
This file, as is, is not going
to be useful for dragging
directly in the Create ML app.
And, that's because this sound
contains multiple sound classes
in one file.
Remember, you have to use
labeled directories to train
your model, and so the best
thing to do in this situation
would be to split this file into
three, and name them drums,
guitar, and background.
This is going to have a lot
better performance when training
your model, if you split your
files this way.
A few other considerations when
collecting audio data.
First, we want to insure that
the data you're collecting
matches a real-world audio
So, remember that if your app is
intended to work in a variety of
rooms or acoustic scenarios, you
can either collect data in those
acoustic scenarios, or consider
even simulating those rooms,
using a technique called
Another important thing to
consider is the on-device
You might check out AV audio
session modes to select
different modes of microphone
processing in your application,
and select the one which is best
suited for your app, or
potentially matches the training
data you've collected.
And a final point, is to be
aware of the model architecture.
So, this is the sound classifier
model, and it can do pretty well
at classifying varieties of
But, this is not something that
would be suitable for, say,
training a full-on speech
There are better tools for that
So, make sure you're always
using the right tool for the
So, now you have this ML model,
and let's talk about how you can
integrate it into your
And, to make it as easy as
possible to run sound
classification models in your
app, we're also releasing a new
framework called SoundAnalysis.
SoundAnalysis is a new
high-level framework for
It uses Core ML models, and it
handles common audio operations
internally, such as channel
mapping, sample rate conversion,
And, let's take a look under the
hood to see how SoundAnalysis
Now, the top section represents
And, the bottom represents
what's happening under the hood
The first thing you'll do is
provide the model you just
trained using Create ML to
Then, your application will
provide some audio that needs to
This audio will first hit a
channel-mapping step, which
ensures that if your model
expects one channel of audio,
like ours did here, that that's
what's delivered to the model,
even as a client, if you're
delivering stereo data, for
The next step that happens is
called sample rate conversion.
The model we trained natively
operates on 16 kilohertz audio
And, this ensures that the audio
that you provide gets converted
to match the rate the model
The final step that
SoundAnalysis performs is an
audio buffering operation.
Most of the models we're working
with today require a fixed
amount of audio data to process
an analysis chunk.
And, oftentimes, the audio that
you have as a client might be
coming in at arbitrary sized
And, it's a lot of work to
implement an efficient ring
buffer that makes sure to
deliver the correct size chunks
of audio to your model.
And so, this step ensures that
if the model expects around 1
second of audio data, that that
will always be what's delivered
to the model.
And then, finally, after the
data's delivered to the model,
your app will receive a
callback, containing the top
classification results for that
piece of audio.
Now, the good thing is, you
don't really have to know any of
Just remember to take your
audio, provide it to
SoundAnalysis framework, and
then handle the results in your
So, let's talk a little bit more
about the results you'll expect
to get from SoundAnalysis.
Audio is a stream, and it
doesn't always have a beginning
and end, like images do.
And, for this reason, the
results that we're working with
might look a little different.
Your results contain a time
range, and this corresponds to
the block of audio that was
analyzed for that result.
In this example, the block size
is specific to model
architecture, and is around 1
second, as you can see.
As you continue providing audio
data to the model, you'll
continue to receive results
containing the top
classifications for that block
of audio that you analyzed.
Now, you might notice that this
second result has overlapped the
previous results by about 50%.
And, this is actually by design.
You want to make sure that every
piece of audio that you're
providing has the opportunity to
fall near the middle of an
Otherwise, it might fall between
two analysis windows, and the
model performance might not be
And so, the default is 50%
overlap on the analysis,
although it's configurable in
the API, if you have a use case
that requires otherwise.
And, as you continue providing
audio data, you'll continue to
And, you can continue pushing
this data, and getting results
for as long as the audio stream
Now, let's take a quick look at
the API provided by
Let's say we have an audio file,
and we want to analyze it using
the classifier we've just
trained here today.
To start, we'll create an audio
file analyzer, and provide the
URL to the file we'd like to
Then, we'll create a
classifySoundRequest, and then
which is the model we trained
Then, we'll add this request to
our analyzer, and provide an
observer, which will handle the
results that the model will
Finally, we'll analyze the file,
which will start scanning
through the file and producing
Now, on your application side,
you'll need to make sure that
one of your classes implements
the SNResultsObserving protocol.
This is how you'll receive
results from the framework.
The first method you might
implement is request didProduce
This method will be called many
Once for each new observation
You might consider grabbing the
top classification result, and
the time range associated with
And, this is where the logic
would go in your application, to
handle the sound classification
Another method you'll be
interested in is request
If analysis fails for any
reason, this method will be
And then, you shouldn't expect
to receive any more results from
Or, if the stream completes
successfully, at the end of the
file, for example, you'll
receive the request didComplete.
So, let's recap what you've seen
You saw how you can train a
sound classifier in Create ML
using your own audio data.
And, take that model and run it
on-device using SoundAnalysis
For more information, check out
our sound classification article
And, there you'll find an
example of how to perform sound
classification on your device's
built-in microphone, using AV
Audio Engine, just like the
musical instrument demo you saw
Thank you all for listening, and
I can't wait to see how you use
sound classification in your
Looking for something specific? Enter a topic above and jump straight to the good stuff.
An error occurred when submitting your query. Please check your Internet connection and try again.