Drawing Classification and One-Shot Object Detection in Turi Create
Apple's open source toolset, Turi Create, recently added tasks for Core ML model creation including Drawing Classification and One-Shot Object Detection. Learn how to quickly use these capabilities in your apps as well as new techniques for visualizing and evaluating the performance of your custom models.
Welcome to our session on Drawing Classification and One-Shot Object Detection in Turi Create.
My name is Sam and I'll be joined by my colleagues, Shantanu and Abhishek to talk to you about what we've been building to enable even richer experiences in your apps. Let's dive right in.
First, a quick refresher on Turi Create itself.
Turi Create is a Python library used for creating Core ML models.
It has a simple, easy to use task-oriented API.
Now, tasks are our term for a collection of complex, machine learning algorithms abstracted away into a simple, easy to use solution for you to build your models.
It's cross-platform, working on both Mac OS and Linux and we're extremely proud to be open source, working closely with our community on features and fixes.
We built Turi Create to enable you to create Core ML models for your intelligent applications.
Now to build these models, we integrate with a various of system frameworks, accepting data from frameworks like PencilKit and Core Motion.
Once we've built our model, we integrate with other system frameworks like Vision and Sound Analysis to create extremely compelling app experiences.
So say you'd want to build an application used to classify musical instruments.
Using microphone data, you can understand what instrument is being played and attempt to tune the instrument. Or say you'd want to use raw sensor data from the Apple Watch to detect what activities a user may be performing in the gym. Or say you'd want to use image data to be able to detect what ingredients a user may be holding and to suggest recipes using these ingredients.
All of these are possible because we have built an intelligent Core ML model.
And to build these models, we've identified a five-step pipeline to do so.
First up in our pipeline, we need to identify the task or what problem it is that we're trying to solve.
Next, we need to collect data and sometimes, lots of it.
After that, we train our model.
Next, we evaluate our model, attempting to understand the real-world performance characteristics and to potentially offer corrections.
After that, we deploy our model for use within Core ML and your application.
Now let's take a look at these five steps in code.
After we import turicreate we load our data from an SFrame on disk. An SFrame is something that Shantanu will cover in a little bit.
After this, we split our data into two sets, one used for testing and one used for training our model.
Next, we create our model with just a single line of code for running complicated machine learning algorithms.
Next, we gather metrics as to the performance of this model. And finally, we save it to disk for use within XCode and your app.
Now we think one of the great things about Turi Create is its simple and consistent API.
So no matter whether you're using an object detector, a sound classifier, an activity classifier, or any of our other tasks, the API remains consistent and easy to use. Now I've mentioned these tasks multiple times and I'd like to cover the full collection that Turi Create offers. We offer tasks that work with images and sound as well as user activities and text. We offer tasks that work with user preferences and numeric data.
And today we're extremely excited to share with you two brand new tasks. First, we have our One-Shot Object Detection task.
Now you may already be familiar with object detection.
Object detection is the ability to detect the presence and location of specific objects within a frame.
One-Shot Object Detection is a twist on this existing framework, which depending on the type of data that you're attempting to detect, can dramatically reduce the amount of data needed to train a model.
Next, we have our Drawing Classification task.
Using user input in the form of a finger on screen or the Apple Pencil, you can now attempt to classify and understand what your users have drawn into your app.
Now I'd like to take a deep dive into the One-Shot Object Detection task.
Say you'd want to build an application to take an image of a player's hand in a game of poker and to offer probabilities that this hand would occur.
Or say you'd want to build an application to take a picture of your car's dashboard and to attempt to understand what those obscure icons may mean.
Or say you'd want to build an app to detect company logos to provide your users with more information about this company.
What do all three of these applications have in common? They're all trying to detect 2D regular images.
One-Shot Object Detection is able to take advantage of these characteristics of regularity and consistency to dramatically reduce the amount of data needed to build an accurate model.
So let's use this card counting app, which Shantanu will build for us in just a minute, as a case study into how this works.
Originally in order to build an object detection model, you would need to collect vast amounts of data of your object in the real world.
In addition to this, you would also need to annotate each of these images with the location in the frame of your image.
As you can see, this is an incredibly data intensive and time-consuming task. So we built a better way.
With One-Shot Object Detection, you now need as little as one image per class that you're attempting to detect.
So instead of dozens of images of the Ace of Clubs, you now need a single image in order to build a reliable and accurate model.
So let's talk about how this works behind the scenes.
This works with a process called Synthetic Data Augmentation.
Turi Create ships with a collection of images of the real world.
We then take your source image that you have supplied and overlay onto these real-world images along with various color perturbations and distortions to simulate real world effects.
In addition to this, there's absolutely no need to annotate these images as our algorithms do that for you automatically.
Now One-Shot Object Detection works with 2D objects only.
It requires very little data and absolutely no annotation.
However, if you're wanting to detect an object that is potentially 3D or more irregular, you may be interested in our more traditional object detection framework.
Now let's take a look at this in code.
As you'd expect, our API is incredibly consistent with our other tasks using just a single line of code with our dot create method to train an incredibly accurate model. And now with code up on the screen, I'd like to invite my colleague, Shantanu to talk to you about our Three Card Poker App and to give you a demonstration of this in the real world. Shantanu.
Good morning everyone. Welcome to the last day of WWDC. Now I'm super excited to give you a first look into the One-Shot Object Detector.
The thing is, Sam and I play a lot of three card poker at work.
And to be honest, I'm getting a little too tired of winning all the time.
So I thought, you know today on stage, why don't we build an app for Sam that helps him, that tells him what his hand is, and you know, helps him make a better decision.
Now to build an app like that, we'd need a Core ML model that can detect all the cards in your playing deck.
So why don't we just go ahead and write some code to build that? Okay, so here I have a blank Jupyter Notebook which is an interactive Python environment to write some code and immediately see the output of your code after that. And we're going to use that to write some code today.
Now the first line of code that we write, and this is our favorite line of code, it's import turicreate as tc.
And I need to select the cell before I write that. And this will be your favorite line too after this session. Next, I'm going to switch to Finder to take a look at all our starter images and it looks like we have a bunch of cards here where the playing card spans the entire frame of the image. We have an Ace of Clubs, an Ace of Diamonds, Ace of Hearts, Five of Hearts, and we have a lot of images here.
So why don't we go ahead and load this up into an SFrame? An SFrame is Turi Create's scalable data structure. It's tabular and it's to represent all the data that you have.
So we're going to create a variable called starter images and Turi Create has this functionality to load images, where you just pass in the -- directly that contains all your images and it builds an SFrame out of that on its own.
Now let's take a look at what the starter image SFrame looks like.
Alright, so there are two columns in this SFrame. There's a path column and an image column. The path column contains the relative path to the image that is in the image column.
Looks good but the path column has this redundant .
/ and .png. So let's write a quick code snippet to get rid of that.
So we're going to create another column in the starter image's SFrame and call that label and we're going to write some code to extract the label out of the path. And we're going to use something like a Python slice applied to all the elements in the SFrame -- in the column of the SFrame.
So we do an element slice from the second index to the negative four index similar to a Python slice.
And next, let's explore this SFrame.
Turi Create has an explore functionality on the SFrame, which we're going to take a look at.
This pops up, a new visualization window, which gives us a way to interactively explore all the data that is in our SFrame.
And as we can see, there's three column here. There's a path, image, and label.
And it looks like we did successfully extract the label out of the path. We did get rid of the . / and .png.
We can also take a look at all the thumbnails of our images in this SFrame which we one day will do when we just printed it out.
Great. How many rows are in this SFrame? Let's take a look at that. So it looks like there are 52 rows and that's the number of cards in our playing card deck. So that looks good. But the important thing to note here is that it's one image per class of the object that we're trying to detect.
Now we looked at the data. We've looked at the task. We know the task is One-Shot Object Detection.
The next step in our five-step biplane is to take a look at the model creation process.
And consistent with all our other toolkits, the model creation process looks something like tc.oneshotobjectdetector.create and we pass in the starter images SFrame as well as the target column, which is in this case, is label.
But I won't run this line of code because this take a while and what it does behind the scenes is it not only generate all the synthetic data by applying different projections on your starter images, superimposing them on random backgrounds and applying color perturbations, but it also trains a model after that.
So instead, we're going to switch to a Cooking Show Mode, where we're going to load a model that we've already baked in the oven sometime.
So let's use the load model functionality and it's called carddetector.model. Let's see what this model looks like.
So it looks like it's a One-Shot Object Detector trained on 52 classes.
But hold on. The number of synthetically generated examples is 104,000. That's strange. We just provided 52 images.
The toolkit automatically synthetically generated 2,000 images per class and created a training set of 104,000. And that's not it.
It also synthetically generated bounding boxes for all the data that it generated because the toolkit generated the data itself.
That's pretty insane.
So we've looked at our model. The next step is let's small test this model on an image that the model hasn't seen before.
So I'm going to load up a test image. And I have a test directory which contains a test image.
And it's definitely not Sam's hand and you'll see why that it is when I show you what this test image is.
Yeah, Sam never gets this lucky. This is probably my hand. It's a Flush.
So anyway, we can see that there's an Eight of Clubs, a Nine of Clubs, and a Ten of Clubs in this image.
Let's see if the model agrees with us.
So we're going to call the predict method on the model and pass in our test image. And let's save this result in variable called prediction.
And next, let's print this prediction out. Okay. Let's take that in.
So it looks like there are three dictionaries in this list and each dictionary represents a bounding box that is part of the prediction.
Let's go through the fields in each prediction. It looks like there's a confidence, there's a coordinates, a label, and a type.
Confidence represents how confident the model is in this bounding box.
Coordinates represents the coordinates of the bounding box.
Label is what class label belongs to this bounding box. And type is what type of bounding box it is, which it's rectangle. That looks great.
And we can see that the model did detect the Ten of Clubs and Eight of Clubs and the Nine of Clubs.
But it doesn't really tell us what location the model detected it in.
So let's use a utility that the One-Shot Object Detector has to draw bounding boxes and we'll pass in the test image along with annotations, which in this case, is the prediction.
And let's call this annotated image.
And next, let's print this annotated image out.
Yeah that looks pretty good.
On an image that the model has never seen before, an Eight of Clubs with 70 percent confidence, a Ten of Clubs with 91 percent, and Nine of Clubs at 69 percent.
And notice how all the bounding boxes perfectly capture all the clubs.
So we've small tested this model on a test image that the model never saw before.
The last step is deployment on device.
And like all our other toolkits, the model has a method to export Core ML.
And let's call this Core ML model CardDetector.mlmodel.
And next, let's open this CardDetector.mlmodel in XCode. Alright. So we can see that we have a CardDetector model which is trained by Turi Create 5.6.
The inputs are image, which is a 416 x 416 color image and confidence is representative of how many classes the model was trained on and coordinates represents the coordinates of the bounding box of the classes.
This looks good. It looks like a very good Core ML model.
But why don't we have some fun and put this into action in an app that Sam can use next time he plays three card poker against me.
Alright. And here's my Three Card Poker app. So it says add an image to evaluate cards. Here I have a deck of cards and I'm just going to randomly draw from this. And this will be Sam's hand.
So let's take a photo right here.
So it looks like there's -- this looks like one of the best hands Sam has gotten in a while.
Alright. So it looks like the model thinks there's a Five of Clubs, a Three of Hearts, and a King of Hearts.
And it also detected it was a hard card. The probability of which is 74.39 percent.
Yeah. Now given that it's a high card, Sam should probably fold because it's pretty likely to get a high card. But just for fun, let's see what I got.
Yeah, that's -- well let's take a picture and then talk about my hand.
I have a Flush. Nine of Hearts, Six of Hearts, and Four of Hearts. So even though we built this app for Sam, we do sort of -- I'm still reaping the benefits of it.
Alright. So that was it for the Card Detector Demo. And to sort of recap, we started with one image per class.
We just called one line of code that automatically, synthetically generated a training data set for us. Trained a model.
We then small tested it on a random image the model had never seen before. And we then saw it in an app. And this is just one example of the kind of user experience that you can build.
We were thrilled with the sort of stuff you built last year with the Object Detector and that sort of led us to build this toolkit because we wanted to reduce your overhead on data collection and annotation.
That time has gone down from like, I don't know, a few days to like five minutes and only for 2D objects though.
And we can't wait to see what you build with the One-Shot Object Detector.
So feel free to come to our labs and try this demo out live. We will be really excited to have you. And with that, I'm going to move onto to our next toolkit for today. I'm really excited to talk to you about the Drawing Classifier.
So far Turi Create has enabled you to build apps with sound input, raw sensor data input, and images.
Today, we're adding a new form of input, the Apple Pencil.
Imagine building an app that let's your users create handsome greeting cards by recognizing their strokes and automatically replacing them with professional drawings.
Power to the user.
Or imagine building an app that breaks ground in education by helping professors grade their student's exams much faster by recognizing their strokes.
We're actually going to build this app. But before that, let's talk about the data.
The Drawing Classification task can accept bitmap-based drawings, represented as Turi Create images, but that's not all. The task can also accept stroke-based drawings, where every stroke is a time series data of x and y coordinates.
We wanted to give you as much flexibility in your data as possible. Model creation follows an API consistent with all our other toolkits, turicreate drawingclassifier.create with the training SFrame and the target column. The model that you get out of here is state of the art.
The Core ML model that you get is less than 500 kilobytes on device.
And this is GPU accelerated.
But we didn't stop there.
We trained a model on millions of drawings and published it to automatically one start your models. And not only does this help you create more accurate models, it also helps you train on as few as 30 drawings per class.
Thirty. Let that sink in. So what does the code look like? Five lines of code, five steps.
We start with importing turicreate. Looking at our data, calling model creation, evaluating our data, and then exploring to Core ML. And with that, remember that grading app that we saw which was like recognizing checks and crosses and doing some more fancy stuff? Why don't we go ahead and build that? So we're going to take a look at this app on the iPad, which helps us grade exams.
So let's say I'm a professor and I have my Apple Pencil and I want to grade my students' exams, but I want to grade them really quickly.
So let's add an image to begin grading. This is an exam of my student, Jane Appleseed and it's Math 101, which is pretty hard.
So let's take a look at the image and now let's start grading.
So one plus one is two, right? Last I checked that was still true.
So let's mark that as correct. And as you can see, the model detected the check and recorded ten points on the right-hand side, giving the student ten out of ten on this question.
One plus two is four. That's correct too. Oh I guess it's not. So erase that. I think I should go back to Math 101 maybe.
So let's mark that incorrect because the model made a mistake. So let's erase that again.
And to give you a preview into how you can improve your models, Abhishek is going to be up on stage in a few minutes.
And this app is also designed to take care of models failing. So if you want, you can check out the 11:00 AM session on Designing Apps for Machine Learning.
So let's keep grading.
So one plus two is four. That's incorrect. And the model detected the cross as incorrect. And it's a pretty serious mistake for Math 101 in college.
So let's give that a negative seven. And as you can see, the model registered the negative seven and gave the student three out of ten points on the right-hand side.
Three plus two is five. That's correct.
To give you an insight into what the model is that powers this app, this model is trained on check, cross, and ten negative scored deductions from negative one through negative ten.
So let's keep grading.
One plus 99 is.
It's a hundred, okay.
So that's incorrect. And that's another mistake the model made.
So let's correct that and mark that incorrect.
So let's give that maybe a negative five and the model registered the negative five and stored it on the right-hand side. And this last question is, it's hard but I don't know how the student this right, but they got one plus two wrong.
Anyway, let's give them a correct for this.
Okay. So this is the kind of experience that you can build with the Drawing Classifier. And why don't we go ahead and write some code to actually build a Core ML model that powered this app.
So here I have another blank Jupyter Notebook and we're going to go through that five-step pipeline again.
The first step being, the task. The task here is Drawing Classification. We've established that. So the next step is the data.
But before that, let's write our favorite line of code which is import turicreate as tc.
Now to look at the data, we're going to load up an SFrame of some data that we've already accumulated.
Let's call it train. And it's called train.sframe.
Next, let's use the explore functionality on the train SFrame and see what it looks like. Great. So it looks like we have a lot of data here. We have a negative three. That's labeled as a negative three.
A check labeled as a check.
Again, this SFrame has two columns, bitmap and label.
And for this demo, we're going to be using bitmap-based drawings, but you can also have stroke-based drawings instead of the bitmap column.
So we have a bunch of drawings here. If we scroll through it, we have a cross, another check, some negative score deductions.
Looks pretty good. How many examples are there, 4,436? Okay, that's quite a few examples. You might be wondering, "Well how did you collect all that data. How did you get all that data?" So three of us got into a conference room with iPads and Pencils and we just drew these strokes on our iPads for about an hour and then just converted it to an SFrame.
So data collection overhead for this toolkit is also pretty low.
So now that we've looked at our data, let's go ahead and look at what the model creation process looks like.
So we have drawingclassifier.create. We pass in the training SFrame and we pass in the target column, which in this case, is label.
And I'm also going to pass into a max iterations perimeter of one to just give you a feel of what the training process looks like.
Behind the scenes what this is doing is it's resizing all our images down to 28 x 28 and then parsing them through the neural network for one iteration.
And looks like that's done. But this model is not going to be good enough for our use case.
So again, let's switch to Cooking Show Mode and open the door of the oven and take out a model from it.
So we have the load model functionality and we load up the grading model.
Alright. So if we take a look at this model, we have a Drawing Classifier model that's trained on 12 classes with the feature column being bitmap, the target column being label, and it's trained for 200 iterations and only for 30 minutes.
So after training for only 30 minutes we got a model with a pretty high training accuracy. That's pretty good. So now that we have the model, the next step is to evaluate this model.
But I'm going to skip that because my colleague, Abhishek will be up here in a few minutes and not only will he show us how to visualize the model's mistakes, but he'll also show us how to establish a feedback loop of the model and improve it.
So let's skip ahead to exporting to Core ML.
And consistent with all our other toolkits, we have an export coreml method on the model and it's not CardDetector, it's Grading.
So let's call it Grading.mlmodel and then let's take a look in XCode.
Great. So this model has bitmap as its input which is a grayscale image of size 28 x 28 and the outputs are label probabilities and class label.
Label probabilities being the probabilities of all the classes that the model was trained on and class label being the class label of the top prediction.
So this looks good.
We worked really hard to bring the size of the Core ML down as much as we could because we wanted this model to be as portable as possible for your use case. And we brought it down to 396 kilobytes. That's kilobytes.
So now that we've taken a look at the Core ML model and we've taken a look at what the final app that we had was, let's also take a look at what the Swift code you need to write to integrate this model into your apps.
PencilKit released their API this past week and they have a Peek at Canvas view, which has a Peek at Drawing, which is the drawing data model.
You can extract the drawing out of the Canvas view as a UI image using that line of code. And next, you can crop it using again, API from PencilKit to crop it down to the bounds of the drawing where the user drew their strokes.
And now that you have this cropped, your image that is ready to be consumed by the Core ML model, you can use the Vision API and that's all you need.
To recap, we started with some drawings and their labels.
We interactively explored our data. We saw what the data looked like.
Then we trained a model or loaded it from the oven, exported it to Core ML, and then deployed using PencilKit and Vision.
But I did miss one thing. I did skip it on purpose.
I did skip evaluation and that's a pretty important part of your workflow.
And if you actually evaluate your models and make them better, you won't see errors like you saw in the app today.
So with that, I'm going to invite my colleague, Abhishek to take us through an exciting model evaluation journey. Abhishek.
Thank you. So today I'd like to talk to you about Model Evaluation and why it's really important in building the best possible user experiences.
But first, let's understand how evaluation works.
Let's take the Drawing Classifier model we trained in the previous demo.
And if we pass a drawing through it, we get a prediction.
Using ground truth data, we can identify whether this prediction is correct or sometimes incorrect.
But using a single data point to evaluate a model doesn't give us a full picture of what the model is doing.
So what do we do? Well, we use more data.
This is our test data set.
We can run the model across the data and get correct and incorrect predictions for each of the drawings.
By grouping the drawings together in correct and incorrect groups we get a sense for how well the model is performing.
This is analogous to the metric of accuracy, a measure of the correct predicted drawings over the total number of drawings in our data set.
And there are other metrics we can use to help us quantify our model's performance.
To access these in Turi Create, simply call the evaluate method available to use on the model object and pass in that test data set.
A dictionary of these metrics will then be returned to now evaluate our model. But this actually doesn't tell us the whole story.
For instance, take a model with an accuracy of 85 percent.
Initially that seems pretty good to us, doesn't it? Well, if we dig down into the per class accuracy, we see that the class of four has a much lower accuracy than the rest of the other classes.
And this is a case that you usually might run into.
Well, what's causing this low accuracy? One thing might be that there might be a low number of examples present for that particular class.
To correct for this we may want to add more examples.
But there are other things that can be going on.
For instance, we've noticed that in some data sets the annotations are incorrect.
In the drawing to your left, you see that the model has predicted this drawing to be one, but the annotation is seven.
And when visually inspecting this drawing, we can see that the annotation probably should be one.
So we can go and correct the annotation.
But in other instances such as the one to your right, the model has predicted it to be four and the annotation is one but it's unclear whether the annotation or the model is incorrect.
There also may be systematic biases present in your model.
For instance, in this example, the model has predicted sevens with bars to be incorrect as three but correctly predicts the sevens without bars. And to help you identify these issues, we've created an interactive visualization tool. Let me show you how it works. Alright. So let's go back to the Jupyter Notebook that Shantanu was showing us and load in a test data set that we can use to evaluate our model.
Now that we've loaded in our test data set, let's call that evaluate method passing in that test data set. So now that we have our metrics object, we can simply call the explorer method on that metric's object to now look at our interactive visualization.
And as you can see, a window popped up and it has three distinct sections, an Overview section, which gives us the overall accuracy of the model as well as the number of iterations in the model that we've trained, the Drawing Classifier model.
The second section is a Per Class breakdown of the metrics as well as a couple of example images that have been correctly been predicted by the model.
And lastly, there's an Error section that shows us all of the errors that the model has made. So let's take a look at the first class, negative four. We see that class has an accuracy of 20 percent.
If we click on it, we see that there are actually four errors that have been made by the model.
But something else that we see is that the number of elements that we've tested is five.
And that's a pretty low number.
So to kind of determine whether the model is performing improperly on this class or simply we have a low number of examples that we're testing, we might want to go back and add more examples to this class to get a better determination of whether the model is performing incorrectly or simply we don't have enough examples.
Let's go to class of negative two. We see something that stands out when we filter by this class.
There are 11 incorrect predictions of the same type, meaning that the actual annotation is negative two, but the model has predicted all of these examples to be negative seven.
But when we look through these examples that have been misclassified, we see that some of these examples are, in fact, negative seven.
And this is a case where the data set has some misannotated data points.
So we might want to go back and correct those annotations to get a better indication of our model's performance.
We've also included a number of other metrics such as F1 Score Precision and Recall which may be more relevant to your use cases when evaluating your models.
So, a recap.
We now allow you to interactively visualize model predictions.
This can help us easily spot annotation errors as well as identify those systematic biases we talked about.
But now that we've identified these issues, what steps can we take to improve the model? Well, let's take a look at our pipeline.
First, we have our test data set.
We've passed it through the Drawing Classifier model and used our interactive visualization tool to identify some issues.
And we may want to add more examples to correct for some of these issues.
We also may want to remove incorrectly trained data as well as maybe update incorrect annotations.
In Turi Create, we enable you to load data as well as sort and filter data if you want to remove incorrect training data.
If you want to add more data, simply use the append command.
And new in Turi Create 6.0 we've introduced an annotation tool. To use it, pass in the data set that you want to annotate, and a GUI pops up with two distinct sections.
To your left, you can see the image that you want to annotate and to your right you see a labels column.
You can add labels and annotate your data points.
But we also recognize that if you're going through a large data set it may be annoying to kind of spend that time to go through the entire data set. So we've included image similarity and we're going to help you quickly go through your data set.
We've also included a Table view that allows you to scroll through your data set, look at annotations, maybe identify those systematic biases we talked about, and add multiple annotations to multiple images.
So a summary of this session.
Sam and Shantanu showed us the new One-Shot Object Detector Toolkit.
Shantanu upped Sam's game with a Three Card Poker app and was able to build a robust model with only one example per class. Next, we looked at the new Drawing Classifier ToolKit.
In it, we saw the new integration with the Apple Pencil.
The interactive visualization tool showed us how to evaluate models interactively and hopefully showed us a process for building better experiences for our users.
And finally, we introduced a new annotation tool that not only allows you to label data, but quickly go through your data sets with the use of image similarity.
Please install Turicreate. We hope you've enjoyed this session and join us in the labs at 2:00 if you want to see any of the demos or have any questions.
We encourage you to check out these related sessions.
Thank you for coming to WWDC 2019.
Looking for something specific? Enter a topic above and jump straight to the good stuff.
An error occurred when submitting your query. Please check your Internet connection and try again.