It's now easy to create an app for fitness or sports coaching that takes advantage of machine learning — and to prove it, we built our own. Learn how we designed the Action & Vision app using Object Detection and Action Classification in Create ML along with the new Body Pose Estimation, Trajectory Detection, and Contour Detection features in the Vision framework. Explore how you can create an immersive application for gameplay or training from setup to analysis and feedback. And follow along in Xcode with a full sample project.
To get the most out of this session, you should have familiarity with the Vision framework and Create ML's Action Classifier tools. To learn more, we recommend watching “Build an Action Classifier with Create ML,” “Explore Computer Vision APIs,” and “Detect Body and Hand Pose with Vision.” We also recommend exploring the Action & Vision sample project to learn more about adopting these technologies.
Whether you are building a fitness coaching app, or exploring new ways of interacting, consider the incredible features that you can build by combining machine learning with the rich set of computer vision features. By bringing Create ML, Core ML, and Vision API together, there's almost no end to the magic you can bring to your app.
Hi my name is Frank Doepke and together with my colleague Brent Dimick we're going to explore the Action & Vision application. The theme that we would like to set for today, is that we use the phone as an observer and give feedback to our users. What do I mean by that? We already see that there are plenty of sporting events; we have people using their phones to record and film it. But our thought is: can we use our iPhone or iPad and step-in as a coach? We already have it with us when we go to the gym.
But we want to use the camera and sensors to actually instead of us looking and observing what's on the device, have it observe what we are doing and give us some realtime feedback. You already have the perfect tool in your pocket.
We have a high quality camera, a fast CPU, GPU and Neural Engine. Now we have a set of comprehensive and cooperative APIs that make it easy for you to take advantage of all the hardware. Last but not least, all of this can happen on device. Now that is important for two reasons.
First, we actually want to make sure that we preserve the privacy of our users by keeping all the data on the device. And second, you actually don't have to wait for any kind of latency by analyzing something in the cloud. So today in sports and fitness we can actually see that analysis of the sport can really help everyone to improve. So we have billions of sports enthusiasts up to the professional athlete who can benefit from sports analysis. But instead of just looking at the device as we do today for many things and looking at videos of how to do it, we want to take it to the next level.
We want to improve in the sports and fitness. So we do some common analysis and we think this can be done. So first what did our body do? How did we move? Then we might want to look at which objects are in motion. Think of like the ball in the soccer game or in a tennis game. We need to understand the field of play - the court and tennis court, or the soccer goal. And then we need to give feedback to the user of what we actually saw and what happened.
So for our session we picked an example of a sport that is simple and easy to understand. And then we wrote a fun little application around it that you can actually download the source code for to actually follow along.
So let me introduce the Action & Vision application. We picked the game of bean bag toss. It is a fun game for everyone to play. It's very simple.
We have two boards set up 25 feet apart. Each of the board has a regulation size of two feet by four feet and it has a six-inch diameter hole right in the center.
Players now take the bean bags and throw them on the boards and they take turns and score different scores for if you land on the board and an extra score When you actually land in the hole. Now you might think of bean bag as just a pastime but still everybody is competitive and wants to win. So you might ask yourself the question of like Why did I miss my shot? To answer that you want to see How is the bag flying? What was my body pose when I released the bag? and How fast was my throw? Of course I need to keep score and perhaps I want to show off in front of my friends by doing some different shots.
So let's head outside and actually play a game. All right. Here we have our phone already set up. We now go into the Live Action Mode to actually record the session. The first thing we need to do is find our board. So we panned our camera over.
Now it's stable and we're waiting for the player, and now we go. We have our player. Let's see how I did.
You can see in the orange line it actually saw how you are actually throwing.
And I could see my trajectory of my being bag flying. Wow, I got even lucky on that shot. The other part I would like you to pay attention to is that we have a skeleton on top of the player so we see all the key points of our movements. With that, we can understand the release angle when we're throwing, and we also can see what kind of throw we did overhand, underhand, and even there you see a trick shot where actually we are throwing under the leg.
You will also see the speed of which the actual bean bag was traveling.
And you have to score on the bottom left. And that kind of throws that we did on the bottom-right. Now once we are done with all of the eight throws we actually get a Summary View. I can see all my different trajectories and can see what actually work best for me. I see my average speed at which I was throwing, as well as the release angle, and of course the Final Score.
So that is the Action & Vision app. We hope that you will enjoy it. Now let's look how we actually created this application. We have some key algorithms that have to play together to make all of this happen. You start with the prerequisite phase. As you saw the camera was on a tripod. Somehow we need to have it stabilized. Then we have a game setup. This is kind of where we understand the playing field. And then of course comes the game play. Now the prerequisite is first if you find the board that was the very first panning move that you saw. Once we have the board we now need to ensure that we have seen stability. That means we know that the camera is one set on a tripod or somehow otherwise stabilized. Now we're getting ready to play in the game setup part. You measure all the boards. Then we find our player and we are ready to roll. Now when the game play starts we can actually find all the throws then we analyse to kind of throw type - is it an overhand, underhand, or under leg? And last but not least we measure the speed. Now we'll of course we're interested in which algorithms do we use. So to find the boards between the custom model and be using the VNCoreMLRequest to actually run the inference on that model and it tells us where the board is. Once we have the board we can now use the VNTranslationalImageRegistrationRequest to analyze for scene stability. We measure the boards by actually running VNDetectContourRequest which gives us the outline of the board itself and then we used to VNDetect HumanBodyPoseRequest which is new this year to find the human. Now when we are ready to play we use the be VNDetectTrajectoriesRequest which finds to throw off the bag, and then we have a new model that we trained in CreateML and runs through CoreML to actually classify what kind of throw we have done. Last but not least we can use the measurements of the board together with the analysis of the trajectory to measure the speed of our throw.
Now to guide you through all of this we actually have an icon that will help you. So you see that we have our pre-requisites stage, the game set- up stage, and the game play. Let's dive into the details. The first part is that we need to detect the boards and recognize them. So we created a Custom Option Detection Model. We used CreateML With its Object Detection Template.
We have our own training data that we brought along where we found images with the boards in them and negatives where there's actually no board in it and between the model. You will hear later on in the session a little bit more about how we did the training. Now once we have the model we actually can run the inference through vision. Now we saw that we need to fixate the camera. Why do we have to do this? Some of our algorithms actually require a stable scene but it also gives us some other advantages, because we only need to analyze the playing field once. We know after that, it doesn't change.
Another neat part about this is that it shows that the user has a clear intent of actually capturing it. So we don't need a start button. You don't need to touch the screen to do anything. We're doing the scene stability through registration. For that, we're using the VNTranslationalImageRegistrationRequest Now that is a mouthful. But what it does, it analyzes the movement from one frame to the next. And what you saw when the camera was panning is that we had a movement of 10 pixels between each of the frames.
Once the camera came to rest that movement went down to zero. So we are now below a certain threshold and we know that our scene is stable. The camera is not moving anymore. Next, we do the contour detection. And for that we used the VNDetectContourRequest. To do that we use the bounding box that we got from our object detection as the region of interest. Then we simplify these contours for their analysis by using these two techniques together. We only look at the contours that we actually have from the board and not off the whole scene. If you want to learn more about the Contour Detection you can look at our Explore Computer Vision APIs session. What we need text is our player. And for that we use the VNDetectHumanBodyPoseRequest.
It gives us the points of the body joints like our elbow of the shoulders, and the wrist, and the legs. We can use these points to analyze the angle between the joints so we can know for instance our arm was bent. More details on how to use the BodyPoseRequest can be found in the Detect Body and Hand Pose with Vision session. Also keep this in mind because we gonna use this for the action classification as well. Now after so many slides you want to see some code. So let me hand it over to my colleague friend who will walk you through that Brent. Thanks Frank.
Hi, I'm Brent from the CoreML team. I'll be walking through some of the code of the app that Frank was talking about. And as Frank mentioned not only did we build this app for a session today we're also making it available for download. So if you'd like, you can pause here, download the app, and follow along with me. You can find it linked to this session in the resources section.
All right. Let's dive in. The first thing I'd like to show you is how the app progresses through various states of the game. The app uses a game manager to manage its state and communicate that state with the view controllers.
We'll see these states as we progress through the app. And the gameManager will notify its listing view controllers of state changes. Also note that the gameManager is a singleton which we'll finally use throughout the app.
Next, let's take a look at the main storyboard. When a user launches the app it begins with the Start and Setup instructions screens.
Next the Source Picker is brought up so that either the Live Camera or an Upload Video is used as input. The Source Picker View Controller handles the selection of these input options. After that the app segways to the RootViewController. Now the RootViewController is responsible for a couple of things in our app. So let's take a closer look at it. The first thing The RootViewController is responsible for is hosting the camera view controller. When the RootViewController loads, it creates an instance of the cameraViewController to manage the buffers of frames coming from either the camera or the video.
The CameraViewController has an Output Delegate which is used to pass those buffers to the appropriate DelegateViewControllers.
Once the ReviewController sets up the CameraViewController then it calls StartObservingStateChanges which will register it to be notified by the gameManager of GameStateChanges.
This corresponds to the second responsibility of the RootViewController which is to present and dismiss overlaying view controllers based on the GameState. The RootViewController has an extension where it conforms to the GameStateChange protocol as the gameManager notifies its observers of GameStateChanges the root controller will listen to these state changes to determine which other view controller to present. This could be the SetupViewController, the GameViewController, or the SummaryViewController.
The SetupViewController and GameView Controller classes have extensions to conform to both the Game State Change Observer and CameraViewController output.Delegate protocols. Which means when one of those viewControllers is presented by the root view controller is also added as a GameState ChangeObserver, and it becomes the CameraViewController.outputDelegate.
We just talked about how the app progresses through GameStates and how it passes buffers to the viewControllers. Next let's take a closer look at some of the key functionality that Frank was talking about. We're going to jump into the SetupViewController because this is the first viewCaontroller that the root controller presents. When the SetupViewController appears it creates a VNCoreMLRequest using the objectDetectionModel that we created to detect our boards using CreateML. We use the new Object DetectionTransferLearning algorithm and the objectDetection template.
Then as the SetupViewController starts receiving buffers from the camera ViewController and its CameraView ControllerOutputDelegate extension, it starts performing these visionary quests on each buffer and the detect Board function. Here, the app is taking the results from the requests which are these detected objects, in our case our game boards, and filtering out low confidence results.
If it finds a result with a high enough confidence it draws a bounding box on the screen around the detected object and then progresses from detecting the board to detecting the board placement. The app then instructs the user to align the bounding box with the board location guide which is already present on the screen. Once the object bounding box is placed within the boardLocationGuide the app progresses to determining scene stability.
One thing to note is that the app will only guide the user to move the board into the boardLocationGuide when the user is using Live Camera Mode. It will not guide the user to move the board during video playback. During video playback, the app assumes that the board is placed on the right side of the video. We just saw how the app detects our game boards and guides our users to place the board in the expected location in the camera frames.
Next, let's take a look at how the app determines that the scene is stable.
Let's go back and look at the cameraViewController app delegate extension of the SetupViewController.
It uses a VNSequenceRequest handler because there will be a vision request performed across a series of frames to determine the scene stability. The app iterates over fifteen frames in order to make sure the scene is actually stable.
As the SetupViewController receives buffers from the cameraViewController it performs VNTranslationalImageRegistrationRequests for each buffer on the previous buffer to determine how align those two buffers are.
When the viewController receives the results from the request it appends the points from the transform to the scenesStabilityHistoryPoints array and then updates the setupState again. When the setupState is detecting player state, as it is now, the view controller uses a read-only computed property called sceneStability to calculate whether the scene is stable or not. This property calculates the moving average of the point stored in the sceneStabilityHistoryPoints array. If the distance of the moving average is less than 10 pixels, then this app considers the scene to be stable.
Once seen stability is found the app can progress to detecting the contours of our game board. So now, let's take a look at how the app does that. We'll take a look back at the cameraViewController output delegate extension of the SetupViewController. This time When the viewController receives a buffer, the setup stage is detecting board contours. So the viewController calls detectBoardContours. This function uses the new VNDetectContoursRequest.
Notice that the boardBoundingBox was found earlier, when we were detecting our boards, is used to set a region of interest for this request. This will cause a request to only be performed in that region. The app then performs an analysis of those contours to find the edge of the board and the whole of the board. Once the app finishes detecting contours, the GameState moves to detectedBoardState. Since the SetupViewController is also a game stateChangeObserver the following code will be run on GameState changes.
In this case the GameState is Detected BoardState. So the app lets the user know that a board has been detected and the GameState is changed to Detecting PlayerState. At this point the app has found our game board, made sure it's placed correctly, determined that our scene is stable, and found the contours on the game board. That completes responsibilities of the SetupViewController, so we can move on to the NextViewController. We'll take a quick look back at the root controller where we can see that since the GameState is now DetectingPlayerState, the NextViewController that will be presented is the GameViewController. This means that the GameViewController will be added as a GameStateChangeObserver and it will become the CameraViewControllerOutputDelegate. Since the GameViewController is now the CameraViewControllerOutputDelegate, it will be receiving buffers from the CameraViewController and executing the following code on each buffer.
The GameViewController will perform its detectPlayerRequest-s which is an instance of the VNDetectHumanBodyPoseRequest. When the a ViewController receives the results from this request, it passes them to the humanBoundingBox function. This function filters out low conference observations and returns the bounding box of the person who enters the frame. Once this happens the app moves its GameState to the next phase. Let's remember this humanBoundingBox function because we'll hear about it again a little later.
Next, Frank is going to tell you about detecting trajectories of the bean bags while the game is being played.
Frank? All right, now we have our game play. So let's look at the trajectory detection.
The VNDetectTrajectoriesRequest finds the objects that move through the scene and it also filters out the noise from movements that we might not be interested in. But to actually use it, we need to have a bit of better understanding As to how it works. So let's look at this. Here we actually will see now a throw.
But I want to peel back a little bit the cover so that you can see under the hood, what we actually using for the analysis. So this was our throw.
But this is not what the algorithm looks at. It looks at what we call a frame differential. And we can now actually just highlight much easier the objects that are moving because they change from frame to frame. Is a bit of noise that we filter out. So what we're going to do is we actually have a whole sequence and we see in this section the bean bag flying. So we use our new VNDetectTrajectoryRequest but it's a bit of a special request that's new in Vision this year; it's what we call a StatefulRequest. That means we need to keep this request around which is not the mandatory part for other requests and vision, but here we need to do this because it builds state over time. So we feeded the first frame and nothing happens. Continue feeding frames third, four frames and now we get to the fifth frame and we actually get a trajectory detected because we now have enough evidence. We cannot see from a single frame if actually something is moving, we actually need a bit of evidence over time, and that's why it's a state for request. Now the throw of course gets reported back and they'll be in trajectory observation.
But it started much earlier. I can now actually use the TimeRange of the observation to know when my throw was started. So now we continue feeding frames to our request and our trajectory gets refined over time. Let's look a little bit more at our trajectoryObservation that we are getting back.
So here I actually composited together all that frame differentials of the whole throw. What we are getting back are the points which are the centroids of these objects, also the detected points. On top of it, we also get the projected points. These are the five points that perfectly describe the parabola on which the object has travelled. In addition you get the equation coefficients that describe the parabola which is y=ax(squared)+bx+c Now if I use these parts I can actually go and create a nice smooth parabola for nice visualization. Now you noticed there's a second parabola here on the bottom. This was created by a shadow of the bag flying.
And I actually get multiple trajectories at the same time and I can differentiate between them by using the UUID. Now we know on this one is kind of at our foot level so it's unlikely the bag that we're going to use so we can actually ignore it and only focus on the top one. So how do we use to be and detect trajectory request? When we create it, we actually give it frame analysis spacing. Again let's look at our graphic. What happens when we set the spacing initially to just a time of zero, we're actually going to analyze all the frames but you can actually set that spacing to something else.
And by doing that we are only going to analyze fewer frames. That helps by reducing the computational costs which is important, particularly on older devices.
Next, I can also specify the trajectory lengths that I'm looking for that allows me for instance to filter out small spurious movements that I'm not interested in. And then we have our completionHandler which is, as usual, the part in Vision where we actually deal with our results. In addition to that, we actually have two properties. By looking at the objects that we see in the scene we see they have different sizes. You see on the left we have the arm throwing, and then we have our bean bag flying, and there's some noise on the very right-hand side. And I indicated that by looking at the enclosing circle of our object. By setting the minimal object size I can filter out the noise of the very small parts because I know the size of my bean bag that I actually expect in the scene. And on the other side I can also use the maximum object size to filter out objects that are much larger that I actually don't care about. So I would never get a trajectory from them and really can focus purely on this case on the bean bag. Now a few things to keep in mind when we use the trajectory detection. It requires a stable scene. Hence the prerequisite that we actually have the phone stabilized on a tripod or otherwise. Fixated objects have to travel on some kind of a parabola. Now a straight line is a parabola that allows us to filter out spurious movements that we might not be interested in. You also need to feed in sampleBuffers with timestamps because we are using the timestamps in our analysis. If your ball, for instance, bounces or leaves the frame we get a new trajectory. Any time this happens and you have to combine those and you can do this easily by looking for instance at the last point of a previous directory. If it matches up as the first point of a new trajectory.
It helps to use the Region Of Interest. If you know where you expect the movement to happen you can filter out a lot of the background noise that otherwise happens around it. Last, but not least, use your business logic.
Like for instance in our example we knew that the bean bags would only travel from the player throwing at the board. We have not encountered yet a board that would actually throw the bag back at the player.
Now again I would like to hand it over to Bent to look at the code. Brent? Thanks Frank. Let's take a look at how we're detecting trajectories. The Apple performTheAntiTrajectoryRequests on each buffer from the camera viewController. We can see this happening in the gameViewController. For each buffer, the GameViewController performs its detect trajectory quest which is an instance of the VNDetectTrajectoriesRequest.
It's important to note that the detectTrajectoriesRequest is happening on its own queue, separate from the camera output queue. Once the app receives the results from the detectTrajectoriesRequest It processes these results in the processTrajectoryObservations function.
It's here in this function where the app tracks information about each trajectory like the duration, and points detected. It also updates the trajectory region of interest and checks of the trajectory is still in flight. If the detected trajectory points are outside the region of interest for more than 20 frames the app considers the throw to be complete. It's important to note that the point's property has a didSet observer which calls the update pathway function. This function updates the trajectory path in the trajectory view and checks if the trajectory points are in the region of interests. This function also calculates the release speed of the trajectory but we'll be seeing more on that later. Right now, Frank's going to tell you about detecting the type of throw a player's making.
Thank you Brent. The next part we need to do is look at how we identify the type of throw. We created a Custom Action Classification Model using CreateML in which we used our own training data. We collected videos of the throw types we wanted to classify but also videos of just walking or picking up the bags so that we can filter those out. Brent will later talk about some details of how we trained the model. The action classification is using BodyPose through Vision just like the trajectory detection.
It builds evidence over time. Let's look at how this works. Here's the sequence of the throwing action where we first have the player's body movement.
Then, we detect the moment of the throw from our trajectory detection. The accumulated body poses from the VNDetectHumanBodyPoseRequest and take 45 frames around the point where the throw is detected. The window encapsulates the full throw movement. We merged the 45 body poses into one MLMultiArray and feed it into the CoreML model. From that we get a label which is the type of throw and a confidence value. Let me hand it back to Brent to show you how this all looks in the code. Thanks Frank.
We're going to look at how we determine the last throw a player made. We'll keep looking at the CameraViewControllerOutputDelegate extension of the GameViewController.
For each buffer received not only is the GameViewController tracking the trajectory it's also detecting key points of the player with a VNDetectHumanBodyPoseRequest and storing those points as observations. I mentioned earlier that we'd hear about the HumanBoundingBox function again because it's here that the app stores the BodyPose observations. As Frank said these BodyPose observations are the input to create a MLAction classification model that was trained to predict the player throw time. Once a throw is finished the updatePlayerStats function is called. It's in this function that getLastThrowType is called on the playerStats object. The playerStats object is an instance of the playerStats struct, which keeps track of stats about the player during the game. A closer look at the getLastThrowType function shows us that it prepares the input of the action classification model using the body pose observations that were being stored. The prepare InputWithObservations function is a helper function that gets the BodyPose observations into the input format required by the actual classification model.
This helper function also sets the number of frames required to capture a full throw action so those frames can be passed to the model for classification with the input ready. The app makes the action classification prediction.
And the throw type with the highest probability is returned. Now Frank is going to tell you more the metrics we calculate on each trajectory.
Thank you Brent.
The next part we need to do is measure our playing field. We know the physical size of the board. It's a regulation size board. Once we measure it out by using the contours we know how many pixels in our image correspond to the four foot by two foot size of the board. Knowing that we have now a correspondence from our image to the real world. Now the trajectory when we throw actually happens in the same plane where the board is. So now we can simply calculate the speed because we have the trajectory how long it took, and we know our size in the real world. The other part that we want to measure is the release angle. So we are kind of looking for where is the bodyPose at the beginning of the throw, when I was actually throwing the bag, and are comparing now what is the angle of the elbow to the wrist with my lower arm in comparison to the horizon. Again, let me hand it back over to Brent to show you this in the code. Brent? Thanks Frank. In addition to getting the last throw type a player makes the updatePlayerStats function also gets the release speed of the trajectory and the release angle of the throw. I mentioned earlier that the release speed of the trajectory is calculated in the update path of your function.
When the app has a first observation of your trajectory, it calculates the length of that trajectory in pixels which can be converted to actual distance using the game board length as a reference. The ap then divides that length by the duration of the trajectory observation to get the release speed of the trajectory. The angle of release for a throw is calculated in the get releaseAngle function. This function uses the wrist and elbow points from the bodyPose observation found in the buffer with the bean bag was released to determine this angle.
We've looked at a number of components of this app and we talked about two specific machine learning models that we created with CreateML. The first was the object detection model used to detect the board. And the second was the action classification model used to detect the throw type of player use.
In each case while we were creating these models we noted down some important points that we wanted to share with you.
We'll start with the object detection model. We wanted to train the model with data from conditions it's expected to operate in. Our data was captured with iPhone because that's where our app will be run and the model will be making predictions on frames from the iPhone camera. We include images of our board in our data because we knew it was the type of board we wanted our app to work with. We also included images of boards outside because we expect we would play the game outside. We noticed, however, that a few things initially through the model off. All of our original data was collected without people or bean bags in the images.
The first model sometimes had difficulty detecting boards, when people and bean bags were in the frame.
Adding images that included people and bean bags improved the model.
Also, after our initial round of data collection, we added additional images from a range of distances and angles which helped improve the model when the phone wasn't directly perpendicular to the board. Next, we'll look at the Action Classification Model. Again we want to train the model with data from conditions. It's expected to operate in. Our Action Classification data was also captured with iPhone. In addition, we included captured at varying distances and angles to account for the iPhone being placed differently when we played the game compared to when we captured the data for the model.
One point we want to share about our first iteration of the model was that it was initially created with just three classes underhand, overhand, and under-the-leg shots. However, that meant that all actions including things like picking up a bean bag were recognized as one of those three actions.
To account for these additional situations, we added another class - a Negative class or other class with people performing a variety of actions that weren't any of these three shots. Doing this helped the model perform much better.
One other point we wanted to share was about setting the correct prediction window.
The correct prediction window needs to be set so that includes the entire target action. Some types of actions, like throwing a bean bag, may take more or less frames to capture than other types of actions. For the model to perform well, the prediction window should be able to capture the full action.
Additionally, when using the model in the app we need to determine when and how often to perform the prediction. For this app, we didn't want to continuously perform predictions. Instead, we wanted to pick times when we can send the portion of the video - when the throw happened - to the model, to classify the action in that duration. We do this once the throw has completed. We have a predetermined event: the end of the throw - at which time we send the frames from the beginning of the throw to the model in order to classify the action performed.
Now, Frank is going to talk about best practices for Live Processing.
All right. Thank you Brent. Now after seeing all that there's a few things that we want to keep in mind for this application. It's all about real time feedback. So we need to follow some best practices for Live Processing.
When we deal with livestreams, there are a few challenges. Our camera only has a finite set of buffers. When you work on a buffer to analyze it, it's not available to the camera anymore, so we can easily starve our camera from buffers.
So it is important that we actually give those buffers back to the camera as soon as possible so that the camera has them available for its work.
Now you might think you know how long your algorithm takes and it's less than a frame duration, so we should all be good. But that's not always the case because the load on the system can vary. For instance you might get in the background a notification or some other network traffic has to happen and your algorithms might actually take a bit longer. If you have difficulties keeping up with the frame rate, it really helps to use to capture output method for didDrop to actually get a notification why the camera was not able to deliver a buffer to you. Now when we deal with livestreams we also want to split up our work. We know that we have here multiple things that we analyze on the frame. So we are going to use different cues. So we feed our frame by using multitasking into different cues and we can run them in parallel, while the camera can do actually its work. You don't want to wait until you inventory results of your analysis. That is important because that you have to normally do on the main cues so released a buffer beforehand and asynchronously render on the main view. Now the camera has all the buffer back available and it will not starve the camera feed. Next when we deal with live playback the challenges are actually similar. When we deal with Live Playback, we need to make sure that the video continues playing while we do our analysis so it doesn't stutter. So again we have to make sure that we can actually process all the frames correctly. If you do a post-analysis you would actually go frame by frame. But here in the Live Playback part we do not want to go frame by frame. So we are using the AVPlayerItemVideoOutput together with the CADisplayLink.
It's gonna tell us when we have a new CVpixel buffer available for a given time and we might actually use that time a little bit in the future so it synchronizes with the actual video frame arriving on the screen. And then, you simply copy output PixelBuffer and do the analysis based on that.
All right. Let's wrap things up. I hope you've seen that analyzing actions and sports can be really exciting. Now it works not just for the bean bag game, We can use it for something like tennis. And although the ball is really small and hard to see. Our algorithm can actually detect it. Or think of like playing soccer. And we again can see the trajectory of how the ball was flying. But perhaps you want to coach the next generation of cricket players and we can nicely see the trajectory of our cricket ball. Now I would like to think about what else can I build with these technologies and what insights can I bring to my users. I can't wait to see all the great applications that you come up with and the innovations that you can build on top of our technologies. Thank you all for attending our session
Looking for something specific? Enter a topic above and jump straight to the good stuff.
An error occurred when submitting your query. Please check your Internet connection and try again.