Go beyond detecting cats and dogs in images. We'll show you how to use Vision to detect the individual joints and poses of these animals as well — all in real time — and share how you can enable exciting features like animal tracking for a camera app, creative embellishment on an animal photo, and more. We'll also explore other important enhancements to Vision and share best practices.
To learn even more about what's new in the Vision framework, watch "Explore 3D body pose and person segmentation in Vision” and "Lift subjects from images in your app.” And to learn more about building live camera-tracking experiences, check out "Integrate with motorized iPhone stands using DockKit"
♪ ♪ Nadia Zouba: Hello, everyone, and welcome. My name is Nadia Zouba and I work on the Vision team here at Apple. Today, I will be talking about a new amazing API in Vision: Animal Body Pose. I will also go over some important updates in Vision. Let’s start with Animal Body Pose. Animal Body Pose can be used in many possible applications. For example, imagine you left your cat and dog at home alone, and you spent the day at work. When you are back from work, you find this mess in your house. Don’t worry. The Vision Framework can help you figure out what happened, what your pets were doing the whole day, and who is making the mess. But before I dig into that, let’s talk about poses. Three years ago, Vision introduced Human Body Pose to detect human poses. The request generates a collection of human body landmarks by detecting up to 19 body joints. Developers around the world used the API to create many useful applications for health, fitness, et cetera. And since Vision interacts with the real world, we don’t only care about humans; we also care about animals. Vision has already a request for animal recognition that detects and recognizes cats and dogs. The request generates a bounding box with a label for the recognized animal and a confidence level. That’s a great API if you are trying to locate and identify the animal, but how about if you want to know more about the animal? It can be challenging to infer what the animal is doing. For example, when I dog-sit for my neighbor, I might like to know the specific pose of my neighbor’s dog when he wants a snack or needs a walk. Guess what? Vision expanded the body pose to animals. That’s awesome.
Animal Body Pose is a new API in Vision. It's offered in Vision via DetectAnimalBodyPoseRequest. This request, once processed, returns an observation for each animal which contains a collection of animal body joint locations. The request supports cats and dogs, and detects 25 animal body landmarks that includes the tail and the ears.
The Animal Body Pose API is available in Vision starting in iOS 17, iPadOS 17, tvOS 17, and macOS Sonoma. The input to Animal Body Pose can be an image or a video. After creating and processing the request in Vision, the request will produce a collection of joints which will define the skeleton of the animal. For Animal Body Pose, six joint groups have been defined: The Head group contains the ears, the eyes, and the nose.
The Forelegs group includes the front legs. You guessed it; here comes the Hindlegs group for the back legs. Trunk group refers to the neck, and the Tail group contains three tail joints. Finally, there is the All group, which is composed by all the joints.
To demonstrate everything I’ve talked about so far, I have a sample app that draws the skeleton of the animal using the location of the landmarks. I have this cute little chihuahua toy dog sitting on my desk. Since this dog can walk, I will use it to show the results of the sample app. I take the dog and put it in front of my phone’s camera and I start the sample app. The app draws a skeleton on top of the animal. Let’s turn on the dog so he can walk.
The skeleton follows the animal in his walk. Oops! The dog is walking away from the camera. Let’s move it back to keep it in front of the phone’s camera.
The skeleton is still following the dog. That’s awesome. Now I will go into the code to show you how the sample app was done.
We start with the capture output where we are receiving CMSampleBuffers from the camera stream. The first step is to create the request. I will use VNDetectAnimalBodyPoseRequest.
The next step is to create a request handler using imageRequestHandler.
Then I provide the request to the handler via a call to perform. If perform request succeed, VNAnimalBodyPoseObservations will be returned in the request results property. Each will contain the location of the joints. To access these joints from Animal Pose Observation, I will request a dictionary of recognized points in the group by calling .recognizedPoints. Since I needed all the joints to draw the skeleton of the animal, I choose to use the All group. You can use another group if you need to only access some of the animal joints.
And finally, to draw the skeleton of the animal, I iterate over all the recognized points and connect the joints.
Here is an example of how the head joints were connected to draw the head skeleton.
There are some considerations to keep in mind. Using the new Animal Body Pose, you can detect up to two animals in an image. The input image size should be at least 64 pixels on each side. And by using the neural engine, the performance can keep up with live capture. Let’s go through a few examples of what can be done with Animal Pose. Using the new Animal Pose API with still images, you can develop your own analysis on the joints to recognize interesting poses of your animal, such as, stretching after waking up...
standing, begging for a treat...
running away from a dog...
or, curling up to take a nap.
As I mentioned before, Animal Recognition allows you to localize and recognize the animal, and Animal Body Pose returns the full body landmarks of the animal. Combining those two requests will allow you to know what type of animal was detected, the location, and the pose. Now you can know who is messing up with the dining table. This will give you a lot of opportunities to develop interesting applications for your pets; maybe something for a dog treat dispenser activated on animal recognition and animal pose detection. The Animal Pose API can also be used with videos. You can bring your own algorithms to your app to analyze the motion, and determine what type of activity your animal is doing. You may even go further with your analysis to understand your animal behavior by tracking the poses over time. Until Animal Body Pose, I thought all those marks in the wall were from my kids, but it was the cat trying to catch a laser pointer.
And whoa, this dog can skateboard better than me! Another use is to track the animal with the camera. To learn more about that kind of tracking, please refer to the session “Integrate with motorized iPhone stands with DockKit.” You can also write funny apps for your pets. For example, putting a hat and sunglasses on a dog. Wouldn’t it be super fun to create a cute card for your animal’s birthday, and send it to family and friends? Happy birthday, Mr. Frenchie! I do this by placing emojis where I find relevant Animal Body Pose joints. Let me demonstrate the emoji app using this cute toy dog I have with me. I will use the same sample app and switch from the skeleton view to the emoji view.
Since this little dog is walking slowly, I added some skates emojis on top of his paw joints to speed him up a little bit. Oh, wait, safety first! Let’s go back to the code and give him a helmet. Here I have already added the skates emojis on top of the paw joints. Let’s add the helmet emoji on top of the ear joints. I choose the size and the position of the emoji. Let’s also add some glasses to skate in style.
I run the app again after taking care of the dog's safety. Better be safe than sorry. Let’s switch to the emoji view. The dog now has the gears he needed to continue skating safely. Looks cool, right? Go, doggy, go! You nailed it! That was it for Animal Body Pose. I can’t wait for you to bring to your app all the amazing things you will create using the new Animal Pose API. There are more updates in Vision that you may find helpful, so let me introduce them to you. There are new Stateful Requests. VNTargetedImage-based requests are available as Stateful Requests in Vision.
Vision has three new derived Stateful Requests, all named with the Track verb. This makes it easier to use for tracking. I am thrilled to announce that Vision supports MLComputeDevice. The new Compute Device API allows you to query where requests get executed and specify which device to use. Core ML and Create ML Multilabel Classification is now compatible with Vision. This allows you to train your own classifier that supports more than one label. Please refer to the session “Discover machine learning enhancements in Create ML” for more information. There are also new revisions in existing requests that bring great improvements. For Barcodes, there is a new revision, Revision 4. The new revision includes a new MSIPlessey symbology and supports color inverted QR codes. And by the way, Revision 1 will be deprecated.
Text Recognition is adding support for Thai and Vietnamese.
And finally, there is a new revision for FaceCaptureQuality, Revision 3, to improve the quality and the accuracy. For additional information about all those new updates in Vision, please check the developer documentation. Today, I talked about the new Animal Body Pose and all the amazing things that can be done with this new API. I also presented some important APIs updates and other enhancements in Vision that I hope will be helpful in your developments. But wait, there is more. Please check the session “Explore 3D body pose and person segmentation in Vision” to learn about the new 3D body pose and person segmentation APIs in Vision. And if you want to segment any selected foreground object, please refer to the session “Lift subjects from images in your app”. Thank you for your attention, and we can’t wait for you to get your paws on Animal Pose.
Looking for something specific? Enter a topic above and jump straight to the good stuff.
An error occurred when submitting your query. Please check your Internet connection and try again.