Combining Machine Learning Models

Hello, all!

I have just started in the wondrous world of machine learning, and I can already tell that I have a lot to learn. So my first project is one that will try to detect gestures and hand positions. For example, if a person were to hold up a peace sign, then a fist, then wave at the camera, it will output the label correctly. Now, some of these gestures are stationary, like pointing or a peace sign, whereas others are in motion, like a wave. I am considering implementing a dual level machine learning model that is both an image classifier and an action classifier in order to capture both stationary and dynamic movements. I was wondering if there is a way to combine those two createml models in order to accomplish this goal. If it is possible, how would I go about combining those models? Or would it simply be easier to only use the action classifier and generate videos of people pointing and doing peace signs to feed to the machine?

Thanks for the help!
To clarify, the Action Classification in Create ML won't be of benefit to you for this goal. It is intended to be used specifically for human body pose classification and won't work with hand pose as input. If you want to build a hand pose classification model or hand gesture recognizer, you'd need to train in another environment like TensorFlow or PyTorch and then convert to Core ML using the Core ML Tools package. There are some readily available gesture recognition models already trained on a large dataset that may work for you allowing you to focus on the conversion work only.

That said, for your goals, you may find that a machine learning model is not even necessary. Recognizing something like a peace sign vs. an open hand, for example, could be handled with a simple heuristic by looking at the relative position of the tip of each finger in the detected hand-pose. The sample code available in Detecting Hand Poses with Vision is a great starting point for this. For something like detection of a wave or swipe, you could first watch for detection of an open hand and then use the Object Tracking API in Vision to observe the movement of finger tips over time.
Combining Machine Learning Models
 
 
Q