Path: blob/master/site/en-snapshot/lite/tutorials/pose_classification.ipynb
25118 views
Copyright 2021 The TensorFlow Authors.
Human Pose Classification with MoveNet and TensorFlow Lite
This notebook teaches you how to train a pose classification model using MoveNet and TensorFlow Lite. The result is a new TensorFlow Lite model that accepts the output from the MoveNet model as its input, and outputs a pose classification, such as the name of a yoga pose.
The procedure in this notebook consists of 3 parts:
Part 1: Preprocess the pose classification training data into a CSV file that specifies the landmarks (body keypoints) detected by the MoveNet model, along with the ground truth pose labels.
Part 2: Build and train a pose classification model that takes the landmark coordinates from the CSV file as input, and outputs the predicted labels.
Part 3: Convert the pose classification model to TFLite.
By default, this notebook uses an image dataset with labeled yoga poses, but we've also included a section in Part 1 where you can upload your own image dataset of poses.
Preparation
In this section, you'll import the necessary libraries and define several functions to preprocess the training images into a CSV file that contains the landmark coordinates and ground truth labels.
Nothing observable happens here, but you can expand the hidden code cells to see the implementation for some of the functions we'll be calling later on.
If you only want to create the CSV file without knowing all the details, just run this section and proceed to Part 1.
Code to run pose estimation using MoveNet
Part 1: Preprocess the input images
Because the input for our pose classifier is the output landmarks from the MoveNet model, we need to generate our training dataset by running labeled images through MoveNet and then capturing all the landmark data and ground truth labels into a CSV file.
The dataset we've provided for this tutorial is a CG-generated yoga pose dataset. It contains images of multiple CG-generated models doing 5 different yoga poses. The directory is already split into a train
dataset and a test
dataset.
So in this section, we'll download the yoga dataset and run it through MoveNet so we can capture all the landmarks into a CSV file... However, it takes about 15 minutes to feed our yoga dataset to MoveNet and generate this CSV file. So as an alternative, you can download a pre-existing CSV file for the yoga dataset by setting is_skip_step_1
parameter below to True. That way, you'll skip this step and instead download the same CSV file that will be created in this preprocessing step.
On the other hand, if you want to train the pose classifier with your own image dataset, you need to upload your images and run this preprocessing step (leave is_skip_step_1
False)—follow the instructions below to upload your own pose dataset.
(Optional) Upload your own pose dataset
If you want to train the pose classifier with your own labeled poses (they can be any poses, not just yoga poses), follow these steps:
Set the above
use_custom_dataset
option to True.Prepare an archive file (ZIP, TAR, or other) that includes a folder with your images dataset. The folder must include sorted images of your poses as follows.
If you've already split your dataset into train and test sets, then set dataset_is_split
to True. That is, your images folder must include "train" and "test" directories like this:
Click the Files tab on the left (folder icon) and then click Upload to session storage (file icon).
Select your archive file and wait until it finishes uploading before you proceed.
Edit the following code block to specify the name of your archive file and images directory. (By default, we expect a ZIP file, so you'll need to also modify that part if your archive is another format.)
Now run the rest of the notebook.
Note: If you're using split_into_train_test()
to split the dataset, it expects all images to be PNG, JPEG, or BMP—it ignores other file types.
Download the yoga dataset
Preprocess the TRAIN
dataset
Preprocess the TEST
dataset
Part 2: Train a pose classification model that takes the landmark coordinates as input, and output the predicted labels.
You'll build a TensorFlow model that takes the landmark coordinates and predicts the pose class that the person in the input image performs. The model consists of two submodels:
Submodel 1 calculates a pose embedding (a.k.a feature vector) from the detected landmark coordinates.
Submodel 2 feeds pose embedding through several
Dense
layer to predict the pose class.
You'll then train the model based on the dataset that were preprocessed in part 1.
(Optional) Download the preprocessed dataset if you didn't run part 1
Load the preprocessed CSVs into TRAIN
and TEST
datasets.
Load and split the original TRAIN
dataset into TRAIN
(85% of the data) and VALIDATE
(the remaining 15%).
Define functions to convert the pose landmarks to a pose embedding (a.k.a. feature vector) for pose classification
Next, convert the landmark coordinates to a feature vector by:
Moving the pose center to the origin.
Scaling the pose so that the pose size becomes 1
Flattening these coordinates into a feature vector
Then use this feature vector to train a neural-network based pose classifier.
Define a Keras model for pose classification
Our Keras model takes the detected pose landmarks, then calculates the pose embedding and predicts the pose class.
Draw the confusion matrix to better understand the model performance
(Optional) Investigate incorrect predictions
You can look at the poses from the TEST
dataset that were incorrectly predicted to see whether the model accuracy can be improved.
Note: This only works if you have run step 1 because you need the pose image files on your local machine to display them.
Part 3: Convert the pose classification model to TensorFlow Lite
You'll convert the Keras pose classification model to the TensorFlow Lite format so that you can deploy it to mobile apps, web browsers and edge devices. When converting the model, you'll apply dynamic range quantization to reduce the pose classification TensorFlow Lite model size by about 4 times with insignificant accuracy loss.
Note: TensorFlow Lite supports multiple quantization schemes. See the documentation if you are interested to learn more.
Then you'll write the label file which contains mapping from the class indexes to the human readable class names.
As you've applied quantization to reduce the model size, let's evaluate the quantized TFLite model to check whether the accuracy drop is acceptable.
Now you can download the TFLite model (pose_classifier.tflite
) and the label file (pose_labels.txt
) to classify custom poses. See the Android and Python/Raspberry Pi sample app for an end-to-end example of how to use the TFLite pose classification model.