Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
tensorflow
GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/pt-br/lite/tutorials/pose_classification.ipynb
25118 views
Kernel: Python 3
#@title Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # https://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License.

Classificação de poses humanas com o MoveNet e o TensorFlow Lite

Este notebook ensina a você como treinar um modelo de classificação de poses usando o MoveNet e o TensorFlow Lite. O resultado é um novo modelo do TensorFlow Lite que aceita a saída do modelo MoveNet como sua entrada e gera uma classificação da pose, como o nome de uma pose de yoga.

O procedimento neste notebook consiste em 3 partes:

  • Parte 1: pré-processe os dados de treinamento da classificação de poses em um arquivo CSV que especifica os landmarks (pontos-chave corporais) detectados pelo modelo MoveNet, além dos rótulos de pose de verdade absoluta.

  • Parte 2: crie e treine um modelo de classificação de poses que aceita coordenadas de landmarks do arquivo CSV como entrada e gera os rótulos previstos.

  • Parte 3: converta o modelo de classificação de poses para o TFLite.

Por padrão, este notebook usa um dataset de imagens com poses de yoga rotuladas, mas também incluímos uma seção na Parte 1 onde você pode fazer upload do seu próprio dataset de imagens de poses.

Preparação

Nesta seção, você importará as bibliotecas necessárias e definirá várias funções para pré-processar as imagens de treinamento em um arquivo CSV com as coordenadas dos landmarks e os rótulos de verdade absoluta.

Não acontece nada observável aqui, mas você pode expandir as células de código ocultas para ver a implementação de algumas funções que serão chamadas depois.

Se você só quiser criar o arquivo CSV sem saber todos os detalhes, basta executar esta seção e seguir para a Parte 1.

!pip install -q opencv-python
import csv import cv2 import itertools import numpy as np import pandas as pd import os import sys import tempfile import tqdm from matplotlib import pyplot as plt from matplotlib.collections import LineCollection import tensorflow as tf import tensorflow_hub as hub from tensorflow import keras from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

Código para executar a estimativa de pose usando o MoveNet

#@title Functions to run pose estimation with MoveNet #@markdown You'll download the MoveNet Thunder model from [TensorFlow Hub](https://www.google.com/url?sa=D&q=https%3A%2F%2Ftfhub.dev%2Fs%3Fq%3Dmovenet), and reuse some inference and visualization logic from the [MoveNet Raspberry Pi (Python)](https://github.com/tensorflow/examples/tree/master/lite/examples/pose_estimation/raspberry_pi) sample app to detect landmarks (ear, nose, wrist etc.) from the input images. #@markdown *Note: You should use the most accurate pose estimation model (i.e. MoveNet Thunder) to detect the keypoints and use them to train the pose classification model to achieve the best accuracy. When running inference, you can use a pose estimation model of your choice (e.g. either MoveNet Lightning or Thunder).* # Download model from TF Hub and check out inference code from GitHub !wget -q -O movenet_thunder.tflite https://tfhub.dev/google/lite-model/movenet/singlepose/thunder/tflite/float16/4?lite-format=tflite !git clone https://github.com/tensorflow/examples.git pose_sample_rpi_path = os.path.join(os.getcwd(), 'examples/lite/examples/pose_estimation/raspberry_pi') sys.path.append(pose_sample_rpi_path) # Load MoveNet Thunder model import utils from data import BodyPart from ml import Movenet movenet = Movenet('movenet_thunder') # Define function to run pose estimation using MoveNet Thunder. # You'll apply MoveNet's cropping algorithm and run inference multiple times on # the input image to improve pose estimation accuracy. def detect(input_tensor, inference_count=3): """Runs detection on an input image. Args: input_tensor: A [height, width, 3] Tensor of type tf.float32. Note that height and width can be anything since the image will be immediately resized according to the needs of the model within this function. inference_count: Number of times the model should run repeatly on the same input image to improve detection accuracy. Returns: A Person entity detected by the MoveNet.SinglePose. """ image_height, image_width, channel = input_tensor.shape # Detect pose using the full input image movenet.detect(input_tensor.numpy(), reset_crop_region=True) # Repeatedly using previous detection result to identify the region of # interest and only croping that region to improve detection accuracy for _ in range(inference_count - 1): person = movenet.detect(input_tensor.numpy(), reset_crop_region=False) return person
#@title Functions to visualize the pose estimation results. def draw_prediction_on_image( image, person, crop_region=None, close_figure=True, keep_input_size=False): """Draws the keypoint predictions on image. Args: image: An numpy array with shape [height, width, channel] representing the pixel values of the input image. person: A person entity returned from the MoveNet.SinglePose model. close_figure: Whether to close the plt figure after the function returns. keep_input_size: Whether to keep the size of the input image. Returns: An numpy array with shape [out_height, out_width, channel] representing the image overlaid with keypoint predictions. """ # Draw the detection result on top of the image. image_np = utils.visualize(image, [person]) # Plot the image with detection results. height, width, channel = image.shape aspect_ratio = float(width) / height fig, ax = plt.subplots(figsize=(12 * aspect_ratio, 12)) im = ax.imshow(image_np) if close_figure: plt.close(fig) if not keep_input_size: image_np = utils.keep_aspect_ratio_resizer(image_np, (512, 512)) return image_np
#@title Code to load the images, detect pose landmarks and save them into a CSV file class MoveNetPreprocessor(object): """Helper class to preprocess pose sample images for classification.""" def __init__(self, images_in_folder, images_out_folder, csvs_out_path): """Creates a preprocessor to detection pose from images and save as CSV. Args: images_in_folder: Path to the folder with the input images. It should follow this structure: yoga_poses |__ downdog |______ 00000128.jpg |______ 00000181.bmp |______ ... |__ goddess |______ 00000243.jpg |______ 00000306.jpg |______ ... ... images_out_folder: Path to write the images overlay with detected landmarks. These images are useful when you need to debug accuracy issues. csvs_out_path: Path to write the CSV containing the detected landmark coordinates and label of each image that can be used to train a pose classification model. """ self._images_in_folder = images_in_folder self._images_out_folder = images_out_folder self._csvs_out_path = csvs_out_path self._messages = [] # Create a temp dir to store the pose CSVs per class self._csvs_out_folder_per_class = tempfile.mkdtemp() # Get list of pose classes and print image statistics self._pose_class_names = sorted( [n for n in os.listdir(self._images_in_folder) if not n.startswith('.')] ) def process(self, per_pose_class_limit=None, detection_threshold=0.1): """Preprocesses images in the given folder. Args: per_pose_class_limit: Number of images to load. As preprocessing usually takes time, this parameter can be specified to make the reduce of the dataset for testing. detection_threshold: Only keep images with all landmark confidence score above this threshold. """ # Loop through the classes and preprocess its images for pose_class_name in self._pose_class_names: print('Preprocessing', pose_class_name, file=sys.stderr) # Paths for the pose class. images_in_folder = os.path.join(self._images_in_folder, pose_class_name) images_out_folder = os.path.join(self._images_out_folder, pose_class_name) csv_out_path = os.path.join(self._csvs_out_folder_per_class, pose_class_name + '.csv') if not os.path.exists(images_out_folder): os.makedirs(images_out_folder) # Detect landmarks in each image and write it to a CSV file with open(csv_out_path, 'w') as csv_out_file: csv_out_writer = csv.writer(csv_out_file, delimiter=',', quoting=csv.QUOTE_MINIMAL) # Get list of images image_names = sorted( [n for n in os.listdir(images_in_folder) if not n.startswith('.')]) if per_pose_class_limit is not None: image_names = image_names[:per_pose_class_limit] valid_image_count = 0 # Detect pose landmarks from each image for image_name in tqdm.tqdm(image_names): image_path = os.path.join(images_in_folder, image_name) try: image = tf.io.read_file(image_path) image = tf.io.decode_jpeg(image) except: self._messages.append('Skipped ' + image_path + '. Invalid image.') continue else: image = tf.io.read_file(image_path) image = tf.io.decode_jpeg(image) image_height, image_width, channel = image.shape # Skip images that isn't RGB because Movenet requires RGB images if channel != 3: self._messages.append('Skipped ' + image_path + '. Image isn\'t in RGB format.') continue person = detect(image) # Save landmarks if all landmarks were detected min_landmark_score = min( [keypoint.score for keypoint in person.keypoints]) should_keep_image = min_landmark_score >= detection_threshold if not should_keep_image: self._messages.append('Skipped ' + image_path + '. No pose was confidentlly detected.') continue valid_image_count += 1 # Draw the prediction result on top of the image for debugging later output_overlay = draw_prediction_on_image( image.numpy().astype(np.uint8), person, close_figure=True, keep_input_size=True) # Write detection result into an image file output_frame = cv2.cvtColor(output_overlay, cv2.COLOR_RGB2BGR) cv2.imwrite(os.path.join(images_out_folder, image_name), output_frame) # Get landmarks and scale it to the same size as the input image pose_landmarks = np.array( [[keypoint.coordinate.x, keypoint.coordinate.y, keypoint.score] for keypoint in person.keypoints], dtype=np.float32) # Write the landmark coordinates to its per-class CSV file coordinates = pose_landmarks.flatten().astype(np.str).tolist() csv_out_writer.writerow([image_name] + coordinates) if not valid_image_count: raise RuntimeError( 'No valid images found for the "{}" class.' .format(pose_class_name)) # Print the error message collected during preprocessing. print('\n'.join(self._messages)) # Combine all per-class CSVs into a single output file all_landmarks_df = self._all_landmarks_as_dataframe() all_landmarks_df.to_csv(self._csvs_out_path, index=False) def class_names(self): """List of classes found in the training dataset.""" return self._pose_class_names def _all_landmarks_as_dataframe(self): """Merge all per-class CSVs into a single dataframe.""" total_df = None for class_index, class_name in enumerate(self._pose_class_names): csv_out_path = os.path.join(self._csvs_out_folder_per_class, class_name + '.csv') per_class_df = pd.read_csv(csv_out_path, header=None) # Add the labels per_class_df['class_no'] = [class_index]*len(per_class_df) per_class_df['class_name'] = [class_name]*len(per_class_df) # Append the folder name to the filename column (first column) per_class_df[per_class_df.columns[0]] = (os.path.join(class_name, '') + per_class_df[per_class_df.columns[0]].astype(str)) if total_df is None: # For the first class, assign its data to the total dataframe total_df = per_class_df else: # Concatenate each class's data into the total dataframe total_df = pd.concat([total_df, per_class_df], axis=0) list_name = [[bodypart.name + '_x', bodypart.name + '_y', bodypart.name + '_score'] for bodypart in BodyPart] header_name = [] for columns_name in list_name: header_name += columns_name header_name = ['file_name'] + header_name header_map = {total_df.columns[i]: header_name[i] for i in range(len(header_name))} total_df.rename(header_map, axis=1, inplace=True) return total_df
#@title (Optional) Code snippet to try out the Movenet pose estimation logic #@markdown You can download an image from the internet, run the pose estimation logic on it and plot the detected landmarks on top of the input image. #@markdown *Note: This code snippet is also useful for debugging when you encounter an image with bad pose classification accuracy. You can run pose estimation on the image and see if the detected landmarks look correct or not before investigating the pose classification logic.* test_image_url = "https://cdn.pixabay.com/photo/2017/03/03/17/30/yoga-2114512_960_720.jpg" #@param {type:"string"} !wget -O /tmp/image.jpeg {test_image_url} if len(test_image_url): image = tf.io.read_file('/tmp/image.jpeg') image = tf.io.decode_jpeg(image) person = detect(image) _ = draw_prediction_on_image(image.numpy(), person, crop_region=None, close_figure=False, keep_input_size=True)

Parte 1: pré-processe as imagens de entrada

Como a entrada do nosso classificador de poses são os landmarks de saída do modelo MoveNet, precisamos gerar o dataset de treinamento ao executar imagens rotuladas através do MoveNet e depois capturar todos os dados de landmarks e rótulos de verdade absoluta em um arquivo CSV.

O dataset que fornecemos para este tutorial é um dataset de poses de yoga gerado por CG. Ele contém imagens de vários modelos gerados por CG fazendo 5 poses de yoga diferentes. O diretório já está dividido em datasets train e test.

Portanto, nesta seção, vamos baixar o dataset de yoga e executá-lo pelo MoveNet para que possamos capturar todos os landmarks em um arquivo CSV... No entanto, leva cerca de 15 minutos para alimentar nosso dataset de yoga ao MoveNet e gerar esse arquivo CSV. Então, como alternativa, você pode baixar um arquivo CSV pré-existente para o dataset de yoga ao definir o parâmetro is_skip_step_1 abaixo como True. Assim, você pulará este passo e baixará o mesmo arquivo CSV criado nesta etapa de pré-processamento.

Por outro lado, se você quiser treinar o classificador de poses com seu próprio dataset de imagens, precisará fazer upload das imagens e realizar este passo de pré-processamento (deixe is_skip_step_1 como False): siga as instruções abaixo para carregar seu próprio dataset de poses.

is_skip_step_1 = False #@param ["False", "True"] {type:"raw"}

(Opcional) Faça upload do seu próprio dataset de poses

use_custom_dataset = False #@param ["False", "True"] {type:"raw"} dataset_is_split = False #@param ["False", "True"] {type:"raw"}

Se você quiser treinar o classificador de poses com suas próprias poses rotuladas (pode ser de qualquer tipo, e não só poses de yoga), siga estas etapas:

  1. Defina a opção use_custom_dataset acima como True.

  2. Prepare um arquivo (ZIP, TAR etc.) que inclua uma pasta com seu dataset de imagens. Essa pasta precisa ter imagens classificadas das suas poses da maneira a seguir.

Se você já dividiu seu dataset em conjuntos de treinamento e teste, defina dataset_is_split como True. Isto é, sua pasta de imagens precisa incluir os diretórios "train" e "teste" assim:

yoga_poses/ |__ train/ |__ downdog/ |______ 00000128.jpg |______ ... |__ test/ |__ downdog/ |______ 00000181.jpg |______ ...

Or, if your dataset is NOT split yet, then set `dataset_is_split` to **False** and we'll split it up based on a specified split fraction. That is, your uploaded images folder should look like this:

yoga_poses/ |__ downdog/ |______ 00000128.jpg |______ 00000181.jpg |______ ... |__ goddess/ |______ 00000243.jpg |______ 00000306.jpg |______ ...

  1. Clique na guia Files, "Arquivos", à esquerda (ícone de pasta) e depois em Upload to session storage, "Fazer upload no armazenamento da sessão", (ícone de arquivo).

  2. Selecione seu arquivo e espere até que o upload seja concluído antes de continuar.

  3. Edite o bloco de código a seguir para especificar o nome do arquivo e do diretório de imagens. (Por padrão, esperamos um arquivo ZIP, então você também precisará modificar essa parte se o seu arquivo estiver em outro formato.)

  4. Agora, execute o resto do notebook.

#@markdown Be sure you run this cell. It's hiding the `split_into_train_test()` function that's called in the next code block. import os import random import shutil def split_into_train_test(images_origin, images_dest, test_split): """Splits a directory of sorted images into training and test sets. Args: images_origin: Path to the directory with your images. This directory must include subdirectories for each of your labeled classes. For example: yoga_poses/ |__ downdog/ |______ 00000128.jpg |______ 00000181.jpg |______ ... |__ goddess/ |______ 00000243.jpg |______ 00000306.jpg |______ ... ... images_dest: Path to a directory where you want the split dataset to be saved. The results looks like this: split_yoga_poses/ |__ train/ |__ downdog/ |______ 00000128.jpg |______ ... |__ test/ |__ downdog/ |______ 00000181.jpg |______ ... test_split: Fraction of data to reserve for test (float between 0 and 1). """ _, dirs, _ = next(os.walk(images_origin)) TRAIN_DIR = os.path.join(images_dest, 'train') TEST_DIR = os.path.join(images_dest, 'test') os.makedirs(TRAIN_DIR, exist_ok=True) os.makedirs(TEST_DIR, exist_ok=True) for dir in dirs: # Get all filenames for this dir, filtered by filetype filenames = os.listdir(os.path.join(images_origin, dir)) filenames = [os.path.join(images_origin, dir, f) for f in filenames if ( f.endswith('.png') or f.endswith('.jpg') or f.endswith('.jpeg') or f.endswith('.bmp'))] # Shuffle the files, deterministically filenames.sort() random.seed(42) random.shuffle(filenames) # Divide them into train/test dirs os.makedirs(os.path.join(TEST_DIR, dir), exist_ok=True) os.makedirs(os.path.join(TRAIN_DIR, dir), exist_ok=True) test_count = int(len(filenames) * test_split) for i, file in enumerate(filenames): if i < test_count: destination = os.path.join(TEST_DIR, dir, os.path.split(file)[1]) else: destination = os.path.join(TRAIN_DIR, dir, os.path.split(file)[1]) shutil.copyfile(file, destination) print(f'Moved {test_count} of {len(filenames)} from class "{dir}" into test.') print(f'Your split dataset is in "{images_dest}"')
if use_custom_dataset: # ATTENTION: # You must edit these two lines to match your archive and images folder name: # !tar -xf YOUR_DATASET_ARCHIVE_NAME.tar !unzip -q YOUR_DATASET_ARCHIVE_NAME.zip dataset_in = 'YOUR_DATASET_DIR_NAME' # You can leave the rest alone: if not os.path.isdir(dataset_in): raise Exception("dataset_in is not a valid directory") if dataset_is_split: IMAGES_ROOT = dataset_in else: dataset_out = 'split_' + dataset_in split_into_train_test(dataset_in, dataset_out, test_split=0.2) IMAGES_ROOT = dataset_out

Observação: se você estiver usando o split_into_train_test() para dividir o dataset, ele espera que todas as imagens sejam PNG, JPEG ou BMP e ignora os outros tipos de arquivo.

Baixe o dataset de yoga

if not is_skip_step_1 and not use_custom_dataset: !wget -O yoga_poses.zip http://download.tensorflow.org/data/pose_classification/yoga_poses.zip !unzip -q yoga_poses.zip -d yoga_cg IMAGES_ROOT = "yoga_cg"

Faça o pré-processamento do dataset TRAIN

if not is_skip_step_1: images_in_train_folder = os.path.join(IMAGES_ROOT, 'train') images_out_train_folder = 'poses_images_out_train' csvs_out_train_path = 'train_data.csv' preprocessor = MoveNetPreprocessor( images_in_folder=images_in_train_folder, images_out_folder=images_out_train_folder, csvs_out_path=csvs_out_train_path, ) preprocessor.process(per_pose_class_limit=None)

Faça o pré-processamento do dataset TEST

if not is_skip_step_1: images_in_test_folder = os.path.join(IMAGES_ROOT, 'test') images_out_test_folder = 'poses_images_out_test' csvs_out_test_path = 'test_data.csv' preprocessor = MoveNetPreprocessor( images_in_folder=images_in_test_folder, images_out_folder=images_out_test_folder, csvs_out_path=csvs_out_test_path, ) preprocessor.process(per_pose_class_limit=None)

Parte 2: treine um modelo de classificação de poses que aceita coordenadas de landmarks como entrada e gera rótulos previstos

Você criará um modelo do TensorFlow que aceita coordenadas de landmark e prevê a classe da pose em que está a pessoa na imagem de entrada. O modelo consiste em dois submodelos:

  • O submodelo 1 calcula um embedding da pose (ou seja, um vetor de características) a partir das coordenadas de landmark detectadas.

  • O submodelo 2 alimenta o embedding da pose por várias camadas Dense para prever a classe dela.

Depois, você treinará o modelo com base no dataset pré-processado na parte 1.

(Opcional) Baixe o dataset pré-processado se você não executou a parte 1

# Download the preprocessed CSV files which are the same as the output of step 1 if is_skip_step_1: !wget -O train_data.csv http://download.tensorflow.org/data/pose_classification/yoga_train_data.csv !wget -O test_data.csv http://download.tensorflow.org/data/pose_classification/yoga_test_data.csv csvs_out_train_path = 'train_data.csv' csvs_out_test_path = 'test_data.csv' is_skipped_step_1 = True

Carregue os CSVs pré-processados nos datasets TRAIN e TEST

def load_pose_landmarks(csv_path): """Loads a CSV created by MoveNetPreprocessor. Returns: X: Detected landmark coordinates and scores of shape (N, 17 * 3) y: Ground truth labels of shape (N, label_count) classes: The list of all class names found in the dataset dataframe: The CSV loaded as a Pandas dataframe features (X) and ground truth labels (y) to use later to train a pose classification model. """ # Load the CSV file dataframe = pd.read_csv(csv_path) df_to_process = dataframe.copy() # Drop the file_name columns as you don't need it during training. df_to_process.drop(columns=['file_name'], inplace=True) # Extract the list of class names classes = df_to_process.pop('class_name').unique() # Extract the labels y = df_to_process.pop('class_no') # Convert the input features and labels into the correct format for training. X = df_to_process.astype('float64') y = keras.utils.to_categorical(y) return X, y, classes, dataframe

Carregue e divida o dataset TRAIN original em TRAIN (85% dos dados) e VALIDATE (os 15% restantes).

# Load the train data X, y, class_names, _ = load_pose_landmarks(csvs_out_train_path) # Split training data (X, y) into (X_train, y_train) and (X_val, y_val) X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.15)
# Load the test data X_test, y_test, _, df_test = load_pose_landmarks(csvs_out_test_path)

Defina funções para converter os landmarks das poses em um embedding (ou seja, um vetor de características) para a classificação

Em seguida, converta as coordenadas de landmark em um vetor de características ao:

  1. Mover o centro da pose para a origem.

  2. Dimensionar a pose para que o tamanho dela seja 1.

  3. Achatar essas coordenadas em um vetor de características.

Em seguida, use esse vetor de características para treinar um classificador de poses baseado em rede neural.

def get_center_point(landmarks, left_bodypart, right_bodypart): """Calculates the center point of the two given landmarks.""" left = tf.gather(landmarks, left_bodypart.value, axis=1) right = tf.gather(landmarks, right_bodypart.value, axis=1) center = left * 0.5 + right * 0.5 return center def get_pose_size(landmarks, torso_size_multiplier=2.5): """Calculates pose size. It is the maximum of two values: * Torso size multiplied by `torso_size_multiplier` * Maximum distance from pose center to any pose landmark """ # Hips center hips_center = get_center_point(landmarks, BodyPart.LEFT_HIP, BodyPart.RIGHT_HIP) # Shoulders center shoulders_center = get_center_point(landmarks, BodyPart.LEFT_SHOULDER, BodyPart.RIGHT_SHOULDER) # Torso size as the minimum body size torso_size = tf.linalg.norm(shoulders_center - hips_center) # Pose center pose_center_new = get_center_point(landmarks, BodyPart.LEFT_HIP, BodyPart.RIGHT_HIP) pose_center_new = tf.expand_dims(pose_center_new, axis=1) # Broadcast the pose center to the same size as the landmark vector to # perform substraction pose_center_new = tf.broadcast_to(pose_center_new, [tf.size(landmarks) // (17*2), 17, 2]) # Dist to pose center d = tf.gather(landmarks - pose_center_new, 0, axis=0, name="dist_to_pose_center") # Max dist to pose center max_dist = tf.reduce_max(tf.linalg.norm(d, axis=0)) # Normalize scale pose_size = tf.maximum(torso_size * torso_size_multiplier, max_dist) return pose_size def normalize_pose_landmarks(landmarks): """Normalizes the landmarks translation by moving the pose center to (0,0) and scaling it to a constant pose size. """ # Move landmarks so that the pose center becomes (0,0) pose_center = get_center_point(landmarks, BodyPart.LEFT_HIP, BodyPart.RIGHT_HIP) pose_center = tf.expand_dims(pose_center, axis=1) # Broadcast the pose center to the same size as the landmark vector to perform # substraction pose_center = tf.broadcast_to(pose_center, [tf.size(landmarks) // (17*2), 17, 2]) landmarks = landmarks - pose_center # Scale the landmarks to a constant pose size pose_size = get_pose_size(landmarks) landmarks /= pose_size return landmarks def landmarks_to_embedding(landmarks_and_scores): """Converts the input landmarks into a pose embedding.""" # Reshape the flat input into a matrix with shape=(17, 3) reshaped_inputs = keras.layers.Reshape((17, 3))(landmarks_and_scores) # Normalize landmarks 2D landmarks = normalize_pose_landmarks(reshaped_inputs[:, :, :2]) # Flatten the normalized landmark coordinates into a vector embedding = keras.layers.Flatten()(landmarks) return embedding

Defina um modelo do Keras para a classificação de poses

Nosso modelo do Keras recebe os landmarks detectados da pose, calcula o embedding e prevê a classe dela.

# Define the model inputs = tf.keras.Input(shape=(51)) embedding = landmarks_to_embedding(inputs) layer = keras.layers.Dense(128, activation=tf.nn.relu6)(embedding) layer = keras.layers.Dropout(0.5)(layer) layer = keras.layers.Dense(64, activation=tf.nn.relu6)(layer) layer = keras.layers.Dropout(0.5)(layer) outputs = keras.layers.Dense(len(class_names), activation="softmax")(layer) model = keras.Model(inputs, outputs) model.summary()
model.compile( optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'] ) # Add a checkpoint callback to store the checkpoint that has the highest # validation accuracy. checkpoint_path = "weights.best.hdf5" checkpoint = keras.callbacks.ModelCheckpoint(checkpoint_path, monitor='val_accuracy', verbose=1, save_best_only=True, mode='max') earlystopping = keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=20) # Start training history = model.fit(X_train, y_train, epochs=200, batch_size=16, validation_data=(X_val, y_val), callbacks=[checkpoint, earlystopping])
# Visualize the training history to see whether you're overfitting. plt.plot(history.history['accuracy']) plt.plot(history.history['val_accuracy']) plt.title('Model accuracy') plt.ylabel('accuracy') plt.xlabel('epoch') plt.legend(['TRAIN', 'VAL'], loc='lower right') plt.show()
# Evaluate the model using the TEST dataset loss, accuracy = model.evaluate(X_test, y_test)

Desenhe a matriz de confusão para entender melhor o desempenho do modelo

def plot_confusion_matrix(cm, classes, normalize=False, title='Confusion matrix', cmap=plt.cm.Blues): """Plots the confusion matrix.""" if normalize: cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis] print("Normalized confusion matrix") else: print('Confusion matrix, without normalization') plt.imshow(cm, interpolation='nearest', cmap=cmap) plt.title(title) plt.colorbar() tick_marks = np.arange(len(classes)) plt.xticks(tick_marks, classes, rotation=55) plt.yticks(tick_marks, classes) fmt = '.2f' if normalize else 'd' thresh = cm.max() / 2. for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])): plt.text(j, i, format(cm[i, j], fmt), horizontalalignment="center", color="white" if cm[i, j] > thresh else "black") plt.ylabel('True label') plt.xlabel('Predicted label') plt.tight_layout() # Classify pose in the TEST dataset using the trained model y_pred = model.predict(X_test) # Convert the prediction result to class name y_pred_label = [class_names[i] for i in np.argmax(y_pred, axis=1)] y_true_label = [class_names[i] for i in np.argmax(y_test, axis=1)] # Plot the confusion matrix cm = confusion_matrix(np.argmax(y_test, axis=1), np.argmax(y_pred, axis=1)) plot_confusion_matrix(cm, class_names, title ='Confusion Matrix of Pose Classification Model') # Print the classification report print('\nClassification Report:\n', classification_report(y_true_label, y_pred_label))

(Opcional) Investigue previsões incorretas

Você pode analisar as poses do dataset TEST que foram previstas incorretamente para ver se a exatidão do modelo pode ser melhorada.

Observação: isso só funciona se você executou a etapa 1, porque precisa dos arquivos de imagem das poses na sua máquina local para exibi-los.

if is_skip_step_1: raise RuntimeError('You must have run step 1 to run this cell.') # If step 1 was skipped, skip this step. IMAGE_PER_ROW = 3 MAX_NO_OF_IMAGE_TO_PLOT = 30 # Extract the list of incorrectly predicted poses false_predict = [id_in_df for id_in_df in range(len(y_test)) \ if y_pred_label[id_in_df] != y_true_label[id_in_df]] if len(false_predict) > MAX_NO_OF_IMAGE_TO_PLOT: false_predict = false_predict[:MAX_NO_OF_IMAGE_TO_PLOT] # Plot the incorrectly predicted images row_count = len(false_predict) // IMAGE_PER_ROW + 1 fig = plt.figure(figsize=(10 * IMAGE_PER_ROW, 10 * row_count)) for i, id_in_df in enumerate(false_predict): ax = fig.add_subplot(row_count, IMAGE_PER_ROW, i + 1) image_path = os.path.join(images_out_test_folder, df_test.iloc[id_in_df]['file_name']) image = cv2.imread(image_path) plt.title("Predict: %s; Actual: %s" % (y_pred_label[id_in_df], y_true_label[id_in_df])) plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB)) plt.show()

Parte 3: converta o modelo de classificação de poses para o TensorFlow Lite

Você converterá o modelo de classificação de poses do Keras para o formato do TensorFlow Lite para implantá-lo em aplicativos móveis, navegadores da Web e dispositivos de borda. Ao converter o modelo, você aplicará a quantização de intervalo dinâmico para reduzir o tamanho do modelo do TensorFlow Lite para classificação de poses em 4 vezes com uma perda de exatidão insignificante.

Observação: o TensorFlow Lite é compatível com vários esquemas de quantização. Veja a documentação se você tiver interesse em saber mais.

converter = tf.lite.TFLiteConverter.from_keras_model(model) converter.optimizations = [tf.lite.Optimize.DEFAULT] tflite_model = converter.convert() print('Model size: %dKB' % (len(tflite_model) / 1024)) with open('pose_classifier.tflite', 'wb') as f: f.write(tflite_model)

Em seguida, você escreverá o arquivo de rótulos com o mapeamento de índices de classes para nomes de classes legíveis por humanos.

with open('pose_labels.txt', 'w') as f: f.write('\n'.join(class_names))

Como você aplicou a quantização para reduzir o tamanho do modelo, vamos avaliar o modelo do TFLite quantizado para verificar se a queda na exatidão é aceitável.

def evaluate_model(interpreter, X, y_true): """Evaluates the given TFLite model and return its accuracy.""" input_index = interpreter.get_input_details()[0]["index"] output_index = interpreter.get_output_details()[0]["index"] # Run predictions on all given poses. y_pred = [] for i in range(len(y_true)): # Pre-processing: add batch dimension and convert to float32 to match with # the model's input data format. test_image = X[i: i + 1].astype('float32') interpreter.set_tensor(input_index, test_image) # Run inference. interpreter.invoke() # Post-processing: remove batch dimension and find the class with highest # probability. output = interpreter.tensor(output_index) predicted_label = np.argmax(output()[0]) y_pred.append(predicted_label) # Compare prediction results with ground truth labels to calculate accuracy. y_pred = keras.utils.to_categorical(y_pred) return accuracy_score(y_true, y_pred) # Evaluate the accuracy of the converted TFLite model classifier_interpreter = tf.lite.Interpreter(model_content=tflite_model) classifier_interpreter.allocate_tensors() print('Accuracy of TFLite model: %s' % evaluate_model(classifier_interpreter, X_test, y_test))

Agora, você pode baixar o modelo do TFLite (pose_classifier.tflite) e o arquivo de rótulos (pose_labels.txt) para classificar as poses personalizadas. Veja o aplicativo de amostra Android e Python/Raspberry Pi para um exemplo completo de como usar o modelo de classificação de poses do TFLite.

!zip pose_classifier.zip pose_labels.txt pose_classifier.tflite
# Download the zip archive if running on Colab. try: from google.colab import files files.download('pose_classifier.zip') except: pass