GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/ko/lite/tutorials/pose_classification.ipynb
²⁵¹¹⁸ views

Kernel: Python 3

Copyright 2021 The TensorFlow Authors.

In [ ]:

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

MoveNet 및 TensorFlow Lite를 사용한 인간 포즈 분류

이 노트북은 MoveNet 및 TensorFlow Lite를 사용하여 포즈 분류 모델을 훈련하는 방법을 알려줍니다. 결과는 MoveNet 모델의 출력을 입력으로 받아들이고 요가 포즈의 이름과 같은 포즈 분류를 출력하는 새로운 TensorFlow Lite 모델입니다.

이 노트북의 절차는 세 부분으로 구성됩니다.

파트 1: 실제 포즈 레이블과 함께 MoveNet 모델이 감지한 랜드마크(신체 키포인트)를 지정하는 CSV 파일로 포즈 분류 훈련 데이터를 사전 처리합니다.
2부: CSV 파일의 랜드마크 좌표를 입력으로 사용하고 예측된 레이블을 출력하는 포즈 분류 모델을 빌드하고 훈련합니다.
3부: 포즈 분류 모델을 TFLite로 변환합니다.

기본적으로 이 노트북은 요가 포즈라는 레이블이 지정된 이미지 데이터 세트를 사용하지만, 포즈의 이미지 데이터 세트를 업로드할 수 있는 섹션도 파트 1에 포함했습니다.

준비

이 섹션에서는 필요한 라이브러리를 가져오고 랜드마크 좌표와 정답 레이블이 포함된 CSV 파일로 훈련 이미지를 사전 처리하는 여러 함수를 정의합니다.

여기서 관찰할 수 있는 일은 없지만 숨겨진 코드 셀을 확장하여 나중에 호출할 일부 기능에 대한 구현을 볼 수 있습니다.

모든 세부 사항을 모른 채 CSV 파일만 만들고 싶다면 이 섹션을 실행하고 1부로 진행하십시오.

In [ ]:

!pip install -q opencv-python

In [ ]:

import csv
import cv2
import itertools
import numpy as np
import pandas as pd
import os
import sys
import tempfile
import tqdm

from matplotlib import pyplot as plt
from matplotlib.collections import LineCollection

import tensorflow as tf
import tensorflow_hub as hub
from tensorflow import keras

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

MoveNet을 사용하여 포즈 추정을 실행하는 코드

In [ ]:

#@title Functions to run pose estimation with MoveNet

#@markdown You'll download the MoveNet Thunder model from [TensorFlow Hub](https://www.google.com/url?sa=D&q=https%3A%2F%2Ftfhub.dev%2Fs%3Fq%3Dmovenet), and reuse some inference and visualization logic from the [MoveNet Raspberry Pi (Python)](https://github.com/tensorflow/examples/tree/master/lite/examples/pose_estimation/raspberry_pi) sample app to detect landmarks (ear, nose, wrist etc.) from the input images.

#@markdown *Note: You should use the most accurate pose estimation model (i.e. MoveNet Thunder) to detect the keypoints and use them to train the pose classification model to achieve the best accuracy. When running inference, you can use a pose estimation model of your choice (e.g. either MoveNet Lightning or Thunder).*

# Download model from TF Hub and check out inference code from GitHub
!wget -q -O movenet_thunder.tflite https://tfhub.dev/google/lite-model/movenet/singlepose/thunder/tflite/float16/4?lite-format=tflite
!git clone https://github.com/tensorflow/examples.git
pose_sample_rpi_path = os.path.join(os.getcwd(), 'examples/lite/examples/pose_estimation/raspberry_pi')
sys.path.append(pose_sample_rpi_path)

# Load MoveNet Thunder model
import utils
from data import BodyPart
from ml import Movenet
movenet = Movenet('movenet_thunder')

# Define function to run pose estimation using MoveNet Thunder.
# You'll apply MoveNet's cropping algorithm and run inference multiple times on
# the input image to improve pose estimation accuracy.
def detect(input_tensor, inference_count=3):
  """Runs detection on an input image.
 
  Args:
    input_tensor: A [height, width, 3] Tensor of type tf.float32.
      Note that height and width can be anything since the image will be
      immediately resized according to the needs of the model within this
      function.
    inference_count: Number of times the model should run repeatly on the
      same input image to improve detection accuracy.
 
  Returns:
    A Person entity detected by the MoveNet.SinglePose.
  """
  image_height, image_width, channel = input_tensor.shape
 
  # Detect pose using the full input image
  movenet.detect(input_tensor.numpy(), reset_crop_region=True)
 
  # Repeatedly using previous detection result to identify the region of
  # interest and only croping that region to improve detection accuracy
  for _ in range(inference_count - 1):
    person = movenet.detect(input_tensor.numpy(), 
                            reset_crop_region=False)

  return person

In [ ]:

#@title Functions to visualize the pose estimation results.

def draw_prediction_on_image(
    image, person, crop_region=None, close_figure=True,
    keep_input_size=False):
  """Draws the keypoint predictions on image.
 
  Args:
    image: An numpy array with shape [height, width, channel] representing the
      pixel values of the input image.
    person: A person entity returned from the MoveNet.SinglePose model.
    close_figure: Whether to close the plt figure after the function returns.
    keep_input_size: Whether to keep the size of the input image.
 
  Returns:
    An numpy array with shape [out_height, out_width, channel] representing the
    image overlaid with keypoint predictions.
  """
  # Draw the detection result on top of the image.
  image_np = utils.visualize(image, [person])
  
  # Plot the image with detection results.
  height, width, channel = image.shape
  aspect_ratio = float(width) / height
  fig, ax = plt.subplots(figsize=(12 * aspect_ratio, 12))
  im = ax.imshow(image_np)
 
  if close_figure:
    plt.close(fig)
 
  if not keep_input_size:
    image_np = utils.keep_aspect_ratio_resizer(image_np, (512, 512))

  return image_np

In [ ]:

#@title Code to load the images, detect pose landmarks and save them into a CSV file

class MoveNetPreprocessor(object):
  """Helper class to preprocess pose sample images for classification."""
 
  def __init__(self,
               images_in_folder,
               images_out_folder,
               csvs_out_path):
    """Creates a preprocessor to detection pose from images and save as CSV.

    Args:
      images_in_folder: Path to the folder with the input images. It should
        follow this structure:
        yoga_poses
        |__ downdog
            |______ 00000128.jpg
            |______ 00000181.bmp
            |______ ...
        |__ goddess
            |______ 00000243.jpg
            |______ 00000306.jpg
            |______ ...
        ...
      images_out_folder: Path to write the images overlay with detected
        landmarks. These images are useful when you need to debug accuracy
        issues.
      csvs_out_path: Path to write the CSV containing the detected landmark
        coordinates and label of each image that can be used to train a pose
        classification model.
    """
    self._images_in_folder = images_in_folder
    self._images_out_folder = images_out_folder
    self._csvs_out_path = csvs_out_path
    self._messages = []

    # Create a temp dir to store the pose CSVs per class
    self._csvs_out_folder_per_class = tempfile.mkdtemp()
 
    # Get list of pose classes and print image statistics
    self._pose_class_names = sorted(
        [n for n in os.listdir(self._images_in_folder) if not n.startswith('.')]
        )
    
  def process(self, per_pose_class_limit=None, detection_threshold=0.1):
    """Preprocesses images in the given folder.
    Args:
      per_pose_class_limit: Number of images to load. As preprocessing usually
        takes time, this parameter can be specified to make the reduce of the
        dataset for testing.
      detection_threshold: Only keep images with all landmark confidence score
        above this threshold.
    """
    # Loop through the classes and preprocess its images
    for pose_class_name in self._pose_class_names:
      print('Preprocessing', pose_class_name, file=sys.stderr)

      # Paths for the pose class.
      images_in_folder = os.path.join(self._images_in_folder, pose_class_name)
      images_out_folder = os.path.join(self._images_out_folder, pose_class_name)
      csv_out_path = os.path.join(self._csvs_out_folder_per_class,
                                  pose_class_name + '.csv')
      if not os.path.exists(images_out_folder):
        os.makedirs(images_out_folder)
 
      # Detect landmarks in each image and write it to a CSV file
      with open(csv_out_path, 'w') as csv_out_file:
        csv_out_writer = csv.writer(csv_out_file, 
                                    delimiter=',', 
                                    quoting=csv.QUOTE_MINIMAL)
        # Get list of images
        image_names = sorted(
            [n for n in os.listdir(images_in_folder) if not n.startswith('.')])
        if per_pose_class_limit is not None:
          image_names = image_names[:per_pose_class_limit]

        valid_image_count = 0
 
        # Detect pose landmarks from each image
        for image_name in tqdm.tqdm(image_names):
          image_path = os.path.join(images_in_folder, image_name)

          try:
            image = tf.io.read_file(image_path)
            image = tf.io.decode_jpeg(image)
          except:
            self._messages.append('Skipped ' + image_path + '. Invalid image.')
            continue
          else:
            image = tf.io.read_file(image_path)
            image = tf.io.decode_jpeg(image)
            image_height, image_width, channel = image.shape
          
          # Skip images that isn't RGB because Movenet requires RGB images
          if channel != 3:
            self._messages.append('Skipped ' + image_path +
                                  '. Image isn\'t in RGB format.')
            continue
          person = detect(image)
          
          # Save landmarks if all landmarks were detected
          min_landmark_score = min(
              [keypoint.score for keypoint in person.keypoints])
          should_keep_image = min_landmark_score >= detection_threshold
          if not should_keep_image:
            self._messages.append('Skipped ' + image_path +
                                  '. No pose was confidentlly detected.')
            continue

          valid_image_count += 1

          # Draw the prediction result on top of the image for debugging later
          output_overlay = draw_prediction_on_image(
              image.numpy().astype(np.uint8), person, 
              close_figure=True, keep_input_size=True)
        
          # Write detection result into an image file
          output_frame = cv2.cvtColor(output_overlay, cv2.COLOR_RGB2BGR)
          cv2.imwrite(os.path.join(images_out_folder, image_name), output_frame)
        
          # Get landmarks and scale it to the same size as the input image
          pose_landmarks = np.array(
              [[keypoint.coordinate.x, keypoint.coordinate.y, keypoint.score]
                for keypoint in person.keypoints],
              dtype=np.float32)

          # Write the landmark coordinates to its per-class CSV file
          coordinates = pose_landmarks.flatten().astype(np.str).tolist()
          csv_out_writer.writerow([image_name] + coordinates)

        if not valid_image_count:
          raise RuntimeError(
              'No valid images found for the "{}" class.'
              .format(pose_class_name))
      
    # Print the error message collected during preprocessing.
    print('\n'.join(self._messages))

    # Combine all per-class CSVs into a single output file
    all_landmarks_df = self._all_landmarks_as_dataframe()
    all_landmarks_df.to_csv(self._csvs_out_path, index=False)

  def class_names(self):
    """List of classes found in the training dataset."""
    return self._pose_class_names
  
  def _all_landmarks_as_dataframe(self):
    """Merge all per-class CSVs into a single dataframe."""
    total_df = None
    for class_index, class_name in enumerate(self._pose_class_names):
      csv_out_path = os.path.join(self._csvs_out_folder_per_class,
                                  class_name + '.csv')
      per_class_df = pd.read_csv(csv_out_path, header=None)
      
      # Add the labels
      per_class_df['class_no'] = [class_index]*len(per_class_df)
      per_class_df['class_name'] = [class_name]*len(per_class_df)

      # Append the folder name to the filename column (first column)
      per_class_df[per_class_df.columns[0]] = (os.path.join(class_name, '') 
        + per_class_df[per_class_df.columns[0]].astype(str))

      if total_df is None:
        # For the first class, assign its data to the total dataframe
        total_df = per_class_df
      else:
        # Concatenate each class's data into the total dataframe
        total_df = pd.concat([total_df, per_class_df], axis=0)
 
    list_name = [[bodypart.name + '_x', bodypart.name + '_y', 
                  bodypart.name + '_score'] for bodypart in BodyPart] 
    header_name = []
    for columns_name in list_name:
      header_name += columns_name
    header_name = ['file_name'] + header_name
    header_map = {total_df.columns[i]: header_name[i] 
                  for i in range(len(header_name))}
 
    total_df.rename(header_map, axis=1, inplace=True)

    return total_df

In [ ]:

#@title (Optional) Code snippet to try out the Movenet pose estimation logic

#@markdown You can download an image from the internet, run the pose estimation logic on it and plot the detected landmarks on top of the input image. 

#@markdown *Note: This code snippet is also useful for debugging when you encounter an image with bad pose classification accuracy. You can run pose estimation on the image and see if the detected landmarks look correct or not before investigating the pose classification logic.*

test_image_url = "https://cdn.pixabay.com/photo/2017/03/03/17/30/yoga-2114512_960_720.jpg" #@param {type:"string"}
!wget -O /tmp/image.jpeg {test_image_url}

if len(test_image_url):
  image = tf.io.read_file('/tmp/image.jpeg')
  image = tf.io.decode_jpeg(image)
  person = detect(image)
  _ = draw_prediction_on_image(image.numpy(), person, crop_region=None, 
                               close_figure=False, keep_input_size=True)

1부: 입력 이미지 전처리

포즈 분류기의 입력 은 MoveNet 모델의 출력 랜드마크이므로 MoveNet을 통해 레이블이 지정된 이미지를 실행한 다음 모든 랜드마크 데이터와 정답 레이블을 CSV 파일로 캡처하여 교육 데이터 세트를 생성해야 합니다.

이 튜토리얼을 위해 제공한 데이터세트는 CG로 생성된 요가 포즈 데이터세트입니다. 여기에는 5가지 다른 요가 포즈를 하는 여러 CG 생성 모델의 이미지가 포함되어 있습니다. 디렉토리는 이미 train 데이터 세트와 test 데이터 세트로 분할되어 있습니다.

따라서 이 섹션에서는 요가 데이터 세트를 다운로드하고 MoveNet을 통해 실행하여 모든 랜드마크를 CSV 파일로 캡처할 수 있습니다... 그러나 요가 데이터 세트를 MoveNet에 제공하고 이 CSV 파일을 생성하는 데 약 15분이 걸립니다. . is_skip_step_1 매개변수를 True 로 설정하여 요가 데이터 세트에 대한 기존 CSV 파일을 다운로드할 수 있습니다. 그렇게 하면 이 단계를 건너뛰고 대신 이 전처리 단계에서 생성될 동일한 CSV 파일을 다운로드합니다.

반면에 자신의 이미지 데이터 세트로 포즈 분류기를 훈련시키려면 이미지를 업로드하고 이 전처리 단계를 실행해야 합니다( is_skip_step_1 False로 둡니다). 아래 지침에 따라 자신의 포즈 데이터 세트를 업로드해야 합니다.

In [ ]:

is_skip_step_1 = False #@param ["False", "True"] {type:"raw"}

(선택 사항) 자신의 포즈 데이터 세트 업로드

In [ ]:

use_custom_dataset = False #@param ["False", "True"] {type:"raw"}

dataset_is_split = False #@param ["False", "True"] {type:"raw"}

고유한 레이블이 지정된 포즈로 포즈 분류기를 훈련시키려면(요가 포즈뿐만 아니라 모든 포즈가 될 수 있음) 다음 단계를 따르세요.

위의 use_custom_dataset 옵션을 True 로 설정합니다.
이미지 데이터 세트가 있는 폴더가 포함된 아카이브 파일(ZIP, TAR 또는 기타)을 준비합니다. 폴더에는 다음과 같이 정렬된 포즈 이미지가 포함되어야 합니다.

데이터 세트를 이미 훈련 세트와 테스트 세트로 분할했다면 dataset_is_split 을 True 로 설정하십시오. 즉, 이미지 폴더에는 다음과 같은 "train" 및 "test" 디렉터리가 포함되어야 합니다.

yoga_poses/ |__ train/ |__ downdog/ |______ 00000128.jpg |______ ... |__ test/ |__ downdog/ |______ 00000181.jpg |______ ...

Or, if your dataset is NOT split yet, then set
`dataset_is_split` to **False** and we'll split it up based
on a specified split fraction. That is, your uploaded images
folder should look like this:

yoga_poses/ |__ downdog/ |______ 00000128.jpg |______ 00000181.jpg |______ ... |__ goddess/ |______ 00000243.jpg |______ 00000306.jpg |______ ...

왼쪽의 파일 탭(폴더 아이콘)을 클릭한 다음 세션 저장소에 업로드 (파일 아이콘)를 클릭합니다.
보관 파일을 선택하고 계속 진행하기 전에 업로드가 완료될 때까지 기다리세요.
아카이브 파일 및 이미지 디렉토리의 이름을 지정하려면 다음 코드 블록을 편집하십시오. (기본적으로 ZIP 파일이 필요하므로 아카이브가 다른 형식인 경우 해당 부분도 수정해야 합니다.)
이제 노트북의 나머지 부분을 실행합니다.

In [ ]:

#@markdown Be sure you run this cell. It's hiding the `split_into_train_test()` function that's called in the next code block.

import os
import random
import shutil

def split_into_train_test(images_origin, images_dest, test_split):
  """Splits a directory of sorted images into training and test sets.

  Args:
    images_origin: Path to the directory with your images. This directory
      must include subdirectories for each of your labeled classes. For example:
      yoga_poses/
      |__ downdog/
          |______ 00000128.jpg
          |______ 00000181.jpg
          |______ ...
      |__ goddess/
          |______ 00000243.jpg
          |______ 00000306.jpg
          |______ ...
      ...
    images_dest: Path to a directory where you want the split dataset to be
      saved. The results looks like this:
      split_yoga_poses/
      |__ train/
          |__ downdog/
              |______ 00000128.jpg
              |______ ...
      |__ test/
          |__ downdog/
              |______ 00000181.jpg
              |______ ...
    test_split: Fraction of data to reserve for test (float between 0 and 1).
  """
  _, dirs, _ = next(os.walk(images_origin))

  TRAIN_DIR = os.path.join(images_dest, 'train')
  TEST_DIR = os.path.join(images_dest, 'test')
  os.makedirs(TRAIN_DIR, exist_ok=True)
  os.makedirs(TEST_DIR, exist_ok=True)

  for dir in dirs:
    # Get all filenames for this dir, filtered by filetype
    filenames = os.listdir(os.path.join(images_origin, dir))
    filenames = [os.path.join(images_origin, dir, f) for f in filenames if (
        f.endswith('.png') or f.endswith('.jpg') or f.endswith('.jpeg') or f.endswith('.bmp'))]
    # Shuffle the files, deterministically
    filenames.sort()
    random.seed(42)
    random.shuffle(filenames)
    # Divide them into train/test dirs
    os.makedirs(os.path.join(TEST_DIR, dir), exist_ok=True)
    os.makedirs(os.path.join(TRAIN_DIR, dir), exist_ok=True)
    test_count = int(len(filenames) * test_split)
    for i, file in enumerate(filenames):
      if i < test_count:
        destination = os.path.join(TEST_DIR, dir, os.path.split(file)[1])
      else:
        destination = os.path.join(TRAIN_DIR, dir, os.path.split(file)[1])
      shutil.copyfile(file, destination)
    print(f'Moved {test_count} of {len(filenames)} from class "{dir}" into test.')
  print(f'Your split dataset is in "{images_dest}"')

In [ ]:

if use_custom_dataset:
  # ATTENTION:
  # You must edit these two lines to match your archive and images folder name:
  # !tar -xf YOUR_DATASET_ARCHIVE_NAME.tar
  !unzip -q YOUR_DATASET_ARCHIVE_NAME.zip
  dataset_in = 'YOUR_DATASET_DIR_NAME'

  # You can leave the rest alone:
  if not os.path.isdir(dataset_in):
    raise Exception("dataset_in is not a valid directory")
  if dataset_is_split:
    IMAGES_ROOT = dataset_in
  else:
    dataset_out = 'split_' + dataset_in
    split_into_train_test(dataset_in, dataset_out, test_split=0.2)
    IMAGES_ROOT = dataset_out

참고: split_into_train_test() 를 사용하여 데이터 세트를 분할하는 경우 모든 이미지가 PNG, JPEG 또는 BMP일 것으로 예상하며 다른 파일 유형은 무시합니다.

요가 데이터 세트 다운로드

In [ ]:

if not is_skip_step_1 and not use_custom_dataset:
  !wget -O yoga_poses.zip http://download.tensorflow.org/data/pose_classification/yoga_poses.zip
  !unzip -q yoga_poses.zip -d yoga_cg
  IMAGES_ROOT = "yoga_cg"

`TRAIN` 데이터세트 전처리

In [ ]:

if not is_skip_step_1:
  images_in_train_folder = os.path.join(IMAGES_ROOT, 'train')
  images_out_train_folder = 'poses_images_out_train'
  csvs_out_train_path = 'train_data.csv'

  preprocessor = MoveNetPreprocessor(
      images_in_folder=images_in_train_folder,
      images_out_folder=images_out_train_folder,
      csvs_out_path=csvs_out_train_path,
  )

  preprocessor.process(per_pose_class_limit=None)

`TEST` 데이터세트 전처리

In [ ]:

if not is_skip_step_1:
  images_in_test_folder = os.path.join(IMAGES_ROOT, 'test')
  images_out_test_folder = 'poses_images_out_test'
  csvs_out_test_path = 'test_data.csv'

  preprocessor = MoveNetPreprocessor(
      images_in_folder=images_in_test_folder,
      images_out_folder=images_out_test_folder,
      csvs_out_path=csvs_out_test_path,
  )

  preprocessor.process(per_pose_class_limit=None)

2부: 랜드마크 좌표를 입력으로 사용하고 예측된 레이블을 출력하는 포즈 분류 모델을 훈련합니다.

랜드마크 좌표를 사용하고 입력 이미지의 사람이 수행하는 포즈 클래스를 예측하는 TensorFlow 모델을 빌드합니다. 모델은 두 개의 하위 모델로 구성됩니다.

하위 모델 1은 감지된 랜드마크 좌표에서 포즈 임베딩(특징 벡터라고도 함)을 계산합니다.
하위 모델 2는 포즈 클래스를 예측하기 위해 Dense 레이어를 통해 포즈 임베딩을 제공합니다.

그런 다음 1부에서 사전 처리된 데이터 세트를 기반으로 모델을 학습합니다.

(선택 사항) 파트 1을 실행하지 않은 경우 전처리된 데이터 세트를 다운로드합니다.

In [ ]:

# Download the preprocessed CSV files which are the same as the output of step 1
if is_skip_step_1:
  !wget -O train_data.csv http://download.tensorflow.org/data/pose_classification/yoga_train_data.csv
  !wget -O test_data.csv http://download.tensorflow.org/data/pose_classification/yoga_test_data.csv

  csvs_out_train_path = 'train_data.csv'
  csvs_out_test_path = 'test_data.csv'
  is_skipped_step_1 = True

사전 처리된 CSV를 `TRAIN` 및 `TEST` 데이터 세트에 로드합니다.

In [ ]:

def load_pose_landmarks(csv_path):
  """Loads a CSV created by MoveNetPreprocessor.
  
  Returns:
    X: Detected landmark coordinates and scores of shape (N, 17 * 3)
    y: Ground truth labels of shape (N, label_count)
    classes: The list of all class names found in the dataset
    dataframe: The CSV loaded as a Pandas dataframe features (X) and ground
      truth labels (y) to use later to train a pose classification model.
  """

  # Load the CSV file
  dataframe = pd.read_csv(csv_path)
  df_to_process = dataframe.copy()

  # Drop the file_name columns as you don't need it during training.
  df_to_process.drop(columns=['file_name'], inplace=True)

  # Extract the list of class names
  classes = df_to_process.pop('class_name').unique()

  # Extract the labels
  y = df_to_process.pop('class_no')

  # Convert the input features and labels into the correct format for training.
  X = df_to_process.astype('float64')
  y = keras.utils.to_categorical(y)

  return X, y, classes, dataframe

TRAIN 데이터 세트를 TRAIN (데이터의 85%) 및 VALIDATE (나머지 15%)로 로드하고 분할합니다.

In [ ]:

# Load the train data
X, y, class_names, _ = load_pose_landmarks(csvs_out_train_path)

# Split training data (X, y) into (X_train, y_train) and (X_val, y_val)
X_train, X_val, y_train, y_val = train_test_split(X, y,
                                                  test_size=0.15)

In [ ]:

# Load the test data
X_test, y_test, _, df_test = load_pose_landmarks(csvs_out_test_path)

포즈 분류를 위해 포즈 랜드마크를 포즈 임베딩(특징 벡터라고도 함)으로 변환하는 함수 정의

다음으로 다음과 같이 랜드마크 좌표를 특징 벡터로 변환합니다.

포즈 중심을 원점으로 이동합니다.
포즈 크기가 1이 되도록 포즈 크기 조정
이 좌표를 특징 벡터로 병합

그런 다음 이 특징 벡터를 사용하여 신경망 기반 포즈 분류기를 훈련시킵니다.

In [ ]:

def get_center_point(landmarks, left_bodypart, right_bodypart):
  """Calculates the center point of the two given landmarks."""

  left = tf.gather(landmarks, left_bodypart.value, axis=1)
  right = tf.gather(landmarks, right_bodypart.value, axis=1)
  center = left * 0.5 + right * 0.5
  return center


def get_pose_size(landmarks, torso_size_multiplier=2.5):
  """Calculates pose size.

  It is the maximum of two values:
    * Torso size multiplied by `torso_size_multiplier`
    * Maximum distance from pose center to any pose landmark
  """
  # Hips center
  hips_center = get_center_point(landmarks, BodyPart.LEFT_HIP, 
                                 BodyPart.RIGHT_HIP)

  # Shoulders center
  shoulders_center = get_center_point(landmarks, BodyPart.LEFT_SHOULDER,
                                      BodyPart.RIGHT_SHOULDER)

  # Torso size as the minimum body size
  torso_size = tf.linalg.norm(shoulders_center - hips_center)

  # Pose center
  pose_center_new = get_center_point(landmarks, BodyPart.LEFT_HIP, 
                                     BodyPart.RIGHT_HIP)
  pose_center_new = tf.expand_dims(pose_center_new, axis=1)
  # Broadcast the pose center to the same size as the landmark vector to
  # perform substraction
  pose_center_new = tf.broadcast_to(pose_center_new,
                                    [tf.size(landmarks) // (17*2), 17, 2])

  # Dist to pose center
  d = tf.gather(landmarks - pose_center_new, 0, axis=0,
                name="dist_to_pose_center")
  # Max dist to pose center
  max_dist = tf.reduce_max(tf.linalg.norm(d, axis=0))

  # Normalize scale
  pose_size = tf.maximum(torso_size * torso_size_multiplier, max_dist)

  return pose_size


def normalize_pose_landmarks(landmarks):
  """Normalizes the landmarks translation by moving the pose center to (0,0) and
  scaling it to a constant pose size.
  """
  # Move landmarks so that the pose center becomes (0,0)
  pose_center = get_center_point(landmarks, BodyPart.LEFT_HIP, 
                                 BodyPart.RIGHT_HIP)
  pose_center = tf.expand_dims(pose_center, axis=1)
  # Broadcast the pose center to the same size as the landmark vector to perform
  # substraction
  pose_center = tf.broadcast_to(pose_center, 
                                [tf.size(landmarks) // (17*2), 17, 2])
  landmarks = landmarks - pose_center

  # Scale the landmarks to a constant pose size
  pose_size = get_pose_size(landmarks)
  landmarks /= pose_size

  return landmarks


def landmarks_to_embedding(landmarks_and_scores):
  """Converts the input landmarks into a pose embedding."""
  # Reshape the flat input into a matrix with shape=(17, 3)
  reshaped_inputs = keras.layers.Reshape((17, 3))(landmarks_and_scores)

  # Normalize landmarks 2D
  landmarks = normalize_pose_landmarks(reshaped_inputs[:, :, :2])

  # Flatten the normalized landmark coordinates into a vector
  embedding = keras.layers.Flatten()(landmarks)

  return embedding

포즈 분류를 위한 Keras 모델 정의

Keras 모델은 감지된 포즈 랜드마크를 가져온 다음 포즈 임베딩을 계산하고 포즈 클래스를 예측합니다.

In [ ]:

# Define the model
inputs = tf.keras.Input(shape=(51))
embedding = landmarks_to_embedding(inputs)

layer = keras.layers.Dense(128, activation=tf.nn.relu6)(embedding)
layer = keras.layers.Dropout(0.5)(layer)
layer = keras.layers.Dense(64, activation=tf.nn.relu6)(layer)
layer = keras.layers.Dropout(0.5)(layer)
outputs = keras.layers.Dense(len(class_names), activation="softmax")(layer)

model = keras.Model(inputs, outputs)
model.summary()

In [ ]:

model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Add a checkpoint callback to store the checkpoint that has the highest
# validation accuracy.
checkpoint_path = "weights.best.hdf5"
checkpoint = keras.callbacks.ModelCheckpoint(checkpoint_path,
                             monitor='val_accuracy',
                             verbose=1,
                             save_best_only=True,
                             mode='max')
earlystopping = keras.callbacks.EarlyStopping(monitor='val_accuracy', 
                                              patience=20)

# Start training
history = model.fit(X_train, y_train,
                    epochs=200,
                    batch_size=16,
                    validation_data=(X_val, y_val),
                    callbacks=[checkpoint, earlystopping])

In [ ]:

# Visualize the training history to see whether you're overfitting.
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['TRAIN', 'VAL'], loc='lower right')
plt.show()

In [ ]:

# Evaluate the model using the TEST dataset
loss, accuracy = model.evaluate(X_test, y_test)

모델 성능을 더 잘 이해하기 위해 정오분류표 그리기

In [ ]:

def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
  """Plots the confusion matrix."""
  if normalize:
    cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
    print("Normalized confusion matrix")
  else:
    print('Confusion matrix, without normalization')

  plt.imshow(cm, interpolation='nearest', cmap=cmap)
  plt.title(title)
  plt.colorbar()
  tick_marks = np.arange(len(classes))
  plt.xticks(tick_marks, classes, rotation=55)
  plt.yticks(tick_marks, classes)
  fmt = '.2f' if normalize else 'd'
  thresh = cm.max() / 2.
  for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
    plt.text(j, i, format(cm[i, j], fmt),
              horizontalalignment="center",
              color="white" if cm[i, j] > thresh else "black")

  plt.ylabel('True label')
  plt.xlabel('Predicted label')
  plt.tight_layout()

# Classify pose in the TEST dataset using the trained model
y_pred = model.predict(X_test)

# Convert the prediction result to class name
y_pred_label = [class_names[i] for i in np.argmax(y_pred, axis=1)]
y_true_label = [class_names[i] for i in np.argmax(y_test, axis=1)]

# Plot the confusion matrix
cm = confusion_matrix(np.argmax(y_test, axis=1), np.argmax(y_pred, axis=1))
plot_confusion_matrix(cm,
                      class_names,
                      title ='Confusion Matrix of Pose Classification Model')

# Print the classification report
print('\nClassification Report:\n', classification_report(y_true_label,
                                                          y_pred_label))

(선택 사항) 잘못된 예측 조사

모델 정확도를 향상시킬 수 있는지 여부를 확인하기 위해 잘못 예측된 TEST 데이터 세트의 포즈를 볼 수 있습니다.

참고: 이는 로컬 시스템에서 포즈 이미지 파일을 표시해야 하기 때문에 1단계를 실행한 경우에만 작동합니다.

In [ ]:

if is_skip_step_1:
  raise RuntimeError('You must have run step 1 to run this cell.')

# If step 1 was skipped, skip this step.
IMAGE_PER_ROW = 3
MAX_NO_OF_IMAGE_TO_PLOT = 30

# Extract the list of incorrectly predicted poses
false_predict = [id_in_df for id_in_df in range(len(y_test)) \
                if y_pred_label[id_in_df] != y_true_label[id_in_df]]
if len(false_predict) > MAX_NO_OF_IMAGE_TO_PLOT:
  false_predict = false_predict[:MAX_NO_OF_IMAGE_TO_PLOT]

# Plot the incorrectly predicted images
row_count = len(false_predict) // IMAGE_PER_ROW + 1
fig = plt.figure(figsize=(10 * IMAGE_PER_ROW, 10 * row_count))
for i, id_in_df in enumerate(false_predict):
  ax = fig.add_subplot(row_count, IMAGE_PER_ROW, i + 1)
  image_path = os.path.join(images_out_test_folder,
                            df_test.iloc[id_in_df]['file_name'])

  image = cv2.imread(image_path)
  plt.title("Predict: %s; Actual: %s"
            % (y_pred_label[id_in_df], y_true_label[id_in_df]))
  plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
plt.show()

3부: 포즈 분류 모델을 TensorFlow Lite로 변환

Keras 포즈 분류 모델을 TensorFlow Lite 형식으로 변환하여 모바일 앱, 웹 브라우저 및 에지 장치에 배포할 수 있습니다. 모델을 변환할 때 동적 범위 양자화를 적용하면 약간의 정확도 저하로 포즈 분류 TensorFlow Lite 모델 크기가 약 4배 줄어듭니다.

참고: TensorFlow Lite는 여러 양자화 체계를 지원합니다. 자세히 알아보려면 설명서를 참조하십시오.

In [ ]:

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()

print('Model size: %dKB' % (len(tflite_model) / 1024))

with open('pose_classifier.tflite', 'wb') as f:
  f.write(tflite_model)

그런 다음 클래스 인덱스에서 사람이 읽을 수 있는 클래스 이름으로의 매핑이 포함된 레이블 파일을 작성합니다.

In [ ]:

with open('pose_labels.txt', 'w') as f:
  f.write('\n'.join(class_names))

모델 크기를 줄이기 위해 양자화를 적용했으므로 양자화된 TFLite 모델을 평가하여 정확도 저하가 허용 가능한지 확인하겠습니다.

In [ ]:

def evaluate_model(interpreter, X, y_true):
  """Evaluates the given TFLite model and return its accuracy."""
  input_index = interpreter.get_input_details()[0]["index"]
  output_index = interpreter.get_output_details()[0]["index"]

  # Run predictions on all given poses.
  y_pred = []
  for i in range(len(y_true)):
    # Pre-processing: add batch dimension and convert to float32 to match with
    # the model's input data format.
    test_image = X[i: i + 1].astype('float32')
    interpreter.set_tensor(input_index, test_image)

    # Run inference.
    interpreter.invoke()

    # Post-processing: remove batch dimension and find the class with highest
    # probability.
    output = interpreter.tensor(output_index)
    predicted_label = np.argmax(output()[0])
    y_pred.append(predicted_label)

  # Compare prediction results with ground truth labels to calculate accuracy.
  y_pred = keras.utils.to_categorical(y_pred)
  return accuracy_score(y_true, y_pred)

# Evaluate the accuracy of the converted TFLite model
classifier_interpreter = tf.lite.Interpreter(model_content=tflite_model)
classifier_interpreter.allocate_tensors()
print('Accuracy of TFLite model: %s' %
      evaluate_model(classifier_interpreter, X_test, y_test))

이제 TFLite 모델( pose_classifier.tflite )과 레이블 파일( pose_labels.txt )을 다운로드하여 사용자 정의 포즈를 분류할 수 있습니다. TFLite 포즈 분류 모델을 사용하는 방법에 대한 종단 간 예제 는 Android 및 Python/Raspberry Pi 샘플 앱을 참조하세요.

In [ ]:

!zip pose_classifier.zip pose_labels.txt pose_classifier.tflite

In [ ]:

# Download the zip archive if running on Colab.
try:
  from google.colab import files
  files.download('pose_classifier.zip')
except:
  pass

Copyright 2021 The TensorFlow Authors.

MoveNet 및 TensorFlow Lite를 사용한 인간 포즈 분류

준비

MoveNet을 사용하여 포즈 추정을 실행하는 코드

1부: 입력 이미지 전처리

(선택 사항) 자신의 포즈 데이터 세트 업로드

요가 데이터 세트 다운로드

`TRAIN` 데이터세트 전처리

`TEST` 데이터세트 전처리

2부: 랜드마크 좌표를 입력으로 사용하고 예측된 레이블을 출력하는 포즈 분류 모델을 훈련합니다.

(선택 사항) 파트 1을 실행하지 않은 경우 전처리된 데이터 세트를 다운로드합니다.

사전 처리된 CSV를 `TRAIN` 및 `TEST` 데이터 세트에 로드합니다.

포즈 분류를 위해 포즈 랜드마크를 포즈 임베딩(특징 벡터라고도 함)으로 변환하는 함수 정의

포즈 분류를 위한 Keras 모델 정의

모델 성능을 더 잘 이해하기 위해 정오분류표 그리기

(선택 사항) 잘못된 예측 조사

3부: 포즈 분류 모델을 TensorFlow Lite로 변환

Product

Resources

Company

Copyright 2021 The TensorFlow Authors.

MoveNet 및 TensorFlow Lite를 사용한 인간 포즈 분류

준비

MoveNet을 사용하여 포즈 추정을 실행하는 코드

1부: 입력 이미지 전처리

(선택 사항) 자신의 포즈 데이터 세트 업로드

요가 데이터 세트 다운로드

TRAIN 데이터세트 전처리

TEST 데이터세트 전처리

2부: 랜드마크 좌표를 입력으로 사용하고 예측된 레이블을 출력하는 포즈 분류 모델을 훈련합니다.

(선택 사항) 파트 1을 실행하지 않은 경우 전처리된 데이터 세트를 다운로드합니다.

사전 처리된 CSV를 TRAIN 및 TEST 데이터 세트에 로드합니다.

포즈 분류를 위해 포즈 랜드마크를 포즈 임베딩(특징 벡터라고도 함)으로 변환하는 함수 정의

포즈 분류를 위한 Keras 모델 정의

모델 성능을 더 잘 이해하기 위해 정오분류표 그리기

(선택 사항) 잘못된 예측 조사

3부: 포즈 분류 모델을 TensorFlow Lite로 변환

`TRAIN` 데이터세트 전처리

`TEST` 데이터세트 전처리

사전 처리된 CSV를 `TRAIN` 및 `TEST` 데이터 세트에 로드합니다.