GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/ja/model_optimization/guide/quantization/training_example.ipynb
²⁵¹¹⁸ views

Kernel: Python 3

Copyright 2020 The TensorFlow Authors.

In [ ]:

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Keras の例による量子化認識トレーニング

概要

量子化認識トレーニングのエンドツーエンドの例へようこそ。

その他のページ

量子化認識トレーニングの紹介、および認識トレーニングを使用すべきかどうかの判定（サポート情報も含む）については、概要ページをご覧ください。

ユースケースに合った API を素早く特定するには（8 ビットのモデルの完全量子化を超えるユースケース）、総合ガイドをご覧ください。

要約

このチュートリアルでは、次について説明しています。

MNIST の tf.keras モデルを最初からトレーニングする。
量子化認識トレーニング API を適用してモデルをファインチューニングし、精度を確認して量子化認識モデルをエクスポートする。
このモデルを使用して、TFLite バックエンドのために実際に量子化されたモデルを作成する。
TFLite および 1/4 のモデルの精度の永続性を確認する。モバイルでのレイテンシーのメリットを確認するには、TFLite アプリリポジトリ内の TFLite の例を試してみてください。

セットアップ

In [ ]:

! pip install -q tensorflow
! pip install -q tensorflow-model-optimization

In [ ]:

import tempfile
import os

import tensorflow as tf

from tensorflow import keras

量子化認識トレーニングを使用せずに、MNIST のモデルをトレーニングする

In [ ]:

# Load MNIST dataset
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images = test_images / 255.0

# Define the model architecture.
model = keras.Sequential([
  keras.layers.InputLayer(input_shape=(28, 28)),
  keras.layers.Reshape(target_shape=(28, 28, 1)),
  keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),
  keras.layers.MaxPooling2D(pool_size=(2, 2)),
  keras.layers.Flatten(),
  keras.layers.Dense(10)
])

# Train the digit classification model
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(
  train_images,
  train_labels,
  epochs=1,
  validation_split=0.1,
)

量子化認識トレーニングを使用して、事前トレーニング済みモデルをクローンおよびファインチューニングする

モデルを定義する

量子化認識トレーニングをモデル全体に適用し、これをモデルの要約で確認します。すべてのレイヤーにプレフィックス "quant" が付いているはずです。

結果のモデルは量子化認識モデルですが、量子化はされていないので注意してください（例えば、重みは int8 ではなく float32 です）。次のセクションでは、量子化認識モデルから量子化モデルを作成する方法を示します。

総合ガイドでは、モデルの精度を改善するために、一部のレイヤーを量子化する方法をご覧いただけます。

In [ ]:

import tensorflow_model_optimization as tfmot

quantize_model = tfmot.quantization.keras.quantize_model

# q_aware stands for for quantization aware.
q_aware_model = quantize_model(model)

# `quantize_model` requires a recompile.
q_aware_model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

q_aware_model.summary()

モデルをベースラインに対してトレーニングおよび評価する

モデルを 1 エポックだけトレーニングした後のファインチューニングを実証するために、トレーニングデータのサブセットに対して、量子化認識トレーニングを使用してファインチューニングを行います。

In [ ]:

train_images_subset = train_images[0:1000] # out of 60000
train_labels_subset = train_labels[0:1000]

q_aware_model.fit(train_images_subset, train_labels_subset,
                  batch_size=500, epochs=1, validation_split=0.1)

この例では、ベースラインと比較し、量子化認識トレーニング後のテスト精度の損失は、最小限あるいはゼロです。

In [ ]:

_, baseline_model_accuracy = model.evaluate(
    test_images, test_labels, verbose=0)

_, q_aware_model_accuracy = q_aware_model.evaluate(
   test_images, test_labels, verbose=0)

print('Baseline test accuracy:', baseline_model_accuracy)
print('Quant test accuracy:', q_aware_model_accuracy)

TFLite バックエンドの量子化モデルを作成する

この後に、重み int8 と活性化関数 uint8 を持つ、実際に量子化されたモデルが出来上がります。

In [ ]:

converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]

quantized_tflite_model = converter.convert()

TF から TFLite への精度の永続性を確認する

テストデータセットで TFLite モデルを評価するヘルパー関数を定義します。

In [ ]:

import numpy as np

def evaluate_model(interpreter):
  input_index = interpreter.get_input_details()[0]["index"]
  output_index = interpreter.get_output_details()[0]["index"]

  # Run predictions on every image in the "test" dataset.
  prediction_digits = []
  for i, test_image in enumerate(test_images):
    if i % 1000 == 0:
      print('Evaluated on {n} results so far.'.format(n=i))
    # Pre-processing: add batch dimension and convert to float32 to match with
    # the model's input data format.
    test_image = np.expand_dims(test_image, axis=0).astype(np.float32)
    interpreter.set_tensor(input_index, test_image)

    # Run inference.
    interpreter.invoke()

    # Post-processing: remove batch dimension and find the digit with highest
    # probability.
    output = interpreter.tensor(output_index)
    digit = np.argmax(output()[0])
    prediction_digits.append(digit)

  print('\n')
  # Compare prediction results with ground truth labels to calculate accuracy.
  prediction_digits = np.array(prediction_digits)
  accuracy = (prediction_digits == test_labels).mean()
  return accuracy

量子化されたモデルを評価し、TensorFlow の精度が TFLite バックエンドに持続されていることを確認します。

In [ ]:

interpreter = tf.lite.Interpreter(model_content=quantized_tflite_model)
interpreter.allocate_tensors()

test_accuracy = evaluate_model(interpreter)

print('Quant TFLite test_accuracy:', test_accuracy)
print('Quant TF test accuracy:', q_aware_model_accuracy)

量子化でモデルが 1/4 になることを確認する

浮動小数点数の TFLite モデルを作成して、量子化された TFLite モデルが 1/4 になっていることを確認します。

In [ ]:

# Create float TFLite model.
float_converter = tf.lite.TFLiteConverter.from_keras_model(model)
float_tflite_model = float_converter.convert()

# Measure sizes of models.
_, float_file = tempfile.mkstemp('.tflite')
_, quant_file = tempfile.mkstemp('.tflite')

with open(quant_file, 'wb') as f:
  f.write(quantized_tflite_model)

with open(float_file, 'wb') as f:
  f.write(float_tflite_model)

print("Float model in Mb:", os.path.getsize(float_file) / float(2**20))
print("Quantized model in Mb:", os.path.getsize(quant_file) / float(2**20))

結論

このチュートリアルでは、TensorFlow Model Optimization Toolkit API を使用して量子化認識モデルを作成し、TFLite バックエンドの量子化モデルを作成する方法を紹介しました。

MNIST のモデルでは、精度の違いを最小限に抑えながらモデルサイズを 1/4 に圧縮できることを示しましたた。モバイルでのレイテンシーのメリットを確認するには、TFLite アプリリポジトリ内の TFLite の例を試してみてください。

この新しい機能をぜひお試しください。リソースが制限される環境でのデプロイにおいて、特に重要となります。