GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/ja/model_optimization/guide/combine/pqat_example.ipynb
²⁵¹¹⁸ views

Kernel: Python 3

In [ ]:

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

TensorFlow.org で表示

Google Colab で実行

GitHub でソースを表示

ノートブックをダウンロード

Keras でのプルーニングを保持する量子化認識トレーニング（PQAT）の例

概要

このチュートリアルでは、TensorFlow モデル最適化ツールキットの協調最適化パイプラインの一部であるプルーニングを保持する量子化認識トレーニング (PQAT) API の使用法を実演します。

その他のページ

パイプラインの概要とその他の利用可能な手法については、協調最適化の概要ページを参照してください。

内容

チュートリアルでは、次について説明しています。

MNIST データセットの tf.keras モデルを最初からトレーニングする。
スパース性 API を適用してモデルをプルーニングで微調整し、精度を確認する。
QAT を適用し、スパース性の損失を観察します。
PQAT を適用し、前に適用されたスパース性が保持されていることを確認する。
TFLite モデルを生成し、それに PQAT を適用した場合の影響を観察する。
達成した PQAT モデルの精度を、ポストトレーニング量子化を使用して量子化されたモデルと比較する。

セットアップ

この Jupyter ノートブックは、ローカルの virtualenv または Colab で実行できます。依存関係のセットアップに関する詳細は、インストールガイドをご覧ください。

In [ ]:

! pip install -q tensorflow-model-optimization

In [ ]:

import tensorflow as tf

import numpy as np
import tempfile
import zipfile
import os

プルーニングを使用せずに、MNIST の tf.keras モデルをトレーニングする

In [ ]:

# Load MNIST dataset
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images  = test_images / 255.0

model = tf.keras.Sequential([
  tf.keras.layers.InputLayer(input_shape=(28, 28)),
  tf.keras.layers.Reshape(target_shape=(28, 28, 1)),
  tf.keras.layers.Conv2D(filters=12, kernel_size=(3, 3),
                         activation=tf.nn.relu),
  tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(10)
])

# Train the digit classification model
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(
    train_images,
    train_labels,
    validation_split=0.1,
    epochs=10
)

ベースラインモデルを評価して後で使用できるように保存する

In [ ]:

_, baseline_model_accuracy = model.evaluate(
    test_images, test_labels, verbose=0)

print('Baseline test accuracy:', baseline_model_accuracy)

_, keras_file = tempfile.mkstemp('.h5')
print('Saving model to: ', keras_file)
tf.keras.models.save_model(model, keras_file, include_optimizer=False)

モデルをプルーニングし、50％のスパース性に微調整する

prune_low_magnitude() API を適用して、事前にトレーニングされたモデル全体をプルーニングします。zip を適用し、モデルサイズを縮小する際に精度が効果的に維持されることを観察してください。API を使用して、モデルの精度を維持しながら最高の圧縮率を達成するための最適な方法については、プルーニング総合ガイドを参照してください。

モデルを定義してスパース性 API を適用する

スパース性 API を使用する前に、モデルを事前にトレーニングする必要があります。

In [ ]:

import tensorflow_model_optimization as tfmot

prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude

pruning_params = {
      'pruning_schedule': tfmot.sparsity.keras.ConstantSparsity(0.5, begin_step=0, frequency=100)
  }

callbacks = [
  tfmot.sparsity.keras.UpdatePruningStep()
]

pruned_model = prune_low_magnitude(model, **pruning_params)

# Use smaller learning rate for fine-tuning
opt = tf.keras.optimizers.Adam(learning_rate=1e-5)

pruned_model.compile(
  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
  optimizer=opt,
  metrics=['accuracy'])

pruned_model.summary()

モデルを微調整し、ベースラインに対する精度を評価する

3 エポックのプルーニングでモデルを微調整します。

In [ ]:

# Fine-tune model
pruned_model.fit(
  train_images,
  train_labels,
  epochs=3,
  validation_split=0.1,
  callbacks=callbacks)

モデルのスパース性を計算して出力するヘルパー関数を定義します。

In [ ]:

def print_model_weights_sparsity(model):

    for layer in model.layers:
        if isinstance(layer, tf.keras.layers.Wrapper):
            weights = layer.trainable_weights
        else:
            weights = layer.weights
        for weight in weights:
            # ignore auxiliary quantization weights
            if "quantize_layer" in weight.name:
                continue
            weight_size = weight.numpy().size
            zero_num = np.count_nonzero(weight == 0)
            print(
                f"{weight.name}: {zero_num/weight_size:.2%} sparsity ",
                f"({zero_num}/{weight_size})",
            )

モデルが正しくプルーニングされていることを確認します。最初にプルーニングラッパーを削除する必要があります。

In [ ]:

stripped_pruned_model = tfmot.sparsity.keras.strip_pruning(pruned_model)

print_model_weights_sparsity(stripped_pruned_model)

この例では、ベースラインと比較し、プルーニング後のテスト精度に最小限の損失があります。

In [ ]:

_, pruned_model_accuracy = pruned_model.evaluate(
  test_images, test_labels, verbose=0)

print('Baseline test accuracy:', baseline_model_accuracy)
print('Pruned test accuracy:', pruned_model_accuracy)

QAT と PQAT を適用し、モデルクラスタへの影響を確認する

次に、プルーニングされたモデルに QAT とプルーニングを保持する QAT（PQAT）の両方を適用し、PQAT がプルーニングされたモデルのスパース性を保持することを確認します。PQAT API を適用する前に、tfmot.sparsity.keras.strip_pruning を使用してモデルからプルーニングのラッパーを削除したことに注意してください。

In [ ]:

# QAT
qat_model = tfmot.quantization.keras.quantize_model(stripped_pruned_model)

qat_model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
print('Train qat model:')
qat_model.fit(train_images, train_labels, batch_size=128, epochs=1, validation_split=0.1)

# PQAT
quant_aware_annotate_model = tfmot.quantization.keras.quantize_annotate_model(
              stripped_pruned_model)
pqat_model = tfmot.quantization.keras.quantize_apply(
              quant_aware_annotate_model,
              tfmot.experimental.combine.Default8BitPrunePreserveQuantizeScheme())

pqat_model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
print('Train pqat model:')
pqat_model.fit(train_images, train_labels, batch_size=128, epochs=1, validation_split=0.1)

In [ ]:

print("QAT Model sparsity:")
print_model_weights_sparsity(qat_model)
print("PQAT Model sparsity:")
print_model_weights_sparsity(pqat_model)

PQAT モデルの圧縮のメリットを確認する

zip 形式のモデルファイルを取得するためのヘルパー関数を定義します。

In [ ]:

def get_gzipped_model_size(file):
  # It returns the size of the gzipped model in kilobytes.

  _, zipped_file = tempfile.mkstemp('.zip')
  with zipfile.ZipFile(zipped_file, 'w', compression=zipfile.ZIP_DEFLATED) as f:
    f.write(file)

  return os.path.getsize(zipped_file)/1000

これは小規模なモデルなので、2 つのモデルの違いはあまり目立ちません。より大きな本番モデルにプルーニングと PQAT を適用すると、より大きな圧縮が得られます。

In [ ]:

# QAT model
converter = tf.lite.TFLiteConverter.from_keras_model(qat_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
qat_tflite_model = converter.convert()
qat_model_file = 'qat_model.tflite'
# Save the model.
with open(qat_model_file, 'wb') as f:
    f.write(qat_tflite_model)
    
# PQAT model
converter = tf.lite.TFLiteConverter.from_keras_model(pqat_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
pqat_tflite_model = converter.convert()
pqat_model_file = 'pqat_model.tflite'
# Save the model.
with open(pqat_model_file, 'wb') as f:
    f.write(pqat_tflite_model)
    
print("QAT model size: ", get_gzipped_model_size(qat_model_file), ' KB')
print("PQAT model size: ", get_gzipped_model_size(pqat_model_file), ' KB')

TF から TFLite への精度の永続性を確認する

テストデータセットで TFLite モデルを評価するヘルパー関数を定義します。

In [ ]:

def eval_model(interpreter):
  input_index = interpreter.get_input_details()[0]["index"]
  output_index = interpreter.get_output_details()[0]["index"]

  # Run predictions on every image in the "test" dataset.
  prediction_digits = []
  for i, test_image in enumerate(test_images):
    if i % 1000 == 0:
      print(f"Evaluated on {i} results so far.")
    # Pre-processing: add batch dimension and convert to float32 to match with
    # the model's input data format.
    test_image = np.expand_dims(test_image, axis=0).astype(np.float32)
    interpreter.set_tensor(input_index, test_image)

    # Run inference.
    interpreter.invoke()

    # Post-processing: remove batch dimension and find the digit with highest
    # probability.
    output = interpreter.tensor(output_index)
    digit = np.argmax(output()[0])
    prediction_digits.append(digit)

  print('\n')
  # Compare prediction results with ground truth labels to calculate accuracy.
  prediction_digits = np.array(prediction_digits)
  accuracy = (prediction_digits == test_labels).mean()
  return accuracy

プルーニングおよび量子化されたモデルを評価し、TensorFlow の精度が TFLite バックエンドに持続することを確認します。

In [ ]:

interpreter = tf.lite.Interpreter(pqat_model_file)
interpreter.allocate_tensors()

pqat_test_accuracy = eval_model(interpreter)

print('Pruned and quantized TFLite test_accuracy:', pqat_test_accuracy)
print('Pruned TF test accuracy:', pruned_model_accuracy)

ポストトレーニング量子化を適用し、PQAT モデルと比較する

次に、プルーニングされたモデルで通常のポストトレーニング量子化（微調整なし）を使用し、PQAT モデルと比較した精度を確認します。量子化モデルの精度を向上させるために PQAT を使用する必要がある理由がお分かりになると思います。

まず、最初の 1000 個のトレーニング画像からキャリブレーションデータセットのジェネレータを定義します。

In [ ]:

def mnist_representative_data_gen():
  for image in train_images[:1000]:  
    image = np.expand_dims(image, axis=0).astype(np.float32)
    yield [image]

モデルを量子化し、以前に取得した CPAT モデルと精度を比較します。微調整で量子化されたモデルは、より高い精度を実現することに注目してください。

In [ ]:

converter = tf.lite.TFLiteConverter.from_keras_model(stripped_pruned_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = mnist_representative_data_gen
post_training_tflite_model = converter.convert()
post_training_model_file = 'post_training_model.tflite'
# Save the model.
with open(post_training_model_file, 'wb') as f:
    f.write(post_training_tflite_model)
    
# Compare accuracy
interpreter = tf.lite.Interpreter(post_training_model_file)
interpreter.allocate_tensors()

post_training_test_accuracy = eval_model(interpreter)

print('PQAT TFLite test_accuracy:', pqat_test_accuracy)
print('Post-training (no fine-tuning) TF test accuracy:', post_training_test_accuracy)

結論

このチュートリアルでは、モデルを作成し、スパース性 API を使用してプルーニングし、スパース性を保持する量子化認識トレーニング（PQAT）を適用して、QAT を使用する際にスパース性を保持する方法を学習しました。最終的な PQAT モデルを QAT モデルと比較し、スパース性が前者で保持され、後者で失われることを示しました。続いて、モデルを TFLite に変換して、連鎖クラスタリングと PQAT モデル最適化手法の圧縮の利点を示し、TFLite モデルを評価して、TFLite バックエンドで精度が維持されることを確認しました。最後に、PQAT モデルを、ポストトレーニング量子化 API を使用したプルーニングされた量子化モデルと比較し、PQAT を使用すると通常の量子化よりも精度の低下を抑えられることを示しました。

Keras でのプルーニングを保持する量子化認識トレーニング（PQAT）の例

概要

その他のページ

内容

セットアップ

プルーニングを使用せずに、MNIST の tf.keras モデルをトレーニングする

ベースラインモデルを評価して後で使用できるように保存する

モデルをプルーニングし、50％のスパース性に微調整する

モデルを定義してスパース性 API を適用する

モデルを微調整し、ベースラインに対する精度を評価する

QAT と PQAT を適用し、モデルクラスタへの影響を確認する

PQAT モデルの圧縮のメリットを確認する

TF から TFLite への精度の永続性を確認する

ポストトレーニング量子化を適用し、PQAT モデルと比較する

結論

Product

Resources

Company

Keras でのプルーニングを保持する量子化認識トレーニング（PQAT）の例

概要

その他のページ

内容

セットアップ

プルーニングを使用せずに、MNIST の tf.keras モデルをトレーニングする

ベースラインモデルを評価して後で使用できるように保存する

モデルをプルーニングし、50％ のスパース性に微調整する

モデルを定義してスパース性 API を適用する

モデルを微調整し、ベースラインに対する精度を評価する

QAT と PQAT を適用し、モデルクラスタへの影響を確認する

PQAT モデルの圧縮のメリットを確認する

TF から TFLite への精度の永続性を確認する

ポストトレーニング量子化を適用し、PQAT モデルと比較する

結論

モデルをプルーニングし、50％のスパース性に微調整する