GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/ja/model_optimization/guide/combine/cqat_example.ipynb
²⁵¹¹⁸ views

Kernel: Python 3

In [ ]:

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

TensorFlow.org で表示

Google Colab で実行

GitHub でソースを表示

ノートブックをダウンロード

Keras でのクラスタ化を保持する量子化認識トレーニング（CQAT）の例

概要

このチュートリアルでは、TensorFlow モデル最適化ツールキットの協調最適化パイプラインの一部である**クラスタ化を保持する量子化認識トレーニング（CQAT）**API の使用法を実演します。

その他のページ

パイプラインとその他の利用可能な手法の概要については、協調最適化の概要ページを参照してください。

内容

チュートリアルでは、次について説明しています。

MNIST データセットの tf.keras モデルを最初からトレーニングする
クラスタ化でモデルを微調整し、精度を確認する。
QAT を適用し、クラスタの損失を観察する。
CQAT を適用し、前に適用されたクラスタ化が保持されていることを確認する。
TFLite モデルを生成し、それに CQAT を適用した場合の影響を観察する。
達成した CQAT モデルの精度を、ポストトレーニング量子化を使用して量子化されたモデルと比較する。

セットアップ

この Jupyter ノートブックは、ローカルの virtualenv または Colab で実行できます。依存関係のセットアップに関する詳細は、インストールガイドをご覧ください。

In [ ]:

! pip install -q tensorflow-model-optimization

In [ ]:

import tensorflow as tf

import numpy as np
import tempfile
import zipfile
import os

クラスタを使用せずに、MNIST の tf.keras モデルをトレーニングする

In [ ]:

# Load MNIST dataset
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images  = test_images / 255.0

model = tf.keras.Sequential([
  tf.keras.layers.InputLayer(input_shape=(28, 28)),
  tf.keras.layers.Reshape(target_shape=(28, 28, 1)),
  tf.keras.layers.Conv2D(filters=12, kernel_size=(3, 3),
                         activation=tf.nn.relu),
  tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(10)
])

# Train the digit classification model
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(
    train_images,
    train_labels,
    validation_split=0.1,
    epochs=10
)

ベースラインモデルを評価して後で使用できるように保存する

In [ ]:

_, baseline_model_accuracy = model.evaluate(
    test_images, test_labels, verbose=0)

print('Baseline test accuracy:', baseline_model_accuracy)

_, keras_file = tempfile.mkstemp('.h5')
print('Saving model to: ', keras_file)
tf.keras.models.save_model(model, keras_file, include_optimizer=False)

8 つのクラスタでモデルをクラスタ化して微調整する

cluster_weights() API を適用して、事前にトレーニングされたモデル全体をクラスタ化します。zip を適用し、モデルサイズを縮小する際に精度が効果的に維持されることを観察してください。API を使用して、モデルの精度を維持しながら最高の圧縮率を達成するための最適な方法については、クラスタ化総合ガイドを参照してください。

モデルを定義してクラスタリング API を適用する

クラスタリング API を使用する前に、モデルを事前にトレーニングする必要があります。

In [ ]:

import tensorflow_model_optimization as tfmot

cluster_weights = tfmot.clustering.keras.cluster_weights
CentroidInitialization = tfmot.clustering.keras.CentroidInitialization

clustering_params = {
  'number_of_clusters': 8,
  'cluster_centroids_init': CentroidInitialization.KMEANS_PLUS_PLUS,
  'cluster_per_channel': True,
}

clustered_model = cluster_weights(model, **clustering_params)

# Use smaller learning rate for fine-tuning
opt = tf.keras.optimizers.Adam(learning_rate=1e-5)

clustered_model.compile(
  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
  optimizer=opt,
  metrics=['accuracy'])

clustered_model.summary()

モデルを微調整し、ベースラインに対する精度を評価する

3 エポックのクラスタ化でモデルを微調整します。

In [ ]:

# Fine-tune model
clustered_model.fit(
  train_images,
  train_labels,
  epochs=3,
  validation_split=0.1)

モデルの各カーネルのクラスタ化の数を計算して出力するヘルパー関数を定義します。

In [ ]:

def print_model_weight_clusters(model):

    for layer in model.layers:
        if isinstance(layer, tf.keras.layers.Wrapper):
            weights = layer.trainable_weights
        else:
            weights = layer.weights
        for weight in weights:
            # ignore auxiliary quantization weights
            if "quantize_layer" in weight.name:
                continue
            if "kernel" in weight.name:
                unique_count = len(np.unique(weight))
                print(
                    f"{layer.name}/{weight.name}: {unique_count} clusters "
                )

モデルのカーネルが正しくクラスタ化されていることを確認します。最初にクラスタ化のラッパーを削除する必要があります。

In [ ]:

stripped_clustered_model = tfmot.clustering.keras.strip_clustering(clustered_model)

print_model_weight_clusters(stripped_clustered_model)

この例では、ベースラインと比較し、クラスタリング後のテスト精度に最小限の損失があります。

In [ ]:

_, clustered_model_accuracy = clustered_model.evaluate(
  test_images, test_labels, verbose=0)

print('Baseline test accuracy:', baseline_model_accuracy)
print('Clustered test accuracy:', clustered_model_accuracy)

QAT と CQAT を適用し、モデルクラスタへの影響を確認する

次に、クラスタ化されたモデルに QAT とクラスタ化を保持する QAT（CQAT）の両方を適用し、CQAT がクラスタ化されたモデルの重みクラスタを保持することを確認します。CQAT API を適用する前に、tfmot.clustering.keras.strip_clustering を使用してモデルからクラスタ化のラッパーを削除したことに注意してください。

In [ ]:

# QAT
qat_model = tfmot.quantization.keras.quantize_model(stripped_clustered_model)

qat_model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
print('Train qat model:')
qat_model.fit(train_images, train_labels, batch_size=128, epochs=1, validation_split=0.1)

# CQAT
quant_aware_annotate_model = tfmot.quantization.keras.quantize_annotate_model(
              stripped_clustered_model)
cqat_model = tfmot.quantization.keras.quantize_apply(
              quant_aware_annotate_model,
              tfmot.experimental.combine.Default8BitClusterPreserveQuantizeScheme())

cqat_model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
print('Train cqat model:')
cqat_model.fit(train_images, train_labels, batch_size=128, epochs=1, validation_split=0.1)

In [ ]:

print("QAT Model clusters:")
print_model_weight_clusters(qat_model)
print("CQAT Model clusters:")
print_model_weight_clusters(cqat_model)

CQAT モデルの圧縮の利点を確認する

zip 形式のモデルファイルを取得するためのヘルパー関数を定義します。

In [ ]:

def get_gzipped_model_size(file):
  # It returns the size of the gzipped model in kilobytes.

  _, zipped_file = tempfile.mkstemp('.zip')
  with zipfile.ZipFile(zipped_file, 'w', compression=zipfile.ZIP_DEFLATED) as f:
    f.write(file)

  return os.path.getsize(zipped_file)/1000

これは小規模なモデルであることに注意してください。クラスタ化と CQAT をより大規模な本番モデルに適用すると、より大きな圧縮が得られます。

In [ ]:

# QAT model
converter = tf.lite.TFLiteConverter.from_keras_model(qat_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
qat_tflite_model = converter.convert()
qat_model_file = 'qat_model.tflite'
# Save the model.
with open(qat_model_file, 'wb') as f:
    f.write(qat_tflite_model)
    
# CQAT model
converter = tf.lite.TFLiteConverter.from_keras_model(cqat_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
cqat_tflite_model = converter.convert()
cqat_model_file = 'cqat_model.tflite'
# Save the model.
with open(cqat_model_file, 'wb') as f:
    f.write(cqat_tflite_model)
    
print("QAT model size: ", get_gzipped_model_size(qat_model_file), ' KB')
print("CQAT model size: ", get_gzipped_model_size(cqat_model_file), ' KB')

TF から TFLite への精度の永続性を確認する

テストデータセットで TFLite モデルを評価するヘルパー関数を定義します。

In [ ]:

def eval_model(interpreter):
  input_index = interpreter.get_input_details()[0]["index"]
  output_index = interpreter.get_output_details()[0]["index"]

  # Run predictions on every image in the "test" dataset.
  prediction_digits = []
  for i, test_image in enumerate(test_images):
    if i % 1000 == 0:
      print(f"Evaluated on {i} results so far.")
    # Pre-processing: add batch dimension and convert to float32 to match with
    # the model's input data format.
    test_image = np.expand_dims(test_image, axis=0).astype(np.float32)
    interpreter.set_tensor(input_index, test_image)

    # Run inference.
    interpreter.invoke()

    # Post-processing: remove batch dimension and find the digit with highest
    # probability.
    output = interpreter.tensor(output_index)
    digit = np.argmax(output()[0])
    prediction_digits.append(digit)

  print('\n')
  # Compare prediction results with ground truth labels to calculate accuracy.
  prediction_digits = np.array(prediction_digits)
  accuracy = (prediction_digits == test_labels).mean()
  return accuracy

クラスタ化および量子化されたモデルを評価し、TensorFlow の精度が TFLite バックエンドに持続することを確認します。

In [ ]:

interpreter = tf.lite.Interpreter(cqat_model_file)
interpreter.allocate_tensors()

cqat_test_accuracy = eval_model(interpreter)

print('Clustered and quantized TFLite test_accuracy:', cqat_test_accuracy)
print('Clustered TF test accuracy:', clustered_model_accuracy)

ポストトレーニング量子化を適用し、CQAT モデルと比較する

次に、クラスタ化されたモデルでポストトレーニング量子化（微調整なし）を使用し、CQAT モデルと比較した精度を確認します。量子化モデルの精度を向上させるために CQAT を使用する必要がある理由がお分かりになると思います。MNIST モデルは非常に小規模でパラメータが多いため、違いはあまり目立たないかもしれません。

まず、最初の 1000 個のトレーニング画像からキャリブレーションデータセットのジェネレータを定義します。

In [ ]:

def mnist_representative_data_gen():
  for image in train_images[:1000]:  
    image = np.expand_dims(image, axis=0).astype(np.float32)
    yield [image]

モデルを定量化し、以前に取得した CQAT モデルと精度を比較します。微調整で量子化されたモデルは、より高い精度を実現することに注目してください。

In [ ]:

converter = tf.lite.TFLiteConverter.from_keras_model(stripped_clustered_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = mnist_representative_data_gen
post_training_tflite_model = converter.convert()
post_training_model_file = 'post_training_model.tflite'
# Save the model.
with open(post_training_model_file, 'wb') as f:
    f.write(post_training_tflite_model)
    
# Compare accuracy
interpreter = tf.lite.Interpreter(post_training_model_file)
interpreter.allocate_tensors()

post_training_test_accuracy = eval_model(interpreter)

print('CQAT TFLite test_accuracy:', cqat_test_accuracy)
print('Post-training (no fine-tuning) TF test accuracy:', post_training_test_accuracy)

結論

このチュートリアルでは、モデルを作成し、cluster_weights() API を使用してクラスタ化し、クラスタ化を保持する量子化認識トレーニング（CQAT）を適用して、QAT を使用する際にクラスタを保持する方法を学習しました。最終的な CQAT モデルを QAT モデルと比較し、クラスタが前者で保持され、後者で失われることを示しました。続いて、モデルを TFLite に変換して、連鎖クラスタリングと CQAT モデル最適化手法の圧縮の利点を示し、TFLite モデルを評価して、TFLite バックエンドで精度が維持されることを確認しました。最後に、CQAT モデルを、ポストトレーニング量子化 API を使用したクラスタ化された量子化モデルと比較し、CQAT を使用すると通常の量子化よりも精度の低下を抑えられることを示しました。