tensorflow

GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/ko/model_optimization/guide/combine/pcqat_example.ipynb
²⁵¹¹⁸ views

Kernel: Python 3

In [ ]:

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

양자화 인식 훈련을 보존하는 희소성 및 클러스터(PCQAT) Keras 예시

개요

이것은 TensorFlow 모델 최적화 툴킷의 공동 작업 최적화 파이프라인의 일부인 양자화 인식 훈련을 보존하는 희소성 및 클러스터(PCQAT) API의 사용을 보여주는 엔드 투 엔드 예시입니다.

다른 페이지

파이프라인 및 기타 가능한 기술에 대한 지침의 경우, 공동 협력 최적화 개요 페이지를 참조합니다.

내용

이 튜토리얼에서는 다음을 수행합니다.

MNIST 데이터세트를 위한 tf.keras 모델을 처음부터 훈련합니다.
잘라내기로 모델을 미세 조정하고 정확성을 확인하며 해당 모델이 성공적으로 잘라내기 되었는지 관찰합니다.
잘라내기 된 모델에 클러스터링을 유지하는 희소성을 적용하고 이전에 적용된 희소성이 유지되었는지 관찰합니다.
QAT를 적용하고 희소성 및 클러스터 손실을 관찰합니다.
PCQAT를 적용하고 이전에 적용된 희소성과 클러스터링이 모두 유지되었는지 관찰합니다.
Generate a TFLite 모델을 생성하고 이에 PCQAT를 적용한 효과를 관찰합니다.
여러 모델의 규모를 비교하여 희소성 적용의 압축 이점에 이어 클러스터링 및 PCQAT를 보존하는 희소성의 공동 협력 최적화 기술을 관찰합니다.
최적화되지 않은 기준 모델 정확성으로 완전히 최적화된 모델의 정확성을 비교합니다.

설정

이 Jupyter 노트북은 로컬 virtualenv 또는 colab에서 실행할 수 있습니다. 종속성 설정에 대한 자세한 내용은 설치 가이드를 참조하세요.

In [ ]:

! pip install -q tensorflow-model-optimization

In [ ]:

import tensorflow as tf

import numpy as np
import tempfile
import zipfile
import os

잘라내기 되고 클러스터링 될 MNIST용 tf.keras 모델 훈련하기

In [ ]:

# Load MNIST dataset
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images  = test_images / 255.0

model = tf.keras.Sequential([
  tf.keras.layers.InputLayer(input_shape=(28, 28)),
  tf.keras.layers.Reshape(target_shape=(28, 28, 1)),
  tf.keras.layers.Conv2D(filters=12, kernel_size=(3, 3),
                         activation=tf.nn.relu),
  tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(10)
])

opt = tf.keras.optimizers.Adam(learning_rate=1e-3)

# Train the digit classification model
model.compile(optimizer=opt,
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(
    train_images,
    train_labels,
    validation_split=0.1,
    epochs=10
)

기준 모델을 평가하고 나중에 사용할 수 있도록 저장하기

In [ ]:

_, baseline_model_accuracy = model.evaluate(
    test_images, test_labels, verbose=0)

print('Baseline test accuracy:', baseline_model_accuracy)

_, keras_file = tempfile.mkstemp('.h5')
print('Saving model to: ', keras_file)
tf.keras.models.save_model(model, keras_file, include_optimizer=False)

50%의 희소성으로 모델 잘라내기 및 미세 조정하기

prune_low_magnitude() API를 적용하여 다음 단계에 클러스트 될 잘라내기 된 모델을 구축합니다. 잘라내기 API에 대한 더욱 자세한 정보는 잘라내기 종합 가이드를 참조합니다.

모델 정의 및 희소성 API 적용하기

사전 훈련된 모델이 사용된 점을 참조하십시오.

In [ ]:

import tensorflow_model_optimization as tfmot

prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude

pruning_params = {
      'pruning_schedule': tfmot.sparsity.keras.ConstantSparsity(0.5, begin_step=0, frequency=100)
  }

callbacks = [
  tfmot.sparsity.keras.UpdatePruningStep()
]

pruned_model = prune_low_magnitude(model, **pruning_params)

# Use smaller learning rate for fine-tuning
opt = tf.keras.optimizers.Adam(learning_rate=1e-5)

pruned_model.compile(
  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
  optimizer=opt,
  metrics=['accuracy'])

모델을 미세 조정하고, 희소성을 확인하고, 기준 대비 정확성 평가하기

3번의 epochs 동안 잘라내기로 모델을 미세 조정합니다.

In [ ]:

# Fine-tune model
pruned_model.fit(
  train_images,
  train_labels,
  epochs=3,
  validation_split=0.1,
  callbacks=callbacks)

도우미 함수를 정의하여 모델의 희소성 및 클러스트를 계산하고 프린트합니다.

In [ ]:

def print_model_weights_sparsity(model):
    for layer in model.layers:
        if isinstance(layer, tf.keras.layers.Wrapper):
            weights = layer.trainable_weights
        else:
            weights = layer.weights
        for weight in weights:
            if "kernel" not in weight.name or "centroid" in weight.name:
                continue
            weight_size = weight.numpy().size
            zero_num = np.count_nonzero(weight == 0)
            print(
                f"{weight.name}: {zero_num/weight_size:.2%} sparsity ",
                f"({zero_num}/{weight_size})",
            )

def print_model_weight_clusters(model):
    for layer in model.layers:
        if isinstance(layer, tf.keras.layers.Wrapper):
            weights = layer.trainable_weights
        else:
            weights = layer.weights
        for weight in weights:
            # ignore auxiliary quantization weights
            if "quantize_layer" in weight.name:
                continue
            if "kernel" in weight.name:
                unique_count = len(np.unique(weight))
                print(
                    f"{layer.name}/{weight.name}: {unique_count} clusters "
                )

잘라내기 래퍼를 우선 분해한 다음 모델 커널이 올바르게 잘라내기 되었는지 확인합니다.

In [ ]:

stripped_pruned_model = tfmot.sparsity.keras.strip_pruning(pruned_model)

print_model_weights_sparsity(stripped_pruned_model)

클러스터링을 유지하는 희소성을 적용하고 두 사례에서 모델 희소성에 대한 효과 확인하기

다음으로, 잘라내기 된 모델에 클러스터링을 유지하는 희소성을 적용하고 클러스터의 수를 관찰하고 희소성이 유지되었는지 관찰합니다.

In [ ]:

import tensorflow_model_optimization as tfmot
from tensorflow_model_optimization.python.core.clustering.keras.experimental import (
    cluster,
)

cluster_weights = tfmot.clustering.keras.cluster_weights
CentroidInitialization = tfmot.clustering.keras.CentroidInitialization

cluster_weights = cluster.cluster_weights

clustering_params = {
  'number_of_clusters': 8,
  'cluster_centroids_init': CentroidInitialization.KMEANS_PLUS_PLUS,
  'preserve_sparsity': True
}

sparsity_clustered_model = cluster_weights(stripped_pruned_model, **clustering_params)

sparsity_clustered_model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

print('Train sparsity preserving clustering model:')
sparsity_clustered_model.fit(train_images, train_labels,epochs=3, validation_split=0.1)

클러스터링 래퍼를 우선 분해한 다음 모델이 올바르게 잘라내기 되고 클러스터링 되었는지 확인합니다.

In [ ]:

stripped_clustered_model = tfmot.clustering.keras.strip_clustering(sparsity_clustered_model)

print("Model sparsity:\n")
print_model_weights_sparsity(stripped_clustered_model)

print("\nModel clusters:\n")
print_model_weight_clusters(stripped_clustered_model)

QAT 및 PCQAT를 적용하고 모델 클러스터 및 희소성에 대한 효과 확인하기

다음으로, 희소 클러스터링 된 모델에 QAT 및 PCQAT를 모두 적용하고 PCAQT가 가중치 희소성과 클러스터를 모델에서 유지하는지 관찰합니다. 분해된 모델은 QAT 및 PCQAT API로 전달된다는 점을 참조합니다.

In [ ]:

# QAT
qat_model = tfmot.quantization.keras.quantize_model(stripped_clustered_model)

qat_model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
print('Train qat model:')
qat_model.fit(train_images, train_labels, batch_size=128, epochs=1, validation_split=0.1)

# PCQAT
quant_aware_annotate_model = tfmot.quantization.keras.quantize_annotate_model(
              stripped_clustered_model)
pcqat_model = tfmot.quantization.keras.quantize_apply(
              quant_aware_annotate_model,
              tfmot.experimental.combine.Default8BitClusterPreserveQuantizeScheme(preserve_sparsity=True))

pcqat_model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
print('Train pcqat model:')
pcqat_model.fit(train_images, train_labels, batch_size=128, epochs=1, validation_split=0.1)

In [ ]:

print("QAT Model clusters:")
print_model_weight_clusters(qat_model)
print("\nQAT Model sparsity:")
print_model_weights_sparsity(qat_model)
print("\nPCQAT Model clusters:")
print_model_weight_clusters(pcqat_model)
print("\nPCQAT Model sparsity:")
print_model_weights_sparsity(pcqat_model)

PCQAT 모델의 압축 이점 확인

도우미 함수를 정의하여 압축된 모델 파일을 얻습니다.

In [ ]:

def get_gzipped_model_size(file):
  # It returns the size of the gzipped model in kilobytes.

  _, zipped_file = tempfile.mkstemp('.zip')
  with zipfile.ZipFile(zipped_file, 'w', compression=zipfile.ZIP_DEFLATED) as f:
    f.write(file)

  return os.path.getsize(zipped_file)/1000

모델에 희소성, 클러스터링 및 PCQAT를 적용하여 상당한 압축 이점을 얻는 것을 관찰합니다.

In [ ]:

# QAT model
converter = tf.lite.TFLiteConverter.from_keras_model(qat_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
qat_tflite_model = converter.convert()
qat_model_file = 'qat_model.tflite'
# Save the model.
with open(qat_model_file, 'wb') as f:
    f.write(qat_tflite_model)

# PCQAT model
converter = tf.lite.TFLiteConverter.from_keras_model(pcqat_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
pcqat_tflite_model = converter.convert()
pcqat_model_file = 'pcqat_model.tflite'
# Save the model.
with open(pcqat_model_file, 'wb') as f:
    f.write(pcqat_tflite_model)

print("QAT model size: ", get_gzipped_model_size(qat_model_file), ' KB')
print("PCQAT model size: ", get_gzipped_model_size(pcqat_model_file), ' KB')

TF에서 TFLite로 정확성이 지속되는지 확인하기

테스트 데이터세트에서 TFLite 모델을 평가하는 도우미 함수를 정의합니다.

In [ ]:

def eval_model(interpreter):
  input_index = interpreter.get_input_details()[0]["index"]
  output_index = interpreter.get_output_details()[0]["index"]

  # Run predictions on every image in the "test" dataset.
  prediction_digits = []
  for i, test_image in enumerate(test_images):
    if i % 1000 == 0:
      print(f"Evaluated on {i} results so far.")
    # Pre-processing: add batch dimension and convert to float32 to match with
    # the model's input data format.
    test_image = np.expand_dims(test_image, axis=0).astype(np.float32)
    interpreter.set_tensor(input_index, test_image)

    # Run inference.
    interpreter.invoke()

    # Post-processing: remove batch dimension and find the digit with highest
    # probability.
    output = interpreter.tensor(output_index)
    digit = np.argmax(output()[0])
    prediction_digits.append(digit)

  print('\n')
  # Compare prediction results with ground truth labels to calculate accuracy.
  prediction_digits = np.array(prediction_digits)
  accuracy = (prediction_digits == test_labels).mean()
  return accuracy

잘라내기 되고, 클러스터링 되고 양자화된 모델을 평가한 다음 TFLite 백엔드에서 TensorFlow의 정확성이 유지되는지 확인합니다.

In [ ]:

interpreter = tf.lite.Interpreter(pcqat_model_file)
interpreter.allocate_tensors()

pcqat_test_accuracy = eval_model(interpreter)

print('Pruned, clustered and quantized TFLite test_accuracy:', pcqat_test_accuracy)
print('Baseline TF test accuracy:', baseline_model_accuracy)

결론

튜토리얼에서 모델을 생성하는 법과 prune_low_magnitude() API를 사용하여 이를 잘라내고 cluster_weights() API를 사용하여 클러스터링을 유지하는 희소성을 적용하여 가중치를 클러스터링 하면서 희소성을 유지하는 법을 학습했습니다.

다음으로, 양자화 인식 훈련(PCQAT)를 유지하는 희소성 및 클러스터는 QAT를 사용하는 동안 적용되어 모델 희소성 및 클러스터를 유지했습니다. 최종 PCQAT 모델은 QAT와 비교되어 희소성과 클러스터는 전자의 경우 보존되고 후자의 경우 소실되는 것을 보여줍니다.

다음으로, 모델은 TFLite로 변환되어 희소성 연결, 클러스터링 및 PCQAT 모델 최적화 기술의 압축 이점을 보여주었으며 TFLite 모델은 평가되어 TFLite 백엔드에서 정확성이 유지되도록 했습니다.

마지막으로, PCQAT TFLite 모델 정확성은 사전 최적화 기준 모델 정확성과 비교되어 공동 협력 최적화 기술이 기존의 모델과 유사한 정확성을 유지하면서 압축 이점을 가까스로 달성하는 것을 보여주었습니다.