GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/ko/model_optimization/guide/combine/pqat_example.ipynb
²⁵¹¹⁸ views

Kernel: Python 3

In [ ]:

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

TensorFlow.org에서보기

Google Colab에서 실행하기

GitHub에서 소스 보기

노트북 다운로드하기

양자화 인식 훈련을 보존하는 잘라내기(PQAT) Keras 예시

개요

이것은 TensorFlow 모델 최적화 툴킷의 공동 작업 최적화 파이프라인의 일부인 양자화 인식 훈련을 보존하는 잘라내기(PQAT) API의 사용을 보여주는 엔드 투 엔드 예시입니다.

다른 페이지

파이프라인 및 기타 가능한 기술에 대한 지침의 경우, 공동 협력 최적화 개요 페이지를 참조합니다.

내용

이 튜토리얼에서는 다음을 수행합니다.

MNIST 데이터세트를 위한 tf.keras 모델을 처음부터 훈련합니다.
잘라내기로 모델을 미세 조정하고, 희소성 API를 사용하고 정확성을 확인합니다.
QAT를 적용하고 희소성 손실을 관찰합니다.
PQAT를 적용하고 이전에 적용된 희소성이 유지되었는지 관찰합니다.
Generate a TFLite 모델을 생성하고 이에 PQAT를 적용한 효과를 관찰합니다.
사전 훈련 양자화를 사용해 양자화된 모델로 구축된 PQAT 모델 정확승을 비교합니다.

설정

이 Jupyter 노트북은 로컬 virtualenv 또는 colab에서 실행할 수 있습니다. 종속성 설정에 대한 자세한 내용은 설치 가이드를 참조하세요.

In [ ]:

! pip install -q tensorflow-model-optimization

In [ ]:

import tensorflow as tf

import numpy as np
import tempfile
import zipfile
import os

잘라내기 없이 MNIST용 tf.keras 모델 훈련하기

In [ ]:

# Load MNIST dataset
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images  = test_images / 255.0

model = tf.keras.Sequential([
  tf.keras.layers.InputLayer(input_shape=(28, 28)),
  tf.keras.layers.Reshape(target_shape=(28, 28, 1)),
  tf.keras.layers.Conv2D(filters=12, kernel_size=(3, 3),
                         activation=tf.nn.relu),
  tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(10)
])

# Train the digit classification model
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(
    train_images,
    train_labels,
    validation_split=0.1,
    epochs=10
)

기준 모델을 평가하고 나중에 사용할 수 있도록 저장하기

In [ ]:

_, baseline_model_accuracy = model.evaluate(
    test_images, test_labels, verbose=0)

print('Baseline test accuracy:', baseline_model_accuracy)

_, keras_file = tempfile.mkstemp('.h5')
print('Saving model to: ', keras_file)
tf.keras.models.save_model(model, keras_file, include_optimizer=False)

50%의 희소성으로 모델 잘라내기 및 미세 조정하기

prune_low_magnitude() API를 적용하여 사전 훈련된 전체 모델을 잘라내어 정확성을 유지하면서 압축 적용 시 모델 규모를 축소함에 있어서 효과를 묘사하고 관찰합니다. 대상 정확성을 유지하면서 최적의 압축률을 구축하는 API를 사용하는 최적의 방법은 잘라내기 종합 가이드를 참조하세요.

모델 정의 및 희소성 API 적용하기

모델은 희소성 API를 사용하기 전에 사전 훈련이 필요합니다.

In [ ]:

import tensorflow_model_optimization as tfmot

prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude

pruning_params = {
      'pruning_schedule': tfmot.sparsity.keras.ConstantSparsity(0.5, begin_step=0, frequency=100)
  }

callbacks = [
  tfmot.sparsity.keras.UpdatePruningStep()
]

pruned_model = prune_low_magnitude(model, **pruning_params)

# Use smaller learning rate for fine-tuning
opt = tf.keras.optimizers.Adam(learning_rate=1e-5)

pruned_model.compile(
  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
  optimizer=opt,
  metrics=['accuracy'])

pruned_model.summary()

모델을 미세 조정하고 기준 대비 정확성 평가하기

3번의 epochs 동안 잘라내기로 모델을 미세 조정합니다.

In [ ]:

# Fine-tune model
pruned_model.fit(
  train_images,
  train_labels,
  epochs=3,
  validation_split=0.1,
  callbacks=callbacks)

도우미 함수를 정의하여 모델의 희소성을 계산하고 프린트합니다.

In [ ]:

def print_model_weights_sparsity(model):

    for layer in model.layers:
        if isinstance(layer, tf.keras.layers.Wrapper):
            weights = layer.trainable_weights
        else:
            weights = layer.weights
        for weight in weights:
            # ignore auxiliary quantization weights
            if "quantize_layer" in weight.name:
                continue
            weight_size = weight.numpy().size
            zero_num = np.count_nonzero(weight == 0)
            print(
                f"{weight.name}: {zero_num/weight_size:.2%} sparsity ",
                f"({zero_num}/{weight_size})",
            )

모델이 올바르게 잘라졌는지 확인합니다. 잘라내기 래퍼를 우선 분해해야 합니다.

In [ ]:

stripped_pruned_model = tfmot.sparsity.keras.strip_pruning(pruned_model)

print_model_weights_sparsity(stripped_pruned_model)

이 예제의 경우, 기준선과 비교하여 잘라낸 후 테스트 정확성의 손실이 최소화됩니다.

In [ ]:

_, pruned_model_accuracy = pruned_model.evaluate(
  test_images, test_labels, verbose=0)

print('Baseline test accuracy:', baseline_model_accuracy)
print('Pruned test accuracy:', pruned_model_accuracy)

QAT 및 PQAT를 적용하고 두 사례에서 모델 희소성에 대한 효과 확인하기

다음으로, 잘라내기 된 모델에 QAT와 잘라내기-보존 QAT(PQAT)를 모두 적용하고 잘라내기 된 모델에 PQAT가 희소성을 유지하는지 관찰합니다. PQAT API를 적용하기 전 tfmot.sparsity.keras.strip_pruning로 잘라내기 된 모델의 잘라내기 래퍼를 분해했다는 점을 참고합니다.

In [ ]:

# QAT
qat_model = tfmot.quantization.keras.quantize_model(stripped_pruned_model)

qat_model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
print('Train qat model:')
qat_model.fit(train_images, train_labels, batch_size=128, epochs=1, validation_split=0.1)

# PQAT
quant_aware_annotate_model = tfmot.quantization.keras.quantize_annotate_model(
              stripped_pruned_model)
pqat_model = tfmot.quantization.keras.quantize_apply(
              quant_aware_annotate_model,
              tfmot.experimental.combine.Default8BitPrunePreserveQuantizeScheme())

pqat_model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
print('Train pqat model:')
pqat_model.fit(train_images, train_labels, batch_size=128, epochs=1, validation_split=0.1)

In [ ]:

print("QAT Model sparsity:")
print_model_weights_sparsity(qat_model)
print("PQAT Model sparsity:")
print_model_weights_sparsity(pqat_model)

PCQAT 모델의 압축 이점 확인

도우미 함수를 정의하여 압축된 모델 파일을 얻습니다.

In [ ]:

def get_gzipped_model_size(file):
  # It returns the size of the gzipped model in kilobytes.

  _, zipped_file = tempfile.mkstemp('.zip')
  with zipfile.ZipFile(zipped_file, 'w', compression=zipfile.ZIP_DEFLATED) as f:
    f.write(file)

  return os.path.getsize(zipped_file)/1000

소규모 모델이기 때문에, 두 모델 간의 차이는 매우 현저하지 않습니다. 더 큰 규모의 운영 모델에 잘라내기와 PQAT를 적용하면 더욱 크게 압축할 수 있습니다.

In [ ]:

# QAT model
converter = tf.lite.TFLiteConverter.from_keras_model(qat_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
qat_tflite_model = converter.convert()
qat_model_file = 'qat_model.tflite'
# Save the model.
with open(qat_model_file, 'wb') as f:
    f.write(qat_tflite_model)
    
# PQAT model
converter = tf.lite.TFLiteConverter.from_keras_model(pqat_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
pqat_tflite_model = converter.convert()
pqat_model_file = 'pqat_model.tflite'
# Save the model.
with open(pqat_model_file, 'wb') as f:
    f.write(pqat_tflite_model)
    
print("QAT model size: ", get_gzipped_model_size(qat_model_file), ' KB')
print("PQAT model size: ", get_gzipped_model_size(pqat_model_file), ' KB')

TF에서 TFLite로 정확성이 지속되는지 확인하기

테스트 데이터세트에서 TFLite 모델을 평가하는 도우미 함수를 정의합니다.

In [ ]:

def eval_model(interpreter):
  input_index = interpreter.get_input_details()[0]["index"]
  output_index = interpreter.get_output_details()[0]["index"]

  # Run predictions on every image in the "test" dataset.
  prediction_digits = []
  for i, test_image in enumerate(test_images):
    if i % 1000 == 0:
      print(f"Evaluated on {i} results so far.")
    # Pre-processing: add batch dimension and convert to float32 to match with
    # the model's input data format.
    test_image = np.expand_dims(test_image, axis=0).astype(np.float32)
    interpreter.set_tensor(input_index, test_image)

    # Run inference.
    interpreter.invoke()

    # Post-processing: remove batch dimension and find the digit with highest
    # probability.
    output = interpreter.tensor(output_index)
    digit = np.argmax(output()[0])
    prediction_digits.append(digit)

  print('\n')
  # Compare prediction results with ground truth labels to calculate accuracy.
  prediction_digits = np.array(prediction_digits)
  accuracy = (prediction_digits == test_labels).mean()
  return accuracy

잘라내기 되고, 양자화된 모델을 평가한 다음 TFLite 백엔드에서 TensorFlow의 정확성이 유지되는지 확인합니다.

In [ ]:

interpreter = tf.lite.Interpreter(pqat_model_file)
interpreter.allocate_tensors()

pqat_test_accuracy = eval_model(interpreter)

print('Pruned and quantized TFLite test_accuracy:', pqat_test_accuracy)
print('Pruned TF test accuracy:', pruned_model_accuracy)

사전 훈련 양자화 적용 및 PQAT 모델과 비교하기

다음으로, 잘라내기 된 모델에서 일반적인 사후 훈련 양자화(미세 조정 없음)를 사용하고 PQAT 모델과 비교하여 정확성을 확인합니다. 이는 양자화된 모델의 정확성 향상을 위해 PQAT를 사용해야 하는 이유를 보여줍니다.

우선, 첫 1,000개 훈련 이미지에서 교정 데이터세트에 대한 생성기를 정의합니다.

In [ ]:

def mnist_representative_data_gen():
  for image in train_images[:1000]:  
    image = np.expand_dims(image, axis=0).astype(np.float32)
    yield [image]

모델을 양자화하고 이전에 확보한 PQAT 모델과 정확성을 비교합니다. 미세 조정이 포함된 양자화된 모델이 정확성이 더 높다는 점을 참고합니다.

In [ ]:

converter = tf.lite.TFLiteConverter.from_keras_model(stripped_pruned_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = mnist_representative_data_gen
post_training_tflite_model = converter.convert()
post_training_model_file = 'post_training_model.tflite'
# Save the model.
with open(post_training_model_file, 'wb') as f:
    f.write(post_training_tflite_model)
    
# Compare accuracy
interpreter = tf.lite.Interpreter(post_training_model_file)
interpreter.allocate_tensors()

post_training_test_accuracy = eval_model(interpreter)

print('PQAT TFLite test_accuracy:', pqat_test_accuracy)
print('Post-training (no fine-tuning) TF test accuracy:', post_training_test_accuracy)

결론

이 튜토리얼에서, 모델을 생성하고 희소성 API를 사용하여 이를 잘라내고 양자화 인식 훈련을 보존하는-희소성(PQAT)을 적용하여 QAT를 사용하는 도중 희소성을 보존하는 방법을 학습했습니다. 최종 PQAT 모델은 QAT와 비교되어 희소성은 전자의 경우 보존되고 후자의 경우 소실된다는 것을 보여줍니다. 다음으로, 모델은 TFLite로 변환되어 잘라내기 연결의 압축 이점과 PQAT 모델 최적화 기술 및 TFLite 백엔드에서 정확성을 보존하도록 평가되는 TFLite 모델을 보여줍니다. 마지막으로, PQAT 모델은 사후 훈련 양자화 API를 사용하여 구축된 양자화된 잘라내기 된 모델과 비교되어 일반적인 양자화의 정확성 손실을 회복함에 있어서 PQAT의 이점을 보여줍니다.