GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/ko/model_optimization/guide/quantization/training_example.ipynb
²⁵¹¹⁸ views

Kernel: Python 3

Copyright 2020 The TensorFlow Authors.

In [ ]:

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Keras 예제의 양자화 인식 훈련

개요

양자화 인식 훈련의 엔드 투 엔드 예제를 시작합니다.

기타 페이지

양자화 인식 훈련이 무엇인지에 대한 소개와 이를 사용해야 하는지에 대한 결정(지원되는 내용 포함)은 개요 페이지를 참조하세요.

사용 사례에 필요한 API를 빠르게 찾으려면(8bit로 모델을 완전히 양자화하는 것 이상), 종합 가이드를 참조하세요.

요약

이 튜토리얼에서는 다음을 수행합니다.

MNIST용 tf.keras 모델을 처음부터 훈련합니다.
양자화 인식 학습 API를 적용하여 모델을 미세 조정하고, 정확성을 확인하고, 양자화 인식 모델을 내보냅니다.
모델을 사용하여 TFLite 백엔드에 대해 실제로 양자화된 모델을 만듭니다.
TFLite와 4배 더 작아진 모델에서 정확성의 지속성을 확인합니다. 모바일에서의 지연 시간 이점을 확인하려면, TFLite 앱 리포지토리에서 TFLite 예제를 사용해 보세요.

설정

In [ ]:

! pip install -q tensorflow
! pip install -q tensorflow-model-optimization

In [ ]:

import tempfile
import os

import tensorflow as tf

from tensorflow import keras

양자화 인식 훈련 없이 MNIST 모델 훈련하기

In [ ]:

# Load MNIST dataset
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images = test_images / 255.0

# Define the model architecture.
model = keras.Sequential([
  keras.layers.InputLayer(input_shape=(28, 28)),
  keras.layers.Reshape(target_shape=(28, 28, 1)),
  keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),
  keras.layers.MaxPooling2D(pool_size=(2, 2)),
  keras.layers.Flatten(),
  keras.layers.Dense(10)
])

# Train the digit classification model
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(
  train_images,
  train_labels,
  epochs=1,
  validation_split=0.1,
)

양자화 인식 훈련으로 사전 훈련된 모델 복제 및 미세 조정

모델 정의하기

전체 모델에 양자화 인식 훈련을 적용하고 모델 요약에서 이를 확인합니다. 이제 모든 레이어 앞에 "quant"가 붙습니다.

결과 모델은 양자화를 인식하지만, 양자화되지는 않습니다(예: 가중치가 int8 대신 float32임). 다음 섹션에서는 양자화 인식 모델에서 양자화된 모델을 만드는 방법을 보여줍니다.

종합 가이드에서 모델 정확성의 향상을 위해 일부 레이어를 양자화하는 방법을 볼 수 있습니다.

In [ ]:

import tensorflow_model_optimization as tfmot

quantize_model = tfmot.quantization.keras.quantize_model

# q_aware stands for for quantization aware.
q_aware_model = quantize_model(model)

# `quantize_model` requires a recompile.
q_aware_model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

q_aware_model.summary()

기준선과 비교하여 모델 훈련 및 평가하기

하나의 epoch 동안 모델을 훈련한 후 미세 조정을 시연하려면 훈련 데이터의 하위 집합에 대한 양자화 인식 훈련으로 미세 조정합니다.

In [ ]:

train_images_subset = train_images[0:1000] # out of 60000
train_labels_subset = train_labels[0:1000]

q_aware_model.fit(train_images_subset, train_labels_subset,
                  batch_size=500, epochs=1, validation_split=0.1)

이 예제의 경우, 기준선과 비교하여 양자화 인식 훈련 후 테스트 정확성의 손실이 거의 없습니다.

In [ ]:

_, baseline_model_accuracy = model.evaluate(
    test_images, test_labels, verbose=0)

_, q_aware_model_accuracy = q_aware_model.evaluate(
   test_images, test_labels, verbose=0)

print('Baseline test accuracy:', baseline_model_accuracy)
print('Quant test accuracy:', q_aware_model_accuracy)

TFLite 백엔드를 위한 양자화 모델 생성하기

다음을 통해 int8 가중치 및 uint8 활성화를 사용하여 실제로 양자화된 모델을 얻게 됩니다.

In [ ]:

converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]

quantized_tflite_model = converter.convert()

TF에서 TFLite까지 정확성의 지속성 확인하기

테스트 데이터세트에 대해 TF Lite 모델을 평가하는 도우미 함수를 정의합니다.

In [ ]:

import numpy as np

def evaluate_model(interpreter):
  input_index = interpreter.get_input_details()[0]["index"]
  output_index = interpreter.get_output_details()[0]["index"]

  # Run predictions on every image in the "test" dataset.
  prediction_digits = []
  for i, test_image in enumerate(test_images):
    if i % 1000 == 0:
      print('Evaluated on {n} results so far.'.format(n=i))
    # Pre-processing: add batch dimension and convert to float32 to match with
    # the model's input data format.
    test_image = np.expand_dims(test_image, axis=0).astype(np.float32)
    interpreter.set_tensor(input_index, test_image)

    # Run inference.
    interpreter.invoke()

    # Post-processing: remove batch dimension and find the digit with highest
    # probability.
    output = interpreter.tensor(output_index)
    digit = np.argmax(output()[0])
    prediction_digits.append(digit)

  print('\n')
  # Compare prediction results with ground truth labels to calculate accuracy.
  prediction_digits = np.array(prediction_digits)
  accuracy = (prediction_digits == test_labels).mean()
  return accuracy

양자화 모델을 평가하고 TensorFlow의 정확성이 TFLite 백엔드까지 유지되는지 확인합니다.

In [ ]:

interpreter = tf.lite.Interpreter(model_content=quantized_tflite_model)
interpreter.allocate_tensors()

test_accuracy = evaluate_model(interpreter)

print('Quant TFLite test_accuracy:', test_accuracy)
print('Quant TF test accuracy:', q_aware_model_accuracy)

양자화로 4배 더 작아진 모델 확인하기

float TFLite 모델을 생성한 다음 TFLite 양자화 모델이 4배 더 작아진 것을 확인합니다.

In [ ]:

# Create float TFLite model.
float_converter = tf.lite.TFLiteConverter.from_keras_model(model)
float_tflite_model = float_converter.convert()

# Measure sizes of models.
_, float_file = tempfile.mkstemp('.tflite')
_, quant_file = tempfile.mkstemp('.tflite')

with open(quant_file, 'wb') as f:
  f.write(quantized_tflite_model)

with open(float_file, 'wb') as f:
  f.write(float_tflite_model)

print("Float model in Mb:", os.path.getsize(float_file) / float(2**20))
print("Quantized model in Mb:", os.path.getsize(quant_file) / float(2**20))

결론

이 튜토리얼에서는 TensorFlow Model Optimization Toolkit API를 사용하여 양자화 인식 모델을 만든 다음 TFLite 백엔드용 양자화 모델을 만드는 방법을 살펴보았습니다.

정확성 차이를 최소화하면서 MNIST 모델의 크기를 4배 압축하는 이점을 확인했습니다. 모바일에서의 지연 시간 이점을 확인하려면, TFLite 앱 리포지토리에서 TFLite 예제를 사용해 보세요.

이 새로운 기능은 리소스가 제한된 환경에서 배포할 때 특히 중요하므로 사용해 볼 것을 권장합니다.