GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/ko/lite/performance/post_training_quant.ipynb
³⁸⁵²³ views

Kernel: Python 3

Copyright 2019 The TensorFlow Authors.

In [ ]:

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

훈련 후 동적 범위 양자화

개요

TensorFlow Lite은 이제 tensorflow graphdefs에서 TensorFlow Lite의 플랫 버퍼 형식으로의 모델 변환의 일부로써 가중치를 8bit 정밀도로 변환하는 것을 지원합니다. 동적 범위 양자화는 모델 크기를 4배 줄입니다. 또한 TFLite는 활성화의 즉석 양자화 및 역양자화를 지원하여 다음을 허용합니다.

가능한 경우 더 빠른 구현을 위해 양자화된 커널 사용하기
그래프의 다른 부분에 대해 부동 소수점 커널과 양자화된 커널을 혼합하기

활성화는 항상 부동 소수점에 저장됩니다. 양자화된 커널을 지원하는 연산의 경우 활성화는 처리 전에 동적으로 8bit 정밀도로 양자화되고 처리 후 float 정밀도로 역양자화됩니다. 따라서 변환되는 모델에 따라 이 방식으로 순수한 부동 소수점 계산보다 속도를 높일 수 있습니다.

양자화 인식 훈련과 달리 가중치는 훈련 후 양자화되고 활성화는 이 메서드의 추론에서 동적으로 양자화됩니다. 따라서 모델 가중치는 양자화로 인한 오류를 보상하기 위해 재훈련되지 않습니다. 저하가 허용 가능한지 확인하기 위해 양자화된 모델의 정확성을 확인하는 것이 중요합니다.

이 가이드에서는 MNIST 모델을 처음부터 훈련하고 TensorFlow에서 정확성을 확인한 다음 동적 범위 양자화를 사용하여 모델을 Tensorflow Lite flatbuffer로 변환합니다. 마지막으로 변환된 모델의 정확성을 확인하고 원본 float 모델과 비교합니다.

MNIST 모델 빌드하기

설정

In [ ]:

import logging
logging.getLogger("tensorflow").setLevel(logging.DEBUG)

import tensorflow as tf
from tensorflow import keras
import numpy as np
import pathlib

TensorFlow 모델 훈련하기

In [ ]:

# Load MNIST dataset
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images = test_images / 255.0

# Define the model architecture
model = keras.Sequential([
  keras.layers.InputLayer(input_shape=(28, 28)),
  keras.layers.Reshape(target_shape=(28, 28, 1)),
  keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation=tf.nn.relu),
  keras.layers.MaxPooling2D(pool_size=(2, 2)),
  keras.layers.Flatten(),
  keras.layers.Dense(10)
])

# Train the digit classification model
model.compile(optimizer='adam',
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
model.fit(
  train_images,
  train_labels,
  epochs=1,
  validation_data=(test_images, test_labels)
)

예를 들어, 단일 epoch에 대해서만 모델을 훈련했기 때문에 최대 96%의 정확성으로만 훈련됩니다.

TensorFlow Lite 모델로 변환하기

TensorFlow Lite Converter를 사용하여 이제 훈련된 모델을 TensorFlow Lite 모델로 변환할 수 있습니다.

TFLiteConverter를 사용하여 모델을 로드합니다.

In [ ]:

converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

tflite 파일에 작성합니다.

In [ ]:

tflite_models_dir = pathlib.Path("/tmp/mnist_tflite_models/")
tflite_models_dir.mkdir(exist_ok=True, parents=True)

In [ ]:

tflite_model_file = tflite_models_dir/"mnist_model.tflite"
tflite_model_file.write_bytes(tflite_model)

내보낼 때 모델을 양자화하려면 optimizations 플래그를 지정하여 크기를 최적화합니다.

In [ ]:

converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quant_model = converter.convert()
tflite_model_quant_file = tflite_models_dir/"mnist_model_quant.tflite"
tflite_model_quant_file.write_bytes(tflite_quant_model)

결과 파일의 약 1/4 크기인지 확인하세요.

In [ ]:

!ls -lh {tflite_models_dir}

TFLite 모델 실행하기

Python TensorFlow Lite 인터프리터를 사용하여 TensorFlow Lite 모델을 실행합니다.

인터프리터에 모델 로드하기

In [ ]:

interpreter = tf.lite.Interpreter(model_path=str(tflite_model_file))
interpreter.allocate_tensors()

In [ ]:

interpreter_quant = tf.lite.Interpreter(model_path=str(tflite_model_quant_file))
interpreter_quant.allocate_tensors()

하나의 이미지에서 모델 테스트하기

In [ ]:

test_image = np.expand_dims(test_images[0], axis=0).astype(np.float32)

input_index = interpreter.get_input_details()[0]["index"]
output_index = interpreter.get_output_details()[0]["index"]

interpreter.set_tensor(input_index, test_image)
interpreter.invoke()
predictions = interpreter.get_tensor(output_index)

In [ ]:

import matplotlib.pylab as plt

plt.imshow(test_images[0])
template = "True:{true}, predicted:{predict}"
_ = plt.title(template.format(true= str(test_labels[0]),
                              predict=str(np.argmax(predictions[0]))))
plt.grid(False)

모델 평가하기

In [ ]:

# A helper function to evaluate the TF Lite model using "test" dataset.
def evaluate_model(interpreter):
  input_index = interpreter.get_input_details()[0]["index"]
  output_index = interpreter.get_output_details()[0]["index"]

  # Run predictions on every image in the "test" dataset.
  prediction_digits = []
  for test_image in test_images:
    # Pre-processing: add batch dimension and convert to float32 to match with
    # the model's input data format.
    test_image = np.expand_dims(test_image, axis=0).astype(np.float32)
    interpreter.set_tensor(input_index, test_image)

    # Run inference.
    interpreter.invoke()

    # Post-processing: remove batch dimension and find the digit with highest
    # probability.
    output = interpreter.tensor(output_index)
    digit = np.argmax(output()[0])
    prediction_digits.append(digit)

  # Compare prediction results with ground truth labels to calculate accuracy.
  accurate_count = 0
  for index in range(len(prediction_digits)):
    if prediction_digits[index] == test_labels[index]:
      accurate_count += 1
  accuracy = accurate_count * 1.0 / len(prediction_digits)

  return accuracy

In [ ]:

print(evaluate_model(interpreter))

다음을 얻기 위해 동적 범위 양자화 모델에 대한 평가를 반복합니다.

In [ ]:

print(evaluate_model(interpreter_quant))

이 예에서 압축된 모델은 정확성에 차이가 없습니다.

기존 모델 최적화하기

사전 활성화 레이어 (Resnet-v2)가 있는 Resnet은 비전 애플리케이션에 널리 사용됩니다. resnet-v2-101에 대해 사전 훈련된 고정 그래프는 Tensorflow Hub에서 사용할 수 있습니다.

다음과 같이 양자화를 사용하여 고정된 그래프를 TensorFLow Lite flatbuffer로 변환할 수 있습니다.

In [ ]:

import tensorflow_hub as hub

resnet_v2_101 = tf.keras.Sequential([
  keras.layers.InputLayer(input_shape=(224, 224, 3)),
  hub.KerasLayer("https://tfhub.dev/google/imagenet/resnet_v2_101/classification/4")
])

converter = tf.lite.TFLiteConverter.from_keras_model(resnet_v2_101)

In [ ]:

# Convert to TF Lite without quantization
resnet_tflite_file = tflite_models_dir/"resnet_v2_101.tflite"
resnet_tflite_file.write_bytes(converter.convert())

In [ ]:

# Convert to TF Lite with quantization
converter.optimizations = [tf.lite.Optimize.DEFAULT]
resnet_quantized_tflite_file = tflite_models_dir/"resnet_v2_101_quantized.tflite"
resnet_quantized_tflite_file.write_bytes(converter.convert())

In [ ]:

!ls -lh {tflite_models_dir}/*.tflite

모델 크기가 171MB에서 43MB로 줄어듭니다. imagenet에서 이 모델의 정확성은 TFLite 정확성 측정을 위해 제공된 스크립트를 사용하여 평가할 수 있습니다.

최적화된 모델 상위 1개 정확성은 부동 소수점 모델과 같은 76.8입니다.