GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/zh-cn/model_optimization/guide/quantization/training_example.ipynb
²⁵¹¹⁸ views

Kernel: Python 3

Copyright 2020 The TensorFlow Authors.

In [ ]:

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Keras 中的量化感知训练示例

概述

欢迎阅读量化感知训练的端到端示例。

其他页面

有关量化感知训练的定义以及如何确定是否应使用量化感知训练（包括支持的功能）的介绍，请参阅概述页面。

要快速找到您的用例（不局限于使用 8 位完全量化模型）所需的 API，请参阅综合指南。

摘要

在本教程中，您将：

从头开始为 MNIST 训练一个 tf.keras 模型。
通过应用量化感知训练 API 来微调模型，查看准确率并导出量化感知模型。
使用该模型为 TFLite 后端创建一个实际量化模型。
查看 TFLite 中的准确率持久性和大小缩减至四分之一的模型。要看到在移动设备上的延迟优势，请尝试 TFLite 应用存储库中的 TFLite 示例。

设置

In [ ]:

! pip install -q tensorflow
! pip install -q tensorflow-model-optimization

In [ ]:

import tempfile
import os

import tensorflow as tf

from tensorflow import keras

在不使用量化感知训练的情况下为 MNIST 训练模型

In [ ]:

# Load MNIST dataset
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images = test_images / 255.0

# Define the model architecture.
model = keras.Sequential([
  keras.layers.InputLayer(input_shape=(28, 28)),
  keras.layers.Reshape(target_shape=(28, 28, 1)),
  keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),
  keras.layers.MaxPooling2D(pool_size=(2, 2)),
  keras.layers.Flatten(),
  keras.layers.Dense(10)
])

# Train the digit classification model
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(
  train_images,
  train_labels,
  epochs=1,
  validation_split=0.1,
)

使用量化感知训练克隆和微调预训练模型

定义模型

将量化感知训练应用于整个模型，然后在模型摘要中查看。现在，所有层都带有前缀“quant”。

请注意，生成的模型可感知量化，但没有量化（例如权重为 float32 而不是 int8）。后面的部分说明了如何从量化感知模型创建量化模型。

在综合指南中，您可以看到如何量化某些层来提高模型准确率。

In [ ]:

import tensorflow_model_optimization as tfmot

quantize_model = tfmot.quantization.keras.quantize_model

# q_aware stands for for quantization aware.
q_aware_model = quantize_model(model)

# `quantize_model` requires a recompile.
q_aware_model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

q_aware_model.summary()

根据基准训练和评估模型

为了在仅训练一个周期后便可演示微调，我们基于训练数据的一个子集来使用量化感知训练进行微调。

In [ ]:

train_images_subset = train_images[0:1000] # out of 60000
train_labels_subset = train_labels[0:1000]

q_aware_model.fit(train_images_subset, train_labels_subset,
                  batch_size=500, epochs=1, validation_split=0.1)

对于本示例，与基准相比，量化感知训练后的测试准确率损失极小，甚至没有损失。

In [ ]:

_, baseline_model_accuracy = model.evaluate(
    test_images, test_labels, verbose=0)

_, q_aware_model_accuracy = q_aware_model.evaluate(
   test_images, test_labels, verbose=0)

print('Baseline test accuracy:', baseline_model_accuracy)
print('Quant test accuracy:', q_aware_model_accuracy)

为 TFLite 后端创建量化模型

之后，您将获得一个具有 int8 权重和 uint8 激活的实际量化模型。

In [ ]:

converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]

quantized_tflite_model = converter.convert()

查看从 TF 到 TFLite 的准确率持久性

定义一个辅助函数，基于测试数据集评估 TFLite 模型。

In [ ]:

import numpy as np

def evaluate_model(interpreter):
  input_index = interpreter.get_input_details()[0]["index"]
  output_index = interpreter.get_output_details()[0]["index"]

  # Run predictions on every image in the "test" dataset.
  prediction_digits = []
  for i, test_image in enumerate(test_images):
    if i % 1000 == 0:
      print('Evaluated on {n} results so far.'.format(n=i))
    # Pre-processing: add batch dimension and convert to float32 to match with
    # the model's input data format.
    test_image = np.expand_dims(test_image, axis=0).astype(np.float32)
    interpreter.set_tensor(input_index, test_image)

    # Run inference.
    interpreter.invoke()

    # Post-processing: remove batch dimension and find the digit with highest
    # probability.
    output = interpreter.tensor(output_index)
    digit = np.argmax(output()[0])
    prediction_digits.append(digit)

  print('\n')
  # Compare prediction results with ground truth labels to calculate accuracy.
  prediction_digits = np.array(prediction_digits)
  accuracy = (prediction_digits == test_labels).mean()
  return accuracy

评估量化模型后，您将看到从 TensorFlow 持续到 TFLite 后端的准确率。

In [ ]:

interpreter = tf.lite.Interpreter(model_content=quantized_tflite_model)
interpreter.allocate_tensors()

test_accuracy = evaluate_model(interpreter)

print('Quant TFLite test_accuracy:', test_accuracy)
print('Quant TF test accuracy:', q_aware_model_accuracy)

查看量化后大小缩减至四分之一的模型

创建一个浮点 TFLite 模型，随后会看到量化 TFLite 模型的大小缩减至原来的四分之一。

In [ ]:

# Create float TFLite model.
float_converter = tf.lite.TFLiteConverter.from_keras_model(model)
float_tflite_model = float_converter.convert()

# Measure sizes of models.
_, float_file = tempfile.mkstemp('.tflite')
_, quant_file = tempfile.mkstemp('.tflite')

with open(quant_file, 'wb') as f:
  f.write(quantized_tflite_model)

with open(float_file, 'wb') as f:
  f.write(float_tflite_model)

print("Float model in Mb:", os.path.getsize(float_file) / float(2**20))
print("Quantized model in Mb:", os.path.getsize(quant_file) / float(2**20))

结论

在本教程中，您了解了如何使用 TensorFlow Model Optimization Toolkit API 创建量化感知模型，以及随后如何为 TFLite 后端创建量化模型。

您看到了为 MNIST 创建一个大小缩减至四分之一且准确率差异最小的模型后带来的优势。要查看在移动设备上的延迟优势，请尝试 TFLite 应用存储库中的 TFLite 示例。

我们鼓励您试用这项新功能，这对于在资源受限的环境中进行部署特别重要。