GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/pt-br/lite/performance/post_training_quant.ipynb
²⁵¹¹⁸ views

Kernel: Python 3

Copyright 2019 The TensorFlow Authors.

In [ ]:

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Quantização de intervalo dinâmico pós-treinamento

Ver em TensorFlow.org

Executar no Google Colab

Ver fonte no GitHub

Baixar notebook

Ver modelo do TF Hub

Visão geral

O TensorFlow Lite é agora compatível com a conversão de pesos para a precisão de 8 bits como parte da conversão do modelo do GraphDefs do TensorFlow para o formato de flatbuffer do TensorFlow Lite. A quantização de intervalo dinâmico obtém uma redução de 4 vezes no tamanho do modelo. Além disso, o TFLite é compatível com a quantização e desquantização ao vivo de ativações para permitir:

O uso de kernels quantizados para uma implementação mais rápida, quando disponível.
A combinação de kernels de ponto flutuante com kernels quantizados para diferentes partes do grafo.

As ativações são sempre armazenadas em ponto flutuante. Para operações compatíveis com kernels quantizados, as ativações são quantizadas dinamicamente para a precisão de 8 bits antes do processamento e são desquantizadas para a precisão de float após o processamento. Dependendo do modelo que está sendo convertido, isso pode proporcionar um speedup em relação à computação pura de ponto flutuante.

Em contraste com o treinamento consciente de quantização, os pesos são quantizados pós-treinamento e as ativações são quantizadas dinamicamente durante a inferência nesse método. Portanto, os pesos do modelo não são treinados novamente para compensar os erros induzidos pela quantização. É importante conferir a exatidão do modelo quantizado para garantir que a degradação seja aceitável.

Neste tutorial, você vai treinar um modelo MNIST do zero, verificar a exatidão no TensorFlow e converter o modelo em um flatbuffer do TensorFlow Lite com a quantização de intervalo dinâmico. Por fim, você vai conferir a exatidão do modelo convertido e compará-lo ao modelo float original.

Crie um modelo MNIST

Configuração

In [ ]:

import logging
logging.getLogger("tensorflow").setLevel(logging.DEBUG)

import tensorflow as tf
from tensorflow import keras
import numpy as np
import pathlib

Treine um modelo do TensorFlow

In [ ]:

# Load MNIST dataset
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images = test_images / 255.0

# Define the model architecture
model = keras.Sequential([
  keras.layers.InputLayer(input_shape=(28, 28)),
  keras.layers.Reshape(target_shape=(28, 28, 1)),
  keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation=tf.nn.relu),
  keras.layers.MaxPooling2D(pool_size=(2, 2)),
  keras.layers.Flatten(),
  keras.layers.Dense(10)
])

# Train the digit classification model
model.compile(optimizer='adam',
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
model.fit(
  train_images,
  train_labels,
  epochs=1,
  validation_data=(test_images, test_labels)
)

Para o exemplo, como você treinou o modelo por apenas uma única época, ele só treina com uma exatidão de aproximadamente 96%.

Converta para um modelo do TensorFlow Lite

Usando o Conversor do TensorFlow Lite, você pode converter o modelo treinado em um modelo do TensorFlow Lite.

Agora carregue o modelo usando o TFLiteConverter:

In [ ]:

converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

Escreva em um arquivo .tflite:

In [ ]:

tflite_models_dir = pathlib.Path("/tmp/mnist_tflite_models/")
tflite_models_dir.mkdir(exist_ok=True, parents=True)

In [ ]:

tflite_model_file = tflite_models_dir/"mnist_model.tflite"
tflite_model_file.write_bytes(tflite_model)

Para quantizar o modelo na exportação, configure a flag optimizations para otimizar o tamanho:

In [ ]:

converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quant_model = converter.convert()
tflite_model_quant_file = tflite_models_dir/"mnist_model_quant.tflite"
tflite_model_quant_file.write_bytes(tflite_quant_model)

O arquivo resultante tem aproximadamente 1/4 do tamanho.

In [ ]:

!ls -lh {tflite_models_dir}

Execute os modelos do TFLite

Execute o modelo do TensorFlow Lite usando o interpretador do TensorFlow Lite em Python.

Carregue o modelo em um interpretador

In [ ]:

interpreter = tf.lite.Interpreter(model_path=str(tflite_model_file))
interpreter.allocate_tensors()

In [ ]:

interpreter_quant = tf.lite.Interpreter(model_path=str(tflite_model_quant_file))
interpreter_quant.allocate_tensors()

Teste o modelo em uma imagem

In [ ]:

test_image = np.expand_dims(test_images[0], axis=0).astype(np.float32)

input_index = interpreter.get_input_details()[0]["index"]
output_index = interpreter.get_output_details()[0]["index"]

interpreter.set_tensor(input_index, test_image)
interpreter.invoke()
predictions = interpreter.get_tensor(output_index)

In [ ]:

import matplotlib.pylab as plt

plt.imshow(test_images[0])
template = "True:{true}, predicted:{predict}"
_ = plt.title(template.format(true= str(test_labels[0]),
                              predict=str(np.argmax(predictions[0]))))
plt.grid(False)

Avalie os modelos

In [ ]:

# A helper function to evaluate the TF Lite model using "test" dataset.
def evaluate_model(interpreter):
  input_index = interpreter.get_input_details()[0]["index"]
  output_index = interpreter.get_output_details()[0]["index"]

  # Run predictions on every image in the "test" dataset.
  prediction_digits = []
  for test_image in test_images:
    # Pre-processing: add batch dimension and convert to float32 to match with
    # the model's input data format.
    test_image = np.expand_dims(test_image, axis=0).astype(np.float32)
    interpreter.set_tensor(input_index, test_image)

    # Run inference.
    interpreter.invoke()

    # Post-processing: remove batch dimension and find the digit with highest
    # probability.
    output = interpreter.tensor(output_index)
    digit = np.argmax(output()[0])
    prediction_digits.append(digit)

  # Compare prediction results with ground truth labels to calculate accuracy.
  accurate_count = 0
  for index in range(len(prediction_digits)):
    if prediction_digits[index] == test_labels[index]:
      accurate_count += 1
  accuracy = accurate_count * 1.0 / len(prediction_digits)

  return accuracy

In [ ]:

print(evaluate_model(interpreter))

Repita a avaliação no modelo quantizado em intervalo dinâmico para obter o seguinte:

In [ ]:

print(evaluate_model(interpreter_quant))

Nesse exemplo, o modelo comprimido não tem nenhuma diferença na exatidão.

Otimizando um modelo existente

As resnets com camadas de pré-ativação (Resnet-v2) são amplamente usadas para aplicativos de visão. O grafo congelado pré-treinado para a resnet-v2-101 está disponível no TensorFlow Hub.

Você pode converter o grafo congelado em um flatbuffer do TensorFlow Lite com a quantização da seguinte maneira:

In [ ]:

import tensorflow_hub as hub

resnet_v2_101 = tf.keras.Sequential([
  keras.layers.InputLayer(input_shape=(224, 224, 3)),
  hub.KerasLayer("https://tfhub.dev/google/imagenet/resnet_v2_101/classification/4")
])

converter = tf.lite.TFLiteConverter.from_keras_model(resnet_v2_101)

In [ ]:

# Convert to TF Lite without quantization
resnet_tflite_file = tflite_models_dir/"resnet_v2_101.tflite"
resnet_tflite_file.write_bytes(converter.convert())

In [ ]:

# Convert to TF Lite with quantization
converter.optimizations = [tf.lite.Optimize.DEFAULT]
resnet_quantized_tflite_file = tflite_models_dir/"resnet_v2_101_quantized.tflite"
resnet_quantized_tflite_file.write_bytes(converter.convert())

In [ ]:

!ls -lh {tflite_models_dir}/*.tflite

O tamanho do modelo é reduzido de 171 MB para 43 MB. A exatidão dele na imagenet pode ser avaliada usando os scripts fornecidos pela medição de exatidão do TFLite.

A exatidão top-1 do modelo otimizado é 76,8, a mesma que o modelo de ponto flutuante.