GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/es-419/lite/performance/post_training_quant.ipynb
²⁵¹¹⁸ views

Kernel: Python 3

Copyright 2019 The TensorFlow Authors.

In [ ]:

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Cuantización del rango dinámico postentrenamiento

Ver en TensorFlow.org

Ejecutar en Google Colab

Ver fuente en GitHub

Descargar el bloc de notas

Ver modelo de TF Hub

Visión general

TensorFlow Lite soporta ahora la conversión de las ponderaciones a una precisión de 8 bits como parte de la conversión del modelo desde graphdefs de tensorflow al formato flatbuffer de TensorFlow Lite. La cuantización del rango dinámico consigue una reducción de 4x del tamaño del modelo. Además, TFLite soporta la cuantización y decuantización sobre la marcha de las activaciones para permitir:

Usar kernels cuantizados para una implementación más rápida cuando estén disponibles.
Mezcla de kernels de punto flotante con kernels cuantizados para diferentes partes del grafo.

Las activaciones se almacenan siempre en punto flotante. Para las ops que admiten kernels cuantizados, las activaciones se cuantizan a 8 bits de precisión dinámicamente antes del procesamiento y se descuantizan a precisión flotante después del procesamiento. Dependiendo del modelo que se esté convirtiendo, esto puede suponer un aumento de la velocidad con respecto al cálculo en punto flotante puro.

En contraste con el entrenamiento consciente de la cuantización en este método, las ponderaciones se cuantizan tras el entrenamiento y las activaciones se cuantizan dinámicamente en el momento de la inferencia. Por lo tanto, las ponderaciones del modelo no se vuelven a entrenar para compensar los errores inducidos por la cuantización. Es importante revisar la precisión del modelo cuantizado para asegurarse de que la degradación es aceptable.

Este tutorial entrena un modelo MNIST desde cero, revisa su precisión en TensorFlow y, a continuación, convierte el modelo en un flatbuffer de Tensorflow Lite con cuantización de rango dinámico. Por último, revisa la precisión del modelo convertido y lo compara con el modelo flotante original.

Generar un modelo MNIST

Prepararación

In [ ]:

import logging
logging.getLogger("tensorflow").setLevel(logging.DEBUG)

import tensorflow as tf
from tensorflow import keras
import numpy as np
import pathlib

Entrenar un modelo TensorFlow

In [ ]:

# Load MNIST dataset
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images = test_images / 255.0

# Define the model architecture
model = keras.Sequential([
  keras.layers.InputLayer(input_shape=(28, 28)),
  keras.layers.Reshape(target_shape=(28, 28, 1)),
  keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation=tf.nn.relu),
  keras.layers.MaxPooling2D(pool_size=(2, 2)),
  keras.layers.Flatten(),
  keras.layers.Dense(10)
])

# Train the digit classification model
model.compile(optimizer='adam',
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
model.fit(
  train_images,
  train_labels,
  epochs=1,
  validation_data=(test_images, test_labels)
)

En el caso del ejemplo, como ha entrenado el modelo durante una sola época, sólo alcanza una precisión del 96%.

Convertir a un modelo TensorFlow Lite

Usando el Convertidor de TensorFlow Lite, ahora puede convertir el modelo entrenado en un modelo TensorFlow Lite.

Ahora cargue el modelo usando el TFLiteConverter:

In [ ]:

converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

Escríbalo en un archivo tflite:

In [ ]:

tflite_models_dir = pathlib.Path("/tmp/mnist_tflite_models/")
tflite_models_dir.mkdir(exist_ok=True, parents=True)

In [ ]:

tflite_model_file = tflite_models_dir/"mnist_model.tflite"
tflite_model_file.write_bytes(tflite_model)

Para cuantizar el modelo en la exportación, configure el indicador optimizations para que optimice el tamaño:

In [ ]:

converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quant_model = converter.convert()
tflite_model_quant_file = tflite_models_dir/"mnist_model_quant.tflite"
tflite_model_quant_file.write_bytes(tflite_quant_model)

Observe cómo el archivo resultante, tiene aproximadamente 1/4 del tamaño.

In [ ]:

!ls -lh {tflite_models_dir}

Ejecutar los modelos TFLite

Ejecute el modelo TensorFlow Lite usando el intérprete TensorFlow Lite de Python.

Cargar el modelo con un intérprete

In [ ]:

interpreter = tf.lite.Interpreter(model_path=str(tflite_model_file))
interpreter.allocate_tensors()

In [ ]:

interpreter_quant = tf.lite.Interpreter(model_path=str(tflite_model_quant_file))
interpreter_quant.allocate_tensors()

Probar el modelo en una imagen

In [ ]:

test_image = np.expand_dims(test_images[0], axis=0).astype(np.float32)

input_index = interpreter.get_input_details()[0]["index"]
output_index = interpreter.get_output_details()[0]["index"]

interpreter.set_tensor(input_index, test_image)
interpreter.invoke()
predictions = interpreter.get_tensor(output_index)

In [ ]:

import matplotlib.pylab as plt

plt.imshow(test_images[0])
template = "True:{true}, predicted:{predict}"
_ = plt.title(template.format(true= str(test_labels[0]),
                              predict=str(np.argmax(predictions[0]))))
plt.grid(False)

Evaluar los modelos

In [ ]:

# A helper function to evaluate the TF Lite model using "test" dataset.
def evaluate_model(interpreter):
  input_index = interpreter.get_input_details()[0]["index"]
  output_index = interpreter.get_output_details()[0]["index"]

  # Run predictions on every image in the "test" dataset.
  prediction_digits = []
  for test_image in test_images:
    # Pre-processing: add batch dimension and convert to float32 to match with
    # the model's input data format.
    test_image = np.expand_dims(test_image, axis=0).astype(np.float32)
    interpreter.set_tensor(input_index, test_image)

    # Run inference.
    interpreter.invoke()

    # Post-processing: remove batch dimension and find the digit with highest
    # probability.
    output = interpreter.tensor(output_index)
    digit = np.argmax(output()[0])
    prediction_digits.append(digit)

  # Compare prediction results with ground truth labels to calculate accuracy.
  accurate_count = 0
  for index in range(len(prediction_digits)):
    if prediction_digits[index] == test_labels[index]:
      accurate_count += 1
  accuracy = accurate_count * 1.0 / len(prediction_digits)

  return accuracy

In [ ]:

print(evaluate_model(interpreter))

Repita la evaluación en el modelo cuantizado de rango dinámico para obtener:

In [ ]:

print(evaluate_model(interpreter_quant))

En este ejemplo, el modelo comprimido no tiene ninguna diferencia en la precisión.

Optimizar un modelo existente

Las resnets con capas de preactivación (Resnet-v2) se usan ampliamente para aplicaciones de visión. El grafo congelado preentrenado para resnet-v2-101 está disponible en Tensorflow Hub.

Puede convertir el grafo congelado en un flatbuffer TensorFLow Lite con cuantización mediante:

In [ ]:

import tensorflow_hub as hub

resnet_v2_101 = tf.keras.Sequential([
  keras.layers.InputLayer(input_shape=(224, 224, 3)),
  hub.KerasLayer("https://tfhub.dev/google/imagenet/resnet_v2_101/classification/4")
])

converter = tf.lite.TFLiteConverter.from_keras_model(resnet_v2_101)

In [ ]:

# Convert to TF Lite without quantization
resnet_tflite_file = tflite_models_dir/"resnet_v2_101.tflite"
resnet_tflite_file.write_bytes(converter.convert())

In [ ]:

# Convert to TF Lite with quantization
converter.optimizations = [tf.lite.Optimize.DEFAULT]
resnet_quantized_tflite_file = tflite_models_dir/"resnet_v2_101_quantized.tflite"
resnet_quantized_tflite_file.write_bytes(converter.convert())

In [ ]:

!ls -lh {tflite_models_dir}/*.tflite

El tamaño del modelo se reduce de 171 MB a 43 MB. La precisión de este modelo en imagenet puede evaluarse usando los scripts proporcionados para Medir la precisión de TFLite.

La precisión top-1 del modelo optimizado es de 76.8, la misma que la del modelo de punto flotante.