Path: blob/master/site/en-snapshot/model_optimization/guide/quantization/training_example.ipynb
25118 views
Copyright 2020 The TensorFlow Authors.
Quantization aware training in Keras example
Overview
Welcome to an end-to-end example for quantization aware training.
Other pages
For an introduction to what quantization aware training is and to determine if you should use it (including what's supported), see the overview page.
To quickly find the APIs you need for your use case (beyond fully-quantizing a model with 8-bits), see the comprehensive guide.
Summary
In this tutorial, you will:
Train a
tf.keras
model for MNIST from scratch.Fine tune the model by applying the quantization aware training API, see the accuracy, and export a quantization aware model.
Use the model to create an actually quantized model for the TFLite backend.
See the persistence of accuracy in TFLite and a 4x smaller model. To see the latency benefits on mobile, try out the TFLite examples in the TFLite app repository.
Setup
Train a model for MNIST without quantization aware training
Clone and fine-tune pre-trained model with quantization aware training
Define the model
You will apply quantization aware training to the whole model and see this in the model summary. All layers are now prefixed by "quant".
Note that the resulting model is quantization aware but not quantized (e.g. the weights are float32 instead of int8). The sections after show how to create a quantized model from the quantization aware one.
In the comprehensive guide, you can see how to quantize some layers for model accuracy improvements.
Train and evaluate the model against baseline
To demonstrate fine tuning after training the model for just an epoch, fine tune with quantization aware training on a subset of the training data.
For this example, there is minimal to no loss in test accuracy after quantization aware training, compared to the baseline.
Create quantized model for TFLite backend
After this, you have an actually quantized model with int8 weights and uint8 activations.
See persistence of accuracy from TF to TFLite
Define a helper function to evaluate the TF Lite model on the test dataset.
You evaluate the quantized model and see that the accuracy from TensorFlow persists to the TFLite backend.
See 4x smaller model from quantization
You create a float TFLite model and then see that the quantized TFLite model is 4x smaller.
Conclusion
In this tutorial, you saw how to create quantization aware models with the TensorFlow Model Optimization Toolkit API and then quantized models for the TFLite backend.
You saw a 4x model size compression benefit for a model for MNIST, with minimal accuracy difference. To see the latency benefits on mobile, try out the TFLite examples in the TFLite app repository.
We encourage you to try this new capability, which can be particularly important for deployment in resource-constrained environments.