Path: blob/master/site/en-snapshot/model_optimization/guide/combine/pcqat_example.ipynb
25118 views
Copyright 2021 The TensorFlow Authors.
Sparsity and cluster preserving quantization aware training (PCQAT) Keras example
Overview
This is an end to end example showing the usage of the sparsity and cluster preserving quantization aware training (PCQAT) API, part of the TensorFlow Model Optimization Toolkit's collaborative optimization pipeline.
Other pages
For an introduction to the pipeline and other available techniques, see the collaborative optimization overview page.
Contents
In the tutorial, you will:
Train a
tf.keras
model for the MNIST dataset from scratch.Fine-tune the model with pruning and see the accuracy and observe that the model was successfully pruned.
Apply sparsity preserving clustering on the pruned model and observe that the sparsity applied earlier has been preserved.
Apply QAT and observe the loss of sparsity and clusters.
Apply PCQAT and observe that both sparsity and clustering applied earlier have been preserved.
Generate a TFLite model and observe the effects of applying PCQAT on it.
Compare the sizes of the different models to observe the compression benefits of applying sparsity followed by the collaborative optimization techniques of sparsity preserving clustering and PCQAT.
Compare the accurracy of the fully optimized model with the un-optimized baseline model accuracy.
Setup
You can run this Jupyter Notebook in your local virtualenv or colab. For details of setting up dependencies, please refer to the installation guide.
Train a tf.keras model for MNIST to be pruned and clustered
Evaluate the baseline model and save it for later usage
Prune and fine-tune the model to 50% sparsity
Apply the prune_low_magnitude()
API to achieve the pruned model that is to be clustered in the next step. Refer to the pruning comprehensive guide for more information on the pruning API.
Define the model and apply the sparsity API
Note that the pre-trained model is used.
Fine-tune the model, check sparsity, and evaluate the accuracy against baseline
Fine-tune the model with pruning for 3 epochs.
Define helper functions to calculate and print the sparsity and clusters of the model.
Let's strip the pruning wrapper first, then check that the model kernels were correctly pruned.
Apply sparsity preserving clustering and check its effect on model sparsity in both cases
Next, apply sparsity preserving clustering on the pruned model and observe the number of clusters and check that the sparsity is preserved.
Strip the clustering wrapper first, then check that the model is correctly pruned and clustered.
Apply QAT and PCQAT and check effect on model clusters and sparsity
Next, apply both QAT and PCQAT on the sparse clustered model and observe that PCQAT preserves weight sparsity and clusters in your model. Note that the stripped model is passed to the QAT and PCQAT API.
See compression benefits of PCQAT model
Define helper function to get zipped model file.
Observe that applying sparsity, clustering and PCQAT to a model yields significant compression benefits.
See the persistence of accuracy from TF to TFLite
Define a helper function to evaluate the TFLite model on the test dataset.
Evaluate the model, which has been pruned, clustered and quantized, and then see that the accuracy from TensorFlow persists in the TFLite backend.
Conclusion
In this tutorial, you learned how to create a model, prune it using the prune_low_magnitude()
API, and apply sparsity preserving clustering using the cluster_weights()
API to preserve sparsity while clustering the weights.
Next, sparsity and cluster preserving quantization aware training (PCQAT) was applied to preserve model sparsity and clusters while using QAT. The final PCQAT model was compared to the QAT one to show that sparsity and clusters are preserved in the former and lost in the latter.
Next, the models were converted to TFLite to show the compression benefits of chaining sparsity, clustering, and PCQAT model optimization techniques and the TFLite model was evaluated to ensure that the accuracy persists in the TFLite backend.
Finally, the PCQAT TFLite model accuracy was compared to the pre-optimization baseline model accuracy to show that collaborative optimization techniques managed to achieve the compression benefits while maintaining a similar accuracy compared to the original model.