Path: blob/master/site/en-snapshot/model_optimization/guide/clustering/clustering_example.ipynb
25118 views
Copyright 2020 The TensorFlow Authors.
Weight clustering in Keras example
Overview
Welcome to the end-to-end example for weight clustering, part of the TensorFlow Model Optimization Toolkit.
Other pages
For an introduction to what weight clustering is and to determine if you should use it (including what's supported), see the overview page.
To quickly find the APIs you need for your use case (beyond fully clustering a model with 16 clusters), see the comprehensive guide.
Contents
In the tutorial, you will:
Train a
tf.keras
model for the MNIST dataset from scratch.Fine-tune the model by applying the weight clustering API and see the accuracy.
Create a 6x smaller TF and TFLite models from clustering.
Create a 8x smaller TFLite model from combining weight clustering and post-training quantization.
See the persistence of accuracy from TF to TFLite.
Setup
You can run this Jupyter Notebook in your local virtualenv or colab. For details of setting up dependencies, please refer to the installation guide.
Train a tf.keras model for MNIST without clustering
Evaluate the baseline model and save it for later usage
Fine-tune the pre-trained model with clustering
Apply the cluster_weights()
API to a whole pre-trained model to demonstrate its effectiveness in reducing the model size after applying zip while keeping decent accuracy. For how best to balance the accuracy and compression rate for your use case, please refer to the per layer example in the comprehensive guide.
Define the model and apply the clustering API
Before you pass the model to the clustering API, make sure it is trained and shows some acceptable accuracy.
Fine-tune the model and evaluate the accuracy against baseline
Fine-tune the model with clustering for 1 epoch.
For this example, there is minimal loss in test accuracy after clustering, compared to the baseline.
Create 6x smaller models from clustering
Both strip_clustering
and applying a standard compression algorithm (e.g. via gzip) are necessary to see the compression benefits of clustering.
First, create a compressible model for TensorFlow. Here, strip_clustering
removes all variables (e.g. tf.Variable
for storing the cluster centroids and the indices) that clustering only needs during training, which would otherwise add to model size during inference.
Then, create compressible models for TFLite. You can convert the clustered model to a format that's runnable on your targeted backend. TensorFlow Lite is an example you can use to deploy to mobile devices.
Define a helper function to actually compress the models via gzip and measure the zipped size.
Compare and see that the models are 6x smaller from clustering
Create an 8x smaller TFLite model from combining weight clustering and post-training quantization
You can apply post-training quantization to the clustered model for additional benefits.
See the persistence of accuracy from TF to TFLite
Define a helper function to evaluate the TFLite model on the test dataset.
You evaluate the model, which has been clustered and quantized, and then see the accuracy from TensorFlow persists to the TFLite backend.
Conclusion
In this tutorial, you saw how to create clustered models with the TensorFlow Model Optimization Toolkit API. More specifically, you've been through an end-to-end example for creating an 8x smaller model for MNIST with minimal accuracy difference. We encourage you to try this new capability, which can be particularly important for deployment in resource-constrained environments.