Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
codebasics
GitHub Repository: codebasics/deep-learning-keras-tf-tutorial
Path: blob/master/49_quantization/quantization.ipynb
1141 views
Kernel: Python 3

Quantization Tutorial

Quantization is a technique to downsize a trained model so that you can deploy it on EDGE devices. In this tutorial we will,

(1) Train a hand written digits model

(2) Export to a disk and check the size of that model

(3) Use two techniques for quantization (1) post training quantization (3) quantization aware training

import tensorflow as tf from tensorflow import keras import matplotlib.pyplot as plt %matplotlib inline import numpy as np
(X_train, y_train) , (X_test, y_test) = keras.datasets.mnist.load_data()
len(X_train)
60000
len(X_test)
10000
X_train[0].shape
(28, 28)
plt.matshow(X_train[0])
<matplotlib.image.AxesImage at 0x255fe7bb760>
Image in a Jupyter notebook
y_train[0]
5
X_train = X_train / 255 X_test = X_test / 255
X_train_flattened = X_train.reshape(len(X_train), 28*28) X_test_flattened = X_test.reshape(len(X_test), 28*28)
X_train_flattened.shape
(60000, 784)

Using Flatten layer so that we don't have to call .reshape on input dataset

model = keras.Sequential([ keras.layers.Flatten(input_shape=(28, 28)), keras.layers.Dense(100, activation='relu'), keras.layers.Dense(10, activation='sigmoid') ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit(X_train, y_train, epochs=5)
Epoch 1/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.2734 - accuracy: 0.9226 Epoch 2/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.1245 - accuracy: 0.9635 Epoch 3/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.0866 - accuracy: 0.9745 Epoch 4/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.0652 - accuracy: 0.9800 Epoch 5/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.0516 - accuracy: 0.9847
<tensorflow.python.keras.callbacks.History at 0x25bbb173820>
model.evaluate(X_test,y_test)
313/313 [==============================] - 1s 2ms/step - loss: 0.0888 - accuracy: 0.9708
[0.08878576755523682, 0.97079998254776]
model.save("./saved_model/")
INFO:tensorflow:Assets written to: ./saved_model/assets

(1) Post training quantization

Without quantization

converter = tf.lite.TFLiteConverter.from_saved_model("./saved_model") tflite_model = converter.convert()

With quantization

converter = tf.lite.TFLiteConverter.from_saved_model("./saved_model") converter.optimizations = [tf.lite.Optimize.DEFAULT] tflite_quant_model = converter.convert()
len(tflite_model)
319792
len(tflite_quant_model)
84752

You can see above that quantizated model is 1/4th the size of a non quantized model

with open("tflite_model.tflite", "wb") as f: f.write(tflite_model)
with open("tflite_quant_model.tflite", "wb") as f: f.write(tflite_quant_model)

Once you have above files saved to a disk, check their sizes. Quantized model will be obvi

(2) Quantization aware training

import tensorflow_model_optimization as tfmot quantize_model = tfmot.quantization.keras.quantize_model # q_aware stands for for quantization aware. q_aware_model = quantize_model(model) # `quantize_model` requires a recompile. q_aware_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) q_aware_model.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= quantize_layer (QuantizeLaye (None, 28, 28) 3 _________________________________________________________________ quant_flatten (QuantizeWrapp (None, 784) 1 _________________________________________________________________ quant_dense (QuantizeWrapper (None, 100) 78505 _________________________________________________________________ quant_dense_1 (QuantizeWrapp (None, 10) 1015 ================================================================= Total params: 79,524 Trainable params: 79,510 Non-trainable params: 14 _________________________________________________________________
q_aware_model.fit(X_train, y_train, epochs=1)
1875/1875 [==============================] - 7s 4ms/step - loss: 0.0438 - accuracy: 0.9866
<tensorflow.python.keras.callbacks.History at 0x255fe86ba30>
q_aware_model.evaluate(X_test, y_test)
313/313 [==============================] - 1s 2ms/step - loss: 0.0802 - accuracy: 0.9755
[0.08016839623451233, 0.9754999876022339]
converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model) converter.optimizations = [tf.lite.Optimize.DEFAULT] tflite_qaware_model = converter.convert()
WARNING:absl:Found untraced functions such as flatten_layer_call_fn, flatten_layer_call_and_return_conditional_losses, dense_layer_call_fn, dense_layer_call_and_return_conditional_losses, dense_1_layer_call_fn while saving (showing 5 of 15). These functions will not be directly callable after loading.
INFO:tensorflow:Assets written to: C:\Users\dhava\AppData\Local\Temp\tmpqnsx4bvx\assets
INFO:tensorflow:Assets written to: C:\Users\dhava\AppData\Local\Temp\tmpqnsx4bvx\assets
len(tflite_qaware_model)
82376
with open("tflite_qaware_model.tflite", 'wb') as f: f.write(tflite_qaware_model)