Path: blob/master/guides/ipynb/customizing_quantization.ipynb
8053 views
Customizing Quantization with QuantizationConfig
Author: Jyotinder Singh
Date created: 2025/12/18
Last modified: 2025/12/18
Description: Guide on using QuantizationConfig for weight-only quantization and custom quantizers.
Introduction
This guide explores the flexible QuantizationConfig API in Keras, introduced to give you granular control over how your models are quantized. While model.quantize("int8") provides a great default, you often need more control. For example, to perform weight-only quantization (common in LLMs) or to use custom quantization schemes (like percentile-based clipping).
We will cover:
Customizing INT8 Quantization: Modifying the default parameters (e.g., custom value range).
Weight-Only Quantization (INT4): Quantizing weights to 4-bit while keeping activations in float, using
Int4QuantizationConfig.Custom Quantizers: Implementing a completely custom quantizer (e.g.,
PercentileQuantizer) and using it withQuantizationConfig.
Setup
1. Customizing INT8 Quantization
By default, model.quantize("int8") uses AbsMaxQuantizer for both weights and activations which uses the default value range of [-127, 127]. You might want to specify different parameters, such as a restricted value range (if you expect your activations to be within a certain range). You can do this by creating an Int8QuantizationConfig.
2. Weight-Only Quantization (INT4)
By default, model.quantize("int4") quantizes activations to INT8 while keeping weights in INT4. For large language models and memory-constrained environments, weight-only quantization is a popular technique. It reduces the model size significantly (keeping weights in 4-bit) while maintaining higher precision for activations.
To achieve this, we set activation_quantizer=None in the Int4QuantizationConfig.
3. Custom Quantizers: Implementing a Percentile Quantizer
Sometimes, standard absolute-max quantization isn't enough. You might want to be robust to outliers by using percentile-based quantization. Keras allows you to define your own quantizer by subclassing keras.quantizers.Quantizer.
Below is an implementation of a PercentileQuantizer that sets the scale based on a specified percentile of the absolute values.
Now we can use this PercentileQuantizer in our configuration.
Conclusion
With QuantizationConfig, you are no longer limited to stock quantization options. Whether you need weight-only quantization or custom quantizers for specialized hardware or research, Keras provides the modularity to build exactly what you need.