Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
tensorflow
GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/zh-cn/model_optimization/guide/combine/pqat_example.ipynb
25118 views
Kernel: Python 3

Copyright 2021 The TensorFlow Authors.

#@title Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # https://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License.

剪枝保留量化感知训练 (PQAT) Keras 示例

文本特征向量

这是一个展示剪枝保留量化感知训练 (PQAT) API 用法的端到端示例,该 API 是 TensorFlow 模型优化工具包的协作优化流水线的一部分。

其他页面

有关流水线和其他可用技术的简介,请参阅协作优化概述页面

目录

在本教程中,您将:

  1. 从头开始为 MNIST 数据集训练一个 tf.keras 模型。

  2. 使用稀疏性 API,通过剪枝对模型进行微调,并查看准确率。

  3. 应用 QAT 并观察稀疏性损失。

  4. 应用 PQAT 并观察之前应用的稀疏性已被保留。

  5. 生成一个 TFLite 模型并观察对其应用 PQAT 的效果。

  6. 将获得的 PQAT 模型准确率与使用训练后量化所量化的模型进行比较。

安装

您可以在本地 virtualenvColab 中运行此 Jupyter 笔记本。有关设置依赖项的详细信息,请参阅安装指南

! pip install -q tensorflow-model-optimization
import tensorflow as tf import numpy as np import tempfile import zipfile import os

为 MNIST 训练不进行剪枝的 tf.keras 模型

# Load MNIST dataset mnist = tf.keras.datasets.mnist (train_images, train_labels), (test_images, test_labels) = mnist.load_data() # Normalize the input image so that each pixel value is between 0 to 1. train_images = train_images / 255.0 test_images = test_images / 255.0 model = tf.keras.Sequential([ tf.keras.layers.InputLayer(input_shape=(28, 28)), tf.keras.layers.Reshape(target_shape=(28, 28, 1)), tf.keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation=tf.nn.relu), tf.keras.layers.MaxPooling2D(pool_size=(2, 2)), tf.keras.layers.Flatten(), tf.keras.layers.Dense(10) ]) # Train the digit classification model model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) model.fit( train_images, train_labels, validation_split=0.1, epochs=10 )

评估基准模型并保存以备稍后使用

_, baseline_model_accuracy = model.evaluate( test_images, test_labels, verbose=0) print('Baseline test accuracy:', baseline_model_accuracy) _, keras_file = tempfile.mkstemp('.h5') print('Saving model to: ', keras_file) tf.keras.models.save_model(model, keras_file, include_optimizer=False)

将模型剪枝和微调至 50% 稀疏性

应用 prune_low_magnitude() API 对整个预训练模型进行剪枝,以演示并观察其不仅能够在应用 zip 时有效缩减模型大小,还能保持良好的准确率。有关如何在保持目标准确率的同时以最佳方式使用 API 实现最佳压缩率,请参阅剪枝综合指南

定义模型并应用稀疏性 API

在使用稀疏性 API 之前,需要对模型进行预训练。

import tensorflow_model_optimization as tfmot prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude pruning_params = { 'pruning_schedule': tfmot.sparsity.keras.ConstantSparsity(0.5, begin_step=0, frequency=100) } callbacks = [ tfmot.sparsity.keras.UpdatePruningStep() ] pruned_model = prune_low_magnitude(model, **pruning_params) # Use smaller learning rate for fine-tuning opt = tf.keras.optimizers.Adam(learning_rate=1e-5) pruned_model.compile( loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer=opt, metrics=['accuracy']) pruned_model.summary()

微调模型并根据基准评估准确率

在 3 个周期内使用剪枝对模型进行微调。

# Fine-tune model pruned_model.fit( train_images, train_labels, epochs=3, validation_split=0.1, callbacks=callbacks)

定义辅助函数来计算和打印模型的稀疏性。

def print_model_weights_sparsity(model): for layer in model.layers: if isinstance(layer, tf.keras.layers.Wrapper): weights = layer.trainable_weights else: weights = layer.weights for weight in weights: # ignore auxiliary quantization weights if "quantize_layer" in weight.name: continue weight_size = weight.numpy().size zero_num = np.count_nonzero(weight == 0) print( f"{weight.name}: {zero_num/weight_size:.2%} sparsity ", f"({zero_num}/{weight_size})", )

检查模型是否已被正确剪枝。我们需要先剥离剪枝包装器。

stripped_pruned_model = tfmot.sparsity.keras.strip_pruning(pruned_model) print_model_weights_sparsity(stripped_pruned_model)

对于本示例,与基准相比,剪枝后的测试准确率损失微乎其微。

_, pruned_model_accuracy = pruned_model.evaluate( test_images, test_labels, verbose=0) print('Baseline test accuracy:', baseline_model_accuracy) print('Pruned test accuracy:', pruned_model_accuracy)

应用 QAT 和 PQAT 并检查两种情况下对模型稀疏性的影响

接下来,我们对剪枝后的模型同时应用 QAT 和剪枝保留 QAT (PQAT),并观察 PQAT 在剪枝后的模型中保留稀疏性。请注意,在应用 PQAT API 之前,我们使用 tfmot.sparsity.keras.strip_pruning 从模型中剥离了剪枝包装器。

# QAT qat_model = tfmot.quantization.keras.quantize_model(stripped_pruned_model) qat_model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) print('Train qat model:') qat_model.fit(train_images, train_labels, batch_size=128, epochs=1, validation_split=0.1) # PQAT quant_aware_annotate_model = tfmot.quantization.keras.quantize_annotate_model( stripped_pruned_model) pqat_model = tfmot.quantization.keras.quantize_apply( quant_aware_annotate_model, tfmot.experimental.combine.Default8BitPrunePreserveQuantizeScheme()) pqat_model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) print('Train pqat model:') pqat_model.fit(train_images, train_labels, batch_size=128, epochs=1, validation_split=0.1)
print("QAT Model sparsity:") print_model_weights_sparsity(qat_model) print("PQAT Model sparsity:") print_model_weights_sparsity(pqat_model)

查看 PQAT 模型的压缩优势

定义辅助函数以获取压缩的模型文件。

def get_gzipped_model_size(file): # It returns the size of the gzipped model in kilobytes. _, zipped_file = tempfile.mkstemp('.zip') with zipfile.ZipFile(zipped_file, 'w', compression=zipfile.ZIP_DEFLATED) as f: f.write(file) return os.path.getsize(zipped_file)/1000

由于这是一个小型模型,因此两个模型之间的差异不是非常明显。将剪枝和 PQAT 应用于更大的生产模型将产生更显著的压缩效果。

# QAT model converter = tf.lite.TFLiteConverter.from_keras_model(qat_model) converter.optimizations = [tf.lite.Optimize.DEFAULT] qat_tflite_model = converter.convert() qat_model_file = 'qat_model.tflite' # Save the model. with open(qat_model_file, 'wb') as f: f.write(qat_tflite_model) # PQAT model converter = tf.lite.TFLiteConverter.from_keras_model(pqat_model) converter.optimizations = [tf.lite.Optimize.DEFAULT] pqat_tflite_model = converter.convert() pqat_model_file = 'pqat_model.tflite' # Save the model. with open(pqat_model_file, 'wb') as f: f.write(pqat_tflite_model) print("QAT model size: ", get_gzipped_model_size(qat_model_file), ' KB') print("PQAT model size: ", get_gzipped_model_size(pqat_model_file), ' KB')

查看从 TF 到 TFLite 的准确率持久性

定义一个辅助函数,基于测试数据集评估 TFLite 模型。

def eval_model(interpreter): input_index = interpreter.get_input_details()[0]["index"] output_index = interpreter.get_output_details()[0]["index"] # Run predictions on every image in the "test" dataset. prediction_digits = [] for i, test_image in enumerate(test_images): if i % 1000 == 0: print(f"Evaluated on {i} results so far.") # Pre-processing: add batch dimension and convert to float32 to match with # the model's input data format. test_image = np.expand_dims(test_image, axis=0).astype(np.float32) interpreter.set_tensor(input_index, test_image) # Run inference. interpreter.invoke() # Post-processing: remove batch dimension and find the digit with highest # probability. output = interpreter.tensor(output_index) digit = np.argmax(output()[0]) prediction_digits.append(digit) print('\n') # Compare prediction results with ground truth labels to calculate accuracy. prediction_digits = np.array(prediction_digits) accuracy = (prediction_digits == test_labels).mean() return accuracy

评估已被剪枝和量化的模型后,您将看到 TFLite 后端保持 TensorFlow 的准确率。

interpreter = tf.lite.Interpreter(pqat_model_file) interpreter.allocate_tensors() pqat_test_accuracy = eval_model(interpreter) print('Pruned and quantized TFLite test_accuracy:', pqat_test_accuracy) print('Pruned TF test accuracy:', pruned_model_accuracy)

应用训练后量化并与 PQAT 模型进行比较

接下来,我们对剪枝后的模型使用一般训练后量化(无微调),并根据 PQAT 模型检查其准确率。这演示了为什么需要使用 PQAT 来提高量化模型的准确率。

首先,根据前 1000 个训练图像定义一个校准数据集生成器。

def mnist_representative_data_gen(): for image in train_images[:1000]: image = np.expand_dims(image, axis=0).astype(np.float32) yield [image]

对模型进行量化并将准确率与先前获得的 PQAT 模型进行比较。请注意,通过微调量化的模型会实现更高的准确率。

converter = tf.lite.TFLiteConverter.from_keras_model(stripped_pruned_model) converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.representative_dataset = mnist_representative_data_gen post_training_tflite_model = converter.convert() post_training_model_file = 'post_training_model.tflite' # Save the model. with open(post_training_model_file, 'wb') as f: f.write(post_training_tflite_model) # Compare accuracy interpreter = tf.lite.Interpreter(post_training_model_file) interpreter.allocate_tensors() post_training_test_accuracy = eval_model(interpreter) print('PQAT TFLite test_accuracy:', pqat_test_accuracy) print('Post-training (no fine-tuning) TF test accuracy:', post_training_test_accuracy)

结论

在本教程中,您学习了如何创建模型,使用稀疏性 API 对其进行剪枝,以及应用稀疏性保留量化感知训练 (PQAT) 以在使用 QAT 时保留稀疏性。将最终的 PQAT 模型与 QAT 模型进行了比较,以表明前者保留了稀疏性,而后者丢失了稀疏性。接下来,将模型转换为 TFLite 以显示链式剪枝和 PQAT 模型优化技术的压缩优势,并评估 TFLite 模型以确保在 TFLite 后端保持准确率。最后,将 PQAT 模型与使用训练后量化 API 实现的量化剪枝模型进行比较,以展示 PQAT 在恢复正常量化的准确率损失方面的优势。