GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/zh-cn/model_optimization/guide/pruning/comprehensive_guide.ipynb
²⁵¹¹⁸ views

Kernel: Python 3

Copyright 2020 The TensorFlow Authors.

In [ ]:

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

剪枝综合指南

欢迎阅读 Keras 权重剪枝综合指南。

本页面记录了各种用例，并展示了如何将 API 用于每种用例。了解需要使用哪些 API 后，请在 API 文档中找到参数和底层详细信息：

如果要查看剪枝的好处以及支持的功能，请参阅概述。
有关单个端到端示例，请参阅剪枝示例。

涵盖以下用例：

定义并训练剪枝模型。
- 序贯模型和函数式模型
- Keras model.fit 和自定义训练循环
为剪枝模型设置检查点和进行反序列化。
部署剪枝模型并查看压缩优势。

有关剪枝算法的配置，请参阅 tfmot.sparsity.keras.prune_low_magnitude API 文档。

设置

如果只是寻找您需要的 API 并了解其用途，您可以只运行以下代码而无需阅读本部分。

In [ ]:

! pip install -q tensorflow-model-optimization

import tensorflow as tf
import numpy as np
import tensorflow_model_optimization as tfmot

%load_ext tensorboard

import tempfile

input_shape = [20]
x_train = np.random.randn(1, 20).astype(np.float32)
y_train = tf.keras.utils.to_categorical(np.random.randn(1), num_classes=20)

def setup_model():
  model = tf.keras.Sequential([
      tf.keras.layers.Dense(20, input_shape=input_shape),
      tf.keras.layers.Flatten()
  ])
  return model

def setup_pretrained_weights():
  model = setup_model()

  model.compile(
      loss=tf.keras.losses.categorical_crossentropy,
      optimizer='adam',
      metrics=['accuracy']
  )

  model.fit(x_train, y_train)

  _, pretrained_weights = tempfile.mkstemp('.tf')

  model.save_weights(pretrained_weights)

  return pretrained_weights

def get_gzipped_model_size(model):
  # Returns size of gzipped model, in bytes.
  import os
  import zipfile

  _, keras_file = tempfile.mkstemp('.h5')
  model.save(keras_file, include_optimizer=False)

  _, zipped_file = tempfile.mkstemp('.zip')
  with zipfile.ZipFile(zipped_file, 'w', compression=zipfile.ZIP_DEFLATED) as f:
    f.write(keras_file)

  return os.path.getsize(zipped_file)

setup_model()
pretrained_weights = setup_pretrained_weights()

定义模型

对整个模型进行剪枝（序贯模型和函数式模型）

提高模型准确率的提示：

尝试“对某些层进行剪枝”以跳过修剪会最大程度降低准确率的层。
与从头开始训练相比，通常最好通过剪枝进行微调。

要通过剪枝训练整个模型，将 tfmot.sparsity.keras.prune_low_magnitude 应用于模型。

In [ ]:

base_model = setup_model()
base_model.load_weights(pretrained_weights) # optional but recommended.

model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(base_model)

model_for_pruning.summary()

对某些层进行剪枝（序贯模型和函数式模型）

修剪模型可能会对准确率造成负面影响。您可以选择性地对模型的某些层进行剪枝以探索如何在准确率、速度和模型大小之间进行权衡。

提高模型准确率的提示：

与从头开始训练相比，通常最好通过剪枝进行微调。
尝试对后面的层而不是前面的层进行剪枝。
避免对关键层（例如注意力机制）进行剪枝。

更多提示：

tfmot.sparsity.keras.prune_low_magnitude API 文档提供了有关如何更改每层的剪枝配置的详细信息。

在下面的示例中，仅对 Dense 层进行剪枝。

In [ ]:

# Create a base model
base_model = setup_model()
base_model.load_weights(pretrained_weights) # optional but recommended for model accuracy

# Helper function uses `prune_low_magnitude` to make only the 
# Dense layers train with pruning.
def apply_pruning_to_dense(layer):
  if isinstance(layer, tf.keras.layers.Dense):
    return tfmot.sparsity.keras.prune_low_magnitude(layer)
  return layer

# Use `tf.keras.models.clone_model` to apply `apply_pruning_to_dense` 
# to the layers of the model.
model_for_pruning = tf.keras.models.clone_model(
    base_model,
    clone_function=apply_pruning_to_dense,
)

model_for_pruning.summary()

尽管此示例使用层的类型来决定要修剪的内容，但是对特定层进行剪枝的最简单方法是设置其 name 属性，然后在 clone_function 中查找该名称。

In [ ]:

print(base_model.layers[0].name)

更具可读性，但可能会降低模型准确率

这不兼容通过剪枝进行的微调，因此，它的准确率可能低于上述支持微调的示例。

虽然在定义初始模型时可以应用 prune_low_magnitude，但之后加载权重在以下示例中不起作用。

函数式模型示例

In [ ]:

# Use `prune_low_magnitude` to make the `Dense` layer train with pruning.
i = tf.keras.Input(shape=(20,))
x = tfmot.sparsity.keras.prune_low_magnitude(tf.keras.layers.Dense(10))(i)
o = tf.keras.layers.Flatten()(x)
model_for_pruning = tf.keras.Model(inputs=i, outputs=o)

model_for_pruning.summary()

序贯模型示例

In [ ]:

# Use `prune_low_magnitude` to make the `Dense` layer train with pruning.
model_for_pruning = tf.keras.Sequential([
  tfmot.sparsity.keras.prune_low_magnitude(tf.keras.layers.Dense(20, input_shape=input_shape)),
  tf.keras.layers.Flatten()
])

model_for_pruning.summary()

对自定义 Keras 层进行剪枝或修改层的部分以进行剪枝

常见误区：对偏差进行剪枝通常会严重损害模型准确率。

tfmot.sparsity.keras.PrunableLayer 适用于两个用例：

对自定义 Keras 层进行剪枝
修改内置 Keras 层的某些部分以进行剪枝

例如，API 默认只对 Dense 层的内核进行剪枝。下面的示例还会对偏差进行剪枝。

In [ ]:

class MyDenseLayer(tf.keras.layers.Dense, tfmot.sparsity.keras.PrunableLayer):

  def get_prunable_weights(self):
    # Prune bias also, though that usually harms model accuracy too much.
    return [self.kernel, self.bias]

# Use `prune_low_magnitude` to make the `MyDenseLayer` layer train with pruning.
model_for_pruning = tf.keras.Sequential([
  tfmot.sparsity.keras.prune_low_magnitude(MyDenseLayer(20, input_shape=input_shape)),
  tf.keras.layers.Flatten()
])

model_for_pruning.summary()

训练模型

Model.fit

在训练期间中调用 tfmot.sparsity.keras.UpdatePruningStep 回调。

为了帮助调试训练，请使用 tfmot.sparsity.keras.PruningSummaries 回调。

In [ ]:

# Define the model.
base_model = setup_model()
base_model.load_weights(pretrained_weights) # optional but recommended for model accuracy
model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(base_model)

log_dir = tempfile.mkdtemp()
callbacks = [
    tfmot.sparsity.keras.UpdatePruningStep(),
    # Log sparsity and other metrics in Tensorboard.
    tfmot.sparsity.keras.PruningSummaries(log_dir=log_dir)
]

model_for_pruning.compile(
      loss=tf.keras.losses.categorical_crossentropy,
      optimizer='adam',
      metrics=['accuracy']
)

model_for_pruning.fit(
    x_train,
    y_train,
    callbacks=callbacks,
    epochs=2,
)

#docs_infra: no_execute
%tensorboard --logdir={log_dir}

对于非 Colab 用户，您可以在 TensorBoard.dev 上查看此代码块先前运行的结果。

自定义训练循环

在训练期间中调用 tfmot.sparsity.keras.UpdatePruningStep 回调。

为了帮助调试训练，请使用 tfmot.sparsity.keras.PruningSummaries 回调。

In [ ]:

# Define the model.
base_model = setup_model()
base_model.load_weights(pretrained_weights) # optional but recommended for model accuracy
model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(base_model)

# Boilerplate
loss = tf.keras.losses.categorical_crossentropy
optimizer = tf.keras.optimizers.Adam()
log_dir = tempfile.mkdtemp()
unused_arg = -1
epochs = 2
batches = 1 # example is hardcoded so that the number of batches cannot change.

# Non-boilerplate.
model_for_pruning.optimizer = optimizer
step_callback = tfmot.sparsity.keras.UpdatePruningStep()
step_callback.set_model(model_for_pruning)
log_callback = tfmot.sparsity.keras.PruningSummaries(log_dir=log_dir) # Log sparsity and other metrics in Tensorboard.
log_callback.set_model(model_for_pruning)

step_callback.on_train_begin() # run pruning callback
for _ in range(epochs):
  log_callback.on_epoch_begin(epoch=unused_arg) # run pruning callback
  for _ in range(batches):
    step_callback.on_train_batch_begin(batch=unused_arg) # run pruning callback

    with tf.GradientTape() as tape:
      logits = model_for_pruning(x_train, training=True)
      loss_value = loss(y_train, logits)
      grads = tape.gradient(loss_value, model_for_pruning.trainable_variables)
      optimizer.apply_gradients(zip(grads, model_for_pruning.trainable_variables))

  step_callback.on_epoch_end(batch=unused_arg) # run pruning callback

#docs_infra: no_execute
%tensorboard --logdir={log_dir}

对于非 Colab 用户，您可以在 TensorBoard.dev 上查看此代码块先前运行的结果。

提高剪枝模型准确率

首先，查看 tfmot.sparsity.keras.prune_low_magnitude API 文档，了解什么是剪枝计划，以及每种类型的剪枝计划的数学。

提示：

对模型进行剪枝时，学习率不要太高或太低。将剪枝计划视为一个超参数。
作为快速测试，尝试在训练开始时将模型剪枝到最终的稀疏度（通过使用 tfmot.sparsity.keras.ConstantSparsity 计划将 begin_step 设置为 0 来实现）。如果运气好的话，会获得不错的结果。
不要频繁剪枝，使模型有时间恢复。剪枝计划提供了不错的默认频率。
有关提高模型准确率的总体思路，请在“定义模型”下查找您的用例对应的提示。

设置检查点和进行反序列化

您必须在检查点操作期间保留优化器步骤。这意味着虽然可以在检查点操作中使用 Keras HDF5 模型，但不能使用 Keras HDF5 权重。

In [ ]:

# Define the model.
base_model = setup_model()
base_model.load_weights(pretrained_weights) # optional but recommended for model accuracy
model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(base_model)

_, keras_model_file = tempfile.mkstemp('.h5')

# Checkpoint: saving the optimizer is necessary (include_optimizer=True is the default).
model_for_pruning.save(keras_model_file, include_optimizer=True)

上述代码普遍适用。仅 HDF5 模型格式需要以下代码（HDF5 权重或其他格式不需要）。

In [ ]:

# Deserialize model.
with tfmot.sparsity.keras.prune_scope():
  loaded_model = tf.keras.models.load_model(keras_model_file)

loaded_model.summary()

部署剪枝模型

导出大小经过压缩的模型

常见误区：要体现剪枝的压缩优势，strip_pruning 和应用标准压缩算法（例如通过 Gzip）缺一不可。

In [ ]:

# Define the model.
base_model = setup_model()
base_model.load_weights(pretrained_weights) # optional but recommended for model accuracy
model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(base_model)

# Typically you train the model here.

model_for_export = tfmot.sparsity.keras.strip_pruning(model_for_pruning)

print("final model")
model_for_export.summary()

print("\n")
print("Size of gzipped pruned model without stripping: %.2f bytes" % (get_gzipped_model_size(model_for_pruning)))
print("Size of gzipped pruned model with stripping: %.2f bytes" % (get_gzipped_model_size(model_for_export)))

特定于硬件的优化

当不同的后端启用剪枝以改善延迟后，使用块稀疏度可以改善某些硬件的延迟。

增加块大小将减小目标模型准确率可达到的峰值稀疏度。尽管如此，延迟仍可以改善。

有关块稀疏度支持的功能的详细信息，请参阅 tfmot.sparsity.keras.prune_low_magnitude API 文档。

In [ ]:

base_model = setup_model()

# For using intrinsics on a CPU with 128-bit registers, together with 8-bit
# quantized weights, a 1x16 block size is nice because the block perfectly
# fits into the register.
pruning_params = {'block_size': [1, 16]}
model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(base_model, **pruning_params)

model_for_pruning.summary()