GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/zh-cn/addons/tutorials/optimizers_cyclicallearningrate.ipynb
²⁵¹¹⁸ views

Kernel: Python 3

Copyright 2021 The TensorFlow Authors.

In [ ]:

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

TensorFlow Addons 优化器: CyclicalLearningRate

在 TensorFlow.org 上查看

在 Google Colab 中运行

在 Github 上查看源代码

下载笔记本

概述

本教程将演示如何使用 Addons 软件包中的周期学习率。

周期学习率

在神经网络训练过程中调整学习率，被广泛认为是有益的。这可以带来多种好处，如逃离鞍点和防止反向传播中可能出现的数值不稳定。但对于一个特定的训练时间应该调整多少呢？2015年，Leslie Smith 注意到，应当提高学习率以尽快穿越损失空间，但也应当在接近收敛时降低学习率。为了实现这一想法，他提出了周期学习率（Cyclical Learning Rates，CLR），一种根据函数周期来调整学习率的方法。这个博客提供了一个直观的演示。CLR 现已加入TensorFlow API。要了解更多细节，请阅读原始论文。

安装

In [ ]:

!pip install -q -U tensorflow_addons

In [ ]:

from tensorflow.keras import layers
import tensorflow_addons as tfa
import tensorflow as tf

import numpy as np
import matplotlib.pyplot as plt

tf.random.set_seed(42)
np.random.seed(42)

加载和预处理数据集

In [ ]:

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()

x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)

设置超参数

In [ ]:

BATCH_SIZE = 64
EPOCHS = 10
INIT_LR = 1e-4
MAX_LR = 1e-2

定义模型构建和训练方法

In [ ]:

def get_training_model():
    model = tf.keras.Sequential(
        [
            layers.InputLayer((28, 28, 1)),
            layers.experimental.preprocessing.Rescaling(scale=1./255),
            layers.Conv2D(16, (5, 5), activation="relu"),
            layers.MaxPooling2D(pool_size=(2, 2)),
            layers.Conv2D(32, (5, 5), activation="relu"),
            layers.MaxPooling2D(pool_size=(2, 2)),
            layers.SpatialDropout2D(0.2),
            layers.GlobalAvgPool2D(),
            layers.Dense(128, activation="relu"),
            layers.Dense(10, activation="softmax"),
        ]
    )
    return model

def train_model(model, optimizer):
    model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer,
                       metrics=["accuracy"])
    history = model.fit(x_train,
        y_train,
        batch_size=BATCH_SIZE,
        validation_data=(x_test, y_test),
        epochs=EPOCHS)
    return history

为了可重现，初始模型权重被序列化存储，你将用它来进行我们的实验。

In [ ]:

initial_model = get_training_model()
initial_model.save("initial_model")

不用 CLR 训练模型

In [ ]:

standard_model = tf.keras.models.load_model("initial_model")
no_clr_history = train_model(standard_model, optimizer="sgd")

设置 CLR 计划

tfa.optimizers.CyclicalLearningRate 模块返回一个可以传递给优化器的计划。该计划的输入是步数，输出值由论文中列出的 CLR 公式计算。

In [ ]:

steps_per_epoch = len(x_train) // BATCH_SIZE
clr = tfa.optimizers.CyclicalLearningRate(initial_learning_rate=INIT_LR,
    maximal_learning_rate=MAX_LR,
    scale_fn=lambda x: 1/(2.**(x-1)),
    step_size=2 * steps_per_epoch
)
optimizer = tf.keras.optimizers.SGD(clr)

在这里，你指定了学习率的上下限，并且计划将在该范围之间波动（本例为 [1e-4, 1e-2]）。 scale_fn 用于定义一个在指定周期内缩放学习率的函数。 step_size 定义一个周期的持续时间。 step_size 为 2 意味着您需要经过 4 次迭代才能完成一个周期。推荐的 step_size 设置如下：

factor * steps_per_epoch，factor 的范围是 [2, 8]。

在这篇CLR 论文中，Leslie 还提出了一种简单、优雅的方法来选择学习率范围。我们也推荐你去看看，这篇博客很好地介绍了它。

下面，通过可视化具体的观察clr计划。

In [ ]:

step = np.arange(0, EPOCHS * steps_per_epoch)
lr = clr(step)
plt.plot(step, lr)
plt.xlabel("Steps")
plt.ylabel("Learning Rate")
plt.show()

为了更好地观察 CLR 的效果，你可以绘制带有更多步骤的计划。

In [ ]:

step = np.arange(0, 100 * steps_per_epoch)
lr = clr(step)
plt.plot(step, lr)
plt.xlabel("Steps")
plt.ylabel("Learning Rate")
plt.show()

您在本教程中使用的函数在 CLR 论文中被称为triangular2。它还讨论了另外两个函数，triangular和exp（指数的缩写）。

使用 CLR 训练模型

In [ ]:

clr_model = tf.keras.models.load_model("initial_model")
clr_history = train_model(clr_model, optimizer=optimizer)

与预期一致，开始时损失比平时高，然后随着周期变化稳定下来。下图可以直观地确认这一点。

损失可视化

In [ ]:

(fig, ax) = plt.subplots(2, 1, figsize=(10, 8))

ax[0].plot(no_clr_history.history["loss"], label="train_loss")
ax[0].plot(no_clr_history.history["val_loss"], label="val_loss")
ax[0].set_title("No CLR")
ax[0].set_xlabel("Epochs")
ax[0].set_ylabel("Loss")
ax[0].set_ylim([0, 2.5])
ax[0].legend()

ax[1].plot(clr_history.history["loss"], label="train_loss")
ax[1].plot(clr_history.history["val_loss"], label="val_loss")
ax[1].set_title("CLR")
ax[1].set_xlabel("Epochs")
ax[1].set_ylabel("Loss")
ax[1].set_ylim([0, 2.5])
ax[1].legend()

fig.tight_layout(pad=3.0)
fig.show()

尽管对于这个小例子，你没有看到 CLR 真正的力量，但请注意，它是超级收敛的主要因素之一，并且在大规模训练时可以获得非常好的效果。