GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/ja/addons/tutorials/optimizers_cyclicallearningrate.ipynb
²⁵¹¹⁸ views

Kernel: Python 3

Copyright 2021 The TensorFlow Authors.

In [ ]:

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

TensorFlow Addons オプティマイザ: CyclicalLearningRate

概要

このチュートリアルでは、Addons パッケージの Cyclical Learning Rate（循環学習率）の使用方法を説明します。

循環学習率

トレーニングプロセスが進むにつれ、ニューラルネットワークの学習率を調整することにメリットがあることが証明されています。このメリットは、サドルポイントの回復から、逆伝播中に発生する可能性のある数値の不安定性の防止に至るまでさまざまですが、特定のトレーニングタイムスタンプに関してどの程度調整すべきかを知るにはどうすればよいのでしょうか。2015 年、Leslie Smith は、損失の状況をより速くトラバースするには学習率を上げ、収束に近づくときは学習率を下げることに気づきました。この考えを実現するため、Smith は関数の循環に関して学習率を調整する循環学習率（CLR）を提案しました。視覚的なデモについては、こちらのブログをご覧ください。CLR は現在、TensorFlow API として提供されています。詳細については、こちらにある元の論文をご覧ください。

MNIST モデルをビルドする

In [ ]:

!pip install -q -U tensorflow_addons

In [ ]:

from tensorflow.keras import layers
import tensorflow_addons as tfa
import tensorflow as tf

import numpy as np
import matplotlib.pyplot as plt

tf.random.set_seed(42)
np.random.seed(42)

データセットを読み込んで準備する

In [ ]:

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()

x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)

ハイパーパラメータを定義する

In [ ]:

BATCH_SIZE = 64
EPOCHS = 10
INIT_LR = 1e-4
MAX_LR = 1e-2

モデル構築とモデルトレーニングのユーティリティを定義する

In [ ]:

def get_training_model():
    model = tf.keras.Sequential(
        [
            layers.InputLayer((28, 28, 1)),
            layers.experimental.preprocessing.Rescaling(scale=1./255),
            layers.Conv2D(16, (5, 5), activation="relu"),
            layers.MaxPooling2D(pool_size=(2, 2)),
            layers.Conv2D(32, (5, 5), activation="relu"),
            layers.MaxPooling2D(pool_size=(2, 2)),
            layers.SpatialDropout2D(0.2),
            layers.GlobalAvgPool2D(),
            layers.Dense(128, activation="relu"),
            layers.Dense(10, activation="softmax"),
        ]
    )
    return model

def train_model(model, optimizer):
    model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer,
                       metrics=["accuracy"])
    history = model.fit(x_train,
        y_train,
        batch_size=BATCH_SIZE,
        validation_data=(x_test, y_test),
        epochs=EPOCHS)
    return history

再現性を確保するために、初期モデルの重みはシリアル化されており、これを使用して実験を行います。

In [ ]:

initial_model = get_training_model()
initial_model.save("initial_model")

CLR を使用せずにモデルをトレーニングする

In [ ]:

standard_model = tf.keras.models.load_model("initial_model")
no_clr_history = train_model(standard_model, optimizer="sgd")

CLR スケジュールを定義する

tfa.optimizers.CyclicalLearningRate モジュールは、オプティマイザに渡すことのできるダイレクトスケジュールを返します。このスケジュールはステップを入力として取り、論文で説明されているように CLR の公式を使用して計算された値を出力します。

In [ ]:

steps_per_epoch = len(x_train) // BATCH_SIZE
clr = tfa.optimizers.CyclicalLearningRate(initial_learning_rate=INIT_LR,
    maximal_learning_rate=MAX_LR,
    scale_fn=lambda x: 1/(2.**(x-1)),
    step_size=2 * steps_per_epoch
)
optimizer = tf.keras.optimizers.SGD(clr)

ここでは、学習率の下限と上限を指定すると、スケジュールはその範囲（この場合は [1e-4, 1e-2]）で振動します。scale_fn は、特定のサイクル内の学習率を上げたり下げたりする関数を定義するために使用されます。step_size は 1 つのサイクルの期間を定義します。step_size が 2 の場合、1 つのサイクルを完了するには合計 4 つのイテレーションが必要となります。以下に、推奨される step_size の値を示します。

factor * steps_per_epoch、ここで factor は [2, 8] の範囲です。

同じ CLR の論文の中で、Leslie は学習率の境界を単純かつエレガントに選択する方法も説明しています。ぜひそれも確認してください。このブログ記事では、その方法の基本部分を説明しています。

以下では、clr スケジュールがどのように見えるかを視覚化します。

In [ ]:

step = np.arange(0, EPOCHS * steps_per_epoch)
lr = clr(step)
plt.plot(step, lr)
plt.xlabel("Steps")
plt.ylabel("Learning Rate")
plt.show()

CLR の効果をさらにうまく視覚化するには、ステップ数を増やしてスケジュールをプロットできます。

In [ ]:

step = np.arange(0, 100 * steps_per_epoch)
lr = clr(step)
plt.plot(step, lr)
plt.xlabel("Steps")
plt.ylabel("Learning Rate")
plt.show()

このチュートリアルで使用している関数は、CLR の論文の中では triangular2 メソッドと呼ばれています。このほかにも、triangular と exp（指数の略）という 2 つの関数についても説明されています。

CLR を使ってモデルをトレーニングする

In [ ]:

clr_model = tf.keras.models.load_model("initial_model")
clr_history = train_model(clr_model, optimizer=optimizer)

期待どおり、損失は通常より高く開始し、サイクルが進むにつれて安定します。これは、以下のプロットで視覚的に確認できます。

損失を視覚化する

In [ ]:

(fig, ax) = plt.subplots(2, 1, figsize=(10, 8))

ax[0].plot(no_clr_history.history["loss"], label="train_loss")
ax[0].plot(no_clr_history.history["val_loss"], label="val_loss")
ax[0].set_title("No CLR")
ax[0].set_xlabel("Epochs")
ax[0].set_ylabel("Loss")
ax[0].set_ylim([0, 2.5])
ax[0].legend()

ax[1].plot(clr_history.history["loss"], label="train_loss")
ax[1].plot(clr_history.history["val_loss"], label="val_loss")
ax[1].set_title("CLR")
ax[1].set_xlabel("Epochs")
ax[1].set_ylabel("Loss")
ax[1].set_ylim([0, 2.5])
ax[1].legend()

fig.tight_layout(pad=3.0)
fig.show()

このトイの例では、CLR の効果をあまり確認していませんが、これは超収束の主な材料の 1 つであり、大規模な設定でトレーニングする際に、非常に優れた効果を持てることを確認できました。