GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/en-snapshot/addons/tutorials/average_optimizers_callback.ipynb
²⁵¹¹⁸ views

Kernel: Python 3

Copyright 2020 The TensorFlow Authors.

In [ ]:

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Model Averaging

View on TensorFlow.org

Run in Google Colab

View source on GitHub

Download notebook

Overview

This notebook demonstrates how to use Moving Average Optimizer along with the Model Average Checkpoint from tensorflow addons package.

Moving Averaging

The advantage of Moving Averaging is that they are less prone to rampant loss shifts or irregular data representation in the latest batch. It gives a smooothened and a more general idea of the model training until some point.

Stochastic Averaging

Stochastic Weight Averaging converges to wider optima. By doing so, it resembles geometric ensembeling. SWA is a simple method to improve model performance when used as a wrapper around other optimizers and averaging results from different points of trajectory of the inner optimizer.

Model Average Checkpoint

callbacks.ModelCheckpoint doesn't give you the option to save moving average weights in the middle of training, which is why Model Average Optimizers required a custom callback. Using the update_weights parameter, ModelAverageCheckpoint allows you to:

Assign the moving average weights to the model, and save them.
Keep the old non-averaged weights, but the saved model uses the average weights.

Setup

In [ ]:

!pip install -U tensorflow-addons

In [ ]:

import tensorflow as tf
import tensorflow_addons as tfa

In [ ]:

import numpy as np
import os

Build Model

In [ ]:

def create_model(opt):
    model = tf.keras.models.Sequential([
        tf.keras.layers.Flatten(),                         
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(10, activation='softmax')
    ])

    model.compile(optimizer=opt,
                    loss='sparse_categorical_crossentropy',
                    metrics=['accuracy'])

    return model

Prepare Dataset

In [ ]:

#Load Fashion MNIST dataset
train, test = tf.keras.datasets.fashion_mnist.load_data()

images, labels = train
images = images/255.0
labels = labels.astype(np.int32)

fmnist_train_ds = tf.data.Dataset.from_tensor_slices((images, labels))
fmnist_train_ds = fmnist_train_ds.shuffle(5000).batch(32)

test_images, test_labels = test

We will be comparing three optimizers here:

Unwrapped SGD
SGD with Moving Average
SGD with Stochastic Weight Averaging

And see how they perform with the same model.

In [ ]:

#Optimizers 
sgd = tf.keras.optimizers.SGD(0.01)
moving_avg_sgd = tfa.optimizers.MovingAverage(sgd)
stocastic_avg_sgd = tfa.optimizers.SWA(sgd)

Both MovingAverage and StochasticAverage optimizers use ModelAverageCheckpoint.

In [ ]:

#Callback 
checkpoint_path = "./training/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_dir,
                                                 save_weights_only=True,
                                                 verbose=1)
avg_callback = tfa.callbacks.AverageModelCheckpoint(filepath=checkpoint_dir, 
                                                    update_weights=True)

Train Model

Vanilla SGD Optimizer

In [ ]:

#Build Model
model = create_model(sgd)

#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[cp_callback])

In [ ]:

#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)

Moving Average SGD

In [ ]:

#Build Model
model = create_model(moving_avg_sgd)

#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[avg_callback])

In [ ]:

#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)

Stocastic Weight Average SGD

In [ ]:

#Build Model
model = create_model(stocastic_avg_sgd)

#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[avg_callback])

In [ ]:

#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)