GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/ko/addons/tutorials/optimizers_conditionalgradient.ipynb
²⁵¹¹⁸ views

Kernel: Python 3

Copyright 2020 The TensorFlow Authors.

In [ ]:

#@title Licensed under the Apache License, Version 2.0
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

TensorFlow 애드온 옵티마이저: ConditionalGradient

개요

이 노트북은 애드온 패키지에서 Conditional Graident 옵티마이저를 사용하는 방법을 보여줍니다.

ConditionalGradient

신경망의 매개변수를 제한하면 기본적인 정규화 효과로 인해 훈련에 유익한 것으로 나타났습니다. 종종 매개변수는 소프트 페널티(제약 조건 만족을 보장하지 않음) 또는 프로젝션 연산(계산적으로 비쌈)을 통해 제한됩니다. 반면에 CG(Conditional Gradient) 옵티마이저는 값 비싼 프로젝션 단계 없이 제약 조건을 엄격하게 적용합니다. 제약 조건 세트 내에서 목표의 선형 근사치를 최소화하여 동작합니다. 이 노트북의 MNIST 데이터세트에서 CG 옵티마이저를 통해 Frobenius norm 제약 조건의 적용을 보여줍니다. CG는 이제 tensorflow API로 사용 가능합니다. 옵티마이저에 대한 자세한 내용은 https://arxiv.org/pdf/1803.06453.pdf를 참조하세요.

설정

In [ ]:

!pip install -U tensorflow-addons

In [ ]:

import tensorflow as tf
import tensorflow_addons as tfa
from matplotlib import pyplot as plt

In [ ]:

# Hyperparameters
batch_size=64
epochs=10

모델 빌드하기

In [ ]:

model_1 = tf.keras.Sequential([
    tf.keras.layers.Dense(64, input_shape=(784,), activation='relu', name='dense_1'),
    tf.keras.layers.Dense(64, activation='relu', name='dense_2'),
    tf.keras.layers.Dense(10, activation='softmax', name='predictions'),
])

데이터 준비하기

In [ ]:

# Load MNIST dataset as NumPy arrays
dataset = {}
num_validation = 10000
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Preprocess the data
x_train = x_train.reshape(-1, 784).astype('float32') / 255
x_test = x_test.reshape(-1, 784).astype('float32') / 255

사용자 정의 콜백 함수 정의하기

In [ ]:

def frobenius_norm(m):
    """This function is to calculate the frobenius norm of the matrix of all
    layer's weight.
  
    Args:
        m: is a list of weights param for each layers.
    """
    total_reduce_sum = 0
    for i in range(len(m)):
        total_reduce_sum = total_reduce_sum + tf.math.reduce_sum(m[i]**2)
    norm = total_reduce_sum**0.5
    return norm

In [ ]:

CG_frobenius_norm_of_weight = []
CG_get_weight_norm = tf.keras.callbacks.LambdaCallback(
    on_epoch_end=lambda batch, logs: CG_frobenius_norm_of_weight.append(
        frobenius_norm(model_1.trainable_weights).numpy()))

훈련 및 평가: CG를 옵티마이저로 사용하기

일반적인 keras 옵티마이저를 새로운 tfa 옵티마이저로 간단히 교체합니다.

In [ ]:

# Compile the model
model_1.compile(
    optimizer=tfa.optimizers.ConditionalGradient(
        learning_rate=0.99949, lambda_=203),  # Utilize TFA optimizer
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),
    metrics=['accuracy'])

history_cg = model_1.fit(
    x_train,
    y_train,
    batch_size=batch_size,
    validation_data=(x_test, y_test),
    epochs=epochs,
    callbacks=[CG_get_weight_norm])

훈련 및 평가: SGD를 옵티마이저로 사용하기

In [ ]:

model_2 = tf.keras.Sequential([
    tf.keras.layers.Dense(64, input_shape=(784,), activation='relu', name='dense_1'),
    tf.keras.layers.Dense(64, activation='relu', name='dense_2'),
    tf.keras.layers.Dense(10, activation='softmax', name='predictions'),
])

In [ ]:

SGD_frobenius_norm_of_weight = []
SGD_get_weight_norm = tf.keras.callbacks.LambdaCallback(
    on_epoch_end=lambda batch, logs: SGD_frobenius_norm_of_weight.append(
        frobenius_norm(model_2.trainable_weights).numpy()))

In [ ]:

# Compile the model
model_2.compile(
    optimizer=tf.keras.optimizers.SGD(0.01),  # Utilize SGD optimizer
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),
    metrics=['accuracy'])

history_sgd = model_2.fit(
    x_train,
    y_train,
    batch_size=batch_size,
    validation_data=(x_test, y_test),
    epochs=epochs,
    callbacks=[SGD_get_weight_norm])

가중치의 Frobenius Norm: CG vs SGD

CG 옵티마이저의 현재 구현은 Frobenius Norm을 대상 함수의 regularizer로 고려하여 Frobenius Norm을 기반으로 합니다. 따라서 CG의 정규화 효과를 Frobenius Norm regularizer를 부과하지 않은 SGD optimizer와 비교합니다.

In [ ]:

plt.plot(
    CG_frobenius_norm_of_weight,
    color='r',
    label='CG_frobenius_norm_of_weights')
plt.plot(
    SGD_frobenius_norm_of_weight,
    color='b',
    label='SGD_frobenius_norm_of_weights')
plt.xlabel('Epoch')
plt.ylabel('Frobenius norm of weights')
plt.legend(loc=1)

훈련 및 검증 정확성: CG vs SGD

In [ ]:

plt.plot(history_cg.history['accuracy'], color='r', label='CG_train')
plt.plot(history_cg.history['val_accuracy'], color='g', label='CG_test')
plt.plot(history_sgd.history['accuracy'], color='pink', label='SGD_train')
plt.plot(history_sgd.history['val_accuracy'], color='b', label='SGD_test')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(loc=4)