Path: blob/master/site/en-snapshot/guide/migrate/mirrored_strategy.ipynb
25118 views
Copyright 2021 The TensorFlow Authors.
Migrate single-worker multiple-GPU training
This guide demonstrates how to migrate the single-worker multiple-GPU workflows from TensorFlow 1 to TensorFlow 2.
To perform synchronous training across multiple GPUs on one machine:
In TensorFlow 1, you use the
tf.estimator.Estimator
APIs withtf.distribute.MirroredStrategy
.In TensorFlow 2, you can use Keras Model.fit or a custom training loop with
tf.distribute.MirroredStrategy
. Learn more in the Distributed training with TensorFlow guide.
Setup
Start with imports and a simple dataset for demonstration purposes:
TensorFlow 1: Single-worker distributed training with tf.estimator.Estimator
This example demonstrates the TensorFlow 1 canonical workflow of single-worker multiple-GPU training. You need to set the distribution strategy (tf.distribute.MirroredStrategy
) through the config
parameter of the tf.estimator.Estimator
:
TensorFlow 2: Single-worker training with Keras
When migrating to TensorFlow 2, you can use the Keras APIs with tf.distribute.MirroredStrategy
.
If you use the tf.keras
APIs for model building and Keras Model.fit
for training, the main difference is instantiating the Keras model, an optimizer, and metrics in the context of Strategy.scope
, instead of defining a config
for tf.estimator.Estimator
.
If you need to use a custom training loop, check out the Using tf.distribute.Strategy with custom training loops guide.
Next steps
To learn more about distributed training with tf.distribute.MirroredStrategy
in TensorFlow 2, check out the following documentation:
The Distributed training on one machine with Keras tutorial
The Distributed training on one machine with a custom training loop tutorial
The Distributed training with TensorFlow guide
The Using multiple GPUs guide
The Optimize the performance on the multi-GPU single host (with the TensorFlow Profiler) guide