Path: blob/master/site/en-snapshot/guide/migrate/tpu_embedding.ipynb
25118 views
Copyright 2021 The TensorFlow Authors.
Migrate from TPU embedding_columns to TPUEmbedding layer
This guide demonstrates how to migrate embedding training on on TPUs from TensorFlow 1's embedding_column
API with TPUEstimator
to TensorFlow 2's TPUEmbedding
layer API with TPUStrategy
.
Embeddings are (large) matrices. They are lookup tables that map from a sparse feature space to dense vectors. Embeddings provide efficient and dense representations, capturing complex similarities and relationships between features.
TensorFlow includes specialized support for training embeddings on TPUs. This TPU-specific embedding support allows you to train embeddings that are larger than the memory of a single TPU device, and to use sparse and ragged inputs on TPUs.
In TensorFlow 1,
tf.compat.v1.estimator.tpu.TPUEstimator
is a high level API that encapsulates training, evaluation, prediction, and exporting for serving with TPUs. It has special support fortf.compat.v1.tpu.experimental.embedding_column
.To implement this in TensorFlow 2, use the TensorFlow Recommenders'
tfrs.layers.embedding.TPUEmbedding
layer. For training and evaluation, use a TPU distribution strategy—tf.distribute.TPUStrategy
—which is compatible with the Keras APIs for, for example, model building (tf.keras.Model
), optimizers (tf.keras.optimizers.Optimizer
), and training withModel.fit
or a custom training loop withtf.function
andtf.GradientTape
.
For additional information, refer to the tfrs.layers.embedding.TPUEmbedding
layer's API documentation, as well as the tf.tpu.experimental.embedding.TableConfig
and tf.tpu.experimental.embedding.FeatureConfig
docs for additional information. For an overview of tf.distribute.TPUStrategy
, check out the Distributed training guide and the Use TPUs guide. If you're migrating from TPUEstimator
to TPUStrategy
, check out the TPU migration guide.
Setup
Start by installing TensorFlow Recommenders and importing some necessary packages:
And prepare a simple dataset for demonstration purposes:
TensorFlow 1: Train embeddings on TPUs with TPUEstimator
In TensorFlow 1, you set up TPU embeddings using the tf.compat.v1.tpu.experimental.embedding_column
API and train/evaluate the model on TPUs with tf.compat.v1.estimator.tpu.TPUEstimator
.
The inputs are integers ranging from zero to the vocabulary size for the TPU embedding table. Begin with encoding the inputs to categorical ID with tf.feature_column.categorical_column_with_identity
. Use "sparse_feature"
for the key
parameter, since the input features are integer-valued, while num_buckets
is the vocabulary size for the embedding table (10
).
Next, convert the sparse categorical inputs to a dense representation with tpu.experimental.embedding_column
, where dimension
is the width of the embedding table. It will store an embedding vector for each of the num_buckets
.
Now, define the TPU-specific embedding configuration via tf.estimator.tpu.experimental.EmbeddingConfigSpec
. You will pass it later to tf.estimator.tpu.TPUEstimator
as an embedding_config_spec
parameter.
Next, to use a TPUEstimator
, define:
An input function for the training data
An evaluation input function for the evaluation data
A model function for instructing the
TPUEstimator
how the training op is defined with the features and labels
With those functions defined, create a tf.distribute.cluster_resolver.TPUClusterResolver
that provides the cluster information, and a tf.compat.v1.estimator.tpu.RunConfig
object.
Along with the model function you have defined, you can now create a TPUEstimator
. Here, you will simplify the flow by skipping checkpoint savings. Then, you will specify the batch size for both training and evaluation for the TPUEstimator
.
Call TPUEstimator.train
to begin training the model:
Then, call TPUEstimator.evaluate
to evaluate the model using the evaluation data:
TensorFlow 2: Train embeddings on TPUs with TPUStrategy
In TensorFlow 2, to train on the TPU workers, use tf.distribute.TPUStrategy
together with the Keras APIs for model definition and training/evaluation. (Refer to the Use TPUs guide for more examples of training with Keras Model.fit and a custom training loop (with tf.function
and tf.GradientTape
).)
Since you need to perform some initialization work to connect to the remote cluster and initialize the TPU workers, start by creating a TPUClusterResolver
to provide the cluster information and connect to the cluster. (Learn more in the TPU initialization section of the Use TPUs guide.)
Next, prepare your data. This is similar to how you created a dataset in the TensorFlow 1 example, except the dataset function is now passed a tf.distribute.InputContext
object rather than a params
dict. You can use this object to determine the local batch size (and which host this pipeline is for, so you can properly partition your data).
When using the
tfrs.layers.embedding.TPUEmbedding
API, it is important to include thedrop_remainder=True
option when batching the dataset withDataset.batch
, sinceTPUEmbedding
requires a fixed batch size.Additionally, the same batch size must be used for evaluation and training if they are taking place on the same set of devices.
Finally, you should use
tf.keras.utils.experimental.DatasetCreator
along with the special input option—experimental_fetch_to_device=False
—intf.distribute.InputOptions
(which holds strategy-specific configurations). This is demonstrated below:
Next, once the data is prepared, you will create a TPUStrategy
, and define a model, metrics, and an optimizer under the scope of this strategy (Strategy.scope
).
You should pick a number for steps_per_execution
in Model.compile
since it specifies the number of batches to run during each tf.function
call, and is critical for performance. This argument is similar to iterations_per_loop
used in TPUEstimator
.
The features and table configuration that were specified in TensorFlow 1 via the tf.tpu.experimental.embedding_column
(and tf.tpu.experimental.shared_embedding_column
) can be specified directly in TensorFlow 2 via a pair of configuration objects:
tf.tpu.experimental.embedding.FeatureConfig
tf.tpu.experimental.embedding.TableConfig
(Refer to the associated API documentation for more details.)
With that, you are ready to train the model with the training dataset:
Finally, evaluate the model using the evaluation dataset:
Next steps
Learn more about setting up TPU-specific embeddings in the API docs:
tfrs.layers.embedding.TPUEmbedding
: particularly about feature and table configuration, setting the optimizer, creating a model (using the Keras functional API or via subclassingtf.keras.Model
), training/evaluation, and model serving withtf.saved_model
tf.tpu.experimental.embedding.TableConfig
tf.tpu.experimental.embedding.FeatureConfig
For more information about TPUStrategy
in TensorFlow 2, consider the following resources:
Guide: Use TPUs (covering training with Keras
Model.fit
/a custom training loop withtf.distribute.TPUStrategy
, as well as tips on improving the performance withtf.function
)
To learn more about customizing your training, refer to:
TPUs—Google's specialized ASICs for machine learning—are available through Google Colab, the TPU Research Cloud, and Cloud TPU.