Path: blob/master/guides/ipynb/intro_to_keras_for_researchers.ipynb
3281 views
Introduction to Keras for Researchers
Author: fchollet
Date created: 2020/04/01
Last modified: 2020/10/02
Description: Everything you need to know to use Keras & TensorFlow for deep learning research.
Setup
Introduction
Are you a machine learning researcher? Do you publish at NeurIPS and push the state-of-the-art in CV and NLP? This guide will serve as your first introduction to core Keras & TensorFlow API concepts.
In this guide, you will learn about:
Tensors, variables, and gradients in TensorFlow
Creating layers by subclassing the
Layer
classWriting low-level training loops
Tracking losses created by layers via the
add_loss()
methodTracking metrics in a low-level training loop
Speeding up execution with a compiled
tf.function
Executing layers in training or inference mode
The Keras Functional API
You will also see the Keras API in action in two end-to-end research examples: a Variational Autoencoder, and a Hypernetwork.
Tensors
TensorFlow is an infrastructure layer for differentiable programming. At its heart, it's a framework for manipulating N-dimensional arrays (tensors), much like NumPy.
However, there are three key differences between NumPy and TensorFlow:
TensorFlow can leverage hardware accelerators such as GPUs and TPUs.
TensorFlow can automatically compute the gradient of arbitrary differentiable tensor expressions.
TensorFlow computation can be distributed to large numbers of devices on a single machine, and large number of machines (potentially with multiple devices each).
Let's take a look at the object that is at the core of TensorFlow: the Tensor.
Here's a constant tensor:
You can get its value as a NumPy array by calling .numpy()
:
Much like a NumPy array, it features the attributes dtype
and shape
:
A common way to create constant tensors is via tf.ones
and tf.zeros
(just like np.ones
and np.zeros
):
You can also create random constant tensors:
Variables
Variables are special tensors used to store mutable state (such as the weights of a neural network). You create a Variable
using some initial value:
You update the value of a Variable
by using the methods .assign(value)
, .assign_add(increment)
, or .assign_sub(decrement)
:
Doing math in TensorFlow
If you've used NumPy, doing math in TensorFlow will look very familiar. The main difference is that your TensorFlow code can run on GPU and TPU.
Gradients
Here's another big difference with NumPy: you can automatically retrieve the gradient of any differentiable expression.
Just open a GradientTape
, start "watching" a tensor via tape.watch()
, and compose a differentiable expression using this tensor as input:
By default, variables are watched automatically, so you don't need to manually watch
them:
Note that you can compute higher-order derivatives by nesting tapes:
Keras layers
While TensorFlow is an infrastructure layer for differentiable programming, dealing with tensors, variables, and gradients, Keras is a user interface for deep learning, dealing with layers, models, optimizers, loss functions, metrics, and more.
Keras serves as the high-level API for TensorFlow: Keras is what makes TensorFlow simple and productive.
The Layer
class is the fundamental abstraction in Keras. A Layer
encapsulates a state (weights) and some computation (defined in the call method).
A simple layer looks like this. The self.add_weight()
method gives you a shortcut for creating weights:
You would use a Layer
instance much like a Python function:
The weight variables (created in __init__
) are automatically tracked under the weights
property:
You have many built-in layers available, from Dense
to Conv2D
to LSTM
to fancier ones like Conv3DTranspose
or ConvLSTM2D
. Be smart about reusing built-in functionality.
Layer weight creation in build(input_shape)
It's often a good idea to defer weight creation to the build()
method, so that you don't need to specify the input dim/shape at layer construction time:
Layer gradients
You can automatically retrieve the gradients of the weights of a layer by calling it inside a GradientTape
. Using these gradients, you can update the weights of the layer, either manually, or using an optimizer object. Of course, you can modify the gradients before using them, if you need to.
Trainable and non-trainable weights
Weights created by layers can be either trainable or non-trainable. They're exposed in trainable_weights
and non_trainable_weights
respectively. Here's a layer with a non-trainable weight:
Layers that own layers
Layers can be recursively nested to create bigger computation blocks. Each layer will track the weights of its sublayers (both trainable and non-trainable).
Note that our manually-created MLP above is equivalent to the following built-in option:
Tracking losses created by layers
Layers can create losses during the forward pass via the add_loss()
method. This is especially useful for regularization losses. The losses created by sublayers are recursively tracked by the parent layers.
Here's a layer that creates an activity regularization loss:
Any model incorporating this layer will track this regularization loss:
These losses are cleared by the top-level layer at the start of each forward pass -- they don't accumulate. layer.losses
always contains only the losses created during the last forward pass. You would typically use these losses by summing them before computing your gradients when writing a training loop.
Keeping track of training metrics
Keras offers a broad range of built-in metrics, like keras.metrics.AUC
or keras.metrics.PrecisionAtRecall
. It's also easy to create your own metrics in a few lines of code.
To use a metric in a custom training loop, you would:
Instantiate the metric object, e.g.
metric = keras.metrics.AUC()
Call its
metric.udpate_state(targets, predictions)
method for each batch of dataQuery its result via
metric.result()
Reset the metric's state at the end of an epoch or at the start of an evaluation via
metric.reset_state()
Here's a simple example:
You can also define your own metrics by subclassing keras.metrics.Metric
. You need to override the three functions called above:
Override
update_state()
to update the statistic values.Override
result()
to return the metric value.Override
reset_state()
to reset the metric to its initial state.
Here is an example where we implement the F1-score metric (with support for sample weighting).
Let's test-drive it:
Compiled functions
Running eagerly is great for debugging, but you will get better performance by compiling your computation into static graphs. Static graphs are a researcher's best friends. You can compile any function by wrapping it in a tf.function
decorator.
Training mode & inference mode
Some layers, in particular the BatchNormalization
layer and the Dropout
layer, have different behaviors during training and inference. For such layers, it is standard practice to expose a training
(boolean) argument in the call
method.
By exposing this argument in call
, you enable the built-in training and evaluation loops (e.g. fit) to correctly use the layer in training and inference modes.
The Functional API for model-building
To build deep learning models, you don't have to use object-oriented programming all the time. All layers we've seen so far can also be composed functionally, like this (we call it the "Functional API"):
The Functional API tends to be more concise than subclassing, and provides a few other advantages (generally the same advantages that functional, typed languages provide over untyped OO development). However, it can only be used to define DAGs of layers -- recursive networks should be defined as Layer subclasses instead.
Learn more about the Functional API here.
In your research workflows, you may often find yourself mix-and-matching OO models and Functional models.
Note that the Model
class also features built-in training & evaluation loops: fit()
, predict()
and evaluate()
(configured via the compile()
method). These built-in functions give you access to the following built-in training infrastructure features:
Callbacks. You can leverage built-in callbacks for early-stopping, model checkpointing, and monitoring training with TensorBoard. You can also implement custom callbacks if needed.
Distributed training. You can easily scale up your training to multiple GPUs, TPU, or even multiple machines with the
tf.distribute
API -- with no changes to your code.Step fusing. With the
steps_per_execution
argument inModel.compile()
, you can process multiple batches in a singletf.function
call, which greatly improves device utilization on TPUs.
We won't go into the details, but we provide a simple code example below. It leverages the built-in training infrastructure to implement the MNIST example above.
You can always subclass the Model
class (it works exactly like subclassing Layer
) if you want to leverage built-in training loops for your OO models. Just override the Model.train_step()
to customize what happens in fit()
while retaining support for the built-in infrastructure features outlined above -- callbacks, zero-code distribution support, and step fusing support. You may also override test_step()
to customize what happens in evaluate()
, and override predict_step()
to customize what happens in predict()
. For more information, please refer to this guide.
End-to-end experiment example 1: variational autoencoders.
Here are some of the things you've learned so far:
A
Layer
encapsulates a state (created in__init__
orbuild
) and some computation (defined incall
).Layers can be recursively nested to create new, bigger computation blocks.
You can easily write highly hackable training loops by opening a
GradientTape
, calling your model inside the tape's scope, then retrieving gradients and applying them via an optimizer.You can speed up your training loops using the
@tf.function
decorator.Layers can create and track losses (typically regularization losses) via
self.add_loss()
.
Let's put all of these things together into an end-to-end example: we're going to implement a Variational AutoEncoder (VAE). We'll train it on MNIST digits.
Our VAE will be a subclass of Layer
, built as a nested composition of layers that subclass Layer
. It will feature a regularization loss (KL divergence).
Below is our model definition.
First, we have an Encoder
class, which uses a Sampling
layer to map a MNIST digit to a latent-space triplet (z_mean, z_log_var, z)
.
Next, we have a Decoder
class, which maps the probabilistic latent space coordinates back to a MNIST digit.
Finally, our VariationalAutoEncoder
composes together an encoder and a decoder, and creates a KL divergence regularization loss via add_loss()
.
Now, let's write a training loop. Our training step is decorated with a @tf.function
to compile into a super fast graph function.
As you can see, building and training this type of model in Keras is quick and painless.
End-to-end experiment example 2: hypernetworks.
Let's take a look at another kind of research experiment: hypernetworks.
The idea is to use a small deep neural network (the hypernetwork) to generate the weights for a larger network (the main network).
Let's implement a really trivial hypernetwork: we'll use a small 2-layer network to generate the weights of a larger 3-layer network.
This is our training loop. For each batch of data:
We use
hypernetwork
to generate an array of weight coefficients,weights_pred
We reshape these coefficients into kernel & bias tensors for the
main_network
We run the forward pass of the
main_network
to compute the actual MNIST predictionsWe run backprop through the weights of the
hypernetwork
to minimize the final classification loss
Implementing arbitrary research ideas with Keras is straightforward and highly productive. Imagine trying out 25 ideas per day (20 minutes per experiment on average)!
Keras has been designed to go from idea to results as fast as possible, because we believe this is the key to doing great research.
We hope you enjoyed this quick introduction. Let us know what you build with Keras!