Losses
The purpose of loss functions is to compute the quantity that a model should seek to minimize during training.
Available losses
Note that all losses are available both via a class handle and via a function handle. The class handles enable you to pass configuration arguments to the constructor (e.g. loss_fn = CategoricalCrossentropy(from_logits=True)
), and they perform reduction by default when used in a standalone way (see details below).
{{toc}}
Base Loss API
{{autogenerated}}
Usage of losses with compile()
& fit()
A loss function is one of the two arguments required for compiling a Keras model:
All built-in loss functions may also be passed via their string identifier:
Loss functions are typically created by instantiating a loss class (e.g. keras.losses.SparseCategoricalCrossentropy
). All losses are also provided as function handles (e.g. keras.losses.sparse_categorical_crossentropy
).
Using classes enables you to pass configuration arguments at instantiation time, e.g.:
Standalone usage of losses
A loss is a callable with arguments loss_fn(y_true, y_pred, sample_weight=None)
:
y_true: Ground truth values, of shape
(batch_size, d0, ... dN)
. For sparse loss functions, such as sparse categorical crossentropy, the shape should be(batch_size, d0, ... dN-1)
y_pred: The predicted values, of shape
(batch_size, d0, .. dN)
.sample_weight: Optional
sample_weight
acts as reduction weighting coefficient for the per-sample losses. If a scalar is provided, then the loss is simply scaled by the given value. Ifsample_weight
is a tensor of size[batch_size]
, then the total loss for each sample of the batch is rescaled by the corresponding element in thesample_weight
vector. If the shape ofsample_weight
is(batch_size, d0, ... dN-1)
(or can be broadcasted to this shape), then each loss element ofy_pred
is scaled by the corresponding value ofsample_weight
. (Note ondN-1
: all loss functions reduce by 1 dimension, usuallyaxis=-1
.)
By default, loss functions return one scalar loss value for each input sample in the batch dimension, e.g.
However, loss class instances feature a reduction
constructor argument, which defaults to "sum_over_batch_size"
(i.e. average). Allowable values are "sum_over_batch_size", "sum", and "none":
"sum_over_batch_size" means the loss instance will return the average of the per-sample losses in the batch.
"sum" means the loss instance will return the sum of the per-sample losses in the batch.
"none" means the loss instance will return the full array of per-sample losses.
Note that this is an important difference between loss functions like keras.losses.mean_squared_error
and default loss class instances like keras.losses.MeanSquaredError
: the function version does not perform reduction, but by default the class instance does.
When using fit()
, this difference is irrelevant since reduction is handled by the framework.
Here's how you would use a loss class instance as part of a simple training loop:
Creating custom losses
Any callable with the signature loss_fn(y_true, y_pred)
that returns an array of losses (one of sample in the input batch) can be passed to compile()
as a loss. Note that sample weighting is automatically supported for any such loss.
Here's a simple example:
The add_loss()
API
Loss functions applied to the output of a model aren't the only way to create losses.
When writing the call
method of a custom layer or a subclassed model, you may want to compute scalar quantities that you want to minimize during training (e.g. regularization losses). You can use the add_loss()
layer method to keep track of such loss terms.
Here's an example of a layer that adds a sparsity regularization loss based on the L2 norm of the inputs:
Loss values added via add_loss
can be retrieved in the .losses
list property of any Layer
or Model
(they are recursively retrieved from every underlying layer):
These losses are cleared by the top-level layer at the start of each forward pass -- they don't accumulate. So layer.losses
always contain only the losses created during the last forward pass. You would typically use these losses by summing them before computing your gradients when writing a training loop.
When using model.fit()
, such loss terms are handled automatically.
When writing a custom training loop, you should retrieve these terms by hand from model.losses
, like this:
See the add_loss()
documentation for more details.