Path: blob/master/guides/ipynb/keras_tuner/getting_started.ipynb
3283 views
Getting started with KerasTuner
Authors: Luca Invernizzi, James Long, Francois Chollet, Tom O'Malley, Haifeng Jin
Date created: 2019/05/31
Last modified: 2021/10/27
Description: The basics of using KerasTuner to tune model hyperparameters.
Introduction
KerasTuner is a general-purpose hyperparameter tuning library. It has strong integration with Keras workflows, but it isn't limited to them: you could use it to tune scikit-learn models, or anything else. In this tutorial, you will see how to tune model architecture, training process, and data preprocessing steps with KerasTuner. Let's start from a simple example.
Tune the model architecture
The first thing we need to do is writing a function, which returns a compiled Keras model. It takes an argument hp
for defining the hyperparameters while building the model.
Define the search space
In the following code example, we define a Keras model with two Dense
layers. We want to tune the number of units in the first Dense
layer. We just define an integer hyperparameter with hp.Int('units', min_value=32, max_value=512, step=32)
, whose range is from 32 to 512 inclusive. When sampling from it, the minimum step for walking through the interval is 32.
You can quickly test if the model builds successfully.
There are many other types of hyperparameters as well. We can define multiple hyperparameters in the function. In the following code, we tune whether to use a Dropout
layer with hp.Boolean()
, tune which activation function to use with hp.Choice()
, tune the learning rate of the optimizer with hp.Float()
.
As shown below, the hyperparameters are actual values. In fact, they are just functions returning actual values. For example, hp.Int()
returns an int
value. Therefore, you can put them into variables, for loops, or if conditions.
You can also define the hyperparameters in advance and keep your Keras code in a separate function.
Each of the hyperparameters is uniquely identified by its name (the first argument). To tune the number of units in different Dense
layers separately as different hyperparameters, we give them different names as f"units_{i}"
.
Notably, this is also an example of creating conditional hyperparameters. There are many hyperparameters specifying the number of units in the Dense
layers. The number of such hyperparameters is decided by the number of layers, which is also a hyperparameter. Therefore, the total number of hyperparameters used may be different from trial to trial. Some hyperparameter is only used when a certain condition is satisfied. For example, units_3
is only used when num_layers
is larger than 3. With KerasTuner, you can easily define such hyperparameters dynamically while creating the model.
Start the search
After defining the search space, we need to select a tuner class to run the search. You may choose from RandomSearch
, BayesianOptimization
and Hyperband
, which correspond to different tuning algorithms. Here we use RandomSearch
as an example.
To initialize the tuner, we need to specify several arguments in the initializer.
hypermodel
. The model-building function, which isbuild_model
in our case.objective
. The name of the objective to optimize (whether to minimize or maximize is automatically inferred for built-in metrics). We will introduce how to use custom metrics later in this tutorial.max_trials
. The total number of trials to run during the search.executions_per_trial
. The number of models that should be built and fit for each trial. Different trials have different hyperparameter values. The executions within the same trial have the same hyperparameter values. The purpose of having multiple executions per trial is to reduce results variance and therefore be able to more accurately assess the performance of a model. If you want to get results faster, you could setexecutions_per_trial=1
(single round of training for each model configuration).overwrite
. Control whether to overwrite the previous results in the same directory or resume the previous search instead. Here we setoverwrite=True
to start a new search and ignore any previous results.directory
. A path to a directory for storing the search results.project_name
. The name of the sub-directory in thedirectory
.
You can print a summary of the search space:
Before starting the search, let's prepare the MNIST dataset.
Then, start the search for the best hyperparameter configuration. All the arguments passed to search
is passed to model.fit()
in each execution. Remember to pass validation_data
to evaluate the model.
During the search
, the model-building function is called with different hyperparameter values in different trial. In each trial, the tuner would generate a new set of hyperparameter values to build the model. The model is then fit and evaluated. The metrics are recorded. The tuner progressively explores the space and finally finds a good set of hyperparameter values.
Query the results
When search is over, you can retrieve the best model(s). The model is saved at its best performing epoch evaluated on the validation_data
.
You can also print a summary of the search results.
You will find detailed logs, checkpoints, etc, in the folder my_dir/helloworld
, i.e. directory/project_name
.
You can also visualize the tuning results using TensorBoard and HParams plugin. For more information, please following this link.
Retrain the model
If you want to train the model with the entire dataset, you may retrieve the best hyperparameters and retrain the model by yourself.
Tune model training
To tune the model building process, we need to subclass the HyperModel
class, which also makes it easy to share and reuse hypermodels.
We need to override HyperModel.build()
and HyperModel.fit()
to tune the model building and training process respectively. A HyperModel.build()
method is the same as the model-building function, which creates a Keras model using the hyperparameters and returns it.
In HyperModel.fit()
, you can access the model returned by HyperModel.build()
,hp
and all the arguments passed to search()
. You need to train the model and return the training history.
In the following code, we will tune the shuffle
argument in model.fit()
.
It is generally not needed to tune the number of epochs because a built-in callback is passed to model.fit()
to save the model at its best epoch evaluated by the validation_data
.
Note: The
**kwargs
should always be passed tomodel.fit()
because it contains the callbacks for model saving and tensorboard plugins.
Again, we can do a quick check to see if the code works correctly.
Tune data preprocessing
To tune data preprocessing, we just add an additional step in HyperModel.fit()
, where we can access the dataset from the arguments. In the following code, we tune whether to normalize the data before training the model. This time we explicitly put x
and y
in the function signature because we need to use them.
If a hyperparameter is used both in build()
and fit()
, you can define it in build()
and use hp.get(hp_name)
to retrieve it in fit()
. We use the image size as an example. It is both used as the input shape in build()
, and used by data prerprocessing step to crop the images in fit()
.
Retrain the model
Using HyperModel
also allows you to retrain the best model by yourself.
Specify the tuning objective
In all previous examples, we all just used validation accuracy ("val_accuracy"
) as the tuning objective to select the best model. Actually, you can use any metric as the objective. The most commonly used metric is "val_loss"
, which is the validation loss.
Built-in metric as the objective
There are many other built-in metrics in Keras you can use as the objective. Here is a list of the built-in metrics.
To use a built-in metric as the objective, you need to follow these steps:
Compile the model with the the built-in metric. For example, you want to use
MeanAbsoluteError()
. You need to compile the model withmetrics=[MeanAbsoluteError()]
. You may also use its name string instead:metrics=["mean_absolute_error"]
. The name string of the metric is always the snake case of the class name.Identify the objective name string. The name string of the objective is always in the format of
f"val_{metric_name_string}"
. For example, the objective name string of mean squared error evaluated on the validation data should be"val_mean_absolute_error"
.Wrap it into
keras_tuner.Objective
. We usually need to wrap the objective into akeras_tuner.Objective
object to specify the direction to optimize the objective. For example, we want to minimize the mean squared error, we can usekeras_tuner.Objective("val_mean_absolute_error", "min")
. The direction should be either"min"
or"max"
.Pass the wrapped objective to the tuner.
You can see the following barebone code example.
Custom metric as the objective
You may implement your own metric and use it as the hyperparameter search objective. Here, we use mean squared error (MSE) as an example. First, we implement the MSE metric by subclassing keras.metrics.Metric
. Remember to give a name to your metric using the name
argument of super().__init__()
, which will be used later. Note: MSE is actully a build-in metric, which can be imported with keras.metrics.MeanSquaredError
. This is just an example to show how to use a custom metric as the hyperparameter search objective.
For more information about implementing custom metrics, please see this tutorial. If you would like a metric with a different function signature than update_state(y_true, y_pred, sample_weight)
, you can override the train_step()
method of your model following this tutorial.
Run the search with the custom objective.
If your custom objective is hard to put into a custom metric, you can also evaluate the model by yourself in HyperModel.fit()
and return the objective value. The objective value would be minimized by default. In this case, you don't need to specify the objective
when initializing the tuner. However, in this case, the metric value will not be tracked in the Keras logs by only KerasTuner logs. Therefore, these values would not be displayed by any TensorBoard view using the Keras metrics.
If you have multiple metrics to track in KerasTuner, but only use one of them as the objective, you can return a dictionary, whose keys are the metric names and the values are the metrics values, for example, return {"metric_a": 1.0, "metric_b", 2.0}
. Use one of the keys as the objective name, for example, keras_tuner.Objective("metric_a", "min")
.
Tune end-to-end workflows
In some cases, it is hard to align your code into build and fit functions. You can also keep your end-to-end workflow in one place by overriding Tuner.run_trial()
, which gives you full control of a trial. You can see it as a black-box optimizer for anything.
Tune any function
For example, you can find a value of x
, which minimizes f(x)=x*x+1
. In the following code, we just define x
as a hyperparameter, and return f(x)
as the objective value. The hypermodel
and objective
argument for initializing the tuner can be omitted.
Keep Keras code separate
You can keep all your Keras code unchanged and use KerasTuner to tune it. It is useful if you cannot modify the Keras code for some reason.
It also gives you more flexibility. You don't have to separate the model building and training code apart. However, this workflow would not help you save the model or connect with the TensorBoard plugins.
To save the model, you can use trial.trial_id
, which is a string to uniquely identify a trial, to construct different paths to save the models from different trials.
KerasTuner includes pre-made tunable applications: HyperResNet and HyperXception
These are ready-to-use hypermodels for computer vision.
They come pre-compiled with loss="categorical_crossentropy"
and metrics=["accuracy"]
.