Path: blob/master/guides/keras_tuner/failed_trials.py
3293 views
"""1Title: Handling failed trials in KerasTuner2Authors: Haifeng Jin3Date created: 2023/02/284Last modified: 2023/02/285Description: The basics of fault tolerance configurations in KerasTuner.6Accelerator: GPU7"""89"""10## Introduction1112A KerasTuner program may take a long time to run since each model may take a13long time to train. We do not want the program to fail just because some trials14failed randomly.1516In this guide, we will show how to handle the failed trials in KerasTuner,17including:1819* How to tolerate the failed trials during the search20* How to mark a trial as failed during building and evaluating the model21* How to terminate the search by raising a `FatalError`22"""2324"""25## Setup26"""2728"""shell29pip install keras-tuner -q30"""3132import keras33from keras import layers34import keras_tuner35import numpy as np3637"""38## Tolerate failed trials3940We will use the `max_retries_per_trial` and `max_consecutive_failed_trials`41arguments when initializing the tuners.4243`max_retries_per_trial` controls the maximum number of retries to run if a trial44keeps failing. For example, if it is set to 3, the trial may run 4 times (145failed run + 3 failed retries) before it is finally marked as failed. The46default value of `max_retries_per_trial` is 0.4748`max_consecutive_failed_trials` controls how many consecutive failed trials49(failed trial here refers to a trial that failed all of its retries) occur50before terminating the search. For example, if it is set to 3 and Trial 2, Trial513, and Trial 4 all failed, the search would be terminated. However, if it is set52to 3 and only Trial 2, Trial 3, Trial 5, and Trial 6 fail, the search would not53be terminated since the failed trials are not consecutive. The default value of54`max_consecutive_failed_trials` is 3.5556The following code shows how these two arguments work in action.5758* We define a search space with 2 hyperparameters for the number of units in the592 dense layers.60* When their product is larger than 800, we raise a `ValueError` for the model61too large.6263"""646566def build_model(hp):67# Define the 2 hyperparameters for the units in dense layers68units_1 = hp.Int("units_1", 10, 40, step=10)69units_2 = hp.Int("units_2", 10, 30, step=10)7071# Define the model72model = keras.Sequential(73[74layers.Dense(units=units_1, input_shape=(20,)),75layers.Dense(units=units_2),76layers.Dense(units=1),77]78)79model.compile(loss="mse")8081# Raise an error when the model is too large82num_params = model.count_params()83if num_params > 1200:84raise ValueError(f"Model too large! It contains {num_params} params.")85return model868788"""89We set up the tuner as follows.9091* We set `max_retries_per_trial=3`.92* We set `max_consecutive_failed_trials=8`.93* We use `GridSearch` to enumerate all hyperparameter value combinations.9495"""9697tuner = keras_tuner.GridSearch(98hypermodel=build_model,99objective="val_loss",100overwrite=True,101max_retries_per_trial=3,102max_consecutive_failed_trials=8,103)104105# Use random data to train the model.106tuner.search(107x=np.random.rand(100, 20),108y=np.random.rand(100, 1),109validation_data=(110np.random.rand(100, 20),111np.random.rand(100, 1),112),113epochs=10,114)115116# Print the results.117tuner.results_summary()118119"""120## Mark a trial as failed121122When the model is too large, we do not need to retry it. No matter how many123times we try with the same hyperparameters, it is always too large.124125We can set `max_retries_per_trial=0` to do it. However, it will not retry no126matter what errors are raised while we may still want to retry for other127unexpected errors. Is there a way to better handle this situation?128129We can raise the `FailedTrialError` to skip the retries. Whenever, this error is130raised, the trial would not be retried. The retries will still run when other131errors occur. An example is shown as follows.132"""133134135def build_model(hp):136# Define the 2 hyperparameters for the units in dense layers137units_1 = hp.Int("units_1", 10, 40, step=10)138units_2 = hp.Int("units_2", 10, 30, step=10)139140# Define the model141model = keras.Sequential(142[143layers.Dense(units=units_1, input_shape=(20,)),144layers.Dense(units=units_2),145layers.Dense(units=1),146]147)148model.compile(loss="mse")149150# Raise an error when the model is too large151num_params = model.count_params()152if num_params > 1200:153# When this error is raised, it skips the retries.154raise keras_tuner.errors.FailedTrialError(155f"Model too large! It contains {num_params} params."156)157return model158159160tuner = keras_tuner.GridSearch(161hypermodel=build_model,162objective="val_loss",163overwrite=True,164max_retries_per_trial=3,165max_consecutive_failed_trials=8,166)167168# Use random data to train the model.169tuner.search(170x=np.random.rand(100, 20),171y=np.random.rand(100, 1),172validation_data=(173np.random.rand(100, 20),174np.random.rand(100, 1),175),176epochs=10,177)178179# Print the results.180tuner.results_summary()181182"""183## Terminate the search programmatically184185When there is a bug in the code we should terminate the search immediately and186fix the bug. You can terminate the search programmatically when your defined187conditions are met. Raising a `FatalError` (or its subclasses `FatalValueError`,188`FatalTypeError`, or `FatalRuntimeError`) will terminate the search regardless189of the `max_consecutive_failed_trials` argument.190191Following is an example to terminate the search when the model is too large.192"""193194195def build_model(hp):196# Define the 2 hyperparameters for the units in dense layers197units_1 = hp.Int("units_1", 10, 40, step=10)198units_2 = hp.Int("units_2", 10, 30, step=10)199200# Define the model201model = keras.Sequential(202[203layers.Dense(units=units_1, input_shape=(20,)),204layers.Dense(units=units_2),205layers.Dense(units=1),206]207)208model.compile(loss="mse")209210# Raise an error when the model is too large211num_params = model.count_params()212if num_params > 1200:213# When this error is raised, the search is terminated.214raise keras_tuner.errors.FatalError(215f"Model too large! It contains {num_params} params."216)217return model218219220tuner = keras_tuner.GridSearch(221hypermodel=build_model,222objective="val_loss",223overwrite=True,224max_retries_per_trial=3,225max_consecutive_failed_trials=8,226)227228try:229# Use random data to train the model.230tuner.search(231x=np.random.rand(100, 20),232y=np.random.rand(100, 1),233validation_data=(234np.random.rand(100, 20),235np.random.rand(100, 1),236),237epochs=10,238)239except keras_tuner.errors.FatalError:240print("The search is terminated.")241242"""243## Takeaways244245In this guide, you learn how to handle failed trials in KerasTuner:246247* Use `max_retries_per_trial` to specify the number of retries for a failed248trial.249* Use `max_consecutive_failed_trials` to specify the maximum consecutive failed250trials to tolerate.251* Raise `FailedTrialError` to directly mark a trial as failed and skip the252retries.253* Raise `FatalError`, `FatalValueError`, `FatalTypeError`, `FatalRuntimeError`254to terminate the search immediately.255"""256257258