Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
keras-team
GitHub Repository: keras-team/keras-io
Path: blob/master/guides/keras_tuner/failed_trials.py
3293 views
1
"""
2
Title: Handling failed trials in KerasTuner
3
Authors: Haifeng Jin
4
Date created: 2023/02/28
5
Last modified: 2023/02/28
6
Description: The basics of fault tolerance configurations in KerasTuner.
7
Accelerator: GPU
8
"""
9
10
"""
11
## Introduction
12
13
A KerasTuner program may take a long time to run since each model may take a
14
long time to train. We do not want the program to fail just because some trials
15
failed randomly.
16
17
In this guide, we will show how to handle the failed trials in KerasTuner,
18
including:
19
20
* How to tolerate the failed trials during the search
21
* How to mark a trial as failed during building and evaluating the model
22
* How to terminate the search by raising a `FatalError`
23
"""
24
25
"""
26
## Setup
27
"""
28
29
"""shell
30
pip install keras-tuner -q
31
"""
32
33
import keras
34
from keras import layers
35
import keras_tuner
36
import numpy as np
37
38
"""
39
## Tolerate failed trials
40
41
We will use the `max_retries_per_trial` and `max_consecutive_failed_trials`
42
arguments when initializing the tuners.
43
44
`max_retries_per_trial` controls the maximum number of retries to run if a trial
45
keeps failing. For example, if it is set to 3, the trial may run 4 times (1
46
failed run + 3 failed retries) before it is finally marked as failed. The
47
default value of `max_retries_per_trial` is 0.
48
49
`max_consecutive_failed_trials` controls how many consecutive failed trials
50
(failed trial here refers to a trial that failed all of its retries) occur
51
before terminating the search. For example, if it is set to 3 and Trial 2, Trial
52
3, and Trial 4 all failed, the search would be terminated. However, if it is set
53
to 3 and only Trial 2, Trial 3, Trial 5, and Trial 6 fail, the search would not
54
be terminated since the failed trials are not consecutive. The default value of
55
`max_consecutive_failed_trials` is 3.
56
57
The following code shows how these two arguments work in action.
58
59
* We define a search space with 2 hyperparameters for the number of units in the
60
2 dense layers.
61
* When their product is larger than 800, we raise a `ValueError` for the model
62
too large.
63
64
"""
65
66
67
def build_model(hp):
68
# Define the 2 hyperparameters for the units in dense layers
69
units_1 = hp.Int("units_1", 10, 40, step=10)
70
units_2 = hp.Int("units_2", 10, 30, step=10)
71
72
# Define the model
73
model = keras.Sequential(
74
[
75
layers.Dense(units=units_1, input_shape=(20,)),
76
layers.Dense(units=units_2),
77
layers.Dense(units=1),
78
]
79
)
80
model.compile(loss="mse")
81
82
# Raise an error when the model is too large
83
num_params = model.count_params()
84
if num_params > 1200:
85
raise ValueError(f"Model too large! It contains {num_params} params.")
86
return model
87
88
89
"""
90
We set up the tuner as follows.
91
92
* We set `max_retries_per_trial=3`.
93
* We set `max_consecutive_failed_trials=8`.
94
* We use `GridSearch` to enumerate all hyperparameter value combinations.
95
96
"""
97
98
tuner = keras_tuner.GridSearch(
99
hypermodel=build_model,
100
objective="val_loss",
101
overwrite=True,
102
max_retries_per_trial=3,
103
max_consecutive_failed_trials=8,
104
)
105
106
# Use random data to train the model.
107
tuner.search(
108
x=np.random.rand(100, 20),
109
y=np.random.rand(100, 1),
110
validation_data=(
111
np.random.rand(100, 20),
112
np.random.rand(100, 1),
113
),
114
epochs=10,
115
)
116
117
# Print the results.
118
tuner.results_summary()
119
120
"""
121
## Mark a trial as failed
122
123
When the model is too large, we do not need to retry it. No matter how many
124
times we try with the same hyperparameters, it is always too large.
125
126
We can set `max_retries_per_trial=0` to do it. However, it will not retry no
127
matter what errors are raised while we may still want to retry for other
128
unexpected errors. Is there a way to better handle this situation?
129
130
We can raise the `FailedTrialError` to skip the retries. Whenever, this error is
131
raised, the trial would not be retried. The retries will still run when other
132
errors occur. An example is shown as follows.
133
"""
134
135
136
def build_model(hp):
137
# Define the 2 hyperparameters for the units in dense layers
138
units_1 = hp.Int("units_1", 10, 40, step=10)
139
units_2 = hp.Int("units_2", 10, 30, step=10)
140
141
# Define the model
142
model = keras.Sequential(
143
[
144
layers.Dense(units=units_1, input_shape=(20,)),
145
layers.Dense(units=units_2),
146
layers.Dense(units=1),
147
]
148
)
149
model.compile(loss="mse")
150
151
# Raise an error when the model is too large
152
num_params = model.count_params()
153
if num_params > 1200:
154
# When this error is raised, it skips the retries.
155
raise keras_tuner.errors.FailedTrialError(
156
f"Model too large! It contains {num_params} params."
157
)
158
return model
159
160
161
tuner = keras_tuner.GridSearch(
162
hypermodel=build_model,
163
objective="val_loss",
164
overwrite=True,
165
max_retries_per_trial=3,
166
max_consecutive_failed_trials=8,
167
)
168
169
# Use random data to train the model.
170
tuner.search(
171
x=np.random.rand(100, 20),
172
y=np.random.rand(100, 1),
173
validation_data=(
174
np.random.rand(100, 20),
175
np.random.rand(100, 1),
176
),
177
epochs=10,
178
)
179
180
# Print the results.
181
tuner.results_summary()
182
183
"""
184
## Terminate the search programmatically
185
186
When there is a bug in the code we should terminate the search immediately and
187
fix the bug. You can terminate the search programmatically when your defined
188
conditions are met. Raising a `FatalError` (or its subclasses `FatalValueError`,
189
`FatalTypeError`, or `FatalRuntimeError`) will terminate the search regardless
190
of the `max_consecutive_failed_trials` argument.
191
192
Following is an example to terminate the search when the model is too large.
193
"""
194
195
196
def build_model(hp):
197
# Define the 2 hyperparameters for the units in dense layers
198
units_1 = hp.Int("units_1", 10, 40, step=10)
199
units_2 = hp.Int("units_2", 10, 30, step=10)
200
201
# Define the model
202
model = keras.Sequential(
203
[
204
layers.Dense(units=units_1, input_shape=(20,)),
205
layers.Dense(units=units_2),
206
layers.Dense(units=1),
207
]
208
)
209
model.compile(loss="mse")
210
211
# Raise an error when the model is too large
212
num_params = model.count_params()
213
if num_params > 1200:
214
# When this error is raised, the search is terminated.
215
raise keras_tuner.errors.FatalError(
216
f"Model too large! It contains {num_params} params."
217
)
218
return model
219
220
221
tuner = keras_tuner.GridSearch(
222
hypermodel=build_model,
223
objective="val_loss",
224
overwrite=True,
225
max_retries_per_trial=3,
226
max_consecutive_failed_trials=8,
227
)
228
229
try:
230
# Use random data to train the model.
231
tuner.search(
232
x=np.random.rand(100, 20),
233
y=np.random.rand(100, 1),
234
validation_data=(
235
np.random.rand(100, 20),
236
np.random.rand(100, 1),
237
),
238
epochs=10,
239
)
240
except keras_tuner.errors.FatalError:
241
print("The search is terminated.")
242
243
"""
244
## Takeaways
245
246
In this guide, you learn how to handle failed trials in KerasTuner:
247
248
* Use `max_retries_per_trial` to specify the number of retries for a failed
249
trial.
250
* Use `max_consecutive_failed_trials` to specify the maximum consecutive failed
251
trials to tolerate.
252
* Raise `FailedTrialError` to directly mark a trial as failed and skip the
253
retries.
254
* Raise `FatalError`, `FatalValueError`, `FatalTypeError`, `FatalRuntimeError`
255
to terminate the search immediately.
256
"""
257
258