CoCalc -- _working_with

GitHub Repository: keras-team/keras-io
Path: blob/master/guides/_working_with_rnns.py
³²⁷³ views
1
"""
2
Title: Working with RNNs
3
Authors: Scott Zhu, Francois Chollet
4
Date created: 2019/07/08
5
Last modified: 2023/07/10
6
Description: Complete guide to using & customizing RNN layers.
7
Accelerator: GPU
8
"""
9

10
"""
11
## Introduction
12

13
Recurrent neural networks (RNN) are a class of neural networks that is powerful for
14
modeling sequence data such as time series or natural language.
15

16
Schematically, a RNN layer uses a `for` loop to iterate over the timesteps of a
17
sequence, while maintaining an internal state that encodes information about the
18
timesteps it has seen so far.
19

20
The Keras RNN API is designed with a focus on:
21

22
- **Ease of use**: the built-in `keras.layers.RNN`, `keras.layers.LSTM`,
23
`keras.layers.GRU` layers enable you to quickly build recurrent models without
24
having to make difficult configuration choices.
25

26
- **Ease of customization**: You can also define your own RNN cell layer (the inner
27
part of the `for` loop) with custom behavior, and use it with the generic
28
`keras.layers.RNN` layer (the `for` loop itself). This allows you to quickly
29
prototype different research ideas in a flexible way with minimal code.
30

31
"""
32

33
"""
34
## Setup
35
"""
36

37
import numpy as np
38
import tensorflow as tf
39
import keras
40
from keras import layers
41

42
"""
43
## Built-in RNN layers: a simple example
44

45
"""
46

47
"""
48
There are three built-in RNN layers in Keras:
49

50
1. `keras.layers.SimpleRNN`, a fully-connected RNN where the output from previous
51
timestep is to be fed to next timestep.
52

53
2. `keras.layers.GRU`, first proposed in
54
[Cho et al., 2014](https://arxiv.org/abs/1406.1078).
55

56
3. `keras.layers.LSTM`, first proposed in
57
[Hochreiter & Schmidhuber, 1997](https://www.bioinf.jku.at/publications/older/2604.pdf).
58

59
In early 2015, Keras had the first reusable open-source Python implementations of LSTM
60
and GRU.
61

62
Here is a simple example of a `Sequential` model that processes sequences of integers,
63
embeds each integer into a 64-dimensional vector, then processes the sequence of
64
vectors using a `LSTM` layer.
65
"""
66

67
model = keras.Sequential()
68
# Add an Embedding layer expecting input vocab of size 1000, and
69
# output embedding dimension of size 64.
70
model.add(layers.Embedding(input_dim=1000, output_dim=64))
71

72
# Add a LSTM layer with 128 internal units.
73
model.add(layers.LSTM(128))
74

75
# Add a Dense layer with 10 units.
76
model.add(layers.Dense(10))
77

78
model.summary()
79

80
"""
81
Built-in RNNs support a number of useful features:
82

83
- Recurrent dropout, via the `dropout` and `recurrent_dropout` arguments
84
- Ability to process an input sequence in reverse, via the `go_backwards` argument
85
- Loop unrolling (which can lead to a large speedup when processing short sequences on
86
CPU), via the `unroll` argument
87
- ...and more.
88

89
For more information, see the
90
[RNN API documentation](https://keras.io/api/layers/recurrent_layers/).
91
"""
92

93
"""
94
## Outputs and states
95

96
By default, the output of a RNN layer contains a single vector per sample. This vector
97
is the RNN cell output corresponding to the last timestep, containing information
98
about the entire input sequence. The shape of this output is `(batch_size, units)`
99
where `units` corresponds to the `units` argument passed to the layer's constructor.
100

101
A RNN layer can also return the entire sequence of outputs for each sample (one vector
102
per timestep per sample), if you set `return_sequences=True`. The shape of this output
103
is `(batch_size, timesteps, units)`.
104
"""
105

106
model = keras.Sequential()
107
model.add(layers.Embedding(input_dim=1000, output_dim=64))
108

109
# The output of GRU will be a 3D tensor of shape (batch_size, timesteps, 256)
110
model.add(layers.GRU(256, return_sequences=True))
111

112
# The output of SimpleRNN will be a 2D tensor of shape (batch_size, 128)
113
model.add(layers.SimpleRNN(128))
114

115
model.add(layers.Dense(10))
116

117
model.summary()
118

119
"""
120
In addition, a RNN layer can return its final internal state(s). The returned states
121
can be used to resume the RNN execution later, or
122
[to initialize another RNN](https://arxiv.org/abs/1409.3215).
123
This setting is commonly used in the
124
encoder-decoder sequence-to-sequence model, where the encoder final state is used as
125
the initial state of the decoder.
126

127
To configure a RNN layer to return its internal state, set the `return_state` parameter
128
to `True` when creating the layer. Note that `LSTM` has 2 state  tensors, but `GRU`
129
only has one.
130

131
To configure the initial state of the layer, just call the layer with additional
132
keyword argument `initial_state`.
133
Note that the shape of the state needs to match the unit size of the layer, like in the
134
example below.
135
"""
136

137
encoder_vocab = 1000
138
decoder_vocab = 2000
139

140
encoder_input = layers.Input(shape=(None,))
141
encoder_embedded = layers.Embedding(input_dim=encoder_vocab, output_dim=64)(
142
    encoder_input
143
)
144

145
# Return states in addition to output
146
output, state_h, state_c = layers.LSTM(64, return_state=True, name="encoder")(
147
    encoder_embedded
148
)
149
encoder_state = [state_h, state_c]
150

151
decoder_input = layers.Input(shape=(None,))
152
decoder_embedded = layers.Embedding(input_dim=decoder_vocab, output_dim=64)(
153
    decoder_input
154
)
155

156
# Pass the 2 states to a new LSTM layer, as initial state
157
decoder_output = layers.LSTM(64, name="decoder")(
158
    decoder_embedded, initial_state=encoder_state
159
)
160
output = layers.Dense(10)(decoder_output)
161

162
model = keras.Model([encoder_input, decoder_input], output)
163
model.summary()
164

165
"""
166
## RNN layers and RNN cells
167

168
In addition to the built-in RNN layers, the RNN API also provides cell-level APIs.
169
Unlike RNN layers, which processes whole batches of input sequences, the RNN cell only
170
processes a single timestep.
171

172
The cell is the inside of the `for` loop of a RNN layer. Wrapping a cell inside a
173
`keras.layers.RNN` layer gives you a layer capable of processing batches of
174
sequences, e.g. `RNN(LSTMCell(10))`.
175

176
Mathematically, `RNN(LSTMCell(10))` produces the same result as `LSTM(10)`. In fact,
177
the implementation of this layer in TF v1.x was just creating the corresponding RNN
178
cell and wrapping it in a RNN layer.  However using the built-in `GRU` and `LSTM`
179
layers enable the use of CuDNN and you may see better performance.
180

181
There are three built-in RNN cells, each of them corresponding to the matching RNN
182
layer.
183

184
- `keras.layers.SimpleRNNCell` corresponds to the `SimpleRNN` layer.
185

186
- `keras.layers.GRUCell` corresponds to the `GRU` layer.
187

188
- `keras.layers.LSTMCell` corresponds to the `LSTM` layer.
189

190
The cell abstraction, together with the generic `keras.layers.RNN` class, make it
191
very easy to implement custom RNN architectures for your research.
192

193
"""
194

195
"""
196
## Cross-batch statefulness
197

198
When processing very long sequences (possibly infinite), you may want to use the
199
pattern of **cross-batch statefulness**.
200

201
Normally, the internal state of a RNN layer is reset every time it sees a new batch
202
(i.e. every sample seen by the layer is assumed to be independent of the past). The
203
layer will only maintain a state while processing a given sample.
204

205
If you have very long sequences though, it is useful to break them into shorter
206
sequences, and to feed these shorter sequences sequentially into a RNN layer without
207
resetting the layer's state. That way, the layer can retain information about the
208
entirety of the sequence, even though it's only seeing one sub-sequence at a time.
209

210
You can do this by setting `stateful=True` in the constructor.
211

212
If you have a sequence `s = [t0, t1, ... t1546, t1547]`, you would split it into e.g.
213

214
```
215
s1 = [t0, t1, ... t100]
216
s2 = [t101, ... t201]
217
...
218
s16 = [t1501, ... t1547]
219
```
220

221
Then you would process it via:
222

223
```python
224
lstm_layer = layers.LSTM(64, stateful=True)
225
for s in sub_sequences:
226
  output = lstm_layer(s)
227
```
228

229
When you want to clear the state, you  can use `layer.reset_states()`.
230

231

232
> Note: In this setup, sample `i` in a given batch is assumed to be the continuation of
233
sample `i` in the previous batch. This means that all batches should contain the same
234
number of samples (batch size). E.g. if a batch contains `[sequence_A_from_t0_to_t100,
235
 sequence_B_from_t0_to_t100]`, the next batch should contain
236
`[sequence_A_from_t101_to_t200,  sequence_B_from_t101_to_t200]`.
237

238

239

240

241
Here is a complete example:
242

243
"""
244

245
paragraph1 = np.random.random((20, 10, 50)).astype(np.float32)
246
paragraph2 = np.random.random((20, 10, 50)).astype(np.float32)
247
paragraph3 = np.random.random((20, 10, 50)).astype(np.float32)
248

249
lstm_layer = layers.LSTM(64, stateful=True)
250
output = lstm_layer(paragraph1)
251
output = lstm_layer(paragraph2)
252
output = lstm_layer(paragraph3)
253

254
# reset_states() will reset the cached state to the original initial_state.
255
# If no initial_state was provided, zero-states will be used by default.
256
lstm_layer.reset_states()
257

258

259
"""
260
### RNN State Reuse
261
<a id="rnn_state_reuse"></a>
262
"""
263

264
"""
265
The recorded states of the RNN layer are not included in the `layer.weights()`. If you
266
would like to reuse the state from a RNN layer, you can retrieve the states value by
267
`layer.states` and use it as the
268
initial state for a new layer via the Keras functional API like `new_layer(inputs,
269
initial_state=layer.states)`, or model subclassing.
270

271
Please also note that sequential model might not be used in this case since it only
272
supports layers with single input and output, the extra input of initial state makes
273
it impossible to use here.
274

275
"""
276

277
paragraph1 = np.random.random((20, 10, 50)).astype(np.float32)
278
paragraph2 = np.random.random((20, 10, 50)).astype(np.float32)
279
paragraph3 = np.random.random((20, 10, 50)).astype(np.float32)
280

281
lstm_layer = layers.LSTM(64, stateful=True)
282
output = lstm_layer(paragraph1)
283
output = lstm_layer(paragraph2)
284

285
existing_state = lstm_layer.states
286

287
new_lstm_layer = layers.LSTM(64)
288
new_output = new_lstm_layer(paragraph3, initial_state=existing_state)
289

290

291
"""
292
## Bidirectional RNNs
293

294
For sequences other than time series (e.g. text), it is often the case that a RNN model
295
can perform better if it not only processes sequence from start to end, but also
296
backwards. For example, to predict the next word in a sentence, it is often useful to
297
have the context around the word, not only just the words that come before it.
298

299
Keras provides an easy API for you to build such bidirectional RNNs: the
300
`keras.layers.Bidirectional` wrapper.
301
"""
302

303
model = keras.Sequential()
304

305
model.add(
306
    layers.Bidirectional(layers.LSTM(64, return_sequences=True), input_shape=(5, 10))
307
)
308
model.add(layers.Bidirectional(layers.LSTM(32)))
309
model.add(layers.Dense(10))
310

311
model.summary()
312

313
"""
314
Under the hood, `Bidirectional` will copy the RNN layer passed in, and flip the
315
`go_backwards` field of the newly copied layer, so that it will process the inputs in
316
reverse order.
317

318
The output of the `Bidirectional` RNN will be, by default, the concatenation of the forward layer
319
output and the backward layer output. If you need a different merging behavior, e.g.
320
concatenation, change the `merge_mode` parameter in the `Bidirectional` wrapper
321
constructor. For more details about `Bidirectional`, please check
322
[the API docs](https://keras.io/api/layers/recurrent_layers/bidirectional/).
323
"""
324

325
"""
326
## Performance optimization and CuDNN kernels
327

328
In TensorFlow 2.0, the built-in LSTM and GRU layers have been updated to leverage CuDNN
329
kernels by default when a GPU is available. With this change, the prior
330
`keras.layers.CuDNNLSTM/CuDNNGRU` layers have been deprecated, and you can build your
331
model without worrying about the hardware it will run on.
332

333
Since the CuDNN kernel is built with certain assumptions, this means the layer **will
334
not be able to use the CuDNN kernel if you change the defaults of the built-in LSTM or
335
GRU layers**. E.g.:
336

337
- Changing the `activation` function from `tanh` to something else.
338
- Changing the `recurrent_activation` function from `sigmoid` to something else.
339
- Using `recurrent_dropout` > 0.
340
- Setting `unroll` to True, which forces LSTM/GRU to decompose the inner
341
`tf.while_loop` into an unrolled `for` loop.
342
- Setting `use_bias` to False.
343
- Using masking when the input data is not strictly right padded (if the mask
344
corresponds to strictly right padded data, CuDNN can still be used. This is the most
345
common case).
346

347
For the detailed list of constraints, please see the documentation for the
348
[LSTM](https://keras.io/api/layers/recurrent_layers/lstm/) and
349
[GRU](https://keras.io/api/layers/recurrent_layers/gru/) layers.
350
"""
351

352
"""
353
### Using CuDNN kernels when available
354

355
Let's build a simple LSTM model to demonstrate the performance difference.
356

357
We'll use as input sequences the sequence of rows of MNIST digits (treating each row of
358
pixels as a timestep), and we'll predict the digit's label.
359

360
"""
361

362
batch_size = 64
363
# Each MNIST image batch is a tensor of shape (batch_size, 28, 28).
364
# Each input sequence will be of size (28, 28) (height is treated like time).
365
input_dim = 28
366

367
units = 64
368
output_size = 10  # labels are from 0 to 9
369

370

371
# Build the RNN model
372
def build_model(allow_cudnn_kernel=True):
373
    # CuDNN is only available at the layer level, and not at the cell level.
374
    # This means `LSTM(units)` will use the CuDNN kernel,
375
    # while RNN(LSTMCell(units)) will run on non-CuDNN kernel.
376
    if allow_cudnn_kernel:
377
        # The LSTM layer with default options uses CuDNN.
378
        lstm_layer = keras.layers.LSTM(units, input_shape=(None, input_dim))
379
    else:
380
        # Wrapping a LSTMCell in a RNN layer will not use CuDNN.
381
        lstm_layer = keras.layers.RNN(
382
            keras.layers.LSTMCell(units), input_shape=(None, input_dim)
383
        )
384
    model = keras.models.Sequential(
385
        [
386
            lstm_layer,
387
            keras.layers.BatchNormalization(),
388
            keras.layers.Dense(output_size),
389
        ]
390
    )
391
    return model
392

393

394
"""
395
Let's load the MNIST dataset:
396
"""
397

398
mnist = keras.datasets.mnist
399

400
(x_train, y_train), (x_test, y_test) = mnist.load_data()
401
x_train, x_test = x_train / 255.0, x_test / 255.0
402
sample, sample_label = x_train[0], y_train[0]
403

404
"""
405
Let's create a model instance and train it.
406

407
We choose `sparse_categorical_crossentropy` as the loss function for the model. The
408
output of the model has shape of `[batch_size, 10]`. The target for the model is an
409
integer vector, each of the integer is in the range of 0 to 9.
410
"""
411

412
model = build_model(allow_cudnn_kernel=True)
413

414
model.compile(
415
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
416
    optimizer="sgd",
417
    metrics=["accuracy"],
418
)
419

420

421
model.fit(
422
    x_train, y_train, validation_data=(x_test, y_test), batch_size=batch_size, epochs=1
423
)
424

425
"""
426
Now, let's compare to a model that does not use the CuDNN kernel:
427
"""
428

429
noncudnn_model = build_model(allow_cudnn_kernel=False)
430
noncudnn_model.set_weights(model.get_weights())
431
noncudnn_model.compile(
432
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
433
    optimizer="sgd",
434
    metrics=["accuracy"],
435
)
436
noncudnn_model.fit(
437
    x_train, y_train, validation_data=(x_test, y_test), batch_size=batch_size, epochs=1
438
)
439

440
"""
441
When running on a machine with a NVIDIA GPU and CuDNN installed,
442
the model built with CuDNN is much faster to train compared to the
443
model that uses the regular TensorFlow kernel.
444

445
The same CuDNN-enabled model can also be used to run inference in a CPU-only
446
environment. The `tf.device` annotation below is just forcing the device placement.
447
The model will run on CPU by default if no GPU is available.
448

449
You simply don't have to worry about the hardware you're running on anymore. Isn't that
450
pretty cool?
451
"""
452

453
import matplotlib.pyplot as plt
454

455
with tf.device("CPU:0"):
456
    cpu_model = build_model(allow_cudnn_kernel=True)
457
    cpu_model.set_weights(model.get_weights())
458
    result = tf.argmax(cpu_model.predict_on_batch(tf.expand_dims(sample, 0)), axis=1)
459
    print(
460
        "Predicted result is: %s, target result is: %s" % (result.numpy(), sample_label)
461
    )
462
    plt.imshow(sample, cmap=plt.get_cmap("gray"))
463

464
"""
465
## RNNs with list/dict inputs, or nested inputs
466

467
Nested structures allow implementers to include more information within a single
468
timestep. For example, a video frame could have audio and video input at the same
469
time. The data shape in this case could be:
470

471
`[batch, timestep, {"video": [height, width, channel], "audio": [frequency]}]`
472

473
In another example, handwriting data could have both coordinates x and y for the
474
current position of the pen, as well as pressure information. So the data
475
representation could be:
476

477
`[batch, timestep, {"location": [x, y], "pressure": [force]}]`
478

479
The following code provides an example of how to build a custom RNN cell that accepts
480
such structured inputs.
481

482
"""
483

484
"""
485
### Define a custom cell that supports nested input/output
486
"""
487

488
"""
489
See [Making new Layers & Models via subclassing](/guides/making_new_layers_and_models_via_subclassing/)
490
for details on writing your own layers.
491
"""
492

493

494
@keras.saving.register_keras_serializable()
495
class NestedCell(keras.layers.Layer):
496
    def __init__(self, unit_1, unit_2, unit_3, **kwargs):
497
        self.unit_1 = unit_1
498
        self.unit_2 = unit_2
499
        self.unit_3 = unit_3
500
        self.state_size = [tf.TensorShape([unit_1]), tf.TensorShape([unit_2, unit_3])]
501
        self.output_size = [tf.TensorShape([unit_1]), tf.TensorShape([unit_2, unit_3])]
502
        super().__init__(**kwargs)
503

504
    def build(self, input_shapes):
505
        # expect input_shape to contain 2 items, [(batch, i1), (batch, i2, i3)]
506
        i1 = input_shapes[0][1]
507
        i2 = input_shapes[1][1]
508
        i3 = input_shapes[1][2]
509

510
        self.kernel_1 = self.add_weight(
511
            shape=(i1, self.unit_1), initializer="uniform", name="kernel_1"
512
        )
513
        self.kernel_2_3 = self.add_weight(
514
            shape=(i2, i3, self.unit_2, self.unit_3),
515
            initializer="uniform",
516
            name="kernel_2_3",
517
        )
518

519
    def call(self, inputs, states):
520
        # inputs should be in [(batch, input_1), (batch, input_2, input_3)]
521
        # state should be in shape [(batch, unit_1), (batch, unit_2, unit_3)]
522
        input_1, input_2 = tf.nest.flatten(inputs)
523
        s1, s2 = states
524

525
        output_1 = tf.matmul(input_1, self.kernel_1)
526
        output_2_3 = tf.einsum("bij,ijkl->bkl", input_2, self.kernel_2_3)
527
        state_1 = s1 + output_1
528
        state_2_3 = s2 + output_2_3
529

530
        output = (output_1, output_2_3)
531
        new_states = (state_1, state_2_3)
532

533
        return output, new_states
534

535
    def get_config(self):
536
        return {"unit_1": self.unit_1, "unit_2": self.unit_2, "unit_3": self.unit_3}
537

538

539
"""
540
### Build a RNN model with nested input/output
541

542
Let's build a Keras model that uses a `keras.layers.RNN` layer and the custom cell
543
we just defined.
544
"""
545

546
unit_1 = 10
547
unit_2 = 20
548
unit_3 = 30
549

550
i1 = 32
551
i2 = 64
552
i3 = 32
553
batch_size = 64
554
num_batches = 10
555
timestep = 50
556

557
cell = NestedCell(unit_1, unit_2, unit_3)
558
rnn = keras.layers.RNN(cell)
559

560
input_1 = keras.Input((None, i1))
561
input_2 = keras.Input((None, i2, i3))
562

563
outputs = rnn((input_1, input_2))
564

565
model = keras.models.Model([input_1, input_2], outputs)
566

567
model.compile(optimizer="adam", loss="mse", metrics=["accuracy"])
568

569
"""
570
### Train the model with randomly generated data
571

572
Since there isn't a good candidate dataset for this model, we use random Numpy data for
573
demonstration.
574
"""
575

576
input_1_data = np.random.random((batch_size * num_batches, timestep, i1))
577
input_2_data = np.random.random((batch_size * num_batches, timestep, i2, i3))
578
target_1_data = np.random.random((batch_size * num_batches, unit_1))
579
target_2_data = np.random.random((batch_size * num_batches, unit_2, unit_3))
580
input_data = [input_1_data, input_2_data]
581
target_data = [target_1_data, target_2_data]
582

583
model.fit(input_data, target_data, batch_size=batch_size)
584

585
"""
586
With the Keras `keras.layers.RNN` layer, You are only expected to define the math
587
logic for individual step within the sequence, and the `keras.layers.RNN` layer
588
will handle the sequence iteration for you. It's an incredibly powerful way to quickly
589
prototype new kinds of RNNs (e.g. a LSTM variant).
590

591
For more details, please visit the [API docs](https://keras.io/api/layers/recurrent_layers/rnn/).
592
"""
593

594
Product

Resources

Company