Path: blob/master/Generative AI for Intelligent Data Handling/Day 5.2 LSTM in Sequence Generation.ipynb
3074 views
LSTM stands for Long Short-Term Memory.
It's a type of recurrent neural network (RNN) architecture designed to handle the issue of vanishing gradients, which can occur when training traditional RNNs on long sequences of data. LSTM networks are well-suited for sequence prediction problems, such as language modeling, speech recognition, and time series forecasting. They achieve this by using a memory cell that can maintain information over long periods of time, allowing them to capture dependencies and patterns in sequential data more effectively than standard RNNs.
LSTM Gates
Forget Gate: Controls what information from the previous cell state should be discarded or kept based on the current input.
Input Gate: Determines which new information from the current input should be stored in the cell state.
Cell State: Represents the memory of the LSTM, preserving information over long sequences by selectively adding or removing information.
Output Gate: Filters the information from the current cell state to produce the output based on the current input and the LSTM's internal state.
LSTM (Long Short-Term Memory) is a type of RNN (Recurrent Neural Network) designed to address the issue of capturing long-term dependencies in sequential data.
LSTMs have a memory cell that can retain information over long periods.
They incorporate gating mechanisms (input, forget, and output gates) to control the flow of information into and out of the memory cell.
Input gate: Determines which new information to incorporate into the memory cell.
Forget gate: Decides which information to discard from the memory cell.
Output gate: Regulates the information output from the memory cell to the next time step.
LSTMs are trained using backpropagation through time (BPTT), allowing them to learn to capture long-range dependencies in the data.
LSTMs are widely used in tasks such as language modeling, speech recognition, and time series prediction due to their ability to effectively capture long-range dependencies.
A simple example of an LSTM network using Python and the Keras library
Example Data: We create a simple sequential dataset (data) where each row has two features. LSTM expects input in the form [samples, time steps, features], so we reshape data accordingly.
LSTM Model: We define a sequential Keras model (Sequential) and add an LSTM layer (LSTM) with 50 units (you can adjust this number as needed), using ReLU activation function. The input_shape specifies that each sample has 1 time step and 2 features.
Generating an arithmetic series using an LSTM
An arithmetic series is a sequence of numbers in which the difference between consecutive terms is constant. Mathematically, an arithmetic series can be defined as follows:
Generate sequences where the output Y is 2x of the input sequence X using LSTM
Example 2 : sum of series
e.g: X=[1,2,3] , y =[6] (1+2+3)
Text Generation using LSTM (Long Short-Term Memory)
sys Module Import: Added import sys at the beginning of the script to resolve the NameError related to sys.stdout.write().
Text Generation: The on_epoch_end function is used as a callback during training to generate text samples after each epoch. It uses sys.stdout.write() to print the generated text, showing how the model predicts sequences of characters based on the learned patterns.
temperature: is a parameter used during the generation of text to control the diversity or randomness of the predicted characters.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [23], in <cell line: 4>()
1 print_callback = LambdaCallback(on_epoch_end=on_epoch_end)
3 # Train the model
----> 4 model.fit(X, y, batch_size=128, epochs=30, callbacks=[print_callback])
, in filter_traceback.<locals>.error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
, in Model.fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
1418 logs = tf_utils.sync_to_numpy_or_python_type(logs)
1419 if logs is None:
-> 1420 raise ValueError('Unexpected result of `train_function` '
1421 '(Empty logs). Please use '
1422 '`Model.compile(..., run_eagerly=True)`, or '
1423 '`tf.config.run_functions_eagerly(True)` for more '
1424 'information of where went wrong, or file a '
1425 'issue/bug to `tf.keras`.')
1426 epoch_logs = copy.copy(logs)
1428 # Run validation.
ValueError: Unexpected result of `train_function` (Empty logs). Please use `Model.compile(..., run_eagerly=True)`, or `tf.config.run_functions_eagerly(True)` for more information of where went wrong, or file a issue/bug to `tf.keras`.