Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
suyashi29
GitHub Repository: suyashi29/python-su
Path: blob/master/Generative NLP Models using Python/5 LSTM (Long Short-Term Memory) network using TensorFlow and Keras.ipynb
3074 views
Kernel: Python 3 (ipykernel)

LSTM (Long Short-Term Memory) is a type of recurrent neural network (RNN) designed to address the problem of capturing long-term dependencies in sequential data.

LSTM Gates

  1. Forget Gate: Controls what information from the previous cell state should be discarded or kept based on the current input.

  2. Input Gate: Determines which new information from the current input should be stored in the cell state.

  3. Cell State: Represents the memory of the LSTM, preserving information over long sequences by selectively adding or removing information.

  4. Output Gate: Filters the information from the current cell state to produce the output based on the current input and the LSTM's internal state.

LSTM (Long Short-Term Memory) is a type of RNN (Recurrent Neural Network) designed to address the issue of capturing long-term dependencies in sequential data.

image-3.png

  • LSTMs have a memory cell that can retain information over long periods.

  • They incorporate gating mechanisms (input, forget, and output gates) to control the flow of information into and out of the memory cell.

  • Input gate: Determines which new information to incorporate into the memory cell.

  • Forget gate: Decides which information to discard from the memory cell.

  • Output gate: Regulates the information output from the memory cell to the next time step.

  • LSTMs are trained using backpropagation through time (BPTT), allowing them to learn to capture long-range dependencies in the data.

  • LSTMs are widely used in tasks such as language modeling, speech recognition, and time series prediction due to their ability to effectively capture long-range dependencies. image-2.png

Text Generation using Recurrent Long Short Term Memory Network

  • LSTMs are a type of neural network that are well-suited for tasks involving sequential data such as text generation.

  • They are particularly useful because they can remember long-term dependencies in the data which is crucial when dealing with text that often has context that spans over multiple words or sentences.

Step1: Importing required Lib

import tensorflow as tf import numpy as np import pandas as pd import random import sys
df = pd.read_csv('Text.csv') text = " ".join(df['text'].dropna().astype(str)).lower() print(f'Total characters in text: {len(text)}')
Total characters in text: 35695884
  • pd.read_csv(): Reads the CSV file into a DataFrame.

  • df['text'].dropna(): Drops rows with missing text entries.

  • " ".join(): Concatenates all text rows into a single string for training.

  • .lower(): Converts text to lowercase for consistency.

df

Creating Vocabulary and Character Mappings

  • sorted(set(text)): Extracts unique characters and sorts them to form the vocabulary.

  • char2idx: Maps each character to a unique integer index.

  • idx2char: Maps integers back to characters and is used during text generation.

  • text_as_int: Converts the entire text into a sequence of integer indices.

vocab = sorted(set(text)) print(f'Vocabulary size: {len(vocab)}') char2idx = {c: i for i, c in enumerate(vocab)} idx2char = np.array(vocab) text_as_int = np.array([char2idx[c] for c in text])
Vocabulary size: 104

Pre-processing the Data

We will ceate dataset from integer encoded text and split sequences into input and target. Then we will shuffle and divide the dataset into batches.

  • seq_length: Defines the length of input sequences for the model.

  • tf.data.Dataset.from_tensor_slices(): Converts the integer sequence into a TensorFlow dataset.

  • batch(seq_length + 1): Creates sequences of length 101 where first 100 are input and the last is the target.

  • split_input_target(): Splits each sequence into input and target (next character).

  • shuffle() and batch(): Randomizes data order and creates batches for training.

seq_length = 100 char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int) sequences = char_dataset.batch(seq_length + 1, drop_remainder=True) def split_input_target(chunk): return chunk[:-1], chunk[1:] dataset = sequences.map(split_input_target) BATCH_SIZE = 64 BUFFER_SIZE = 10000 dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

Building the LSTM Model

LSTM model with the following layers and compile the model. We will be using RMSprop optimizer in this model.

  • Embedding layer: Converts integer indices into dense vectors of length embedding_dim.

  • LSTM layer: Processes sequences capturing temporal dependencies with rnn_units memory cells. return_sequences=True outputs sequence at each timestep.

  • Dense layer: Produces output logits for all characters in the vocabulary to predict the next character.

RMSProp (Root Mean Square Propagation) is an adaptive learning rate optimization algorithm designed to improve the performance and speed of training deep learning models.

It is a variant of the gradient descent algorithm which adapts the learning rate for each parameter individually by considering the magnitude of recent gradients for those parameters. This adaptive nature helps in dealing with the challenges of non-stationary objectives and sparse gradients commonly encountered in deep learning tasks. Need of RMSProp Optimizer RMSProp was developed to address the limitations of previous optimization methods such as SGD (Stochastic Gradient Descent) and AdaGrad as SGD uses a constant learning rate which can be inefficient and AdaGrad reduces the learning rate too aggressively.

vocab_size = len(vocab) embedding_dim = 64 rnn_units = 128 model = tf.keras.Sequential([ tf.keras.layers.Embedding(vocab_size, embedding_dim, input_shape=(None,)), tf.keras.layers.LSTM(rnn_units, return_sequences=True, recurrent_initializer='glorot_uniform'), tf.keras.layers.Dense(vocab_size) ]) def loss(labels, logits): return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True) model.compile(optimizer='adam', loss=loss) model.summary()
C:\Users\Suyashi144893\AppData\Local\anaconda3\Lib\site-packages\keras\src\layers\core\embedding.py:93: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead. super().__init__(**kwargs)

Training the LSTM model

  • Train our model on 20 Epochs to use it for predictions.

  • model.fit(): Trains the model on the dataset for 20 epochs.

  • history: Stores training metrics for later analysis.

EPOCHS = 20 history = model.fit(dataset, epochs=EPOCHS)
Epoch 1/20 5522/5522 ━━━━━━━━━━━━━━━━━━━━ 410s 74ms/step - loss: 2.1128 Epoch 2/20 5522/5522 ━━━━━━━━━━━━━━━━━━━━ 439s 79ms/step - loss: 1.4992 Epoch 3/20 5522/5522 ━━━━━━━━━━━━━━━━━━━━ 447s 81ms/step - loss: 1.4056 Epoch 4/20 5522/5522 ━━━━━━━━━━━━━━━━━━━━ 440s 79ms/step - loss: 1.3661 Epoch 5/20 5522/5522 ━━━━━━━━━━━━━━━━━━━━ 455s 82ms/step - loss: 1.3433 Epoch 6/20 5522/5522 ━━━━━━━━━━━━━━━━━━━━ 449s 81ms/step - loss: 1.3284 Epoch 7/20 5522/5522 ━━━━━━━━━━━━━━━━━━━━ 404s 73ms/step - loss: 1.3177 Epoch 8/20 5522/5522 ━━━━━━━━━━━━━━━━━━━━ 402s 73ms/step - loss: 1.3094 Epoch 9/20 5522/5522 ━━━━━━━━━━━━━━━━━━━━ 434s 78ms/step - loss: 1.3028 Epoch 10/20 5522/5522 ━━━━━━━━━━━━━━━━━━━━ 426s 77ms/step - loss: 1.2973 Epoch 11/20 5522/5522 ━━━━━━━━━━━━━━━━━━━━ 403s 73ms/step - loss: 1.2925 Epoch 12/20 5407/5522 ━━━━━━━━━━━━━━━━━━━ 9s 79ms/step - loss: 1.2887

Generating new random text

  • start_string: Initial seed text to start generation.

  • temperature: Controls randomness; lower values make output more predictable, higher values more creative.

  • model.reset_states(): Clears LSTM states before generation.

  • tf.random.categorical(): Samples the next character probabilistically from the model’s predictions.

  • Returns: The seed text plus generated characters.

def generate_text(model, start_string, num_generate=100, temperature=1.0): input_eval = [char2idx.get(s, 0) for s in start_string.lower()] input_eval = tf.expand_dims(input_eval, 0) text_generated = [] model.layers[1].reset_states() # Reset LSTM layer states for _ in range(num_generate): predictions = model(input_eval) predictions = tf.squeeze(predictions, 0) / temperature predicted_id = tf.random.categorical(predictions, num_samples=1)[-1, 0].numpy() input_eval = tf.expand_dims([predicted_id], 0) text_generated.append(idx2char[predicted_id]) return start_string + ''.join(text_generated) print(generate_text(model, start_string="The ", num_generate=200, temperature=0.8))