LSTM for Text and Sequence Generation

This notebook explains the mathematical foundation and code implementation of LSTM (Long Short-Term Memory) models for both text generation and sequence prediction tasks using Python and TensorFlow/Keras.

1. What is LSTM?

LSTM (Long Short-Term Memory) is a type of Recurrent Neural Network (RNN) that is capable of learning long-term dependencies. It solves the vanishing gradient problem in traditional RNNs using gates that control the flow of information.

LSTM Cell Structure

Mathematics:

Forget gate: $f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)$
Input gate: $i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)$
Candidate memory: $\tilde{C}_t = \tanh(W_C \cdot [h_{t-1}, x_t] + b_C)$
Output gate: $o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)$
Final memory update: $C_t = f_t * C_{t-1} + i_t * \tilde{C}_t$
Hidden state: $h_t = o_t * \tanh(C_t)$

"Life is good"
life is good
lefi ood is 

is good life 


Early Stopping: 
    val

LSTM model to generate poem-like text.

Temperature Sampling: Controls randomness when generating each character. A lower temperature (e.g., 0.5) makes predictions more conservative, while a higher temperature (e.g., 1.5) introduces more randomness.
Early Stopping: A callback that stops training if the validation loss stops improving, preventing overfitting.
Seed Control: Setting random seeds for reproducibility, ensuring you get the same model initialization and training behavior each time.

Explanation of Key Concepts

Temperature Sampling:
- What it is: Adjusts the probability distribution used to pick the next character by applying a "temperature" parameter.
- Example:
  - With temperature=0.5, the model tends to pick high-probability characters (more deterministic).
  - With temperature=1.5, choices become more random, which might lead to more creative or unexpected outputs.
Early Stopping:
- What it is: A strategy to halt model training when further improvement is unlikely, based on monitoring a metric (e.g., loss).
- Example:
  - If the training loss does not decrease for 5 consecutive epochs (patience=5), training stops to avoid overfitting.
Seed Control:
- What it is: Setting fixed random seeds in Python, NumPy, and TensorFlow to ensure reproducibility.
- Example:
  - By setting seed_value=42 for all relevant libraries, you ensure that the randomness (e.g., weight initialization, training shuffles, sampling) remains the same across different runs.

In [ ]:

import numpy as np
np.random.seed(42)
numbers=np.random.rand(5)
numbers

In [15]:

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding
from tensorflow.keras.callbacks import EarlyStopping
import random
import os


# Set seeds for reproducibility
seed_value = 42
os.environ['PYTHONHASHSEED'] = str(seed_value)
tf.random.set_seed(seed_value)
np.random.seed(seed_value)
random.seed(seed_value)


text = (
    "Two roads diverged in a yellow wood,\n"
    "And sorry I could not travel both\n"
    "And be one traveler, long I stood\n"
    "And looked down one as far as I could\n"
    "To where it bent in the undergrowth;"
)

# Create a sorted list of unique characters
chars = sorted(list(set(text)))
print("Unique characters:", chars)

Out[15]:

Unique characters: ['\n', ' ', ',', ';', 'A', 'I', 'T', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'k', 'l', 'n', 'o', 'r', 's', 't', 'u', 'v', 'w', 'y']

In [19]:

# Create mappings from characters to indices and vice versa
char2idx = {c: i for i, c in enumerate(chars)}
idx2char = {i: c for i, c in enumerate(chars)}

In [22]:

# Set sequence length and create input-output sequences
seq_length = 50
sequences = []
next_chars = []
for i in range(0, len(text) - seq_length):
    sequences.append(text[i: i + seq_length])
    next_chars.append(text[i + seq_length])

print("Number of sequences:", len(sequences))

Out[22]:

Number of sequences: 129

In [23]:

# Vectorize the sequences (one-hot encoding)
X = np.zeros((len(sequences), seq_length, len(chars)), dtype=np.bool_)
y = np.zeros((len(sequences), len(chars)), dtype=np.bool_)

for i, seq in enumerate(sequences):
    for t, char in enumerate(seq):
        X[i, t, char2idx[char]] = 1
    y[i, char2idx[next_chars[i]]] = 1

In [24]:

# We define a simple LSTM model for character-level text prediction.


model = Sequential([
    LSTM(128, input_shape=(seq_length, len(chars))),
    Dense(len(chars), activation='softmax')
])

model.compile(loss='categorical_crossentropy', optimizer='adam')
model.summary()

Out[24]:

In [25]:

# **Early Stopping** is a strategy that monitors a metric (e.g., validation loss) during training,
# and stops training if there is no further improvement. This can save time and prevent overfitting.
# In our case, we monitor the training loss (or you could split some data as a validation set)
# and stop if it does not improve for a few epochs.

early_stopping = EarlyStopping(monitor='loss', patience=5, verbose=1)

In [26]:


## 5. Training the Model
# We train the model on our prepared dataset. For real applications, use more epochs and a larger corpus.


history = model.fit(X, y, epochs=50, batch_size=16, callbacks=[early_stopping])

def sample(preds, temperature=1.0):
    """
    Sample an index from a probability array reweighted by temperature.
    """
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds + 1e-8) / temperature  # add epsilon to avoid log(0)
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


def generate_text(seed, length=200, temperature=1.0):
    generated = seed
    print("Seed:", seed)
    for i in range(length):
        # Prepare the input sequence (one-hot encoding)
        x_pred = np.zeros((1, seq_length, len(chars)))
        for t, char in enumerate(seed):
            x_pred[0, t, char2idx[char]] = 1.
        
        # Predict the next character probabilities
        preds = model.predict(x_pred, verbose=0)[0]
        next_index = sample(preds, temperature)
        next_char = idx2char[next_index]
        
        # Append the next character
        generated += next_char
        seed = seed[1:] + next_char
    return generated


seed_text = text[:seq_length]
print("Generated poem with temperature=0.5:\n")
print(generate_text(seed_text, length=200, temperature=0.5))

print("\nGenerated poem with temperature=1.0:\n")
print(generate_text(seed_text, length=200, temperature=1.0))

print("\nGenerated poem with temperature=1.5:\n")
print(generate_text(seed_text, length=200, temperature=1.5))

Out[26]:

Epoch 1/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 2s 21ms/step - loss: 3.2847
Epoch 2/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 3.1725
Epoch 3/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 2.8910
Epoch 4/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.8720
Epoch 5/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.8336
Epoch 6/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.8184
Epoch 7/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.8081
Epoch 8/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.7935
Epoch 9/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.7778
Epoch 10/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 2.7612
Epoch 11/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.7391
Epoch 12/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.7089
Epoch 13/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.6813
Epoch 14/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.6633
Epoch 15/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.6901
Epoch 16/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.6465
Epoch 17/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 2.6687
Epoch 18/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.6057
Epoch 19/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.5719
Epoch 20/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.5109
Epoch 21/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.4575
Epoch 22/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.3987
Epoch 23/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.4858
Epoch 24/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 2.4149
Epoch 25/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 2.3499
Epoch 26/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.2135
Epoch 27/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 2.1242
Epoch 28/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.0749
Epoch 29/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 1.9952
Epoch 30/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 1.9352
Epoch 31/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 2.0289
Epoch 32/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 1.9452
Epoch 33/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 2.0856
Epoch 34/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.1436
Epoch 35/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 2.0565
Epoch 35: early stopping
Generated poem with temperature=0.5:

Seed: Two roads diverged in a yellow wood,
And sorry I c
Two roads diverged in a yellow wood,
And sorry I c nn d   asasaaI s
Ao
en ie e   avvvevel        tgr o;b
noondno o oen avvevrl          od
wo ne  en h n  vvea r I  s 
doodowkoooelw een n  vravaaav    Icote
Tn oetiehee bnen nnvtvvrrvvrggg;;,odowAnoood

Generated poem with temperature=1.0:

Seed: Two roads diverged in a yellow wood,
And sorry I c
Two roads diverged in a yellow wood,
And sorry I ce  obd     salu rtu d nnt ii nte evavurn dd e orrggeravduwhooddteToneeen a vevv Ie  a tr
sgowdtT dAdoodoben awe v dv nnsIl rIs l nodkw ndedereee  aivaaev esrrtrIIhAwodddd enb ose eo  no v oref  no tdo

Generated poem with temperature=1.5:

Seed: Two roads diverged in a yellow wood,
And sorry I c
Two roads diverged in a yellow wood,
And sorry I coluln  loaavgrnIsAcd
bTgnkrrhtvvelaavnoreo;I uf;
gt,;A
b;nrvThlkdke e r wfassgIoIlv
sed

toAwerdn ht,end b dataerlsrrIldodsdI,colrAddkndotoaoaen raavteo  ausIg AsuudbkkTyAow n
tb iIt tnoo awslIn toolh

Why not meaningful Output:

Use a large corpus of poems (e.g., thousands of Shakespearean sonnets, modern poems, etc.) for better learning.

Lack of Meaning Understanding in LSTM

LSTMs don't “understand” meaning—they only learn statistical patterns of sequences.
For actual semantic understanding or theme, you'd need Transformers (GPT, BERT) trained on large corpora.

Improvement	What to Do
More Data	Use a large text corpus (~1MB or more) of poems.
Word-Level Modeling	Use `Tokenizer` + `Embedding` layer for word-level LSTM generation.
Train Longer	Use at least 100–200 epochs with good hardware.
Use GRU/BiLSTM	Try stacking layers or using bidirectional LSTMs.
Use Pretrained Models	Fine-tune GPT-2 or LLaMA models on your poem dataset.

LSTM for Text and Sequence Generation

1. What is LSTM?

LSTM Cell Structure

LSTM model to generate poem-like text.

Explanation of Key Concepts

Why not meaningful Output:

Lack of Meaning Understanding in LSTM

Fine-tuning a small GPT model on poems will result in meaningful poetic generation much faster than an LSTM.

Product

Resources

Company