Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
suyashi29
GitHub Repository: suyashi29/python-su
Path: blob/master/Generative NLP Models using Python/Ex-2 LSTM for Poem like Text.ipynb
3074 views
Kernel: Python 3 (ipykernel)

LSTM for Text and Sequence Generation

This notebook explains the mathematical foundation and code implementation of LSTM (Long Short-Term Memory) models for both text generation and sequence prediction tasks using Python and TensorFlow/Keras.

1. What is LSTM?

LSTM (Long Short-Term Memory) is a type of Recurrent Neural Network (RNN) that is capable of learning long-term dependencies. It solves the vanishing gradient problem in traditional RNNs using gates that control the flow of information.

LSTM Cell Structure

Mathematics:

  • Forget gate: ft=σ(Wf[ht1,xt]+bf)f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)

  • Input gate: it=σ(Wi[ht1,xt]+bi)i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)

  • Candidate memory: C~t=tanh(WC[ht1,xt]+bC)\tilde{C}_t = \tanh(W_C \cdot [h_{t-1}, x_t] + b_C)

  • Output gate: ot=σ(Wo[ht1,xt]+bo)o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)

  • Final memory update: Ct=ftCt1+itC~tC_t = f_t * C_{t-1} + i_t * \tilde{C}_t

  • Hidden state: ht=ottanh(Ct)h_t = o_t * \tanh(C_t)

"Life is good" life is good lefi ood is is good life Early Stopping: val

LSTM model to generate poem-like text.

  • Temperature Sampling: Controls randomness when generating each character. A lower temperature (e.g., 0.5) makes predictions more conservative, while a higher temperature (e.g., 1.5) introduces more randomness.

  • Early Stopping: A callback that stops training if the validation loss stops improving, preventing overfitting.

  • Seed Control: Setting random seeds for reproducibility, ensuring you get the same model initialization and training behavior each time.

Explanation of Key Concepts

  1. Temperature Sampling:

    • What it is: Adjusts the probability distribution used to pick the next character by applying a "temperature" parameter.

    • Example:

      • With temperature=0.5, the model tends to pick high-probability characters (more deterministic).

      • With temperature=1.5, choices become more random, which might lead to more creative or unexpected outputs.

  2. Early Stopping:

    • What it is: A strategy to halt model training when further improvement is unlikely, based on monitoring a metric (e.g., loss).

    • Example:

      • If the training loss does not decrease for 5 consecutive epochs (patience=5), training stops to avoid overfitting.

  3. Seed Control:

    • What it is: Setting fixed random seeds in Python, NumPy, and TensorFlow to ensure reproducibility.

    • Example:

      • By setting seed_value=42 for all relevant libraries, you ensure that the randomness (e.g., weight initialization, training shuffles, sampling) remains the same across different runs.


import numpy as np np.random.seed(42) numbers=np.random.rand(5) numbers
import numpy as np import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import LSTM, Dense, Embedding from tensorflow.keras.callbacks import EarlyStopping import random import os # Set seeds for reproducibility seed_value = 42 os.environ['PYTHONHASHSEED'] = str(seed_value) tf.random.set_seed(seed_value) np.random.seed(seed_value) random.seed(seed_value) text = ( "Two roads diverged in a yellow wood,\n" "And sorry I could not travel both\n" "And be one traveler, long I stood\n" "And looked down one as far as I could\n" "To where it bent in the undergrowth;" ) # Create a sorted list of unique characters chars = sorted(list(set(text))) print("Unique characters:", chars)
Unique characters: ['\n', ' ', ',', ';', 'A', 'I', 'T', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'k', 'l', 'n', 'o', 'r', 's', 't', 'u', 'v', 'w', 'y']
# Create mappings from characters to indices and vice versa char2idx = {c: i for i, c in enumerate(chars)} idx2char = {i: c for i, c in enumerate(chars)}
# Set sequence length and create input-output sequences seq_length = 50 sequences = [] next_chars = [] for i in range(0, len(text) - seq_length): sequences.append(text[i: i + seq_length]) next_chars.append(text[i + seq_length]) print("Number of sequences:", len(sequences))
Number of sequences: 129
# Vectorize the sequences (one-hot encoding) X = np.zeros((len(sequences), seq_length, len(chars)), dtype=np.bool_) y = np.zeros((len(sequences), len(chars)), dtype=np.bool_) for i, seq in enumerate(sequences): for t, char in enumerate(seq): X[i, t, char2idx[char]] = 1 y[i, char2idx[next_chars[i]]] = 1
# We define a simple LSTM model for character-level text prediction. model = Sequential([ LSTM(128, input_shape=(seq_length, len(chars))), Dense(len(chars), activation='softmax') ]) model.compile(loss='categorical_crossentropy', optimizer='adam') model.summary()
# **Early Stopping** is a strategy that monitors a metric (e.g., validation loss) during training, # and stops training if there is no further improvement. This can save time and prevent overfitting. # In our case, we monitor the training loss (or you could split some data as a validation set) # and stop if it does not improve for a few epochs. early_stopping = EarlyStopping(monitor='loss', patience=5, verbose=1)
## 5. Training the Model # We train the model on our prepared dataset. For real applications, use more epochs and a larger corpus. history = model.fit(X, y, epochs=50, batch_size=16, callbacks=[early_stopping]) def sample(preds, temperature=1.0): """ Sample an index from a probability array reweighted by temperature. """ preds = np.asarray(preds).astype('float64') preds = np.log(preds + 1e-8) / temperature # add epsilon to avoid log(0) exp_preds = np.exp(preds) preds = exp_preds / np.sum(exp_preds) probas = np.random.multinomial(1, preds, 1) return np.argmax(probas) def generate_text(seed, length=200, temperature=1.0): generated = seed print("Seed:", seed) for i in range(length): # Prepare the input sequence (one-hot encoding) x_pred = np.zeros((1, seq_length, len(chars))) for t, char in enumerate(seed): x_pred[0, t, char2idx[char]] = 1. # Predict the next character probabilities preds = model.predict(x_pred, verbose=0)[0] next_index = sample(preds, temperature) next_char = idx2char[next_index] # Append the next character generated += next_char seed = seed[1:] + next_char return generated seed_text = text[:seq_length] print("Generated poem with temperature=0.5:\n") print(generate_text(seed_text, length=200, temperature=0.5)) print("\nGenerated poem with temperature=1.0:\n") print(generate_text(seed_text, length=200, temperature=1.0)) print("\nGenerated poem with temperature=1.5:\n") print(generate_text(seed_text, length=200, temperature=1.5))
Epoch 1/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 2s 21ms/step - loss: 3.2847 Epoch 2/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 3.1725 Epoch 3/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 2.8910 Epoch 4/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.8720 Epoch 5/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.8336 Epoch 6/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.8184 Epoch 7/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.8081 Epoch 8/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.7935 Epoch 9/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.7778 Epoch 10/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 2.7612 Epoch 11/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.7391 Epoch 12/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.7089 Epoch 13/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.6813 Epoch 14/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.6633 Epoch 15/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.6901 Epoch 16/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.6465 Epoch 17/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 2.6687 Epoch 18/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.6057 Epoch 19/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.5719 Epoch 20/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.5109 Epoch 21/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.4575 Epoch 22/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.3987 Epoch 23/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.4858 Epoch 24/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 2.4149 Epoch 25/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 2.3499 Epoch 26/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.2135 Epoch 27/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 2.1242 Epoch 28/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.0749 Epoch 29/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 1.9952 Epoch 30/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 1.9352 Epoch 31/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 2.0289 Epoch 32/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 1.9452 Epoch 33/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 2.0856 Epoch 34/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.1436 Epoch 35/50 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 2.0565 Epoch 35: early stopping Generated poem with temperature=0.5: Seed: Two roads diverged in a yellow wood, And sorry I c Two roads diverged in a yellow wood, And sorry I c nn d asasaaI s Ao en ie e avvvevel tgr o;b noondno o oen avvevrl od wo ne en h n vvea r I s doodowkoooelw een n vravaaav Icote Tn oetiehee bnen nnvtvvrrvvrggg;;,odowAnoood Generated poem with temperature=1.0: Seed: Two roads diverged in a yellow wood, And sorry I c Two roads diverged in a yellow wood, And sorry I ce obd salu rtu d nnt ii nte evavurn dd e orrggeravduwhooddteToneeen a vevv Ie a tr sgowdtT dAdoodoben awe v dv nnsIl rIs l nodkw ndedereee aivaaev esrrtrIIhAwodddd enb ose eo no v oref no tdo Generated poem with temperature=1.5: Seed: Two roads diverged in a yellow wood, And sorry I c Two roads diverged in a yellow wood, And sorry I coluln loaavgrnIsAcd bTgnkrrhtvvelaavnoreo;I uf; gt,;A b;nrvThlkdke e r wfassgIoIlv sed toAwerdn ht,end b dataerlsrrIldodsdI,colrAddkndotoaoaen raavteo ausIg AsuudbkkTyAow n tb iIt tnoo awslIn toolh

Why not meaningful Output:

Use a large corpus of poems (e.g., thousands of Shakespearean sonnets, modern poems, etc.) for better learning.

Lack of Meaning Understanding in LSTM

  • LSTMs don't “understand” meaning—they only learn statistical patterns of sequences.

  • For actual semantic understanding or theme, you'd need Transformers (GPT, BERT) trained on large corpora.

ImprovementWhat to Do
More DataUse a large text corpus (~1MB or more) of poems.
Word-Level ModelingUse Tokenizer + Embedding layer for word-level LSTM generation.
Train LongerUse at least 100–200 epochs with good hardware.
Use GRU/BiLSTMTry stacking layers or using bidirectional LSTMs.
Use Pretrained ModelsFine-tune GPT-2 or LLaMA models on your poem dataset.

Fine-tuning a small GPT model on poems will result in meaningful poetic generation much faster than an LSTM.