Path: blob/master/Generative NLP Models using Python/Ex-2 LSTM for Poem like Text.ipynb
3074 views
LSTM for Text and Sequence Generation
This notebook explains the mathematical foundation and code implementation of LSTM (Long Short-Term Memory) models for both text generation and sequence prediction tasks using Python and TensorFlow/Keras.
1. What is LSTM?
LSTM (Long Short-Term Memory) is a type of Recurrent Neural Network (RNN) that is capable of learning long-term dependencies. It solves the vanishing gradient problem in traditional RNNs using gates that control the flow of information.
LSTM Cell Structure
Mathematics:
Forget gate:
Input gate:
Candidate memory:
Output gate:
Final memory update:
Hidden state:
LSTM model to generate poem-like text.
Temperature Sampling: Controls randomness when generating each character. A lower temperature (e.g., 0.5) makes predictions more conservative, while a higher temperature (e.g., 1.5) introduces more randomness.
Early Stopping: A callback that stops training if the validation loss stops improving, preventing overfitting.
Seed Control: Setting random seeds for reproducibility, ensuring you get the same model initialization and training behavior each time.
Explanation of Key Concepts
Temperature Sampling:
What it is: Adjusts the probability distribution used to pick the next character by applying a "temperature" parameter.
Example:
With
temperature=0.5
, the model tends to pick high-probability characters (more deterministic).With
temperature=1.5
, choices become more random, which might lead to more creative or unexpected outputs.
Early Stopping:
What it is: A strategy to halt model training when further improvement is unlikely, based on monitoring a metric (e.g., loss).
Example:
If the training loss does not decrease for 5 consecutive epochs (
patience=5
), training stops to avoid overfitting.
Seed Control:
What it is: Setting fixed random seeds in Python, NumPy, and TensorFlow to ensure reproducibility.
Example:
By setting
seed_value=42
for all relevant libraries, you ensure that the randomness (e.g., weight initialization, training shuffles, sampling) remains the same across different runs.
Epoch 1/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 2s 21ms/step - loss: 3.2847
Epoch 2/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 3.1725
Epoch 3/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 2.8910
Epoch 4/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.8720
Epoch 5/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.8336
Epoch 6/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.8184
Epoch 7/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.8081
Epoch 8/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.7935
Epoch 9/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.7778
Epoch 10/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 2.7612
Epoch 11/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.7391
Epoch 12/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.7089
Epoch 13/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.6813
Epoch 14/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.6633
Epoch 15/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.6901
Epoch 16/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.6465
Epoch 17/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 2.6687
Epoch 18/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.6057
Epoch 19/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.5719
Epoch 20/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.5109
Epoch 21/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.4575
Epoch 22/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.3987
Epoch 23/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.4858
Epoch 24/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 2.4149
Epoch 25/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 2.3499
Epoch 26/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.2135
Epoch 27/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 2.1242
Epoch 28/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 2.0749
Epoch 29/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 1.9952
Epoch 30/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 1.9352
Epoch 31/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 2.0289
Epoch 32/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 1.9452
Epoch 33/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 2.0856
Epoch 34/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.1436
Epoch 35/50
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 2.0565
Epoch 35: early stopping
Generated poem with temperature=0.5:
Seed: Two roads diverged in a yellow wood,
And sorry I c
Two roads diverged in a yellow wood,
And sorry I c nn d asasaaI s
Ao
en ie e avvvevel tgr o;b
noondno o oen avvevrl od
wo ne en h n vvea r I s
doodowkoooelw een n vravaaav Icote
Tn oetiehee bnen nnvtvvrrvvrggg;;,odowAnoood
Generated poem with temperature=1.0:
Seed: Two roads diverged in a yellow wood,
And sorry I c
Two roads diverged in a yellow wood,
And sorry I ce obd salu rtu d nnt ii nte evavurn dd e orrggeravduwhooddteToneeen a vevv Ie a tr
sgowdtT dAdoodoben awe v dv nnsIl rIs l nodkw ndedereee aivaaev esrrtrIIhAwodddd enb ose eo no v oref no tdo
Generated poem with temperature=1.5:
Seed: Two roads diverged in a yellow wood,
And sorry I c
Two roads diverged in a yellow wood,
And sorry I coluln loaavgrnIsAcd
bTgnkrrhtvvelaavnoreo;I uf;
gt,;A
b;nrvThlkdke e r wfassgIoIlv
sed
toAwerdn ht,end b dataerlsrrIldodsdI,colrAddkndotoaoaen raavteo ausIg AsuudbkkTyAow n
tb iIt tnoo awslIn toolh
Why not meaningful Output:
Use a large corpus of poems (e.g., thousands of Shakespearean sonnets, modern poems, etc.) for better learning.
Lack of Meaning Understanding in LSTM
LSTMs don't “understand” meaning—they only learn statistical patterns of sequences.
For actual semantic understanding or theme, you'd need Transformers (GPT, BERT) trained on large corpora.
Improvement | What to Do |
---|---|
More Data | Use a large text corpus (~1MB or more) of poems. |
Word-Level Modeling | Use Tokenizer + Embedding layer for word-level LSTM generation. |
Train Longer | Use at least 100–200 epochs with good hardware. |
Use GRU/BiLSTM | Try stacking layers or using bidirectional LSTMs. |
Use Pretrained Models | Fine-tune GPT-2 or LLaMA models on your poem dataset. |