Path: blob/master/Generative NLP Models using Python/4 N-gram models and Neural Network models .ipynb
3074 views
The N-Gram model is a probabilistic language model used to predict the next item in a sequence, usually a word or character, based on the previous N-1 items.
Types
📘 1. Unigram Model (N = 1)
Only considers the probability of each word independently.
Corpus: "I love ice cream"
Unigram probabilities (based on frequency):
P(I) = 1/4
P(love) = 1/4
P(ice) = 1/4
P(cream) = 1/4
Usage: To generate text, choose each word based on its unigram probability, ignoring previous words.
📘 2. Bigram Model (N = 2)
Considers the probability of a word based on the previous word.
Corpus: "I love ice cream"
Bigrams and probabilities:
P(love | I) = 1
P(ice | love) = 1
P(cream | ice) = 1
To calculate a sentence probability:
P(I love ice cream) = P(I) × P(love | I) × P(ice | love) × P(cream | ice)
Let’s assume P(I) = 0.25 (from unigram), then: P(sentence) = 0.25 × 1 × 1 × 1 = 0.25
📘 3. Trigram Model (N = 3)
Considers two previous words to predict the next.
Corpus: "I love ice cream"
Trigrams and probabilities:
P(ice | I love) = 1
P(cream | love ice) = 1
To calculate a sentence probability (approximate):
Assume we also have P(I) and P(love | I):
P(sentence) = P(I) × P(love | I) × P(ice | I love) × P(cream | love ice)
Summary:
Model | Memory | Example Prediction |
---|---|---|
Unigram | 0 words | Predict next word from overall freq |
Bigram | 1 word | Predict "cream" from "ice" |
Trigram | 2 words | Predict "cream" from "love ice" |
Real Use Case Example:
Suppose you're using a bigram model and want to predict the next word after "machine". From a corpus, you might get:
P(learning | machine) = 0.7
P(gun | machine) = 0.2
P(shop | machine) = 0.1
So, the model would most likely predict "learning".
Bigram Model step-by-step using the sentence:
"Ashi have ice cream jar. Ice cream is her fav."
Step 1: Preprocess the Sentence
Let’s split the text into lowercase words and add a sentence boundary marker <s>
at the beginning of each sentence.
Preprocessed tokens:
Tokenized list:
Step 2: Extract Bigrams
From the token list, we form pairs of consecutive words:
Step 3: Count Bigrams and Unigrams
Bigram Counts:
Bigram | Count |
---|---|
(' | 1 |
('ashi', 'have') | 1 |
('have', 'ice') | 1 |
('ice', 'cream') | 2 |
('cream', 'jar') | 1 |
('jar', '.') | 1 |
('.', ' | 1 |
(' | 1 |
('cream', 'is') | 1 |
('is', 'her') | 1 |
('her', 'fav') | 1 |
('fav', '.') | 1 |
Unigram Counts (first word in each bigram):
Word | Count |
---|---|
<s> | 2 |
ashi | 1 |
have | 1 |
ice | 2 |
cream | 2 |
jar | 1 |
. | 1 |
is | 1 |
her | 1 |
fav | 1 |
Step 4: Calculate Bigram Probabilities
Using Maximum Likelihood Estimation (MLE):
Examples:
Step 5: Bigram Sentence Probability
Let’s compute the probability of the first sentence: "Ashi have ice cream jar ." With tokens: <s> ashi have ice cream jar .
Summary:
The bigram model helps assign probabilities to word sequences by looking at pairs of words. In this example:
It recognizes that "ice cream" is more probable (seen twice).
It gives lower probabilities to less frequent bigrams like "cream jar" or "ashi have".
Simple Neural Network Language Model with TensorFlow
"life is love" [1,2,3] [length=4] [0,1,2,3] [0,0,1,2,3]