CoCalc -- 2 NLTK Functions.ipynb

GitHub Repository: suyashi29/python-su
Path: blob/master/Natural Language Processing using Python/ 2 NLTK Functions.ipynb
³⁰⁷⁴ views

Kernel: Python 3 (ipykernel)

Counts Number of times a word appears

In [ ]:

import nltk
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.probability import FreqDist

text = "Suyashi is from India. She works in India. Delhi is Capital of India"
words = word_tokenize(text.lower())
freq_dist = FreqDist(words)
print("Frequency Distribution:", freq_dist.most_common(5))

Function: lower(), re.sub()

Converts text to lowercase and removes special characters.

In [25]:

import re

text = "Displays &the %context of a @word in a !text"
normalized_text = re.sub(r"[^a-zA-Z0-9\s]", "", text.lower())
print("Normalized Text:", normalized_text)

Out[25]:

Normalized Text: displays the context of a word in a text

Concordance

Function: Text.concordance Displays the context of a word in a text.

In [11]:

from nltk.text import Text
from nltk.tokenize import word_tokenize

words = word_tokenize("Ashi is an excellent data analyst. Ashi loves working on NLP projects.")
text_obj = Text(words)
text_obj.concordance("Ashi")

Out[11]:

Displaying 2 of 2 matches:
 Ashi is an excellent data analyst . Ashi 
Ashi is an excellent data analyst . Ashi loves working on NLP projects .

Collocations

Function: BigramCollocationFinder
Finds word pairs that frequently occur together.

In [13]:

from nltk.collocations import BigramCollocationFinder
from nltk.metrics import BigramAssocMeasures
from nltk.tokenize import word_tokenize

text = "GenAI powers creative solutions and simplifies complex tasks. GenAI is transforming industries."
tokens = word_tokenize(text)
bigram_finder = BigramCollocationFinder.from_words(tokens)
bigrams = bigram_finder.nbest(BigramAssocMeasures.likelihood_ratio, 3)
print("Top 3 Collocations:", bigrams)

Out[13]:

Top 3 Collocations: [('and', 'simplifies'), ('complex', 'tasks'), ('creative', 'solutions')]

In [22]:

pip install wordcloud

Out[22]:

Collecting wordcloud
  Obtaining dependency information for wordcloud from https://files.pythonhosted.org/packages/00/09/abb305dce85911b8fba382926cfc57f2f257729e25937fdcc63f3a1a67f9/wordcloud-1.9.4-cp311-cp311-win_amd64.whl.metadata
  Downloading wordcloud-1.9.4-cp311-cp311-win_amd64.whl.metadata (3.5 kB)
Requirement already satisfied: numpy>=1.6.1 in c:\users\suyashi144893\appdata\local\anaconda3\lib\site-packages (from wordcloud) (1.24.3)
Requirement already satisfied: pillow in c:\users\suyashi144893\appdata\local\anaconda3\lib\site-packages (from wordcloud) (9.4.0)
Requirement already satisfied: matplotlib in c:\users\suyashi144893\appdata\local\anaconda3\lib\site-packages (from wordcloud) (3.7.2)
Requirement already satisfied: contourpy>=1.0.1 in c:\users\suyashi144893\appdata\local\anaconda3\lib\site-packages (from matplotlib->wordcloud) (1.0.5)
Requirement already satisfied: cycler>=0.10 in c:\users\suyashi144893\appdata\local\anaconda3\lib\site-packages (from matplotlib->wordcloud) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in c:\users\suyashi144893\appdata\local\anaconda3\lib\site-packages (from matplotlib->wordcloud) (4.25.0)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\suyashi144893\appdata\local\anaconda3\lib\site-packages (from matplotlib->wordcloud) (1.4.4)
Requirement already satisfied: packaging>=20.0 in c:\users\suyashi144893\appdata\local\anaconda3\lib\site-packages (from matplotlib->wordcloud) (23.1)
Requirement already satisfied: pyparsing<3.1,>=2.3.1 in c:\users\suyashi144893\appdata\local\anaconda3\lib\site-packages (from matplotlib->wordcloud) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7 in c:\users\suyashi144893\appdata\local\anaconda3\lib\site-packages (from matplotlib->wordcloud) (2.8.2)
Requirement already satisfied: six>=1.5 in c:\users\suyashi144893\appdata\local\anaconda3\lib\site-packages (from python-dateutil>=2.7->matplotlib->wordcloud) (1.16.0)
Downloading wordcloud-1.9.4-cp311-cp311-win_amd64.whl (299 kB)
   ---------------------------------------- 0.0/299.9 kB ? eta -:--:--
   - -------------------------------------- 10.2/299.9 kB ? eta -:--:--
   -- ------------------------------------- 20.5/299.9 kB ? eta -:--:--
   --- ----------------------------------- 30.7/299.9 kB 435.7 kB/s eta 0:00:01
   ----- --------------------------------- 41.0/299.9 kB 330.3 kB/s eta 0:00:01
   ------- ------------------------------- 61.4/299.9 kB 297.7 kB/s eta 0:00:01
   ------- ------------------------------- 61.4/299.9 kB 297.7 kB/s eta 0:00:01
   --------- ----------------------------- 71.7/299.9 kB 262.6 kB/s eta 0:00:01
   ----------- --------------------------- 92.2/299.9 kB 275.8 kB/s eta 0:00:01
   ------------ ------------------------- 102.4/299.9 kB 281.0 kB/s eta 0:00:01
   ------------ ------------------------- 102.4/299.9 kB 281.0 kB/s eta 0:00:01
   --------------- ---------------------- 122.9/299.9 kB 277.4 kB/s eta 0:00:01
   ------------------ ------------------- 143.4/299.9 kB 284.4 kB/s eta 0:00:01
   ------------------- ------------------ 153.6/299.9 kB 278.4 kB/s eta 0:00:01
   ---------------------- --------------- 174.1/299.9 kB 291.5 kB/s eta 0:00:01
   ----------------------- -------------- 184.3/299.9 kB 293.6 kB/s eta 0:00:01
   ---------------------------- --------- 225.3/299.9 kB 320.0 kB/s eta 0:00:01
   ----------------------------- -------- 235.5/299.9 kB 313.8 kB/s eta 0:00:01
   ----------------------------- -------- 235.5/299.9 kB 313.8 kB/s eta 0:00:01
   -------------------------------- ----- 256.0/299.9 kB 308.4 kB/s eta 0:00:01
   ------------------------------------ - 286.7/299.9 kB 321.9 kB/s eta 0:00:01
   -------------------------------------- 299.9/299.9 kB 319.7 kB/s eta 0:00:00
Installing collected packages: wordcloud
Successfully installed wordcloud-1.9.4
Note: you may need to restart the kernel to use updated packages.

In [24]:

from wordcloud import WordCloud
import matplotlib.pyplot as plt
from nltk.probability import FreqDist

text = "Dancing is best Excercise for mind and body.I love Dancing. do you often do Dancing excersise?"
words = word_tokenize(text.lower())
freq_dist = FreqDist(words)

wordcloud = WordCloud(width=800, height=400).generate_from_frequencies(freq_dist)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

Out[24]:

In [ ]:

Counts Number of times a word appears

Function: lower(), re.sub()

Concordance

Collocations

Product

Resources

Company