Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
suyashi29
GitHub Repository: suyashi29/python-su
Path: blob/master/Natural Language Processing using Python/ 2 NLTK Functions.ipynb
3074 views
Kernel: Python 3 (ipykernel)

Counts Number of times a word appears

import nltk from nltk.tokenize import word_tokenize, sent_tokenize from nltk.probability import FreqDist text = "Suyashi is from India. She works in India. Delhi is Capital of India" words = word_tokenize(text.lower()) freq_dist = FreqDist(words) print("Frequency Distribution:", freq_dist.most_common(5))

Function: lower(), re.sub()

Converts text to lowercase and removes special characters.

import re text = "Displays &the %context of a @word in a !text" normalized_text = re.sub(r"[^a-zA-Z0-9\s]", "", text.lower()) print("Normalized Text:", normalized_text)
Normalized Text: displays the context of a word in a text

Concordance

Function: Text.concordance Displays the context of a word in a text.

from nltk.text import Text from nltk.tokenize import word_tokenize words = word_tokenize("Ashi is an excellent data analyst. Ashi loves working on NLP projects.") text_obj = Text(words) text_obj.concordance("Ashi")
Displaying 2 of 2 matches: Ashi is an excellent data analyst . Ashi Ashi is an excellent data analyst . Ashi loves working on NLP projects .

Collocations

  • Function: BigramCollocationFinder

  • Finds word pairs that frequently occur together.

from nltk.collocations import BigramCollocationFinder from nltk.metrics import BigramAssocMeasures from nltk.tokenize import word_tokenize text = "GenAI powers creative solutions and simplifies complex tasks. GenAI is transforming industries." tokens = word_tokenize(text) bigram_finder = BigramCollocationFinder.from_words(tokens) bigrams = bigram_finder.nbest(BigramAssocMeasures.likelihood_ratio, 3) print("Top 3 Collocations:", bigrams)
Top 3 Collocations: [('and', 'simplifies'), ('complex', 'tasks'), ('creative', 'solutions')]
pip install wordcloud
Collecting wordcloud Obtaining dependency information for wordcloud from https://files.pythonhosted.org/packages/00/09/abb305dce85911b8fba382926cfc57f2f257729e25937fdcc63f3a1a67f9/wordcloud-1.9.4-cp311-cp311-win_amd64.whl.metadata Downloading wordcloud-1.9.4-cp311-cp311-win_amd64.whl.metadata (3.5 kB) Requirement already satisfied: numpy>=1.6.1 in c:\users\suyashi144893\appdata\local\anaconda3\lib\site-packages (from wordcloud) (1.24.3) Requirement already satisfied: pillow in c:\users\suyashi144893\appdata\local\anaconda3\lib\site-packages (from wordcloud) (9.4.0) Requirement already satisfied: matplotlib in c:\users\suyashi144893\appdata\local\anaconda3\lib\site-packages (from wordcloud) (3.7.2) Requirement already satisfied: contourpy>=1.0.1 in c:\users\suyashi144893\appdata\local\anaconda3\lib\site-packages (from matplotlib->wordcloud) (1.0.5) Requirement already satisfied: cycler>=0.10 in c:\users\suyashi144893\appdata\local\anaconda3\lib\site-packages (from matplotlib->wordcloud) (0.11.0) Requirement already satisfied: fonttools>=4.22.0 in c:\users\suyashi144893\appdata\local\anaconda3\lib\site-packages (from matplotlib->wordcloud) (4.25.0) Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\suyashi144893\appdata\local\anaconda3\lib\site-packages (from matplotlib->wordcloud) (1.4.4) Requirement already satisfied: packaging>=20.0 in c:\users\suyashi144893\appdata\local\anaconda3\lib\site-packages (from matplotlib->wordcloud) (23.1) Requirement already satisfied: pyparsing<3.1,>=2.3.1 in c:\users\suyashi144893\appdata\local\anaconda3\lib\site-packages (from matplotlib->wordcloud) (3.0.9) Requirement already satisfied: python-dateutil>=2.7 in c:\users\suyashi144893\appdata\local\anaconda3\lib\site-packages (from matplotlib->wordcloud) (2.8.2) Requirement already satisfied: six>=1.5 in c:\users\suyashi144893\appdata\local\anaconda3\lib\site-packages (from python-dateutil>=2.7->matplotlib->wordcloud) (1.16.0) Downloading wordcloud-1.9.4-cp311-cp311-win_amd64.whl (299 kB) ---------------------------------------- 0.0/299.9 kB ? eta -:--:-- - -------------------------------------- 10.2/299.9 kB ? eta -:--:-- -- ------------------------------------- 20.5/299.9 kB ? eta -:--:-- --- ----------------------------------- 30.7/299.9 kB 435.7 kB/s eta 0:00:01 ----- --------------------------------- 41.0/299.9 kB 330.3 kB/s eta 0:00:01 ------- ------------------------------- 61.4/299.9 kB 297.7 kB/s eta 0:00:01 ------- ------------------------------- 61.4/299.9 kB 297.7 kB/s eta 0:00:01 --------- ----------------------------- 71.7/299.9 kB 262.6 kB/s eta 0:00:01 ----------- --------------------------- 92.2/299.9 kB 275.8 kB/s eta 0:00:01 ------------ ------------------------- 102.4/299.9 kB 281.0 kB/s eta 0:00:01 ------------ ------------------------- 102.4/299.9 kB 281.0 kB/s eta 0:00:01 --------------- ---------------------- 122.9/299.9 kB 277.4 kB/s eta 0:00:01 ------------------ ------------------- 143.4/299.9 kB 284.4 kB/s eta 0:00:01 ------------------- ------------------ 153.6/299.9 kB 278.4 kB/s eta 0:00:01 ---------------------- --------------- 174.1/299.9 kB 291.5 kB/s eta 0:00:01 ----------------------- -------------- 184.3/299.9 kB 293.6 kB/s eta 0:00:01 ---------------------------- --------- 225.3/299.9 kB 320.0 kB/s eta 0:00:01 ----------------------------- -------- 235.5/299.9 kB 313.8 kB/s eta 0:00:01 ----------------------------- -------- 235.5/299.9 kB 313.8 kB/s eta 0:00:01 -------------------------------- ----- 256.0/299.9 kB 308.4 kB/s eta 0:00:01 ------------------------------------ - 286.7/299.9 kB 321.9 kB/s eta 0:00:01 -------------------------------------- 299.9/299.9 kB 319.7 kB/s eta 0:00:00 Installing collected packages: wordcloud Successfully installed wordcloud-1.9.4 Note: you may need to restart the kernel to use updated packages.
from wordcloud import WordCloud import matplotlib.pyplot as plt from nltk.probability import FreqDist text = "Dancing is best Excercise for mind and body.I love Dancing. do you often do Dancing excersise?" words = word_tokenize(text.lower()) freq_dist = FreqDist(words) wordcloud = WordCloud(width=800, height=400).generate_from_frequencies(freq_dist) plt.figure(figsize=(10, 5)) plt.imshow(wordcloud, interpolation="bilinear") plt.axis("off") plt.show()
Image in a Jupyter notebook