Path: blob/master/Natural Language Processing using Python/Exploratory Analysis of Textual Data.ipynb
3074 views
Kernel: Python 3 (ipykernel)
Exploratory Analysis of Textual Data
Data Preprocessing
pip install textstat pip install TextBlob
In [1]:
In [2]:
In [3]:
Out[3]:
C:\Users\Suyashi144893\AppData\Local\Temp\1\ipykernel_3856\3023027549.py:3: DtypeWarning: Columns (1,10) have mixed types. Specify dtype option on import or set low_memory=False.
text=pd.read_csv('AWSReview.csv')
(34660, 21)
In [4]:
Out[4]:
In [5]:
Out[5]:
id 0
name 6760
asins 2
brand 0
categories 0
keys 0
manufacturer 0
reviews.date 39
reviews.dateAdded 10621
reviews.dateSeen 0
reviews.didPurchase 34659
reviews.doRecommend 594
reviews.id 34659
reviews.numHelpful 529
reviews.rating 33
reviews.sourceURLs 0
reviews.text 1
reviews.title 6
reviews.userCity 34660
reviews.userProvince 34660
reviews.username 7
dtype: int64
In [6]:
Out[6]:
In [9]:
Out[9]:
C:\Users\Suyashi144893\AppData\Local\Temp\1\ipykernel_3856\2047947588.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
textdata.dropna(inplace=True)
C:\Users\Suyashi144893\AppData\Local\Temp\1\ipykernel_3856\2047947588.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
textdata.dropna(inplace=True)
In [8]:
Out[8]:
(27409, 4)
1. dropna ()-
2.fillna(0) 3.fillna("median","mode","mean")
Median and Mean : mean, outliers: Median
10,500,800,700,900,1000,1200,1100,450
Mode:"A","A","c","d"
=2, A
In [10]:
In [11]:
Out[11]:
9
In [14]:
Out[14]:
[2, 6, 8, 12]
In [15]:
Out[15]:
Number of products matching the criteria is 10
In [20]:
In [19]:
Out[19]:
10
In [21]:
Out[21]:
0 All-New Fire HD 8 Tablet, 8 HD Display, Wi-Fi,...
1 All-New Fire HD 8 Tablet, 8 HD Display, Wi-Fi,...
2 All-New Fire HD 8 Tablet, 8 HD Display, Wi-Fi,...
3 All-New Fire HD 8 Tablet, 8 HD Display, Wi-Fi,...
4 All-New Fire HD 8 Tablet, 8 HD Display, Wi-Fi,...
...
26715 Amazon Fire Tv
26716 Amazon Fire Tv
26717 Amazon Fire Tv
26718 Amazon Fire Tv
26719 Amazon Fire Tv
Name: name, Length: 26720, dtype: object
In [22]:
Out[22]:
0 This product so far has not disappointed. My c...
1 great for beginner or experienced person. Boug...
2 Inexpensive tablet for him to use and learn on...
3 I've had my Fire HD 8 two weeks now and I love...
4 I bought this for my grand daughter when she c...
...
26715 It has many uses. You can listen to music, che...
26716 Cost is not outrageous. Easy setup, fun to use...
26717 I knew about this from its crowd funding start...
26718 This is a neat product but did not fit my need...
26719 Responses well and there are lots of skills to...
Name: reviews.text, Length: 26720, dtype: object
In [23]:
Out[23]:
Review:
(0, 'Not easy for elderly users cease of ads that pop up.')
Review:
(1, 'Excellent product. Easy to use, large screen makes watching movies and reading easier.')
Review:
(2, 'Wanted my father to have his first tablet and this is a very good value. He can watch movies and play a few games. Easy enough for him to use.')
Review:
(3, 'Simply does everything I need. Thank youAnd silk works wonders')
Review:
(4, 'Got it as a present and love the size of the screen')
In [24]:
In [25]:
Out[25]:
0 this product so far has not disappointed. my c...
1 great for beginner or experienced person. boug...
2 inexpensive tablet for him to use and learn on...
3 i've had my fire hd 8 two weeks now and i love...
4 i bought this for my grand daughter when she c...
...
26715 it has many uses. you can listen to music, che...
26716 cost is not outrageous. easy setup, fun to use...
26717 i knew about this from its crowd funding start...
26718 this is a neat product but did not fit my need...
26719 responses well and there are lots of skills to...
Name: reviews.text, Length: 26720, dtype: object
In [26]:
In [27]:
In [28]:
Out[28]:
0 this product so far has not disappointed my ch...
1 great for beginner or experienced person bough...
2 inexpensive tablet for him to use and learn on...
3 ive had my fire hd two weeks now and i love i...
4 i bought this for my grand daughter when she c...
...
26715 it has many uses you can listen to music check...
26716 cost is not outrageous easy setup fun to use a...
26717 i knew about this from its crowd funding start...
26718 this is a neat product but did not fit my need...
26719 responses well and there are lots of skills to...
Name: reviews.text, Length: 26720, dtype: object
-1, very negtive 0:Neural 1: Very positive
In [29]:
In [31]:
Out[31]:
0 0.325000
1 0.800000
2 0.600000
3 0.374583
4 0.368056
...
26715 0.500000
26716 0.411111
26717 0.512500
26718 0.250000
26719 0.000000
Name: emotion, Length: 26720, dtype: float64
In [32]:
Out[32]:
Sentiment(polarity=0.8, subjectivity=1.0)
In [33]:
Out[33]:
Sentiment(polarity=-0.7000000000000001, subjectivity=0.9666666666666667)
In [34]:
Out[34]:
Sentiment(polarity=0.0, subjectivity=0.0)
In [ ]:
In [35]:
Out[35]:
Here first few products have good feedback from the viewers whereas last few products depicted in the bar graph have lesser user ratings. This helps in understanding the popularity of products through user reviews.
Python package textstat is used to calculate statistics from text to determine readability of texts. We can use this to determine if reading time of reviews upvoted as helpful and non-helpful have any impact.
In [36]:
In [37]:
Out[37]:
Reading Time of upvoted reviews is 3.6968225190839696
Reading Time of not upvoted reviews is 1.8005997496301354
Previous review was helpful to decide about product
Create a World Cloud review steps:
token, Stopwords
In [ ]: