Path: blob/master/ML Clustering Analysis/Day 2 EDA on Women purchasing pattern.ipynb
3074 views
Undestanding Data
Most of the rating are 3-5 which describe product quality are good
Most of the items are liked by buyers
Most of the women recommended the products
Good numbers of buyers liked the product
There can be discount if you recommend a product/
Most of the good rated products are recommended
Age 30 to 50 are big buyers
Because people of this age tends to have more money than teens or senior citizens.
Most of the product have got good and Average ratings
Lowest for 84 , 90, 91 & 94
Highest for 90 but lowest for 84 and 94
More numbers of count of recommendation over all age bands
We can see here buying is directly proprtional to Recommendations
Youngf Women are buying more and Adult women giving good rating rather than youngsters.
Word Cloud
Checking Null Values
Dropping All Null Values
Word Cloud For Review Text
Word Cloud For Title
Dresses , Knits , Blouses are the most sold items
Histograms
Boxplots
Distribution Plots
Individual Boxplots
Correlation Heatmap
Recommendation and ratings are showing good correlation, i.e, Directly Proportional to each other.
Clustering
I am creating a new data frame with all relevant coulums
Convert Categorical data to Numerical suong Labal Encoder
Label Encoding Categorical Columns
Scaling Data
K-Means Algorithm
Silhouette Coefficient
One of the metrics to evaluate the quality of clustering is referred to as silhouette analysis. Silhouette analysis can be applied to other clustering algorithms as well. Silhouette coefficient ranges between −1 and 1, where a higher silhouette coefficient refers to a model with more coherent clusters.
The Silhouette Coefficient is calculated using the mean intra-cluster distance ( a) and the mean nearest-cluster distance ( b) for each sample. The Silhouette Coefficient for a sample is (b - a) / max (a, b). To clarify, b is the distance between a sample and the nearest cluster that the sample is not a part of.