Path: blob/master/ML Clustering Analysis/Lab 3 Women purchasing pattern using K-Means.ipynb
3074 views
Undestanding Data
Most of the rating are 3-5 which describe product quality are good
Most of the women recommended the products
Good rating means product will be recommended
Age 35 to 45 are big buyers
Because people of this age tends to have more money than teens or senior citizens.
Average Rating by each age group
Lowest for 85 , 90, 91 & 94
Highest for 90 but lowest for 89 and 18
We can see here buying is directly proprtional to Recommendations
Most of the buyers are giving good rating
Word Cloud
Checking Null Values
Dropping All Null Values
Word Cloud For Review Text
Word Cloud For Title
Dresses , Knits , Blouses are the most sold items
Histograms
Boxplots
Distribution Plots
Individual Boxplots
Correlation Heatmap
Recommendation and ratings are showing good correlation, i.e, Directly Proportional to each other.
Clustering
Label Encoding Categorical Columns
Scaling Data
K-Means Algorithm
Elbow Method for Best K value
I am Little confused in 3 or 4
Silhouette Coefficient
One of the metrics to evaluate the quality of clustering is referred to as silhouette analysis. Silhouette analysis can be applied to other clustering algorithms as well. Silhouette coefficientranges between −1 and 1, where a higher silhouette coefficient refers to a model with more coherentclusters.
The Silhouette Coefficient is calculated using the mean intra-cluster distance ( a) and the mean nearest-cluster distance ( b) for each sample. The Silhouette Coefficient for a sample is (b - a) / max (a, b). To clarify, b is the distance between a sample and the nearest cluster that the sample is not a part of.
Silhouette Score Plot
For K = 5 Sillhouette is higher than 3 or 4 so optimum no. of clusters should be 5
KMeans for K = 5
Adding Labels to new_df and scaled_df
Cluster Profiling
Cluster means
Cluster 0 ,1 & 2 giving best ratings as well as recommending products
Cluster 4 consists of women giving least ratings and least recommendation but feedback Count is Low
Cluster 3 is women with satisfactory ratings and Recommendations.but feedback count is high
Women in Cluster 2 tensds to buy Dresses, Pants, Skirts, Jeans and Shorts
Women in Cluster 1 are more interested in Blouses, Knits, Sweaters, Fine Gauge and Jackets.
Cluster 4 are more attracted to Pants, Lounge , Sweaters, skirts , Swim , Legwear and Layering.
Cluster 3 are less in no. and buying mostly Dresses, Pants , Blouses and Knits.
Women in cluster 0 shownig average approach like cluster 3.
Women in Cluster 3 giving 1,2 & 3 rating out if 5, which is least in all of the cluster groups
Rest of the Clusters giving good average rating.
Cluster 0
Age between 35 & 50 is majority
Giving highest Ratings, Good Recommendations
Buying more Sweaters, Fine Gauge, Intimates, Jackets, Jeans etc.
Cluster 1
Age 35- 55
Good Rating + Good Recommendations
Buying mostly Blouses, Casual bottoms, Chemises, Dresses, Fine gauge, Intimates, Jackets & Jeans.
Cluster 2
Age 35 - 50 majority
Good Rating + Good Recommendations
Buying alsmost similar product to cluster 0 but more satisfied.
Cluster 3
Majority of age 47- 57
Lowest ratings + bad Recommendations
Similar Buying patterns to cluster 0 ,1 and 2 but more skewed to left side
Cluster 4
Age between 35 & 50 is majority
Good Rating + Good Recommendations
Buying Lounge, Outerwear, Pants, Shorts, Skirts, Sleep, Sweaters, Swim & Trend cloths
Conclusions
Women Between 35 - 55 of Age are Big Buyers and also giving good reviews and ratings.
We should target this age group to increase sales.
Women of Age between 35 - 55 having more money and more purchasing power than young and Old aged Women.