Path: blob/master/lessons/lesson_09/code/kmeans_clustering-lab - (done).ipynb
1904 views

K-Means Clustering with Seeds Data
Authors: Joseph Nelson (DC), Haley Boyan (DC), Sam Stack (DC)
1. Import the data
2. Do some EDA of relationships between features.
Remember, clustering is a unsupervised learning method so known classes will never be a thing. In this situation we can see that the perimiter vs. groove_length is a good visualization to view the proper classes class, and we can use later to compare the results of clustering to a true value.
3. Prepare the data for clustering
Remove the
speciescolumn. We will see if the clusters from K-Means end up like the actual species.Put the features on the same scale.
4. Clustering with K-Means
Cluster the data to our our target groups.
We know that there are 3 actual classes. However, in an actual situation in which we used clustering we would have no idea. Lets initally try using the default K for
KMeans(8).
5. Get the labels and centroids for out first clustering model.
6. Compute the silouette score and visually examine the results of the 8 clusters.
(pairplot with hue)
7. Repeat steps #4 and #6 with two selected or random K values and compare the results to the k=8 model.
8. Build a function to find the optimal number of clusters using silhouette score as the criteria.
Function should accept a range and a dataframe as arguments
Returns the optimal K value, associate silhoutte and scaling method.
Your function should also consider the scaled results of the data.
normalize,StandardScaler,MinMaxScaler
Once you have found the optimal K and version of the data, visualize the clusters.