Path: blob/master/april_18/lessons/lesson-11-flex/code/Clustering with Scikit-Learn-Solutions.ipynb
1904 views
Clustering with Sklearn
In this notebook we'll practice clustering algorithms with Scikit-Learn.
Data sets
We'll use the following datasets:
Some sample data
Old Faithful eruption data: eruption times and wait times between eruptions
There are many clustering data sets you can use for practice!
K-Means with sklearn
Let's try it with k=4
this time.
Let's try the circular data.
Ouch! No so great on this dataset. Now let's try some real data.
DBSCAN
Much better than k-means on this dataset! Let's try to cook up something that DBSCAN doesn't work as well on.
Exercise: DBSCAN
For the Iris dataset, fit and plot DBSCAN models to:
sepal_length and petal_length
sepal_width and petal_width
Bonus: Compare your classifications to the known species. How well do the labels match up?
Hierarchical Clustering
Bigger is better, so k-means was a better clustering algorithm on this data set.