Machine Learning with PyTorch and Scikit-Learn
-- Code Examples
Package version checks
Add folder to path in order to load from the check_packages.py script:
Check recommended package versions:
Python Machine Learning - Code Examples
Chapter 10 - Working with Unlabeled Data – Clustering Analysis
Note that the optional watermark extension is a small IPython notebook plugin that I developed to make the code reproducible. You can just skip the following line(s).
The use of watermark
is optional. You can install this Jupyter extension via
or
For more information, please see: https://github.com/rasbt/watermark.
Overview
Grouping objects by similarity using k-means
K-means clustering using scikit-learn
A smarter way of placing the initial cluster centroids using k-means++
...
Hard versus soft clustering
...
Using the elbow method to find the optimal number of clusters
Quantifying the quality of clustering via silhouette plots
Comparison to "bad" clustering:
Organizing clusters as a hierarchical tree
Grouping clusters in bottom-up fashion
Performing hierarchical clustering on a distance matrix
We can either pass a condensed distance matrix (upper triangular) from the pdist
function, or we can pass the "original" data array and define the metric='euclidean'
argument in linkage
. However, we should not pass the squareform distance matrix, which would yield different distance values although the overall clustering could be the same.
Attaching dendrograms to a heat map
Applying agglomerative clustering via scikit-learn
Locating regions of high density via DBSCAN
K-means and hierarchical clustering:
Density-based clustering:
Summary
...
Readers may ignore the next cell.