Path: blob/master/ML/Notebook/KNN with case study.ipynb
3087 views
KNN algorithm
One of the simplest of all the supervised machine learning algorithms. It simply calculates the distance of a new data point to all other training data points.
K can be any integer. K=3 mean ( find the 3 nearest points)
The KNN algorithm starts by calculating the distance of point(Euclidean or Manhattan ) X from all the points.
Finally it assigns the data point to the class to which the majority of the K data points belong.
Note
The model for KNN is the entire training dataset. When a prediction is required for a unseen data instance, the kNN algorithm will search through the training dataset for the k-most similar instances. The prediction attribute of the most similar instances is summarized and returned as the prediction for the unseen instance.
As we can see the values has been encoded into 4 different numeric labels.
Identify the predictor variables and encode any string variables to equivalent integer codes
Insights
Only 6 samples were misclassified.Since this is a very simplistic data set with distinctly separable classes. But there you have it. That’s how to implement K-Nearest Neighbors with scikit-learn.
How to decide the value of n-neighbors
Choosing a large value of K will lead to greater amount of execution time & underfitting. Selecting the small value of K will lead to overfitting. There is no such guaranteed way to find the best value of K.