Path: blob/master/lessons/lesson_07/code/NHL_classification_with_knn-lab.ipynb
1904 views
Kernel: Python [default]
Classification and KNN with NHL data
Authors: Joseph Nelson (DC)
Below you will practice KNN classification on a dataset of NHL statistics.
You will be predicting the Rank
of a team from predictor variables of your choice.
In [2]:
In [8]:
1. Load the NHL data
In [9]:
2. Perform any required data cleaning. Do some EDA.
In [4]:
3. Set up the Rank
variable as your target. How many classes are there?
In [5]:
4. What is the baseline accuracy?
In [6]:
5. Choose 4 features to be your predictor variables and set up your design matrix.
In [7]:
6. Fit a KNeighborsClassifier
with 1 neighbor using the target and predictors.
In [8]:
7. Evaluate the accuracy of your model.
Is it better than baseline?
Is it legitimate?
In [9]:
8. Create a 50-50 train-test-split of your target and predictors. Refit the KNN and assess the accuracy.
In [10]:
9. Evaluate the test accuracy of a KNN where K == number of rows in the training data.
In [11]:
10. Fit the KNN at values of K from 1 to the number of rows in the training data.
Store the test accuracy in a list.
Plot the test accuracy vs. the number of neighbors.
In [12]:
11. Fit KNN across different values of K and plot the mean cross-validated accuracy with 5 folds.
In [13]:
12. Standardize the predictor matrix and cross-validate across the different K.
Plot the standardized mean cross-validated accuracy against the unstandardized. Which is better?
Why?
In [14]: