Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
YStrano
GitHub Repository: YStrano/DataScience_GA
Path: blob/master/lessons/lesson_07/code/NHL_classification_with_knn-lab.ipynb
1904 views
Kernel: Python [default]

Classification and KNN with NHL data

Authors: Joseph Nelson (DC)


Below you will practice KNN classification on a dataset of NHL statistics.

You will be predicting the Rank of a team from predictor variables of your choice.

import matplotlib.pyplot as plt import numpy as np import pandas as pd import seaborn as sns %matplotlib inline %config InlineBackend.figure_format = 'retina'
# web location: local_csv = '../assets/data/NHL_Data_GA.csv'

1. Load the NHL data

# A: NHL = pd.read_csv(local_csv)

2. Perform any required data cleaning. Do some EDA.

# A:

3. Set up the Rank variable as your target. How many classes are there?

# A:

4. What is the baseline accuracy?

# A:

5. Choose 4 features to be your predictor variables and set up your design matrix.

# A:

6. Fit a KNeighborsClassifier with 1 neighbor using the target and predictors.

# A:

7. Evaluate the accuracy of your model.

  • Is it better than baseline?

  • Is it legitimate?

# A:

8. Create a 50-50 train-test-split of your target and predictors. Refit the KNN and assess the accuracy.

# A:

9. Evaluate the test accuracy of a KNN where K == number of rows in the training data.

# A:

10. Fit the KNN at values of K from 1 to the number of rows in the training data.

  • Store the test accuracy in a list.

  • Plot the test accuracy vs. the number of neighbors.

# A:

11. Fit KNN across different values of K and plot the mean cross-validated accuracy with 5 folds.

# A:

12. Standardize the predictor matrix and cross-validate across the different K.

  • Plot the standardized mean cross-validated accuracy against the unstandardized. Which is better?

  • Why?

# A: