GitHub Repository: YStrano/DataScience_GA
Path: blob/master/lessons/lesson_07/code/NHL_classification_with_knn-lab.ipynb
¹⁹⁰⁴ views

Kernel: Python [default]

Classification and KNN with NHL data

Authors: Joseph Nelson (DC)

Below you will practice KNN classification on a dataset of NHL statistics.

You will be predicting the Rank of a team from predictor variables of your choice.

In [2]:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

In [8]:

# web location:
local_csv = '../assets/data/NHL_Data_GA.csv'

1. Load the NHL data

In [9]:

# A:
NHL = pd.read_csv(local_csv)

2. Perform any required data cleaning. Do some EDA.

In [4]:

# A:

3. Set up the `Rank` variable as your target. How many classes are there?

In [5]:

# A:

4. What is the baseline accuracy?

In [6]:

# A:

5. Choose 4 features to be your predictor variables and set up your design matrix.

In [7]:

# A:

6. Fit a `KNeighborsClassifier` with 1 neighbor using the target and predictors.

In [8]:

# A:

7. Evaluate the accuracy of your model.

Is it better than baseline?
Is it legitimate?

In [9]:

# A:

8. Create a 50-50 train-test-split of your target and predictors. Refit the KNN and assess the accuracy.

In [10]:

# A:

9. Evaluate the test accuracy of a KNN where K == number of rows in the training data.

In [11]:

# A:

10. Fit the KNN at values of K from 1 to the number of rows in the training data.

Store the test accuracy in a list.
Plot the test accuracy vs. the number of neighbors.

In [12]:

# A:

11. Fit KNN across different values of K and plot the mean cross-validated accuracy with 5 folds.

In [13]:

# A:

12. Standardize the predictor matrix and cross-validate across the different K.

Plot the standardized mean cross-validated accuracy against the unstandardized. Which is better?
Why?

In [14]:

# A:

Classification and KNN with NHL data

1. Load the NHL data

2. Perform any required data cleaning. Do some EDA.

3. Set up the `Rank` variable as your target. How many classes are there?

4. What is the baseline accuracy?

5. Choose 4 features to be your predictor variables and set up your design matrix.

6. Fit a `KNeighborsClassifier` with 1 neighbor using the target and predictors.

7. Evaluate the accuracy of your model.

8. Create a 50-50 train-test-split of your target and predictors. Refit the KNN and assess the accuracy.

9. Evaluate the test accuracy of a KNN where K == number of rows in the training data.

10. Fit the KNN at values of K from 1 to the number of rows in the training data.

11. Fit KNN across different values of K and plot the mean cross-validated accuracy with 5 folds.

12. Standardize the predictor matrix and cross-validate across the different K.

Product

Resources

Company

Classification and KNN with NHL data

1. Load the NHL data

2. Perform any required data cleaning. Do some EDA.

3. Set up the Rank variable as your target. How many classes are there?

4. What is the baseline accuracy?

5. Choose 4 features to be your predictor variables and set up your design matrix.

6. Fit a KNeighborsClassifier with 1 neighbor using the target and predictors.

7. Evaluate the accuracy of your model.

8. Create a 50-50 train-test-split of your target and predictors. Refit the KNN and assess the accuracy.

9. Evaluate the test accuracy of a KNN where K == number of rows in the training data.

10. Fit the KNN at values of K from 1 to the number of rows in the training data.

11. Fit KNN across different values of K and plot the mean cross-validated accuracy with 5 folds.

12. Standardize the predictor matrix and cross-validate across the different K.

3. Set up the `Rank` variable as your target. How many classes are there?

6. Fit a `KNeighborsClassifier` with 1 neighbor using the target and predictors.