Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/main/09. Machine Learning with Python/Final Project/Machine Learning with Python - The Best Classifier.ipynb
Views: 4598
Classification with Python
In this notebook we try to practice all the classification algorithms that we have learned in this course.
We load a dataset using Pandas library, and apply the following algorithms, and find the best one for this specific dataset by accuracy evaluation methods.
Let's first load required libraries:
About dataset
This dataset is about past loans. The Loan_train.csv data set includes details of 346 customers whose loan are already paid off or defaulted. It includes following fields:
Field | Description |
---|---|
Loan_status | Whether a loan is paid off on in collection |
Principal | Basic principal loan amount at the |
Terms | Origination terms which can be weekly (7 days), biweekly, and monthly payoff schedule |
Effective_date | When the loan got originated and took effects |
Due_date | Since it’s one-time payoff schedule, each loan has one single due date |
Age | Age of applicant |
Education | Education of applicant |
Gender | The gender of applicant |
Load Data From CSV File
Convert to date time object
Data visualization and pre-processing
Let’s see how many of each class is in our data set
260 people have paid off the loan on time while 86 have gone into collection
Let's plot some columns to underestand data better:
Pre-processing: Feature selection/extraction
Let's look at the day of the week people get the loan
We see that people who get the loan at the end of the week don't pay it off, so let's use Feature binarization to set a threshold value less than day 4
Convert Categorical features to numerical values
Let's look at gender:
86 % of female pay there loans while only 73 % of males pay there loan
Let's convert male to 0 and female to 1:
One Hot Encoding
How about education?
Features before One Hot Encoding
Use one hot encoding technique to conver categorical varables to binary variables and append them to the feature Data Frame
Feature Selection
Let's define feature sets, X:
What are our lables?
Normalize Data
Data Standardization give data zero mean and unit variance (technically should be done after train test split)
Classification
Now, it is your turn, use the training set to build an accurate model. Then use the test set to report the accuracy of the model You should use the following algorithm:
K Nearest Neighbor(KNN)
Decision Tree
Support Vector Machine
Logistic Regression
__ Notice:__
You can go above and change the pre-processing, feature selection, feature-extraction, and so on, to make a better model.
You should use either scikit-learn, Scipy or Numpy libraries for developing the classification algorithms.
You should include the code of the algorithm in the following cells.
K Nearest Neighbor(KNN)
Notice: You should find the best k to build the model with the best accuracy. warning: You should not use the loan_test.csv for finding the best k, however, you can split your train_loan.csv into train and test to find the best k.