Path: blob/master/14_imbalanced/Handling Imbalanced Data In Customer Churn Using ANN/Bank Turnover Customer Churn Using ANN.ipynb
1141 views
Customer churn prediction is to measure why customers are leaving a business. In this tutorial we will be looking at customer churn in Bank business. We will build a deep learning model to predict the churn and use precision, recall, f1-score to measure performance of our model
LOAD THE DATA
DROP UNNECCESSARY COLUMNS
FEATURE ENGINEERING
ONE HOT ENCODING CATEGORICAL VALUES
SCALING THE DATASET
SPLITTING THE DATASET INTO TRAINING AND TEST SET
IMPORTING TENSORFLOW LIBRARIES
BUILD THE MODEL(ANN)
As We See, the precision, recall and f1 score of Class 1 is very low due to imbalanced dataset
Mitigating Skewdness of Data
Method 1: Undersampling
reference: https: // www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets
As we see, there is considerable Improvement in the f1, recall and precision scores of Class 1 Value. The f1 score has improved from 0.58 to 0.77.
Method2: Oversampling
f1-score for minority class 1 improved from 0.58 to 0.79.
Method3: SMOTE
To install imbalanced-learn library use pip install imbalanced-learn command
SMOT Oversampling increases f1 score of minority class 1 from 0.58 to 0.81.
Method4: Use of Ensemble with undersampling
f1-score for minority class 1 is 0.57