Week 1 Homework
Question 1:
A good application of classifiers is "employee retention".
Companies that are looking to retain their employees might use a classifier that identifies employees who might leave the company within the next 6 month. Good predictors for training this classifier might include:
number of promotions within last 18 months
difference between employee's salary and the industry average salary of his/her position
overall years of experience
years of tenure at the company
marital status
Question 2.1:
The classifier's linear equation is in the format of:
a0 + a1A1 + a2A2 + a3A3 + a4A8 + a5A9 + a6A10 +a7A11 + a8A12 + a9A14 + a10A15 = 0
where,{A1...A15} = feature inputs from credit_card_data-headers.txt
See [19] and [20] respectively for a1...am and a0. This classifier had an accuracy rate of 86.39% for the overall dataset [21].
- A1
- -0.000466036176627327
- A2
- -0.0140534983606244
- A3
- -0.0081688661743442
- A8
- 0.0101292226736795
- A9
- 0.501609468692229
- A10
- -0.00140343386065389
- A11
- 0.00129121684002342
- A12
- -0.000266898857269382
- A14
- -0.206754961642446
- A15
- 558.33559056503
Question 2.2:
K-value=12; kknn classifier was able to classify 85.3211% of the samples in the data-set correctly. (the cumulative performance where one sample is iterating taken out of the train data, and the resulting model is then evaluated on the left-out sample).
Question 3a
Randomly splitting the data-set into the original credit card data into the following three datasets:
Training dataset of 60% of the data
Evaluation dataset of 20% of the data
Test dataset of 20% of the data
The best kknn model that was trained on training dataset, and then evaluated on the evaluation dataset was the kknn model with default parameters and K-value of 29; with the classifier accuracy of 90.4%.
V1 | V2 | |
---|---|---|
29 | 29 | 0.904 |
Question 3b
The accuracy of the kknn model with K-value of 29 assessed on the test dataset is 85.9375%.