Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
YStrano
GitHub Repository: YStrano/DataScience_GA
Path: blob/master/lessons/lesson_10-sub-Jacob_Koehler/02-BankMarketing.ipynb
1904 views
Kernel: Python 3

Bank Marketing Lab

In this lab, our goal is to predict the purchase of a bank product marketed over the phone. This is represented in our y column in the data. (Here is the data dictionary ) In examining this binary classification problem, we want to explore how different models perform, specifically in terms of:

  • Accuracy

  • Precision

  • Recall

import pandas as pd
bank = pd.read_csv('data/bank_marketing.csv')
bank.head()
bank.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4119 entries, 0 to 4118 Data columns (total 22 columns): Unnamed: 0 4119 non-null int64 age 4119 non-null int64 job 4119 non-null object marital 4119 non-null object education 4119 non-null object default 4119 non-null object housing 4119 non-null object loan 4119 non-null object contact 4119 non-null object month 4119 non-null object day_of_week 4119 non-null object duration 4119 non-null int64 campaign 4119 non-null int64 pdays 4119 non-null int64 previous 4119 non-null int64 poutcome 4119 non-null object emp.var.rate 4119 non-null float64 cons.price.idx 4119 non-null float64 cons.conf.idx 4119 non-null float64 euribor3m 4119 non-null float64 nr.employed 4119 non-null float64 y 4119 non-null int64 dtypes: float64(5), int64(7), object(10) memory usage: 708.0+ KB
bank.y.value_counts()
0 3668 1 451 Name: y, dtype: int64

Exploration

Are there other characteristics that need to be handled within the data? How can we encode something like the missing values that are labeled 'unknown'?

Model Building

Examine the results of three classification models including a LogisticRegression, and a DummyClassifier. Produce and interpret the classification report for each of these models. Which would you care about more in this scenario, a model that has higher precision or recall?