Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
YStrano
GitHub Repository: YStrano/DataScience_GA
Path: blob/master/april_18/projects/unit-projects/project-1/assets/project1-example.ipynb
1905 views
Kernel: Python 2

Project 1 example

Read and evaluate the following problem statement:

Using Planet Express customer data from January 3001-3005, determine how likely previous customers are to request a repeat delivery using demographic information (profession, company size, location) and previous delivery data (days since last delivery, number of total deliveries).

1. What is the outcome?

Answer: return customer indicator (yes/no)

2. What are the predictors/covariates?

Answer: age, gender, location, date of first deliveries and profession, days since last delivery, number of total deliveries

3. What timeframe is this data relevent for?

Answer: Jan 3001-3005

4. What is the hypothesis?

Answer: Demographic and previous delivery info will allow us to predict if a customer will be a repeat customer

Let's begin by exploring the dataset

1. create a data dictionary

Answer:

VariableDescriptionType of Variable
ProfessionTitle of the account ownercategorical
Company Size1- small, 2- medium, 3- largecategorical
Locationplanet of the companycategorical
Days Since Last Deliveryintegercontinuous
Number of Deliveriesintegercontinuous