Project #3: Modeling Assignment
DS | Unit Project 3
PROMPT
In this project, you will perform a logistic regression on the admissions data we've been working with in projects 1 and 2. For more instructions, follow the questions included in the starter code.
Goal: Completed iPython notebook that includes basic modeling using logistic regression
DELIVERABLES
Requirements:
Create dummy variables
Calculate OR by hand
Complete a logistic regression using stats models and interpret your findings
Calculate predicted probabilities
Bonus:
Plot the predicted probabilities
Brainstorm ways to improve your analysis
Submission:
TBD by Instructor
TIMELINE
Deadline | Deliverable | Description |
---|---|---|
Lesson 9 | Project 3 | Basic Modeling Assignment |
EVALUATION
Your project will be assessed using the following standards:
Refine the Data
Rubric: Click here for the complete rubric.
Requirements for these standards will be assessed using the scale below:
While your total score is a helpful gauge of whether you've met overall project goals, specific scores are more important since they'll show you where to focus your efforts in the future!
RESOURCES
Dataset
We'll be using the same dataset as UCLA's Logistic Regression in R tutorial to explore logistic regression in Python, as explained in yhat's blog. This is an excellent resource for using logistic regression and summary statistics to explore a relevant dataset. Our goal will be to identify the various factors that may influence admission into graduate school. It contains four variables- admit, gre, gpa, rank.
'admit' is a binary variable. It indicates whether or not a candidate was admitted admit =1) our not (admit= 0)
'gre' is GRE score
'gpa' stands for Grade Point Average
'rank' is the rank of an applicant's undergraduate alma mater, with 1 being the highest and 4 as the lowest
Dataset: Admissions.csv
Starter code
Review the questions in the iPython Notebook provided.
Suggestions for Getting Started
Review logistic regression, odds ratios and probabilities from prior lessons.
Read the docs for Stats models. Most of the time, there is a tutorial that you can follow, but not always, and learning to read documentation is crucial to your success as a data scientist!