Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
YStrano
GitHub Repository: YStrano/DataScience_GA
Path: blob/master/projects/project_3/README.md
1904 views

Project #3: Modeling Assignment

DS | Unit Project 3

PROMPT

In this project, you will perform a logistic regression on the admissions data we've been working with in projects 1 and 2. For more instructions, follow the questions included in the starter code.

Goal: Completed iPython notebook that includes basic modeling using logistic regression


DELIVERABLES

  • Requirements:

    • Create dummy variables

    • Calculate OR by hand

    • Complete a logistic regression using stats models and interpret your findings

    • Calculate predicted probabilities

  • Bonus:

    • Plot the predicted probabilities

    • Brainstorm ways to improve your analysis

  • Submission:

    • TBD by Instructor


TIMELINE

DeadlineDeliverableDescription
Lesson 9Project 3Basic Modeling Assignment

EVALUATION

Your project will be assessed using the following standards:

  1. Refine the Data

Rubric: Click here for the complete rubric.

Requirements for these standards will be assessed using the scale below:

Score | Expectations ----- | ------------ **0** | _Incomplete._ **1** | _Does not meet expectations._ **2** | _Meets expectations, good job!_ **3** | _Exceeds expectations, you wonderful creature, you!_

While your total score is a helpful gauge of whether you've met overall project goals, specific scores are more important since they'll show you where to focus your efforts in the future!


RESOURCES

Dataset

We'll be using the same dataset as UCLA's Logistic Regression in R tutorial to explore logistic regression in Python, as explained in yhat's blog. This is an excellent resource for using logistic regression and summary statistics to explore a relevant dataset. Our goal will be to identify the various factors that may influence admission into graduate school. It contains four variables- admit, gre, gpa, rank.

  • 'admit' is a binary variable. It indicates whether or not a candidate was admitted admit =1) our not (admit= 0)

  • 'gre' is GRE score

  • 'gpa' stands for Grade Point Average

  • 'rank' is the rank of an applicant's undergraduate alma mater, with 1 being the highest and 4 as the lowest

Dataset: Admissions.csv

Starter code

Review the questions in the iPython Notebook provided.

Suggestions for Getting Started

  • Review logistic regression, odds ratios and probabilities from prior lessons.

  • Read the docs for Stats models. Most of the time, there is a tutorial that you can follow, but not always, and learning to read documentation is crucial to your success as a data scientist!