Path: blob/master/april_18/projects/unit-projects/project-3/starter-code/project3-starter.ipynb
1905 views
Project 3
In this project, you will perform a logistic regression on the admissions data we've been working with in projects 1 and 2.
Part 1. Frequency Tables
1. Let's create a frequency table of our variables
Part 2. Return of dummy variables
2.1 Create class or dummy variables for prestige
2.2 When modeling our class variables, how many do we need?
Answer:
Part 3. Hand calculating odds ratios
Develop your intuition about expected outcomes by hand calculating odds ratios.
3.1 Use the cross tab above to calculate the odds of being admitted to grad school if you attended a #1 ranked college
3.2 Now calculate the odds of admission if you did not attend a #1 ranked college
3.3 Calculate the odds ratio
3.4 Write this finding in a sentenance:
Answer:
3.5 Print the cross tab for prestige_4
3.6 Calculate the OR
3.7 Write this finding in a sentence
Answer:
Part 4. Analysis
We're going to add a constant term for our Logistic Regression. The statsmodels function we're going to be using requires that intercepts/constants are specified explicitly.
4.1 Set the covariates to a variable called train_cols
4.2 Fit the model
4.3 Print the summary results
4.4 Calculate the odds ratios of the coeffiencents and their 95% CI intervals
hint 1: np.exp(X)
hint 2: conf['OR'] = params
4.5 Interpret the OR of Prestige_2
Answer:
4.6 Interpret the OR of GPA
Answer:
Part 5: Predicted probablities
As a way of evaluating our classifier, we're going to recreate the dataset with every logical combination of input values. This will allow us to see how the predicted probability of admission increases/decreases across different variables. First we're going to generate the combinations using a helper function called cartesian (above).
We're going to use np.linspace to create a range of values for "gre" and "gpa". This creates a range of linearly spaced values from a specified min and maximum value--in our case just the min/max observed values.
5.1 Recreate the dummy variables
5.2 Make predictions on the enumerated dataset
5.3 Interpret findings for the last 4 observations
Answer:
Bonus
Plot the probability of being admitted into graduate school, stratified by GPA and GRE score.