Path: blob/master/projects/project_3/starter-code/Project 3 - Yair Strano.ipynb
1904 views
Project 3
In this project, you will perform a logistic regression on admissions data
Part 1. Frequency Tables
1. Let's create a frequency table of our variables. Look at the documentation for pd.crosstab
Part 2. Return of dummy variables
2.1 Create class or dummy variables for prestige
2.2 When modeling our class variables, how many do we need?
Answer:
3 dummies are needed.
when presented with a categorical variables for which every row must take one and exactly one value, you should drop one of the dummy columns so as to avoid redundancy in your exogenous variables (e.g. flip of a coin, you need either heads or tails as a column and do not need both). however if you have a categorical variable for which a row could take multiple or no values, then you leave all the columns there.
Part 3. Hand calculating odds ratios
Develop your intuition about expected outcomes by hand calculating odds ratios.
3.1 Use the cross tab above to calculate the odds of being admitted to grad school if you attended a #1 ranked college
odds ratio: 33:28
3.2 Now calculate the odds of admission if you did not attend a #1 ranked college
3.3 Calculate the odds ratio
odds ratio: 93:243
3.4 Write this finding in a sentenance:
Answer:
we see that prestige plays a big role in admittance to grad school. if you did not attend a prestige 1 school, your odds (93:243) of getting admitted are severely hindered. non-prestige 1 attendance stands a 27% chance of admittance versus a 54% chance if you did attend a prestige 1 school.
3.5 Print the cross tab for prestige_4
3.6 Calculate the Odds Ratio
12:55
3.7 Write this finding in a sentence
Answer:
we see that if you attended a prestige 4 school, your odds (12:55) of getting admitted are even more bleak. prestige 4 attendance stands a 18% chance of admittance versus a 54% chance if you did attend a prestige 1 school.
Part 4. Analysis
4.1 Create the X and Y variables
4.2 Fit the model -
Load sklearn's logistic regression
Create the regression object
Fit the model
4.3 Print the coefficients
if you throw 0 for all the y preds, you would be right 68% of the time
if you throw 1 for all the y preds, you would be right 32% of the time
that is not a very good model
4.4 Calculate the odds ratios of the coeffiencents
hint 1: np.exp(X)
odds = probability / (1 - probability) i.e. one specific outcome/the rest of the other outcomes
probability = odds / (1 + odds) i.e. one specific outcome/all outcomes
logistic regression, compresses the linear regression to fit between 0 and 1
the np.exp(X) reverts it back
4.5 Interpret the OR of Prestige_2
Answer:
ppl who went to prestige 2 school, are 54% more likely to get admitted than prestige 1 students
bc prestige 1 is the base dummy variable
4.6 Interpret the OR of GPA
Answer:
for one unit increase in gpa you are 1.26149128 times likely to be admitted
Bonus
Plot the probability of being admitted into graduate school, stratified by GPA and GRE score.