Path: blob/master/april_18/lessons/lesson-06-alt/code/solution-code/demo-lesson-06-solution.ipynb
1905 views
##Lesson 06 Demo- solution code
#Checks for Linear Regression. It works best when:
The data is normally distributed (but doesn’t have to be)
X’s are independent of each other (low multicollinearity)
X’s significantly explain y (have low p-values)
Check 1. Distribution
Last time we plotted our data like this
Seaborn plotting library
https://stanford.edu/~mwaskom/software/seaborn/index.html
Today we use lmplot https://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.lmplot.html
Check 2. Low Multicollinearity
Student question:
Do these variables have colinearity?
Answer:
Check 3: X’s significantly explain y (have low p-values)
Let's take a look again the the crude model
Student Model
Now do a full model with TV, Radio and Newspaper
syntax can be found here: http://statsmodels.sourceforge.net/devel/example_formulas.html
1. Which of the media buys were significantly associated with the sales?
Answer: TV 95%CI (0.043, 0.049) and Radio (0.172, 0.206). Note that Newspaper crosses 0 and is not statisically significant here.
2. Controlling for all the other media buys, which media type had the largest association with sales?
Answer: Radio
####3. Given that one of the variables above was not significant do we drop it from our model? Why or why not?
Answer: We don't drop it simply becuase it is not significant. We can do a comparison with other models to determine if keeping the variable improves our model uses metrics we will learn in the next class.