Path: blob/master/april_18/lessons/lesson-06-alt/code/starter-code/demo-lesson-06-starter-code.ipynb
1905 views
Kernel: Python 2
##Lesson 06 Demo
In [9]:
In [2]:
Out[2]:
#Checks for Linear Regression. It works best when:
The data is normally distributed (but doesn’t have to be)
X’s are independent of each other (low multicollinearity)
X’s significantly explain y (have low p-values)
Check 1. Distribution
Last time we plotted our data like this
In [3]:
Out[3]:
<matplotlib.axes._subplots.AxesSubplot at 0x10a293a90>
//anaconda/lib/python2.7/site-packages/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
if self._edgecolors == str('face'):
Seaborn plotting library
https://stanford.edu/~mwaskom/software/seaborn/index.html
Today we use lmplot https://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.lmplot.html
In [7]:
Out[7]:
<seaborn.axisgrid.FacetGrid at 0x10aa7a190>
In [6]:
Out[6]:
<seaborn.axisgrid.FacetGrid at 0x10aac25d0>
Check 2. Low Multicollinearity
In [18]:
Out[18]:
TV Radio Newspaper
TV 1.000000 0.054809 0.056648
Radio 0.054809 1.000000 0.354104
Newspaper 0.056648 0.354104 1.000000
Axes(0.125,0.125;0.62x0.775)
Student question:
Do these variables have colinearity?
Answer:
Check 3: X’s significantly explain y (have low p-values)
Let's take a look again the the crude model
In [11]:
Out[11]:
Student Model
Now do a full model with TV, Radio and Newspaper
syntax can be found here: http://statsmodels.sourceforge.net/devel/example_formulas.html
In [1]:
1. Which of the media buys were significantly associated with the sales?
Answer:
2. Controlling for all the other media buys, which media type had the largest association with sales?
Answer:
####3. Given that one of the variables above was not significant do we drop it from our model? Why or why not?
Answer: