Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
YStrano
GitHub Repository: YStrano/DataScience_GA
Path: blob/master/april_18/lessons/lesson-06-alt/code/starter-code/demo-lesson-06-starter-code.ipynb
1905 views
Kernel: Python 2

##Lesson 06 Demo

%matplotlib inline import numpy as np import pandas as pd from matplotlib import pyplot as plt import seaborn as sns sns.set_style("darkgrid") # this is the standard import if you're using "formula notation" (similar to R) import statsmodels.formula.api as smf
# read data into a DataFrame data = pd.read_csv('http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv', index_col=0) data.head()

#Checks for Linear Regression. It works best when:

  1. The data is normally distributed (but doesn’t have to be)

  2. X’s are independent of each other (low multicollinearity)

  3. X’s significantly explain y (have low p-values)

Check 1. Distribution

Last time we plotted our data like this

# visualize the relationship between the features and the response using scatterplots fig, axs = plt.subplots(1, 3, sharey=True) data.plot(kind='scatter', x='TV', y='Sales', ax=axs[0], figsize=(16, 8)) data.plot(kind='scatter', x='Radio', y='Sales', ax=axs[1]) data.plot(kind='scatter', x='Newspaper', y='Sales', ax=axs[2])
<matplotlib.axes._subplots.AxesSubplot at 0x10a293a90>
//anaconda/lib/python2.7/site-packages/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison if self._edgecolors == str('face'):
Image in a Jupyter notebook
sns.lmplot('TV', 'Sales', data)
<seaborn.axisgrid.FacetGrid at 0x10aa7a190>
Image in a Jupyter notebook
sns.lmplot('Radio', 'Sales', data) sns.lmplot('Newspaper', 'Sales', data)
<seaborn.axisgrid.FacetGrid at 0x10aac25d0>
Image in a Jupyter notebookImage in a Jupyter notebook

Check 2. Low Multicollinearity

cmap = sns.diverging_palette(220, 10, as_cmap=True) correlations = data[['TV', 'Radio', 'Newspaper']].corr() print correlations print sns.heatmap(correlations, cmap=cmap)
TV Radio Newspaper TV 1.000000 0.054809 0.056648 Radio 0.054809 1.000000 0.354104 Newspaper 0.056648 0.354104 1.000000 Axes(0.125,0.125;0.62x0.775)
Image in a Jupyter notebook

Student question:

  1. Do these variables have colinearity?

Answer:

Check 3: X’s significantly explain y (have low p-values)

Let's take a look again the the crude model

lm = smf.ols(formula='Sales ~ TV', data=data).fit() #print the full summary lm.summary()

Student Model

Now do a full model with TV, Radio and Newspaper

syntax can be found here: http://statsmodels.sourceforge.net/devel/example_formulas.html

#fit model #print summary

1. Which of the media buys were significantly associated with the sales?

Answer:

2. Controlling for all the other media buys, which media type had the largest association with sales?

Answer:

####3. Given that one of the variables above was not significant do we drop it from our model? Why or why not?

Answer: