Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
YStrano
GitHub Repository: YStrano/DataScience_GA
Path: blob/master/lessons/lesson_05/code/Linear Regression with Simulated Data - Solution code (done).ipynb
1904 views
Kernel: Python 3

Linear Regression with Simulated Data

## Basic imports %matplotlib inline import numpy as np import pandas as pd import seaborn as sns import statsmodels.formula.api as smf

100 Random Sample from a Normal Distribution (np.random.randn)

X = np.random.randn(100) X
array([-0.86263111, 2.12205253, -0.19119369, 0.8914565 , -0.06863318, -1.85375508, -0.68943993, 1.20195946, -0.70193827, 1.36462236, -2.9584875 , 0.11704817, -1.9234075 , 1.49977632, -0.30407317, -0.39180849, 0.70300464, 0.86246528, -0.42644705, 0.17636177, -1.7870432 , -0.00659526, -1.22431819, 0.43125762, 0.61653442, 0.54664429, -0.35557064, -0.65142241, 0.89569784, -1.00710185, 0.81701861, -0.12796308, 0.62462281, 0.15555161, -1.33867444, 0.40105962, -0.85043252, 1.05851956, -0.65293783, -0.17681899, 0.27571891, 0.62521257, 0.360203 , -2.48734398, 1.43893884, -0.7958003 , -0.23526785, 1.05522751, 0.20748649, 0.41079423, -1.49319904, 1.54552729, -0.29076506, 0.68255727, -1.78297863, 0.95463271, -0.19981706, 1.22516288, -1.07704283, 0.82060848, -1.29910111, -0.06732896, -1.06124267, -0.23629949, 2.09987858, -1.26369429, 0.03237696, -0.98521614, 0.40883493, -0.91302624, 0.67147181, -0.5961868 , 1.17592021, 0.52573888, -1.29840116, 1.20059359, -1.28642861, 1.04867606, 0.37315101, 0.91649244, -1.79039163, -0.48341874, -1.31889581, 0.09449798, 0.15883817, 0.82490546, -0.72970416, 0.69012513, -0.59388906, -0.01855135, -1.05276867, 0.3757267 , 0.17818673, 0.72823495, 1.83242879, -1.88055631, 1.03954104, 0.89407288, 0.22923608, -0.15309013])

Define our Betas and Generate Y

beta_0 = 2 beta_1 = 3 Y = beta_0 + beta_1*X + np.random.randn(100) Y
array([-0.09811801, 8.02864696, 3.24355683, 1.75900294, 2.54227678, -3.47200938, 0.49449328, 6.67594284, 0.0339692 , 5.85778502, -7.52520879, 1.38200135, -3.64759611, 7.63016788, 0.68603627, 1.36268951, 4.69130299, 4.75749854, -0.15151476, 3.75743203, -5.01325723, 1.12629783, -1.186844 , 3.93319805, 4.34085286, 3.05672638, 1.21419111, 0.27384927, 5.2467575 , -2.09259539, 3.41953313, 3.16552732, 4.74774104, 1.45897499, -2.61878716, 2.39699893, -0.85819949, 3.67022824, 0.14642679, 0.58831642, 1.51824758, 3.1234011 , 3.40939318, -4.07842134, 4.85469464, -0.74881077, -0.30103399, 4.88824007, 4.95694325, 4.87147428, -1.92303363, 6.36813992, 1.02471249, 4.50099746, -3.57528305, 4.40479345, 0.5772572 , 4.95837754, -0.48444755, 4.79863027, -1.97436127, 0.41789391, -2.64062232, 1.06697885, 8.96919591, -1.24571585, 4.39552265, -0.77589849, 4.08719976, 0.40992796, 5.40443798, 1.7045144 , 4.59977162, 3.25447816, -2.61355147, 8.08419741, -2.10114368, 5.75478627, 4.57216778, 4.6388713 , -2.99941236, 0.94046493, -1.81890935, 1.59313132, 1.21986292, 5.44534611, 1.05335039, 3.00261338, 0.78057142, 1.9315195 , -0.23510765, 2.74030914, 3.85861009, 4.85470511, 8.03524818, -1.29306121, 6.31443079, 4.73268197, 2.21379844, 0.79977881])

Throw into a Data Frame for Easy processing

df = pd.DataFrame({'X': X, 'Y':Y}) df.head()

Generate Model and Look at the results`

model = smf.ols(formula='Y~X', data=df) results = model.fit() results.summary()

Plot with Seaborn

sns.lmplot(x="X",y='Y',data=df)
<seaborn.axisgrid.FacetGrid at 0x2d83be058d0>
Image in a Jupyter notebook