Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
YStrano
GitHub Repository: YStrano/DataScience_GA
Path: blob/master/lessons/lesson_05/code/Linear Regression with Simulated Data - (done).ipynb
1904 views
Kernel: Python 3

Linear Regression with Simulated Data

## Basic imports %matplotlib inline import numpy as np import pandas as pd import seaborn as sns import statsmodels.formula.api as smf

Generate a Sample of Size 100 from a Normal Distribution (np.random.randn)

X = np.random.randn(100) #n for normal X
array([ 0.06493375, 0.7984497 , 0.51634406, -0.17006532, 0.67659552, -0.12039165, 1.05648162, -0.04658388, 0.12489761, 1.10119439, -1.79839814, 0.10490968, 1.57167251, -0.53698783, -1.16232652, 0.84850172, 1.30612834, 0.59375966, -0.52700669, 0.11570846, -0.47324893, -0.48848682, 1.43785741, 0.18378832, 0.66465429, 0.2448326 , -1.44672557, 0.34443231, -0.3575056 , 0.37252765, -0.3071095 , -0.09579139, 1.02151853, 0.48536062, -2.32451609, -0.15650236, -1.05341782, -0.69982228, -1.10457518, -0.91441854, 0.49910147, -0.29526662, -1.52812187, -0.08339499, 0.19289274, 0.39515657, 1.85071214, 1.3542823 , -0.05058901, -0.92227413, 1.49603742, -0.32747489, -1.19727926, -0.23266561, -1.15589619, -0.77248483, -0.49974948, 0.13588953, -0.89503119, 1.72353011, 0.68029549, 0.84034493, 0.10852891, 0.22685788, 0.38786926, 0.71066973, -0.12920315, 0.04409544, -0.60161435, -0.91400097, -0.19600274, 0.34592439, -0.15931562, 0.28707037, 0.64155796, -0.34302668, -0.54639181, -2.85667605, 2.24233233, -0.48903869, 0.13769365, -0.60268568, 0.40513136, 0.59576328, 0.54502947, -0.20575235, 0.61441866, 0.77973282, -0.09726794, -0.27554759, -0.05035097, -0.58245879, 2.34022181, -1.72706398, 0.92324506, 1.29250059, 0.05048737, 0.0044817 , -0.51776128, -1.02294188])

Define our Betas and Generate Y

beta_0 = 2 beta_1 = 3 Y = beta_0 + beta_1*X + np.random.randn(100) Y
array([ 2.20400099, 3.94138477, 3.10202787, 3.2208578 , 4.11947607, 2.94827028, 6.39627038, 1.62492766, -0.29631979, 4.24975102, -2.04639204, 2.43215487, 8.20655562, 1.23559254, -2.09947287, 4.54849702, 7.25724375, 3.0863531 , 0.65963968, 1.83513297, 0.44139728, 1.00096155, 7.45235381, 1.95085381, 4.84431035, 2.0972651 , -1.61514519, 1.84685114, 2.30036601, 4.04030607, 1.5794858 , 3.83703922, 4.51567908, 2.86940751, -6.73865171, 2.74605363, -0.21813694, 2.03561071, -2.89354033, -0.79323186, 3.95558333, 0.68928368, -3.83102513, 1.55405958, 1.12117538, 3.1964302 , 7.94143169, 4.17339876, 2.73823142, -1.19459812, 5.88300785, 0.01978937, -0.18083988, -0.78939549, -0.19297022, 0.21102042, 0.44735634, 2.53348305, -1.49476516, 7.63960977, 4.77217487, 4.23736443, 3.48669396, 1.61068156, 2.60711488, 2.86067989, 1.42062426, 2.12219028, -0.36938175, -0.35855762, 1.94387742, 3.90748891, 1.05681304, 3.38315959, 3.39337211, 2.30845007, 1.9072533 , -7.09003809, 7.78038171, 2.44615193, 3.76694585, 0.66574515, 2.87776588, 5.03432225, 3.07695276, 1.70177004, 3.4697147 , 3.74098472, 2.01271124, 3.33540658, 2.31107535, 0.29698396, 8.00897885, -2.83864825, 3.87985048, 5.07064712, 2.68973698, 3.42341515, 2.1819507 , -1.20161521])

Throw into a Data Frame for Easy processing

df = pd.DataFrame({'X': X, 'Y':Y}) df.head()

Generate Model and Look at the results`

model = smf.ols(formula='Y~X', data=df) #predict Y given X results = model.fit() results.summary()

Plot with Seaborn

sns.lmplot(x="X",y='Y',data=df) #seaborn way to plot a regression
<seaborn.axisgrid.FacetGrid at 0x10b787f98>
Image in a Jupyter notebook