Path: blob/master/lessons/lesson_15/PCA - (done).ipynb
1904 views
Kernel: Python 3
Load in some Economic Data
Note that this data has been scaled and normalized so that everything has a mean of 0 and a standard deviation of 1 (z-score).
In [1]:
Out[1]:
In [2]:
Out[2]:
(349, 8)
Lets look at it
In [3]:
Out[3]:
<matplotlib.axes._subplots.AxesSubplot at 0xb316d68>
This is a mess, so lets smooth it out by doing a 12 month rolling average
In [4]:
Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a1c67f390>
This is better, but you can still see a lot of very correlated variables and two particularly volatile ones that seem to be negatively correlated
In [5]:
Out[5]:
In [6]:
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x11171dda0>
PCA can help!
The below code shows that with two "principal components" you can capture > 97% of the variation!
In [7]:
Out[7]:
array([0.77973177, 0.19289192])
Lets look at these things
In [8]:
Out[8]:
On their own, they are uninterpretable
In [9]:
Lets see how they vary over time
In [10]:
Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a200ca1d0>
Now I can use this single feature in a regression without all of that noise clogging my outputs
In [11]:
Out[11]:
In [12]:
Out[12]:
In [13]:
Out[13]:
In [19]:
In [23]:
Out[23]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
In [24]:
Out[24]:
array([-0.13271296])
In [27]:
Out[27]:
0.7790902187113105
In [33]:
Out[33]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a2103fdd8>
In [ ]:
see csv example that shows graph of inflation negatively correlated to pcs 1 - this is what was driving it all along... replicate that graph in python with matplotlib
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
PCA for Plotting
In [34]:
Out[34]:
Bernie Sanders 2241
Joseph Biden 1854
Rick Santorum 1613
Mike Pence 1238
Lindsey Graham 1158
Hillary Clinton 830
Rand Paul 455
Barack Obama 411
Jim Webb 381
Ted Cruz 365
Marco Rubio 359
John Kasich 316
Lincoln Chafee 154
Joe Biden 1
Name: speaker_name, dtype: int64
In [35]:
Out[35]:
In [13]:
Out[13]:
(3467, 3)
In [14]:
Out[14]:
In [15]:
Out[15]:
<3467x29674 sparse matrix of type '<class 'numpy.float64'>'
with 523660 stored elements in Compressed Sparse Row format>
In [16]:
In [17]:
Out[17]:
(3467, 2)
In [18]:
Out[18]:
In [19]:
In [20]:
Out[20]:
In [21]:
Out[21]:
<seaborn.axisgrid.FacetGrid at 0x1a21766748>
In [ ]: