Path: blob/master/1 - Natural Language Processing with Classification and Vector Spaces/Week 3/C1W3_L3_Another explanation about PCA.ipynb
65 views
Another explanation about PCA
photo credit: Raunak JoshiIn this lab, we are going to view another explanation about Principal Component Analysis(PCA). PCA is a statistical technique invented in 1901 by Karl Pearson that uses orthogonal transformations to map a set of variables into a set of linearly uncorrelated variables called Principal Components.
PCA is based on the Singular Value Decomposition(SVD) of the Covariance Matrix of the original dataset. The Eigenvectors of such decomposition are used as a rotation matrix. The Eigenvectors are arranged in the rotation matrix in decreasing order according to its explained variance. This last term is related to the EigenValues of the SVD.
PCA is a potent technique with applications ranging from simple space transformation, dimensionality reduction, and mixture separation from spectral information.
Follow this lab to view another explanation for PCA. In this case, we are going to use the concept of rotation matrices applied to correlated random data, just as illustrated in the next picture.
Source: https://en.wikipedia.org/wiki/Principal_component_analysis
As usual, we must import the libraries that will use in this lab.
To start, let us consider a pair of random variables x, y. Consider the base case when y = n * x. The x and y variables will be perfectly correlated to each other since y is just a scaling of x.
Now, what is the direction in which the variables point?
Understanding the transformation model pcaTr
As mentioned before, a PCA model is composed of a rotation matrix and its corresponding explained variance. In the next module, we will explain the details of the rotation matrices.
pcaTr.components_has the rotation matrixpcaTr.explained_variance_has the explained variance of each principal component
The rotation matrix is equal to:
And is the same angle that form the variables y = 1 * x.
Then, PCA has identified the angle in which point the original variables.
And the explained Variance is around [0.166 0]. Remember that the Variance of a uniform random variable x ~ U(1, 2), as our x and y, is equal to:
Then the explained variance given by the PCA can be interpret as
Which means that all the explained variance of our new system is explained by our first principal component.
Correlated Normal Random Variables.
Now, we will use a controlled dataset composed of 2 random variables with different variances and with a specific Covariance among them. The only way I know to get such a dataset is, first, create two independent Normal random variables with the desired variances and then combine them using a rotation matrix. In this way, the new resulting variables will be a linear combination of the original random variables and thus be dependent and correlated.
Let us print the original and the resulting transformed system using the result of the PCA in the same plot alongside with the 2 Principal Component vectors in red and blue
The explanation of this chart is as follows:
The rotation matrix used to create our correlated variables took the original uncorrelated variables
xandyand transformed them into the blue points.The PCA transformation finds out the rotation matrix used to create our correlated variables (blue points). Using the PCA model to transform our data, puts back the variables as our original uncorrelated variables.
The explained Variance of the PCA is
which is approximately
the parameters of our original random variables x and y
You can use the previous code to try with other standard deviations and correlations and convince your self of this fact.
PCA as a strategy for dimensionality reduction
The principal components contained in the rotation matrix, are decreasingly sorted depending on its explained Variance. It usually means that the first components retain most of the power of the data to explain the patterns that generalize the data. Nevertheless, for some applications, we are interested in the patterns that explain much less Variance, for example, in novelty detection.
In the next figure, we can see the original data and its corresponding projection over the first and second principal components. In other words, data comprised of a single variable.
PCA as a strategy to plot complex data
The next chart shows a sample diagram displaying a dataset of pictures of cats and dogs. Raw pictures are composed of hundreds or even thousands of features. However, PCA allows us to reduce that many features to only two. In that reduced space of uncorrelated variables, we can easily separate cats and dogs.

You will learn how to generate a chart like this with word vectors in this week's programming assignment.