CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
weijie-chen

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

GitHub Repository: weijie-chen/Linear-Algebra-With-Python
Path: blob/master/notebooks/Chapter 18 - The Singular Value Decomposition.ipynb
Views: 449
Kernel: Python 3 (ipykernel)
import numpy as np import scipy as sp import sympy as sy import matplotlib.pyplot as plt plt.style.use('ggplot')
np.set_printoptions(precision=3) np.set_printoptions(suppress=True)

The Singular Values

We have discussed what Spectral Decomposition which can decompose any symmetric matrices unconditionally into three special matrices.However only square matrices have eigenvalues and vectors, however we want to extend a similar concept for any m×nm \times n matrices.

If AA is an m×nm \times n matrix, then ATAA^TA and AATAA^T are both symmetric and orthogonally diagonalizable.

A = np.array([[4, -3], [-3, 2], [2, -1], [1, 0]]); A
array([[ 4, -3], [-3, 2], [ 2, -1], [ 1, 0]])

Compute eigenvalues and eigenvectors of ATAA^TA.

ATA = A.T@A; ATA
array([[ 30, -20], [-20, 14]])
D1, P1 = np.linalg.eig(ATA)

Check if PP is an orthonormal matrix.

P1@P1.T
array([[1., 0.], [0., 1.]])
D1
array([43.541, 0.459])

The square roots of eigenvalues of ATAA^TA are called singular values of AA, denoted by σ1,...,σn\sigma_1, ..., \sigma_n in decreasing order.

We can also show that singular values of AA are the lengths of vectors Av1,...,AvnA\mathbf{v}_1,..., A\mathbf{v}_n, where vi\mathbf{v}_i is the eigenvalue of ATAA^TA.

The length of AviA\mathbf{v}_i is ∥Avi∥\|A\mathbf{v}_i\|

∥Avi∥=(Avi)TAvi=viTATAvi=viT(λivi)=λi=σ1\|A\mathbf{v}_i\| = \sqrt{(A\mathbf{v}_i)^TA\mathbf{v}_i} = \sqrt{\mathbf{v}_i^TA^T A\mathbf{v}_i}=\sqrt{\mathbf{v}_i^T(\lambda_i\mathbf{v}_i)} = \sqrt{\lambda_i}=\sigma_1

where viTvi=1\sqrt{\mathbf{v}_i^T\mathbf{v}_i} = 1 and λi\lambda_i's are eigenvalues of ATAA^TA.

Singular Value Decomposition

Singular Value Decomposition (SVD) is probably the most important decomposition technique in the history of linear algebra, it combines all the theory we discussed, then culminate at this point.

AA is a m×nm\times n matrix. However AATAA^T and ATAA^TA are symmetric matrices,then both are orthogonally diagonalizable.

AAT=UΣΣTUT=(UΣVT)(VΣUT)ATA=VΣTΣVT=(VΣTUT)(UΣVT)AA^T = U\Sigma\Sigma^T U^T=(U\Sigma V^T)(V\Sigma U^T)\\ A^TA = V\Sigma^T \Sigma V^T = (V\Sigma^T U^T)(U \Sigma V^T)

where ΣΣT\Sigma\Sigma^T is a diagonal matrix with all eigenvalues of AATAA^T and ΣTΣ\Sigma^T \Sigma is a diagonal matrix with all eigenvalues of VVTVV^T.

Because both AATAA^T and ATAA^TA are symmetric, then UUT=UTU=Im×mUU^T= U^TU=I_{m\times m} and VVT=VTV=In×nVV^T= V^TV=I_{n\times n}.

We have implicitly shown the singular value decompositions above, one of the most important concept in linear algebra.

SVD:Am×n=Um×mΣm×nVn×nT\Large SVD:\quad A_{m\times n} = U_{m\times m}\Sigma_{m \times n} V^T_{n \times n}

The SVD theory guarantees that any matrix AA, no matter its ranks or shapes, can be unconditionally decomposed into three special matrices.

So next question: what is Σ\Sigma?

It is an m×nm\times n main diagonal matrix, with all singular values on the main diagonal. Rewrite

ATA=VΣTΣVT=VΣ2VTA^TA = V\Sigma^T \Sigma V^T = V\Sigma^2 V^T

Post-multiply both sides by VV

ATAV=VΣ2A^TAV = V\Sigma^2

This is the matrix version of Avi=λiviA\mathbf{v}_i = \lambda_i \mathbf{v}_i, but here the matrix of interest is ATAA^TA rather than AA. Similarly it can be written with singular values

ATAvi=σi2viA^TA\mathbf{v}_i = \sigma_i^2\mathbf{v}_i

Because UU and VV are not unique, we tend to standardize the solution by arranging σ1≥σ2≥σ3≥...≥σr\sigma_1 \geq \sigma_2 \geq \sigma_3\geq ... \geq\sigma_r.

Why we only arrange rr singular values? Because it is the rank of AA, so is the rank of ATAA^TA. Explicitly Σ\Sigma looks like

Σ=[λ1λ2⋱λr⋱0]=[σ1σ2⋱σr⋱0]\Sigma =\left[\begin{array}{cccccc} \sqrt{\lambda_{1}} & & & & &\\ & \sqrt{\lambda_{2}} & & & &\\ & & \ddots & & &\\ & & & \sqrt{\lambda}_{\mathrm{r}} & &\\ & & & & \ddots &\\ & & & & & 0 \end{array}\right] =\left[\begin{array}{cccccc} \sigma_1 & & & & &\\ & \sigma_2 & & & &\\ & & \ddots & & &\\ & & & \sigma_r & &\\ & & & & \ddots &\\ & & & & & 0 \end{array}\right]

We can do the same for AATAA^T and get

AATU=UΣ2AA^TU = U \Sigma^2

or

AATui=σi2uiAA^T\mathbf{u}_i = \sigma_i^2\mathbf{u}_i

We have shown why Am×n=Um×mΣm×nVn×nTA_{m\times n} = U_{m\times m}\Sigma_{m \times n} V^T_{n \times n} holds.

To perfomr a SVD on AA, we just need two equations and this is also a mannual procedure to decompose any matrix.

ATA=VΣTΣVTAV=UΣA^TA = V\Sigma^T \Sigma V^T\\ AV = U\Sigma

Here's an example, let's say we have we have a data set AA

A = np.random.rand(10, 2)

Give it a SVD\text{SVD} decomposition

U, S, VT = np.linalg.svd(A, full_matrices=False)

Let's say we want to reduce A10×3A_{10\times 3} in to A10×2A_{10\times 2}.

# Keep only the first dimension A_reduced = np.dot(U[:, :1], np.dot(np.diag(S[:1]), VT[:1, :])) # Plot original data plt.scatter(A[:, 0], A[:, 1]) plt.title("Original $A$") plt.show() # Plot reduced data plt.scatter(A_reduced[:, 0], A_reduced[:, 1]) plt.title("$A$ with Reduced Dimensionality") plt.show()
Image in a Jupyter notebookImage in a Jupyter notebook
import numpy as np import matplotlib.pyplot as plt # Generate some random data np.random.seed(0) data = np.random.randn(200, 2) # Perform SVD on the data U, s, VT = np.linalg.svd(data) # Perform PCA on the data mean = np.mean(data, axis=0) data_pca = data - mean cov = np.cov(data_pca.T) eigenvalues, eigenvectors = np.linalg.eig(cov) # Plot the data plt.scatter(data[:, 0], data[:, 1]) plt.title("Original Data") plt.show() # Plot the SVD components plt.scatter(U[:, 0], U[:, 1]) plt.title("SVD Components") plt.show() # Plot the PCA components plt.scatter(eigenvectors[:, 0], eigenvectors[:, 1]) plt.title("PCA Components") plt.show()
Image in a Jupyter notebookImage in a Jupyter notebookImage in a Jupyter notebook

Reformulate SVD

Rewrite SVDSVD

AV=UΣAV = U\Sigma

vector version is

Avi=σiuiA\mathbf{v}_i = \sigma_i \mathbf{u}_i

There two implications from the equation above: (a)(a) AA can be decomposed into

A=∑irσiuiviTA = \sum_{i}^r\sigma_i\mathbf{u}_i \mathbf{v}_i^T

(b)(b) We can compute ui\mathbf{u}_i by using

ui=Aviσi\mathbf{u}_i = \frac{A\mathbf{v}_i}{\sigma_i}