CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
weijie-chen

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

GitHub Repository: weijie-chen/Linear-Algebra-With-Python
Path: blob/master/notebooks/Chapter 4 - LU Factorization.ipynb
Views: 449
Kernel: Python 3
import numpy as np import scipy as sp import matplotlib.pyplot as plt import pandas as pd import sympy as sy sy.init_printing()
from IPython.core.interactiveshell import InteractiveShell InteractiveShell.ast_node_interactivity = "all"

LU Factorization

LU factorization arises due its computational efficiency, it mainly facilitates solving the system of linear equations, though the flops (number of floating operations) of LU is still higher than Guassian-Jordon.

Nonetheless LU factorization is particularly handy if you are computing multiple solutions of Ax=bA x =b.

One example is, if you have a set of {bi, iZ}\{b_i,\ i \in \mathbb Z\} to substitute in the system, such that Ax=b1,Ax=b2,Ax=b3,Ax=b4,... Ax=b_1,\quad Ax=b_2,\quad Ax=b_3,\quad Ax=b_4, \quad... we only decompose AA once, but Guassian-Jordon algorithm will have to re-do operations with every augmented [Abi][A |b_i].

LU Algorithm

We aim to decompose a matrix AA into LL and UU, which represent lower and upper triangular matrix respectively. And LL must have 11's on its principal diagonal. A=LU A = LU For instance, $$ A=\underbrace{\left[\begin{array}{cccc} 1 & 0 & 0 & 0 \

  • & 1 & 0 & 0 \

  • & * & 1 & 0 \

  • & * & * & 1 \end{array}\right]}{L}\underbrace{\left[000000000\begin{array}{ccccc} \blacksquare & * & * & * & * \\ 0 & \blacksquare & * & * & * \\ 0 & 0 & 0 & \blacksquare & * \\ 0 & 0 & 0 & 0 & 0 \end{array}\right]}{U} $$

The plausibility of decomposition hinges on if AA can be converted into a upper triangular matrix after a series of row operations, i.e.

Ep...E2E1A=UE_p...E_2E_1 A = U

Then

A=(Ep...E2E1)1U=LUA = (E_p...E_2E_1)^{-1}U = LU

where

L=(Ep...E2E1)1L = (E_p...E_2E_1)^{-1}

We will hand calculate an example here:

A=[936346088]A = \begin{bmatrix} 9 & 3 & 6\\3 & 4 & 6\\0 & 8 & 8\\ \end{bmatrix}
[1001310001]E1[936346088]A=[936034088]E1A\underbrace{ \begin{bmatrix} 1 & 0 & 0\\-\frac{1}{3} & 1 & 0\\0 & 0 & 1\\ \end{bmatrix}}_{E_1} \underbrace{ \begin{bmatrix} 9 & 3 & 6\\3 & 4 & 6\\0 & 8 & 8\\ \end{bmatrix}}_{A} = \underbrace{ \begin{bmatrix} 9 & 3 & 6\\0 & 3 & 4\\0 & 8 & 8\\ \end{bmatrix}}_{E_1A}
[1000100831]E2[936034088]E1A=[9360340083]E2E1A=U\underbrace{ \begin{bmatrix} 1 & 0 & 0\\0 &1 & 0\\0 & -\frac{8}{3} & 1\\ \end{bmatrix}}_{E_2} \underbrace{ \begin{bmatrix} 9 & 3 & 6\\0 & 3 & 4\\0 & 8 & 8\\ \end{bmatrix}}_{E_1A} = \underbrace{ \begin{bmatrix} 9 & 3 & 6\\0 & 3 & 4\\0 & 0 & -\frac{8}{3}\\ \end{bmatrix}}_{E_2E_1A=U}
L1=E2E1=[1000100831]E2[1001310001]E1=[100131089831]L^{-1} = E_2E_1 = \underbrace{ \begin{bmatrix} 1 & 0 & 0\\0 &1 & 0\\0 & -\frac{8}{3} & 1\\ \end{bmatrix}}_{E_2} \underbrace{ \begin{bmatrix} 1 & 0 & 0\\-\frac{1}{3} & 1 & 0\\0 & 0 & 1\\ \end{bmatrix}}_{E_1} = \begin{bmatrix} 1 & 0 & 0\\-\frac{1}{3} & 1 & 0\\\frac{8}{9} & -\frac{8}{3} & 1\\ \end{bmatrix}

Or we can compute E11E_1^{-1} and E21E_2^{-1} directly, because L=E11E21 L = E_1^{-1}E_2^{-1}

Construct augmented matrices for E2E_2 and E1E_1 [E1I]=[1001001310010001001][1001000101310001001]=[IE11] [E_1| I]=\left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 1 & 0 & 0\\ -\frac{1}{3} & 1 & 0 & 0 & 1 & 0\\ 0 & 0 & 1 & 0 & 0 & 1 \end{array}\right] \sim \left[\begin{array}{ccc|ccc} 1 &0 &0 &1 & 0 & 0\\ 0& 1& 0& \frac{1}{3} & 1 & 0\\ 0 & 0 & 1 &0 & 0 & 1 \end{array}\right] =[I|E_1^{-1}]

[E2I]=[1001000100100831001][1001000100100010831]=[IE21][E_2| I]= \left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 1 & 0 & 0\\ 0 & 1 & 0 & 0 & 1 & 0\\ 0 & -\frac{8}{3} & 1 & 0 & 0 & 1\\ \end{array} \right] \sim \left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 1 & 0 & 0\\ 0 & 1 & 0 & 0 & 1 & 0\\ 0 & 0 & 1 & 0 &\frac{8}{3}& 1\\ \end{array} \right] =[I|E_2^{-1}]

And we get the inverse of E1E_1 and E2E_2: E11=[1001310001]E21=[1001100831] E_1^{-1} = \left[\begin{array}{ccc} 1& 0& 0 \\ \frac{1}{3}& 1& 0\\ 0& 0 &1 \end{array}\right]\\ E_2^{-1} = \left[\begin{array}{ccc} 1& 0& 0 \\ 1& 1& 0\\ 0& \frac{8}{3} &1 \end{array}\right]

L=E11E21=[1001310001][1001100831]=[10043100831]L = E_1^{-1}E_2^{-1}= \left[\begin{array}{ccc} 1& 0& 0 \\ \frac{1}{3}& 1& 0\\ 0& 0 &1 \end{array}\right] \left[\begin{array}{ccc} 1& 0& 0 \\ 1& 1& 0\\ 0& \frac{8}{3} &1 \end{array}\right] = \left[\begin{array}{ccc} 1& 0& 0 \\ \frac{4}{3}&1& 0\\ 0& \frac{8}{3} &1 \end{array}\right]

The final result of LU decomposition: A=LU=[10043100831]L[9360340083]E2E1A=U A = LU = \underbrace{ \left[\begin{array}{ccc} 1& 0& 0 \\ \frac{4}{3}&1& 0\\ 0& \frac{8}{3} &1 \end{array}\right]}_{L} \underbrace{ \begin{bmatrix} 9 & 3 & 6\\0 & 3 & 4\\0 & 0 & -\frac{8}{3}\\ \end{bmatrix}}_{E_2E_1A=U}

We can take a look at SciPy results.

A = np.array([[9, 3, 6], [3, 4, 6], [0, 8, 8]]); A
array([[9, 3, 6], [3, 4, 6], [0, 8, 8]])
P, L, U = sp.linalg.lu(A) P L U
array([[1., 0., 0.], [0., 0., 1.], [0., 1., 0.]])
array([[1. , 0. , 0. ], [0. , 1. , 0. ], [0.33333333, 0.375 , 1. ]])
array([[9., 3., 6.], [0., 8., 8.], [0., 0., 1.]])
P@L@U # this is A
array([[9., 3., 6.], [3., 4., 6.], [0., 8., 8.]])

It is easy to notice the SciPy lu function give us more than LL and UU, but also PP, which is a permutation matrix. Theoretically, we don't need row switches to convert AA into UU, but in some situations we must make row switches beforehand, otherwise decomposition will fail to materialize.

Thus Scipy uses PLUPLU decomposition to make the procedure always possible A=PLU A = PLU

Also P=P1P = P^{-1}, why? It's easier to analyze in augmented matrix expression, the inverse of row-switched elementary matrices are themselves.

[PI]=[100100001010010001][100100010001001010]=[IP1]=[P1I][P| I]=\left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0 & 1 & 0\\ 0 & 1 & 0 & 0 & 0 & 1 \end{array}\right] \sim \left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 1 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 1\\ 0 & 0 & 1 & 0 & 1 & 0 \end{array}\right]=[I| P^{-1}] =[P^{-1}|I]

With these knowledge, what we are decomposing is actucally

PA=LUPA = LU
E2E1E0A=[1000100381]E2[1000101301]E1[100001010]E0[936346088]E2E1E0A=[1000100381]E2[1000101301]E1[936088346]E0AE2E1E0A=[1000100381]E2[936088034]E1E0AE2E1E0A=[936088001]E2E1E0A=U\begin{align} E_2E_1E_0A &= \underbrace{ \begin{bmatrix} 1 & 0& 0\\0 & 1 & 0\\0 & -\frac{3}{8} & 1\\ \end{bmatrix}}_{E_2} \underbrace{ \begin{bmatrix} 1 & 0& 0\\0 & 1 & 0\\-\frac{1}{3} & 0 & 1\\ \end{bmatrix}}_{E_1} \underbrace{ \begin{bmatrix} 1 & 0& 0\\0 & 0 & 1\\0 & 1 & 0\\ \end{bmatrix}}_{E_0} \begin{bmatrix} 9 & 3 & 6\\3 & 4 & 6\\0 & 8 & 8\\ \end{bmatrix}\\ E_2E_1E_0A &= \underbrace{ \begin{bmatrix} 1 & 0& 0\\0 & 1 & 0\\0 & -\frac{3}{8} & 1\\ \end{bmatrix}}_{E_2} \underbrace{ \begin{bmatrix} 1 & 0& 0\\0 & 1 & 0\\-\frac{1}{3} & 0 & 1\\ \end{bmatrix}}_{E_1} \underbrace{ \begin{bmatrix} 9& 3& 6\\0&8 &8\\ 3& 4& 6 \end{bmatrix}}_{E_0A}\\ E_2E_1E_0A &= \underbrace{ \begin{bmatrix} 1 & 0& 0\\0 & 1 & 0\\0 & -\frac{3}{8} & 1\\ \end{bmatrix}}_{E_2} \underbrace{ \begin{bmatrix} 9 &3 &6\\0 &8& 8 \\0& 3& 4 \end{bmatrix}}_{E_1E_0A}\\ E_2E_1E_0A &= \underbrace{ \begin{bmatrix} 9 &3 &6\\0 &8& 8 \\0& 0& -1 \end{bmatrix}}_{E_2E_1E_0A}=U \end{align}

Rearrange that we can see PLPL A=E01P(E11E21)LU A = \underbrace{E_0^{-1}}_{P} \underbrace{(E_1^{-1}E_2^{-1})}_{L}U

Solving Linear System by Using LU Factorization

Solve the linear system: 3x17x22x3+2x4=93x1+5x2+x3=56x14x25x4=79x1+5x25x3+12x4=11 \begin{align} 3x_1-7x_2 -2x_3+2x_4&=-9\\ -3x_1+5x_2 +x_3 &=5\\ 6x_1-4x_2 -5x_4&=7\\ -9x_1+5x_2 -5x_3+12x_4&=11\\ \end{align} In matrix form: $$ \underbrace{\left[37223510640595512\begin{array}{rrrr} 3 & -7 & -2 & 2 \\ -3 & 5 & 1 & 0 \\ 6 & -4 & 0 & -5 \\ -9 & 5 & -5 & 12 \end{array}\right]}_{A} \left[x1x2x3x4\begin{array}{r} x_1 \\ x_2 \\ x_3 \\ x_4 \end{array}\right]

\underbrace{\left[95711\begin{array}{r} -9 \\ 5 \\ 7 \\ 11 \end{array}\right]}_{b} $$ Perform $LU$ decomposition:

[37223510640595512]A=[1000110025103831]L[3722021200110001]U\underbrace{\left[\begin{array}{rrrr} 3 & -7 & -2 & 2 \\ -3 & 5 & 1 & 0 \\ 6 & -4 & 0 & -5 \\ -9 & 5 & -5 & 12 \end{array}\right]}_{A} =\underbrace{ \left[\begin{array}{rrrr} 1 & 0 & 0 & 0 \\ -1 & 1 & 0 & 0 \\ 2 & -5 & 1 & 0 \\ -3 & 8 & 3 & 1 \end{array}\right]}_{L}\underbrace{\left[\begin{array}{rrrr} 3 & -7 & -2 & 2 \\ 0 & -2 & -1 & 2 \\ 0 & 0 & -1 & 1 \\ 0 & 0 & 0 & -1 \end{array}\right]}_{U}

Replace AA by LULU, we get L(Ux)=bL(Ux) = b, now solve this pair of equations

Ly=bUx=yLy = b\\ Ux = y

Construct augmented matrix [Lb][L|b]

[1000110025103831]L[y1y2y3y4]y=[95711]b\underbrace{ \left[\begin{array}{rrrr} 1 & 0 & 0 & 0 \\ -1 & 1 & 0 & 0 \\ 2 & -5 & 1 & 0 \\ -3 & 8 & 3 & 1 \end{array}\right]}_{L} \underbrace{\left[\begin{array}{r} y_1 \\ y_2 \\ y_3 \\ y_4 \end{array}\right]}_{y} = \underbrace{\left[\begin{array}{r} -9 \\ 5 \\ 7 \\ 11 \end{array}\right]}_{b}
[100091100525107383111][10009010040010500011]\left[\begin{array}{rrrr|r} 1 & 0 & 0 & 0 & -9 \\ -1 & 1 & 0 & 0 & 5\\ 2 & -5 & 1 & 0 & 7\\ -3 & 8 & 3 & 1 & 11 \end{array}\right]\sim \left[\begin{array}{rrrr|r} 1 & 0 & 0 & 0 & -9 \\ 0 & 1 & 0 & 0 & -4\\ 0 & 0 & 1 & 0 & 5\\ 0 & 0 & 0 & 1 & 1 \end{array}\right]
[y1y2y3y4]=[9451]\left[\begin{array}{r} y_1 \\ y_2 \\ y_3 \\ y_4 \end{array}\right]= \left[\begin{array}{r} -9 \\ -4 \\ 5 \\ 1 \end{array}\right]

Next we solve Ux=yUx = y, to show this in NumPy:

U = np.array([[3, -7, -2, 2], [0, -2, -1, 2], [0, 0, -1, 1], [0, 0, 0,-1]]) y = np.array([-9, -4, 5, 1]) np.linalg.solve(U, y)
array([ 3., 4., -6., -1.])

If the process is correct, this is the solution of the system and we can verify results by invoking np.linalg.solve().

A = np.array([[3, -7, -2, 2], [-3, 5, 1, 0], [6, -4, 0, -5], [-9, 5, -5, 12]]) b = np.array([-9, 5, 7, 11]) np.linalg.solve(A, b)
array([ 3., 4., -6., -1.])

The results are the same, LULU decomposition works!

Cholesky Factorization

Cholesky decomposition is analogous to taking the square root of a matrix. A common example is the decomposition of a covariance matrix. In this context, the LL matrix plays a role similar to that of the correlation matrix. Σ=LLT \Sigma = L L^T

Suppose we have a vector of uncorrelated random variables z\mathbf{z}. Each element of z\mathbf{z} is a standard normal random variable with zero mean and unit variance. In mathematical terms, zN(0,I)\mathbf{z} \sim \mathcal{N}(0, I), where II is the identity matrix. z=(z1z2zn) \mathbf{z}=\left(\begin{array}{c} z_1 \\ z_2 \\ \vdots \\ z_n \end{array}\right)

We can transform non-correlated series xx into corelated zz x=Lz x = Lz

To understand this, we examine the covariance matrix of the transformed vector ( x ): Cov(x)=Cov(Lz)=LCov(z)LT=LILT=LLT=Σ \text{Cov}(x) = \text{Cov}(Lz) = L \text{Cov}(z) L^T = L I L^T = L L^T = \Sigma This shows that Cov(x)\text{Cov}(x) has the desired covariance structure.

Let's see a numerical example.

# use Antithetic Variates to reduce variance n = 1000 z = np.random.normal(size=(1000, 3)) z2 = -z z = np.concatenate((z, z2), axis=0) cov_matrix = np.array([[1, 0.8, 0.7], [0.8, 1, 0.8], [0.7, 0.8, 1]]) L = np.linalg.cholesky(cov_matrix) z = z @ L.T np.cov(z.T)
array([[0.99424759, 0.80479653, 0.71424917], [0.80479653, 1.00962669, 0.81794008], [0.71424917, 0.81794008, 1.02650048]])