CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
weijie-chen

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

GitHub Repository: weijie-chen/Linear-Algebra-With-Python
Path: blob/master/notebooks/Chapter 15 - Innear Product and Orthogonality.ipynb
Views: 449
Kernel: Python 3 (ipykernel)
import matplotlib.pyplot as plt import numpy as np import scipy as sp import sympy as sy import plotly.graph_objects as go sy.init_printing()

The Dot Product

Consider two vectors

u=[u1u2un] and v=[v1v2vn]\mathbf{u}=\left[\begin{array}{l} u_{1} \\ u_{2} \\ \vdots \\ u_{n} \end{array}\right] \quad \text { and } \bf{v}=\left[\begin{array}{l} v_{1} \\ v_{2} \\ \vdots \\ v_{n} \end{array}\right]

The dot product of u\mathbf{u} and v\mathbf{v}, i.e. uv\mathbf{u}\cdot\mathbf{v} is defined as

[u1u2un][v1v2vn]=u1v1+u2v2++unvn\left[\begin{array}{llll} u_{1} & u_{2} & \cdots & u_{n} \end{array}\right]\left[\begin{array}{c} v_{1} \\ v_{2} \\ \vdots \\ v_{n} \end{array}\right]=u_{1} v_{1}+u_{2} v_{2}+\cdots+u_{n} v_{n}

We can generate two random vectors, then let's compare operations in NumPy.

u = np.round(100*np.random.randn(10)) v = np.round(100*np.random.randn(10))
u*v # this is element-wise multiplication
array([ 1050., -215., 11900., -1276., 13838., 5535., 23409., -1456., -4828., 35032.])
u@v # matrix multiplication, which is a dot product in this case

82989.0\displaystyle 82989.0

np.inner(u,v) # inner product here is the same as matrix multiplication

82989.0\displaystyle 82989.0

The Norm of a Vector

The norm is the length of a vector, defined by

v=vv=v12+v22++vn2,andv2=vv\|\mathbf{v}\| = \sqrt{\mathbf{v} \cdot \mathbf{v}} = \sqrt{v_{1}^{2} + v_{2}^{2} + \cdots + v_{n}^{2}}, \quad \text{and} \quad \|\mathbf{v}\|^{2} = \mathbf{v} \cdot \mathbf{v}

The NumPy built-in function np.linalg.norm() is used for computing norms. By default, it computes the length of vectors from the origin.

a = [2, 6] np.linalg.norm(a)

6.32455532033676\displaystyle 6.32455532033676

Verify the results.

np.sqrt(2**2 + 6**2)

6.32455532033676\displaystyle 6.32455532033676

This function can also compute a group of vectors' length, for instance (2,6)T(2, 6)^T, (8,2)T(8, 2)^T, (9,1)T(9, 1)^T

A = np.array([[2, 8, 9], [6, 2, 1]]) np.linalg.norm(A, axis = 0)
array([6.32455532, 8.24621125, 9.05538514])

Distance in Rn\mathbb{R}^n

For u\mathbf{u} and v\mathbf{v} in Rn\mathbb{R}^{n}, the distance between u\mathbf{u} and v,\mathbf{v}, written as dist (u,v),(\mathbf{u}, \mathbf{v}), is the length of the vector uv.\mathbf{u}-\mathbf{v} . That is,

dist(u,v)=uv\operatorname{dist}(\mathbf{u}, \mathbf{v})=\|\mathbf{u}-\mathbf{v}\|

Suppose we have two vectors u=(2,9)\mathbf{u} = (2, 9) and v=(3,4)\mathbf{v} = (-3, 4), compute the distance and visualize the results.

u = np.array([2, 9]) v = np.array([-3, 4]) np.linalg.norm(u - v)

7.07106781186548\displaystyle 7.07106781186548

fig, ax = plt.subplots(figsize=(12, 12)) # Define the vectors vects = np.array([[2, 9], [-3, 4], [3, -4], [5, 5]]) col = ['red', 'blue', 'blue', 'green'] cordt = ['$(2, 9)$', '$(-3, 4)$', '$(3, -4)$', '$(5, 5)$'] # Plot the vectors for i in range(4): ax.arrow(0, 0, vects[i, 0], vects[i, 1], color=col[i], width=0.08, length_includes_head=True, head_width=0.3, # default: 3*width head_length=0.6, overhang=0.4) ax.text(x=vects[i][0], y=vects[i][1], s=cordt[i], size=15) ax.grid() # Define the points for the arrows and lines points = np.array([[2, 9], [5, 5], [3, -4], [-3, 4]]) # Plot the green vector from (-3, 4) to (2, 9) start_point = points[3] end_point = points[0] direction = end_point - start_point ax.arrow(start_point[0], start_point[1], direction[0], direction[1], color='green', width=0.08, length_includes_head=True, head_width=0.3, # default: 3*width head_length=0.6, overhang=0.4) # Plot the blue dashed lines line1 = np.array([points[0], points[1]]) ax.plot(line1[:, 0], line1[:, 1], c='b', lw=3.5, alpha=0.5, ls='--') line2 = np.array([points[2], points[1]]) ax.plot(line2[:, 0], line2[:, 1], c='b', lw=3.5, alpha=0.5, ls='--') # Add the text for the distance using raw string ax.text(-2, 6, r'$\|\|\mathbf{u}-\mathbf{v}\|\|$', size=16) ###################### Axis, Spines, Ticks ########################## ax.axis([-10, 10.1, -10.1, 10.1]) ax.spines['left'].set_position('center') ax.spines['right'].set_color('none') ax.spines['bottom'].set_position('center') ax.spines['top'].set_color('none') ax.xaxis.set_ticks_position('bottom') ax.yaxis.set_ticks_position('left') ax.minorticks_on() ax.tick_params(axis='both', direction='inout', length=12, width=2, which='major') ax.tick_params(axis='both', direction='inout', length=10, width=1, which='minor') plt.show()
Image in a Jupyter notebook

From the graph, we know that the uv\|\mathbf{u}-\mathbf{v}\| is 52+52\sqrt{5^2 + 5^2}.

np.sqrt(5**2 + 5**2)

7.07106781186548\displaystyle 7.07106781186548

The same results as the np.linalg.norm(u - v).

Orthogonal Vectors

We have two vectors u\mathbf{u} and v\mathbf{v}, and square the distance of u+v\|\mathbf{u}+\mathbf{v}\| and uv\|\mathbf{u}-\mathbf{v}\|

[dist(u,v)]2=u(v)2=u+v2=(u+v)(u+v)=u(u+v)+v(u+v)=uu+uv+vu+vv=u2+v2+2uv\begin{aligned} \big[\operatorname{dist}(\mathbf{u},-\mathbf{v})\big]^{2} &=\|\mathbf{u}-(-\mathbf{v})\|^{2}=\|\mathbf{u}+\mathbf{v}\|^{2} \\ &=(\mathbf{u}+\mathbf{v}) \cdot(\mathbf{u}+\mathbf{v}) \\ &=\mathbf{u} \cdot(\mathbf{u}+\mathbf{v})+\mathbf{v} \cdot(\mathbf{u}+\mathbf{v}) \\ &=\mathbf{u} \cdot \mathbf{u}+\mathbf{u} \cdot \mathbf{v}+\mathbf{v} \cdot \mathbf{u}+\mathbf{v} \cdot \mathbf{v} \\ &=\|\mathbf{u}\|^{2}+\|\mathbf{v}\|^{2}+2 \mathbf{u} \cdot \mathbf{v} \end{aligned}
[dist(u,v)]2=u2+v2+2u(v)=u2+v22uv\begin{aligned} \big[\operatorname{dist}(\mathbf{u}, \mathbf{v})\big]^{2} &=\|\mathbf{u}\|^{2}+\|-\mathbf{v}\|^{2}+2 \mathbf{u} \cdot(-\mathbf{v}) \\ &=\|\mathbf{u}\|^{2}+\|\mathbf{v}\|^{2}-2 \mathbf{u} \cdot \mathbf{v} \end{aligned}

Suppose u=(1,4)\mathbf{u} = (1, 4) and v=(2,2)\mathbf{v} = (-2, 2), visualize the vector and distances.

fig, ax = plt.subplots(figsize=(12, 12)) vects = np.array([[1, 4], [-2, 2], [2, -2]]) colr = ['red', 'blue', 'blue'] cordt = ['$(1, 4)$', '$(-2, 2)$', '$(2, -2)$'] vec_name = [r'$\mathbf{u}$', r'$\mathbf{v}$', r'$\mathbf{-v}$'] for i in range(3): ax.arrow(0, 0, vects[i][0], vects[i][1], color=colr[i], width=0.08, length_includes_head=True, head_width=0.3, # default: 3*width head_length=0.6, overhang=0.4) ax.text(x=vects[i][0], y=vects[i][1], s=cordt[i], size=15) ax.text(x=vects[i][0]/2, y=vects[i][1]/2, s=vec_name[i], size=22) ax.text(x=-1, y=3, s=r'$\|\|\mathbf{u}-\mathbf{v}\|\|$', size=15) ax.text(x=1.5, y=1, s=r'$\|\|\mathbf{u}+\mathbf{v}\|\|$', size=15) ############################### Dashed Line ####################### line1 = np.array([vects[0], vects[1]]) ax.plot(line1[:, 0], line1[:, 1], c='b', lw=3.5, alpha=0.5, ls='--') line2 = np.array([vects[0], vects[2]]) ax.plot(line2[:, 0], line2[:, 1], c='b', lw=3.5, alpha=0.5, ls='--') ###################### Axis, Spines, Ticks ########################## ax.spines['left'].set_position('center') ax.spines['right'].set_color('none') ax.spines['bottom'].set_position('center') ax.spines['top'].set_color('none') ax.xaxis.set_ticks_position('bottom') ax.yaxis.set_ticks_position('left') ax.minorticks_on() ax.tick_params(axis='both', direction='inout', length=12, width=2, which='major') ax.tick_params(axis='both', direction='inout', length=10, width=1, which='minor') ax.axis([-5, 5, -5, 5]) ax.grid() plt.show()
Image in a Jupyter notebook

Note that if [dist(u,v)]2=[dist(u,v)]2\big[\operatorname{dist}(\mathbf{u},-\mathbf{v})\big]^{2} = \big[\operatorname{dist}(\mathbf{u}, \mathbf{v})\big]^{2}, u\mathbf{u} and v\mathbf{v} are orthogonal.According to equations above, it must be

uv=0\mathbf{u} \cdot \mathbf{v} = 0

This is one of the most important conclusion in linear algebra.

Suppose there is another vector w=(2.5,2.5)w = (2.5, 2.5), let's plot over the graph again.

fig, ax = plt.subplots(figsize=(12, 12)) vects = np.array([[1, 4], [-2, 2], [2, -2], [2.5, 2.5]]) colr = ['red', 'blue', 'blue', 'green'] cordt = ['$(1, 4)$', '$(-2, 2)$', '$(2, -2)$', '$(2.5, 2.5)$'] vec_name = [r'$\mathbf{u}$', r'$\mathbf{v}$', r'$\mathbf{-v}$', r'$\mathbf{w}$'] for i in range(4): ax.arrow(0, 0, vects[i][0], vects[i][1], color=colr[i], width=0.08, length_includes_head=True, head_width=0.3, # default: 3*width head_length=0.6, overhang=0.4) ax.text(x=vects[i][0], y=vects[i][1], s=cordt[i], size=15) ax.text(x=vects[i][0]/2, y=vects[i][1]/2, s=vec_name[i], size=22) ax.text(x=-1, y=3, s=r'$\|\|\mathbf{u}-\mathbf{v}\|\|$', size=15) ax.text(x=1.5, y=1, s=r'$\|\|\mathbf{u}+\mathbf{v}\|\|$', size=15) ############################### Dashed Line ####################### line1 = np.array([vects[0], vects[1]]) ax.plot(line1[:, 0], line1[:, 1], c='b', lw=3.5, alpha=0.5, ls='--') line2 = np.array([vects[0], vects[2]]) ax.plot(line2[:, 0], line2[:, 1], c='b', lw=3.5, alpha=0.5, ls='--') line3 = np.array([vects[1], vects[3]]) ax.plot(line3[:, 0], line3[:, 1], c='k', lw=3.5, alpha=0.5, ls='--') line4 = np.array([vects[2], vects[3]]) ax.plot(line4[:, 0], line4[:, 1], c='k', lw=3.5, alpha=0.5, ls='--') ###################### Axis, Spines, Ticks ########################## ax.spines['left'].set_position('center') ax.spines['right'].set_color('none') ax.spines['bottom'].set_position('center') ax.spines['top'].set_color('none') ax.xaxis.set_ticks_position('bottom') ax.yaxis.set_ticks_position('left') ax.minorticks_on() ax.tick_params(axis='both', direction='inout', length=12, width=2, which='major') ax.tick_params(axis='both', direction='inout', length=10, width=1, which='minor') ax.axis([-5, 5, -5, 5]) ax.grid() plt.show()
Image in a Jupyter notebook

Use SciPy built-in function, construct two 2×22\times 2 matrices for holding head and tail coordinates of the vector.

a = np.array([[1, 4], [-2, 2]]) b = np.array([[1, 4], [2, -2]])
distance = sp.spatial.distance.pdist(a, 'euclidean'); distance
array([3.60555128])
distance = sp.spatial.distance.pdist(b, 'euclidean');distance
array([6.08276253])

Verify by NumPy .norm.

def dist(u, v): a = np.linalg.norm(u - v) return a
u = np.array([1, 4]) v = np.array([-2, 2]) dist(u, v)

3.60555127546399\displaystyle 3.60555127546399

u = np.array([1, 4]) v = np.array([2, -2]) dist(u, v)

6.08276253029822\displaystyle 6.08276253029822

Now Let's test if vector (2.5,2.5)T(2.5, 2.5)^T is perpendicular to (2,2)T(2, -2)^T and (2,2)T(-2, 2)^T.

a = np.array([[2.5, 2.5], [-2, 2]]) b = np.array([[2.5, 2.5], [2, -2]]) distance1 = sp.spatial.distance.pdist(a, 'euclidean') distance2 = sp.spatial.distance.pdist(b, 'euclidean')
print(distance1, distance2)
[4.52769257] [4.52769257]

They are the same length, which means wv\mathbf{w}\perp \mathbf{v} and wv\mathbf{w}\perp \mathbf{-v}.

Orthogonal Complements

In general, the set of all vectors z\mathbf{z} that are orthogonal to subspace WW is called orthogonal complement, or denoted as WW^\perp.

The most common example would be

(RowA)=NulA and (ColA)=NulAT(\operatorname{Row} A)^{\perp}=\operatorname{Nul} A \quad \text { and } \quad(\operatorname{Col} A)^{\perp}=\operatorname{Nul} A^{T}

The nullspace of AA is perpendicular to the row space of AA; the nullspace of ATA^T is perpendicular to the column space of AA.

Angles in Rn\mathbb{R}^n

Here is the formula of calculating angles in vector space, to derive it we need the law of cosine:

uv2=u2+v22uvcosϑ\|\mathbf{u}-\mathbf{v}\|^{2}=\|\mathbf{u}\|^{2}+\|\mathbf{v}\|^{2}-2\|\mathbf{u}\|\|\mathbf{v}\| \cos \vartheta

Rearrange, we get

uvcosϑ=12[u2+v2uv2]=12[u12+u22+v12+v22(u1v1)2(u2v2)2]=u1v1+u2v2=uv\begin{aligned} \|\mathbf{u}\|\|\mathbf{v}\| \cos \vartheta &=\frac{1}{2}\left[\|\mathbf{u}\|^{2}+\|\mathbf{v}\|^{2}-\|\mathbf{u}-\mathbf{v}\|^{2}\right] \\ &=\frac{1}{2}\left[u_{1}^{2}+u_{2}^{2}+v_{1}^{2}+v_{2}^{2}-\left(u_{1}-v_{1}\right)^{2}-\left(u_{2}-v_{2}\right)^{2}\right] \\ &=u_{1} v_{1}+u_{2} v_{2} \\ &=\mathbf{u} \cdot \mathbf{v} \end{aligned}

In statistics, cosϑ\cos{\vartheta} is called correlation coefficient.

cosϑ=uvuv\cos{\vartheta}=\frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\|\|\mathbf{v}\|}

Geometric Interpretation of Dot Product

You may have noticed that the terms dot product and inner product are sometimes used interchangeably. However, they have distinct meanings.

Functions and polynomials can also have an inner product, but we typically use the term dot product to refer to the inner product within vector spaces.

The dot product has an interesting geometric interpretation. Consider two vectors, a\mathbf{a} and u\mathbf{u}, pointing in different directions, with an angle ϑ\vartheta between them. Suppose u\mathbf{u} is a unit vector. We want to determine how much a\mathbf{a} is aligned with the direction of u\mathbf{u}.

This value can be calculated by projecting a\mathbf{a} onto u\mathbf{u}:

acosϑ=aucosϑ=au\|\mathbf{a}\|\cos{\vartheta} = \|\mathbf{a}\| \|\mathbf{u}\|\cos{\vartheta} = \mathbf{a} \cdot \mathbf{u}

Any vector b\mathbf{b} can be normalized to a unit vector u\mathbf{u}. By performing the calculation above, we can determine how much b\mathbf{b} is aligned with the direction of u\mathbf{u}.

Orthogonal Sets

If a set of vectors S={u1,,up}S = \left\{\mathbf{u}_{1}, \ldots, \mathbf{u}_{p}\right\} in Rn\mathbb{R}^{n} has any arbitrary pair to be orthogonal, i.e. uiuj=0\mathbf{u}_{i} \cdot \mathbf{u}_{j}=0 whenever iji \neq j, S={u1,,up}S =\left\{\mathbf{u}_{1}, \ldots, \mathbf{u}_{p}\right\} is called an orthogonal set.

Naturally, orthogonal set SS is linearly independent, they are also an orthogonal basis for space spanned by {u1,,up}\left\{\mathbf{u}_{1}, \ldots, \mathbf{u}_{p}\right\}. Orthogonal basis has an advantage is that coordinates of the basis can be quickly computed.

For instance any y\mathbf{y} in WW,

y=c1u1++cpup\mathbf{y}=c_{1} \mathbf{u}_{1}+\cdots+c_{p} \mathbf{u}_{p}

Because it is an orthogonal sets,

yu1=(c1u1+c2u2++cpup)u1=c1(u1u1)\mathbf{y} \cdot \mathbf{u}_{1}=\left(c_{1} \mathbf{u}_{1}+c_{2} \mathbf{u}_{2}+\cdots+c_{p} \mathbf{u}_{p}\right) \cdot \mathbf{u}_{1}=c_{1}\left(\mathbf{u}_{1} \cdot \mathbf{u}_{1}\right)

Thus

cj=yujujuj(j=1,,p)c_{j}=\frac{\mathbf{y} \cdot \mathbf{u}_{j}}{\mathbf{u}_{j} \cdot \mathbf{u}_{j}} \quad(j=1, \ldots, p)

Orthogonal Projection

For any y\mathbf{y} in Rn\mathbb{R}^n, we want to decompose it as

y=αu+z=y^+z\mathbf{y} = \alpha \mathbf{u}+\mathbf{z} = \hat{\mathbf{y}}+\mathbf{z}

where u \mathbf{u} is perpendicular to z\mathbf{z}, α\alpha is a scalar.And the subspace LL spanned by u\mathbf{u}, the projection of y\mathbf{y} onto LL is denoted as

y^=projLy=yuuuu\hat{\mathbf{y}}=\operatorname{proj}_{L} \mathbf{y}=\frac{\mathbf{y} \cdot \mathbf{u}}{\mathbf{u} \cdot \mathbf{u}} \mathbf{u}

Becasue uz\mathbf{u}\perp \mathbf{z}, then uz=0\mathbf{u}\cdot \mathbf{z} = 0, replace z\mathbf{z} by yαu\mathbf{y}- \alpha \mathbf{u}:

(yαu)u=0α=yuuu(\mathbf{y}- \alpha \mathbf{u})\cdot \mathbf{u}= 0\\ \alpha = \frac{\mathbf{y}\cdot \mathbf{u}}{\mathbf{u}\cdot \mathbf{u}}

Now we get the formula for projection onto LL spanned by u\mathbf{u}.

A Visual Example in RT\mathbb{R}^T

Suppose we have y=(2,5)T\mathbf{y} = (2, 5)^T, u=(3,1)T\mathbf{u} = (3, 1)^T. Plot y^\hat{\mathbf{y}} onto the subspace spanned LL by u\mathbf{u}.

fig, ax = plt.subplots(figsize=(12, 12)) vects = np.array([[4, 5], [2, 1]]) colr = ['red', 'blue'] cordt = ['$(4, 5)$', '$(2, 1)$'] vec_name = [r'$\mathbf{y}$', r'$\mathbf{u}$'] for i in range(2): ax.arrow(0, 0, vects[i][0], vects[i][1], color=colr[i], width=0.03, length_includes_head=True, head_width=0.1, # default: 3*width head_length=0.2, overhang=0.4) ax.text(x=vects[i][0], y=vects[i][1], s=cordt[i], size=15, color=colr[i], verticalalignment='bottom') ax.text(x=vects[i][0]/2, y=vects[i][1]/2, s=vec_name[i], size=22, color=colr[i], verticalalignment='bottom') ################################### Subspace L ############################ x = np.linspace(0, 8.1, 100) y = 1/2 * x ax.plot(x, y, lw=3, color='red', alpha=0.5) ax.text(x=6.5, y=3, s=r'$L = \operatorname{Span}(\mathbf{u})$', size=19, color='red') ax.axis([0, 8, 0, 8]) ax.grid() plt.show()
Image in a Jupyter notebook

Let's use formula to compute α\alpha and y^\hat{\mathbf{y}}.

y = np.array([4, 5]) u = np.array([2, 1]) alpha = (y@u)/(u@u);alpha

2.6\displaystyle 2.6

yhat = alpha*u;yhat
array([5.2, 2.6])

With results above, we can plot the orthogonal projection.

fig, ax = plt.subplots(figsize=(12, 12)) vects = np.array([[4, 5], [2, 1], [5.2, 2.6]]) colr = ['red', 'blue', 'green'] cordt = ['$(4, 5)$', '$(2, 1)$', r'$(5.2, 2.6)$'] vec_name = [r'$\mathbf{y}$', r'$\mathbf{u}$', r'$\hat{\mathbf{y}} = \alpha\mathbf{u}$'] for i in range(3): ax.arrow(0, 0, vects[i][0], vects[i][1], color=colr[i], width=0.03, length_includes_head=True, head_width=0.1, # default: 3*width head_length=0.2, overhang=0.4, zorder=-i) ax.text(x=vects[i][0], y=vects[i][1], s=cordt[i], size=19) ax.text(x=vects[i][0]/2, y=vects[i][1]/2, s=vec_name[i], size=22) ##################################### Components of y orthogonal to u ########################## point1 = [4, 5] point2 = [5.2, 2.6] line1 = np.array([point1, point2]) ax.plot(line1[:, 0], line1[:, 1], c='k', lw=3.5, alpha=0.5, ls='--') ax.text(4.7, 3.8, r'$\mathbf{z}$', size=22) ################################### Subspace L ############################ x = np.linspace(0, 8.1, 100) y = 0.5 * x ax.plot(x, y, lw=3, color='red', alpha=0.5, zorder=-3) ax.text(x=6.5, y=3, s=r'$L = \operatorname{Span}(\mathbf{u})$', size=19) ax.axis([0, 8, 0, 8]) ax.grid() #################################### Formula ################################ ax.text(x=1, y=7, s=r'$(\mathbf{y} - \alpha \mathbf{u}) \cdot \mathbf{u} = 0$', size=22, color='b') ax.text(x=1, y=6.5, s=r'$\alpha = \frac{\mathbf{y} \cdot \mathbf{u}}{\mathbf{u} \cdot \mathbf{u}}$', size=22, color='b') ax.text(x=1, y=6, s=r'$\operatorname{proj}_{L} \mathbf{y} = \frac{\mathbf{y} \cdot \mathbf{u}}{\mathbf{u} \cdot \mathbf{u}} \mathbf{u}$', size=22, color='b') plt.show()
Image in a Jupyter notebook

The Orthogonal Decomposition Theorem

To generalize the orthogonal projection in the higher dimension Rn\mathbb{R}^n, we summarize the idea into the orthogonal decomposition theorem.

Let WW be a subspace of Rn\mathbb{R}^{n}. Then each y\mathbf{y} in Rn\mathbb{R}^{n} can be written uniquely in the form y=y^+z \mathbf{y}=\hat{\mathbf{y}}+\mathbf{z} where y^\hat{\mathbf{y}} is in WW and z\mathbf{z} is in W.W^{\perp} . In fact, if {u1,,up}\left\{\mathbf{u}_{1}, \ldots, \mathbf{u}_{p}\right\} is any orthogonal basis of W,W, then y^=yu1u1u1u1++yupupupup \hat{\mathbf{y}}=\frac{\mathbf{y} \cdot \mathbf{u}_{1}}{\mathbf{u}_{1} \cdot \mathbf{u}_{1}} \mathbf{u}_{1}+\cdots+\frac{\mathbf{y} \cdot \mathbf{u}_{p}}{\mathbf{u}_{p} \cdot \mathbf{u}_{p}} \mathbf{u}_{p} and z=yy^\mathbf{z}=\mathbf{y}-\hat{\mathbf{y}}.

In R2\mathbb{R}^{2}, we project y\mathbf{y} onto subspace LL which is spanned by u\mathbf{u}, here we generalize the formula for Rn\mathbb{R}^{n}, that y\mathbf{y} is projected onto WW which is spanned by {u1,,up}\left\{\mathbf{u}_{1}, \ldots, \mathbf{u}_{p}\right\}.

A Visual Example in R3\mathbb{R}^{3}

A subspace W=Span{u1,u2}W=\operatorname{Span}\left\{\mathbf{u}_{1}, \mathbf{u}_{2}\right\}, and a vector y\mathbf{y} is not in WW, decompose y\mathbf{y} into y^+z\hat{\mathbf{y}} + \mathbf{z}, and plot them.

where

u1=[251],u2=[211], and y=[123]\mathbf{u}_{1}=\left[\begin{array}{r} 2 \\ 5 \\ -1 \end{array}\right], \mathbf{u}_{2}=\left[\begin{array}{r} -2 \\ 1 \\ 1 \end{array}\right], \text { and } \mathbf{y}=\left[\begin{array}{l} 1 \\ 2 \\ 3 \end{array}\right]

The projection onto WW in R3\mathbb{R}^3 is

y^=yu1u1u1u1+yu2u2u2u2=y^1+y^2\hat{\mathbf{y}}=\frac{\mathbf{y} \cdot \mathbf{u}_{1}}{\mathbf{u}_{1} \cdot \mathbf{u}_{1}} \mathbf{u}_{1}+\frac{\mathbf{y} \cdot \mathbf{u}_{2}}{\mathbf{u}_{2} \cdot \mathbf{u}_{2}} \mathbf{u}_{2}=\hat{\mathbf{y}}_{1}+\hat{\mathbf{y}}_{2}

The codes for plotting are quite redundent, however exceedingly intuitive.

######################## Subspace W ############################## s = np.linspace(-0.5, 0.5, 10) t = np.linspace(-0.5, 0.5, 10) S, T = np.meshgrid(s, t) X1 = 2 * S - 2 * T X2 = 5 * S + T X3 = -S + T fig = plt.figure(figsize=(7, 7)) ax = fig.add_subplot(projection='3d') ax.plot_wireframe(X1, X2, X3, linewidth=1.5, alpha=0.3) ########################### Vector y ############################### y = np.array([1, 2, 3]) u1, u2 = np.array([2, 5, -1]), np.array([-2, 1, 1]) vec = np.array([[0, 0, 0, y[0], y[1], y[2]]]) X, Y, Z, U, V, W = zip(*vec) ax.quiver(X, Y, Z, U, V, W, length=1, normalize=False, color='red', alpha=0.6, arrow_length_ratio=0.08, pivot='tail', linestyles='solid', linewidths=3) ax.text(y[0], y[1], y[2], r'$\mathbf{y}$', size=15) ########################### Vector u1 and u2 ############################### vec = np.array([[0, 0, 0, u1[0], u1[1], u1[2]]]) X, Y, Z, U, V, W = zip(*vec) ax.quiver(X, Y, Z, U, V, W, length=1, normalize=False, color='red', alpha=0.6, arrow_length_ratio=0.08, pivot='tail', linestyles='solid', linewidths=3) vec = np.array([[0, 0, 0, u2[0], u2[1], u2[2]]]) X, Y, Z, U, V, W = zip(*vec) ax.quiver(X, Y, Z, U, V, W, length=1, normalize=False, color='red', alpha=0.6, arrow_length_ratio=0.08, pivot='tail', linestyles='solid', linewidths=3) ax.text(u1[0], u1[1], u1[2], r'$\mathbf{u}_1$', size=15) ax.text(u2[0], u2[1], u2[2], r'$\mathbf{u}_2$', size=15) ########################### yhat ############################### alpha1 = (y @ u1) / (u1 @ u1) alpha2 = (y @ u2) / (u2 @ u2) yhat1 = alpha1 * u1 yhat2 = alpha2 * u2 yhat = yhat1 + yhat2 vec = np.array([[0, 0, 0, yhat1[0], yhat1[1], yhat1[2]]]) X, Y, Z, U, V, W = zip(*vec) ax.quiver(X, Y, Z, U, V, W, length=1, normalize=False, color='blue', alpha=0.6, arrow_length_ratio=0.08, pivot='tail', linestyles='solid', linewidths=3, zorder=3) vec = np.array([[0, 0, 0, yhat2[0], yhat2[1], yhat2[2]]]) X, Y, Z, U, V, W = zip(*vec) ax.quiver(X, Y, Z, U, V, W, length=1, normalize=False, color='blue', alpha=0.6, arrow_length_ratio=0.08, pivot='tail', linestyles='solid', linewidths=3, zorder=3) vec = np.array([[0, 0, 0, yhat[0], yhat[1], yhat[2]]]) X, Y, Z, U, V, W = zip(*vec) ax.quiver(X, Y, Z, U, V, W, length=1, normalize=False, color='pink', alpha=1, arrow_length_ratio=0.12, pivot='tail', linestyles='solid', linewidths=3, zorder=3) ax.text(yhat1[0], yhat1[1], yhat1[2], r'$\hat{\mathbf{y}}_1$', size=15) ax.text(yhat2[0], yhat2[1], yhat2[2], r'$\hat{\mathbf{y}}_2$', size=15) ax.text(x=yhat[0], y=yhat[1], z=yhat[2], s=r'$\hat{\mathbf{y}}=\frac{\mathbf{y} \cdot \mathbf{u}_{1}}{\mathbf{u}_{1} \cdot \mathbf{u}_{1}} \mathbf{u}_{1}+\frac{\mathbf{y} \cdot \mathbf{u}_{2}}{\mathbf{u}_{2} \cdot \mathbf{u}_{2}} \mathbf{u}_{2}=\hat{\mathbf{y}}_{1}+\hat{\mathbf{y}}_{2}$', size=15) ########################### z ############################### z = y - yhat vec = np.array([[yhat[0], yhat[1], yhat[2], z[0], z[1], z[2]]]) X, Y, Z, U, V, W = zip(*vec) ax.quiver(X, Y, Z, U, V, W, length=1, normalize=False, color='green', alpha=0.6, arrow_length_ratio=0.08, pivot='tail', linestyles='solid', linewidths=3) ############################ Dashed Line #################### line1 = np.array([y, yhat1]) ax.plot(line1[:, 0], line1[:, 1], line1[:, 2], c='b', lw=3.5, alpha=0.5, ls='--') line2 = np.array([y, yhat2]) ax.plot(line2[:, 0], line2[:, 1], line2[:, 2], c='b', lw=2.5, alpha=0.5, ls='--') line3 = np.array([yhat, yhat2]) ax.plot(line3[:, 0], line3[:, 1], line3[:, 2], c='b', lw=2.5, alpha=0.5, ls='--') line4 = np.array([yhat, yhat1]) ax.plot(line4[:, 0], line4[:, 1], line4[:, 2], c='b', lw=2.5, alpha=0.5, ls='--') ############################# View Angle ax.view_init(elev=-6, azim=12) plt.show()
Image in a Jupyter notebook

Orthonormal Sets

An orthonormal set is obtained from normalizing the orthogonal set, and also called orthonomal basis.

Matrices whose columns form an orthonormal set are important for matrix computation.

Let U=[u1u2u3]U=\left[\begin{array}{lll} \mathbf{u}_{1} & \mathbf{u}_{2} & \mathbf{u}_{3} \end{array}\right], where ui\mathbf{u}_{i}'s are from an orthonormal set, the outer product, UTUU^TU is

UTU=[u1Tu2Tu3T][u1u2u3]=[u1Tu1u1Tu2u1Tu3u2Tu1u2Tu2u2Tu3u3Tu1u3Tu2u3Tu3]U^{T} U=\left[\begin{array}{l} \mathbf{u}_{1}^{T} \\ \mathbf{u}_{2}^{T} \\ \mathbf{u}_{3}^{T} \end{array}\right]\left[\begin{array}{lll} \mathbf{u}_{1} & \mathbf{u}_{2} & \mathbf{u}_{3} \end{array}\right]=\left[\begin{array}{lll} \mathbf{u}_{1}^{T} \mathbf{u}_{1} & \mathbf{u}_{1}^{T} \mathbf{u}_{2} & \mathbf{u}_{1}^{T} \mathbf{u}_{3} \\ \mathbf{u}_{2}^{T} \mathbf{u}_{1} & \mathbf{u}_{2}^{T} \mathbf{u}_{2} & \mathbf{u}_{2}^{T} \mathbf{u}_{3} \\ \mathbf{u}_{3}^{T} \mathbf{u}_{1} & \mathbf{u}_{3}^{T} \mathbf{u}_{2} & \mathbf{u}_{3}^{T} \mathbf{u}_{3} \end{array}\right]

And uiTuj,i,j(1,2,3)\mathbf{u}_{i}^{T} \mathbf{u}_{j}, i, j \in (1, 2, 3) is an inner product.

Because of orthonormal sets, we have

u1Tu2=u2Tu1=0,u1Tu3=u3Tu1=0,u2Tu3=u3Tu2=0u1Tu1=1,u2Tu2=1,u3Tu3=1\mathbf{u}_{1}^{T} \mathbf{u}_{2}=\mathbf{u}_{2}^{T} \mathbf{u}_{1}=0, \quad \mathbf{u}_{1}^{T} \mathbf{u}_{3}=\mathbf{u}_{3}^{T} \mathbf{u}_{1}=0, \quad \mathbf{u}_{2}^{T} \mathbf{u}_{3}=\mathbf{u}_{3}^{T} \mathbf{u}_{2}=0\\ \mathbf{u}_{1}^{T} \mathbf{u}_{1}=1, \quad \mathbf{u}_{2}^{T} \mathbf{u}_{2}=1, \quad \mathbf{u}_{3}^{T} \mathbf{u}_{3}=1

which means

UTU=IU^TU = I

Recall that we have a general projection formula

$$\hat{\mathbf{y}}=\frac{\mathbf{y} \cdot \mathbf{u}_{1}}{\mathbf{u}_{1} \cdot \mathbf{u}_{1}} \mathbf{u}_{1}+\cdots+\frac{\mathbf{y} \cdot \mathbf{u}_{p}}{\mathbf{u}_{p} \cdot \mathbf{u}_{p}} \mathbf{u}_{p} =\frac{\mathbf{y} \cdot \mathbf{u}_{1}}{{\mathbf{u}_{1}^T} \mathbf{u}_{1}} \mathbf{u}_{1}+\cdots+\frac{\mathbf{y} \cdot \mathbf{u}_{p}}{{\mathbf{u}_{p}^T} \mathbf{u}_{p}} \mathbf{u}_{p}\\$$

If the {u1,...,up}\{\mathbf{u}_1,...,\mathbf{u}_p\} is an orthonormal set, then

y^=(yu1)u1+(yu2)u2++(yup)up=(u1Ty)u1+(u2Ty)u2++(upTy)up\begin{align} \hat{\mathbf{y}}&=\left(\mathbf{y} \cdot \mathbf{u}_{1}\right) \mathbf{u}_{1}+\left(\mathbf{y} \cdot \mathbf{u}_{2}\right) \mathbf{u}_{2}+\cdots+\left(\mathbf{y} \cdot \mathbf{u}_{p}\right) \mathbf{u}_{p}\\ & = \left(\mathbf{u}_{1}^T\mathbf{y}\right) \mathbf{u}_{1}+\left(\mathbf{u}_{2}^T\mathbf{y}\right) \mathbf{u}_{2}+\cdots+\left(\mathbf{u}_{p}^T\mathbf{y}\right) \mathbf{u}_{p}\\ \end{align}

Cross Product

This is the formula of Cross product a×b=absin(θ)n \mathbf{a} \times \mathbf{b}=\|\mathbf{a}\|\|\mathbf{b}\| \sin (\theta) \mathbf{n} The output of cross product is the length of vector c\mathbf{c} which is perpendicular to both a\mathbf{a} and b\mathbf{b} .

# Define the two vectors a = np.array([5, 2, 3]) b = np.array([4, 8, 10]) # Calculate the cross product c = np.cross(a, b) # Create the 3D plot fig = go.Figure(data=[ go.Scatter3d(x=[0, a[0]], y=[0, a[1]], z=[0, a[2]], mode='lines', name='Vector a', line=dict(color='red', width=5)), go.Scatter3d(x=[0, b[0]], y=[0, b[1]], z=[0, b[2]], mode='lines', name='Vector b', line=dict(color='green', width=5)), go.Scatter3d(x=[0, c[0]], y=[0, c[1]], z=[0, c[2]], mode='lines', name='Vector c (cross product)', line=dict(color='blue', width=5)) ]) # Set the axis labels fig.update_layout(scene=dict(xaxis_title='X', yaxis_title='Y', zaxis_title='Z')) # Show the plot fig.show()
MIME type unknown not supported