Singular Value Decomposition

SVD Theorem

For each matrix ARn×m, there exists a padded diagonal matrix ΣRn×m and orthogonal matrices URn×n and VRm×m such that A can be expressed as

A=UΣVT,

where

Σ=diag(σ1,,σmin{n,m}).

One can often prune columns of U or V corresponding to zero elements in Σ. For example, if n>m, then we can have
A=UΣVT=U:,1:mΣ1:mVT,
where U:,1:m is the first m columns of U, and Σ1:m is the first m rows (non-zero rows) of Σ.

SVD and Eigendecomposition

SVD of A is intimately related to the eigendecomposition of the product of A with its transpose:

AAT=UΣVTVΣTUT=Udiag(σ12,,σn2)UT,ATA=VΣTUTUΣVT=Vdiag(σ12,,σm2)VT,

where σr=0 if r>min{n,m}.

One can also note that in PCA, we perform eigendecomposition on the covariance matrix XXT. Therefore, we can also apply SVD to the data matrix X to identify the principle eigenvectors.

SVD and Frobenius Norm

The squared Frobenius norm of a matrix ARn×m is the sum of its squared singular values:

AF2=i=1min{n,m}σi2.
proof

Using the properties of the trace
AF2=tr(ATA)=tr(Vdiag(σ12,,σm2)VT)=tr(diag(σ12,...,σm2)VTVI)=tr(diag(σ12,,σm2))=i=1min{n,m}σi2.

SVD and Spectral Norm

The spectral norm of a matrix A equals the largest singular value σ1:

A2sup{Axx=1}=σ1.
proof

sup{Axx=1}=sup{xTATAxx=1}=sup{xTVΣTΣVxx=1}=sup{ΣVxx=1}=sup{Σzz=1}=Σ2=σ1.

SVD for Low-Rank Approximation

eckart-young theorem

Given ARn×m with SVD A=UΣVT. Then for all 1kmin{n,m}:
AkargminB{ABFrank(B)<k}=Udiag(σ1,,σk)ΣkVT,
where Σk is rectangular and padded accroding to the dimensions of U and V.

The theorem also gives the following corollary:

corollary

The squared error of low rank approximations can be expressed as
AAkF2=i=k+1rank(A)σi2.

In addition, Ak also minimizes the loss defined by spectral norm:

Ak=argminB{AB2rank(B)k}.

SVD can be seen as an additive superposition of k rank 1 matrices:

Ai=1kσiuiviT.

SVD for Matrix Completion

For fully observed matrices, SVD is computable with O(min{nm2,mn2}) complexity, which is an important example of a solvable non-convex problem. However, in the case of incomplete observations, SVD is in general not application directly to compute low-rank approximations. Low-rank matrix approximation with a weighted Frobenius norm is defined as follows:

A^k=argminBi,jwij(aijbij)2,s.t. rank(B)=k,

where the weights wij=ωij{0,1} in the collaborative filtering settings, or they can be some positive values. In both case, the problem has been shown to be NP-hard, even for k=1.