#Note #Machine-Learning/Matrix-Approximation

Rank 1 Matrix Approximation

Outer Product Models

Outer product models are the simplest matrix model that couples entries in each row and each column. Given a matrix $A \in R^{n \times m}$ , we would like to approximate it with the outer product of two vectors $u \in R^{n}$ and $v \in R^{m}$ , such that the squared error is minimized:

ℓ (u, v) = \frac{1}{2} ‖ Π_{Ω} (A - u v^{T}) ‖_{F}^{2},

where

Π_{Ω} (R) = Ω ⊙ R .

We note that $u v^{T}$ is a rank 1 matrix.

Gradients

If $A$ is fully observed, the objective becomes

ℓ (u, v) = \frac{1}{2} ‖ A - u v^{T} ‖_{F}^{2} .

Taking partial derivatives with respect to $u$ and $v$ and letting $R = u v^{T} - A$ , we have

\begin{array}{r} \frac{\partial ℓ}{\partial u} = R v, \frac{\partial ℓ}{\partial v} = R^{T} u . \end{array}

derivation

Using the chain rule,
$\frac{\partial ℓ}{\partial u} = \frac{\partial R}{\partial u} \frac{\partial \frac{1}{2} ‖ R ‖_{F}^{2}}{\partial R} = \sum_{i j} \frac{\partial r_{i j}}{\partial u} \frac{\partial \frac{1}{2} ‖ R ‖_{F}^{2}}{\partial r_{i j}} .$
The derivative of the Frobenius norm is
$\frac{\partial \frac{1}{2} ‖ R ‖_{F}^{2}}{\partial r_{i j}} = r_{i j} .$
Also,
$\frac{\partial r_{i j}}{\partial u} = \frac{\partial u_{i} v_{j} - a_{i j}}{\partial u} = {vec}_{i}^{n} [v_{j}],$
where
${vec}_{i}^{n} [v_{j}] := [\underset{n dimensions}{\underset{⏟}{0, \dots, \overset{i -th dimension}{\overset{⏞}{v_{j}}}, \dots, 0}}]^{T} .$
Therefore,
$\frac{\partial ℓ}{\partial u} = \sum_{i j} r_{i j} {vec}_{i}^{n} [v_{j}] = \sum_{i} {vec}_{i}^{n} [\sum_{j} r_{i j} v_{j}] = R v .$
Similarly, we have
$\frac{\partial ℓ}{\partial v} = R^{T} u .$

Convexity

Given the gradients, the hessian matrix is simply calculated

H (ℓ (u, v)) = [\begin{matrix} v^{T} v I_{n \times n} & 2 u v^{T} - A \\ (2 u v^{T} - A)^{T} & u^{T} u I_{m \times m} \end{matrix}] .

The Hessian matrix at the origin is

H (ℓ (0, 0)) = [\begin{matrix} 0_{n \times n} & - A \\ - A^{T} & 0_{m \times m} \end{matrix}],

which is not positive semi-definite, unless $A$ is a zero matrix. Therefore the problem is non-convex for all dimensions $m, n \geq 1$ .

Solving with Lagrange Multiplier

Using the property that

‖ R ‖_{F}^{2} = tr (R^{T} R),

we can rewrite the objective, using the properties of trace (invariance under transpose and circular shift):

\begin{aligned} ℓ (u, v) & = \frac{1}{2} ‖ R ‖_{F}^{2} \\ = \frac{1}{2} tr (R^{T} R) \\ = \frac{1}{2} tr (v u^{T} u v^{T} - A^{T} u v^{T} - v u^{T} A + A^{T} A) \\ = \frac{1}{2} tr (v u^{T} u v^{T}) - tr (v u^{T} A) + const \\ = \frac{1}{2} tr (u^{T} u v^{T} v) - tr (u^{T} A v) + const \\ = \frac{1}{2} ‖ u ‖^{2} ‖ v ‖^{2} - u^{T} A v + const . \end{aligned}

We can solve for unit $u$ and $v$ , and then rescale:

\underset{u, v}{argmax} u^{T} A v, s.t. ‖ u ‖ = ‖ v ‖ = 1.

Using Lagrange multipliers,

L = u^{T} A v - λ u^{T} u - μ v^{T} v .

Taking partial derivatives,

\frac{\partial L}{\partial u} = A v - 2 λ u = 0 ⟹ u = \frac{A v}{‖ A v ‖} .

Similarly,

v = \frac{A^{T} u}{‖ A^{T} u ‖} .

Therefore,

\begin{array}{r} u \propto A A^{T} u, v \propto A^{T} A v, \end{array}

which implies $u$ should be proportional to the principal eigenvector of the matrix $A A^{T}$ , and $v$ should be proportional to the principal eigenvector of the matrix $A^{T} A$ . This can be simply done by SVD.