#Note #Machine-Learning/Matrix-Approximation

Collaborative Filtering

Motivation

Given a sparse matrix $A$ with partially observed entries, without further specification, its reconstruction is an ill-posed problem. However, if the rows and columns of the matrix are meaningful, then we expect (at least) the entries within the same column or the same row not to be independent, which gives the minimal dependency.

minimal dependency assumption

A minimal assumption in matrix reconstruction is that entries within the same row or the same column are not independent
$a_{i j} ⊥ ⊥ {a_{k l} : k \neq i \land l \neq j} ∣ {a_{i l} : l \neq j} \cup {a_{k j} : k \neq i} .$

Recommender Systems and Collaborative Filtering

The goal of recommender systems is to recommend or pre-select relevant items in a personalized manner, based on a person's history or profile, which leads to sparse matrix completion, where the columns correspond to items on some ordinal rating scale (e.g. 1-5 star or 1-10 numerical), and the rows correspond to people or users. In recommender systems, data are usually highly sparse (e.g. only 1% of the entries are observed).

In collaborative filtering, we exploit the similarity between people's ratings to learn from the collective data.

Formalization

We have a rating matrix $A \in R^{n \times m}$ , where $n$ corresponds to the number of users, and $m$ corresponds to the number of items.
We have an observation matrix $Ω \in {0, 1}^{n \times m}$ , where each $w_{i j} = 1$ indicates that $a_{i j}$ is observed.

Preprocessing

Centering

Centering makes rows or columns more comparable and subtract out rating bias

\begin{aligned} μ_{i}^{row} & = \frac{\sum_{j}^{m} ω_{i j} a_{i j}}{max {1, \sum_{j}^{m} ω_{i j}}} & , μ_{j}^{col} & = \frac{\sum_{i}^{n} ω_{i j} a_{i j}}{max {1, \sum_{i}^{n} ω_{i j}}}, \\ a_{i j} & \leftarrow a_{i j} - μ_{i}^{row}, & or a_{i j} & \leftarrow a_{i j} - μ_{j}^{col} . \end{aligned}

Variance Normalization

It may make sense to normalize the variance to 1: per row or per column.

standardized score

Let $X$ be a score with $E [X] = μ$ and $Var [X] = σ^{2}$ , then its standardized score (or $z$ -score) is given by
$Z = \frac{X - μ}{σ} .$