Collaborative Filtering

Motivation

Given a sparse matrix A with partially observed entries, without further specification, its reconstruction is an ill-posed problem. However, if the rows and columns of the matrix are meaningful, then we expect (at least) the entries within the same column or the same row not to be independent, which gives the minimal dependency.

minimal dependency assumption

A minimal assumption in matrix reconstruction is that entries within the same row or the same column are not independent
aij{akl:kilj}{ail:lj}{akj:ki}.

Recommender Systems and Collaborative Filtering

The goal of recommender systems is to recommend or pre-select relevant items in a personalized manner, based on a person's history or profile, which leads to sparse matrix completion, where the columns correspond to items on some ordinal rating scale (e.g. 1-5 star or 1-10 numerical), and the rows correspond to people or users. In recommender systems, data are usually highly sparse (e.g. only 1% of the entries are observed).

In collaborative filtering, we exploit the similarity between people's ratings to learn from the collective data.

Formalization

Preprocessing

Centering

Centering makes rows or columns more comparable and subtract out rating bias

μirow=jmωijaijmax{1,jmωij},μjcol=inωijaijmax{1,inωij},aijaijμirow,oraijaijμjcol.

Variance Normalization

It may make sense to normalize the variance to 1: per row or per column.

standardized score

Let X be a score with E[X]=μ and Var[X]=σ2, then its standardized score (or z-score) is given by
Z=Xμσ.