Topic Models

Problem Settings

Probabilistic Latent Semantic Analysis (pLSA)

The log-likelihood of the occurrence matrix with respect to the unknown parameters is

logp(N;θ)=i,jNijlogp(wjdi)=i,jNijlogz=1kp(wjz)p(zdi)=i=1nt=1silogz=1kp(xitz)p(zdi).

EM for Topic Models

We can apply the standard EM algorithm to solve for p(wjz) and p(zdi). Let qitz be the per-word posterior probability estimated with the old parameters:

qitz:=p(zxit,di)=p(xitz)p(zdi)ζ=1kp(xitζ)p(ζdi)=j=1mI[xit=wj]p(wjz)p(zdi)ζ=1kj=1mI[xit=wj]p(wjζ)p(ζdi).

We can easily get the ELBO:

logp(N;θ)i=1Nt=1siz=1kqitz[logp(xitz)+logp(zdi)logqitz]=i=1Nt=1siz=1kqitz[logj=1mI[xit=wj]p(wjz)+logp(zdi)logqitz]i=1Nt=1siz=1jqitz[j=1mI[xit=wj]logp(wjz)+logp(zdi)logqitz].

By maximizing the ELBO with constraints

z=1kp(zdi)=1,j=1mp(wjz)=1,

we get

p(zdi)=1sit=1siqitz,p(wjz)=itqitzI[xit=wj]itqitz.

Latent Dirichlet Allocation (LDA)

The parameters can be written as

ujz:=p(wjz),vzi:=p(zdi).

We can further define

U=[ujz][0,1]m×k,V=[vzi][0,1]k×n.

For a new, unseen document, we don't know the exact vz. We can therefore define a distribution over v parameterized by α:

p(v;α).

We wish the prior distribution p(v) and the posterior p(vwj) belongs to the same distribution family. By conjugate prior property, we can define the prior distribution with Dirichlet distribution:

p(v;α)z=1kvzαz1,

where αz>0 is typically set to the same α.

Latent Dirichlet allocation augments topic models with a Dirichlet prior. For a fixed-length document with length s, its likelihood with parameter U is

p(x1,,xs;U)=vt=1sp(xt;U,v)p(v;α)dv;

where

p(xt;U,v)=j=1mI[xt=wj]z=1kujzvz.

Probabilistic Matrix Decomposition

The log-likelihood in #Probabilistic Latent Semantic Analysis (pLSA) can be written as

logp(N;U,V)=i,jNijlogN^ij,N^=UV.

Therefore, we can see the topic model as a non-negative matrix decomposition with a principled log-likelihood objective, which satisfies the following constraints:

N^ij0,juiz=1,zvzi=1.