Markov Chains

Discrete Stochastic Processes

A discrete stochastic process is a discrete system in which transitions occur randomly according to some probability distribution.

Memoryless: The process is memoryless if the probability of a transition does not depend on the history of the process:

P [X_{t + 1} = j ∣ X_{t} = s_{t}, \dots, X_{0} = s_{0}] = P [X_{t + 1} = j ∣ X_{t} = s_{t}] .

Homogeneous: The process is homogeneous if the transition probability $p_{i j} = P [X_{t + 1} = j ∣ X_{t} = i]$ does not depend on the time step

Finite Markov Chains

A finite Markov chain is a memoryless homogeneous stochastic process with a finite number of states $S = {1 \dots n}$ . It is characterized by a transition matrix $P = {[p_{i j}]}_{i, j \in S}$ and an initial distribution $q_{0}$ , where ${(q_{0})}_{i} = P [X_{0} = i]$ .

State Probabilities

The probability distribution of states at time step $t$ can be represented as a vector $q_{t}$ . By marginalizing out the distribution of the previous time step:

\begin{aligned} (q_{t + 1})_{j} = P [X_{t + 1} = j] & = \sum_{i} P [X_{t + 1} = j ∣ X_{t} = i] P [X_{t} = i] \\ = \sum_{i} p_{i j} (q_{t})_{i} = P_{:, j}^{T} q_{t} \end{aligned}

Therefore,

q_{t + 1} = P^{T} q_{t} = (P^{T})^{2} q_{t - 1} = \dots = (P^{T})^{t} q_{0},

which indicates that the distribution at each time step is well-defined as long as the initial distribution $q_{0}$ is specified.

We denote the probability $P [X_{t} = j ∣ X_{0} = i]$ as $p_{i j}^{(t)}$ for simplicity.

Stationary Distribution

A stationary distribution is a state vector $π$ , such that

π = P^{T} π .

In other words, $π$ is an eigenvector of $P^{T}$ to eigenvalue $1$ .

Irreducible Markov Chains

A Markov chain is irreducible if for all $i, j \in S$ , $\exists n$ , s.t. $p_{i j}^{(n)} > 0$ , i.e. State $j$ is accessible from State $i$ . A irreducible Markov chain has a strongly connected underlying transition graph.

An irreducible Markov chain has a unique stationary state $π$ .

π_{j} = h_{j j}^{- 1},

where $h_{i j}$ , the hitting time, is the average number of steps needed to go from $i$ to $j$ :

h_{i j} := E [min {h \geq 1 ∣ X_{0} = i, X_{h} = j}] .

In the following state diagram, State 0 and State 2 are not accessible to each other. Therefore, the Markov chain is not irreducible, and any $[a, 0, 1 - a]^{T}$ can be a stationary state.

stateDiagram-v2
	direction LR
    state "0" as zero
    state "1" as one
    state "2" as two

    zero --> zero : 1

    one --> zero: q
    one --> two : p

    two --> two: 1

Aperiodic Markov Chains

The periodicity of a state $j \in S$ is $gcd {n \geq 0 ∣ p_{j j}^{(n)} > 0}$ .

A Markov chain is aperiodic if all states have periodicity $1$ . i.e., if for all $i \in S$ ,

gcd {length of the walks from i to i} = 1.

theorem

For every irreducible, aperiodic (ergodic) Markov chain, independently of $q_{0}$ ,
$lim_{t \to \infty} q_{t} = π .$

Time-reversible Markov Chains

Consider an ergodic Markov chain ${X_{0}, X_{1}, \dots, X_{t}, \dots}$ . Given a (large) time step $n$ , we trace the states going back in time. It turns out that ${Y_{k} = X_{n - k} ∣ k = 0, 1, \dots}$ is a Markov chain with transition probabilities:

p_{j i}^{R} = P [Y_{k + 1} = j ∣ Y_{k} = i] = P [X_{n - k - 1} = j ∣ X_{n - k} = i] = \frac{π_{j} p_{i j}}{π_{i}} .

proof

Using Bayes' theorem,
$\begin{aligned} p_{j i}^{R} & = P [X_{n - k - 1} = j ∣ X_{n - k} = i] \\ = \frac{P [X_{n - k} = i ∣ X_{n - k - 1} = j] P [X_{n - k - 1} = j]}{P [X_{n - k} = i]} \\ = \frac{π_{j} p_{i j}}{π_{i}} . \end{aligned}$

Continuous-time Markov Chains

Rather than a transition matrix, we have a transition probability density $p (x^{'} ∣ x)$ , where $x^{'}, x \in R^{d}$ .

A stationary distribution $π (x)$ satisfies

π (x^{'}) = \int_{R^{d}} p (x^{'} ∣ x) π (x) d x .

The reverse transition probabilities is defined as

p (x ∣ x^{'}) = \frac{π (x) p (x^{'} ∣ x)}{π (x^{'})} .