Diffusion Models
Original paper: Denoising Diffusion Probabilistic Models
Survey: Understanding Diffusion Models A Unified Perspective
Intuition
A diffusion model is trained through a diffusion process that progressively adds noise to the original data. The model then learns how to reconstruct the original data from this noisy input.
graph LR; A[Original Data] -->|Forward Diffusion| B[Noisy Data]; B -->|Reverse Denoising| A
Once the model has sufficiently learned to reconstruct the data distribution from a typically Gaussian noise distribution, it gains the capability to generate new, novel data.
Forward Process
At each time step
where
Reparameterization
We can generate
Furthermore, we can generate
where
Define
The second term is the sum of i.i.d. zero-mean Gaussians, and is thus also a zero-mean Gaussian, with variance
Therefore, the forward diffusion process from time step
where
Asymptotics
Since
Therefore, regardless of
Reverse Process
The posterior distribution,
where
Therefore, we learn a network parameterized by
We approximate the log-likelihood using the ELBO:
The Term
Since
The Term
The posterior conditioned on
where
Using Bayes' theorem,
The coefficient of
Therefore,
The coefficient of
Plugging
For simplicity,
The Term
To obtain discrete log-likelihood, this term is set to an independent discrete decoder derived from Gaussian
where
Training
The formulation of
Combining with the reparameterization step, we have the training objective:
We thus have the training algorithm:
\begin{algorithm}
\caption{Training}
\begin{algorithmic}
\Repeat
\State $\mathbf{x}_0 \sim q(\mathbf{x}_0)$
\State $t \sim \text{Uniform}(\{1, \ldots, T\})$
\State $\boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$
\State Take gradient descent step on $\nabla_\theta \| \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_\theta (\sqrt{\bar\alpha_t} \mathbf{x}_0 + \sqrt{1-\bar\alpha_t} \boldsymbol{\epsilon}, t) \|^2$
\Until{converged}
\end{algorithmic}
\end{algorithm}
Sampling
The sampling algorithm also uses the reparameterization trick to sample
where
\begin{algorithm}
\caption{Sampling}
\begin{algorithmic}
\State $\mathbf{x}_T \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$
\For{$t = T, \cdots, 1$}
\State $\mathbf{z} \sim \begin{cases}
\mathcal{N}(\mathbf{0}, \mathbf{I}) & \text{if } t > 1 \\
\mathbf{0} & \text{otherwise}
\end{cases}$
\State $\mathbf{x}_{t-1} = \frac{1}{\sqrt{\alpha_t}} \left(\mathbf{x}_t - \frac{1-\alpha_t}{\sqrt{1-\bar\alpha_t}} \boldsymbol\epsilon_\theta(\mathbf{x}_t, t)\right) + \sigma_t \mathbf{z}$
\EndFor
\State \Return $\mathbf{x}_0$
\end{algorithmic}
\end{algorithm}
Network Architecture
The network for estimating
Extensions and Applications
Conditional Generation
- Additional conditioning information
is applied to control generation. - The network now samples
from . - The condition
is taken as input in the similar fashion to . - ControlNet
Guidance Methods
- Gradient guidance added to shift the predicted mean.
- For more information, see Guidance Methods.
Latent Diffusion Models
- Train a VAE to map the input data to latent space.
- Diffusion and denoising are performed within the latent space.
- Lower computational cost.