#Note #Computer-Graphics-and-Vision/Rendering/Neural-Rendering

Mesh-based Differentiable Rendering

Mesh Primitives

A mesh $X$ is defined by vertex locations $V \in R^{N_{v} \times 3}$ , face indices $F \in N^{N_{f} \times 3}$ , and textures (per-vertex in RGB) $T \in R^{N_{v} \times 3}$ .

Given a mesh primitive $X = {V, F, T}$ and camera parameters $C = {K, R, t}$ , we want to find a rendering function $r : X \times C \mapsto I$ , which is fully differentiable.

Rasterization

Consider the following scenario. We have a mesh which consists of only one triangle with constantly black texture:

\begin{aligned} X & = {V, F, T} \\ V & = [\begin{array}{c} v_{1} \\ v_{2} \\ v_{3} \end{array}] = [\begin{array}{c} x_{1} & y_{1} & z_{1} \\ x_{2} & y_{2} & z_{2} \\ x_{3} & y_{3} & z_{3} \end{array}] \\ F & = [\begin{array}{c} F_{1} \end{array}] = [\begin{array}{c} 0 & 1 & 2 \end{array}] \\ T & = [\begin{array}{c} 0^{T} \\ 0^{T} \\ 0^{T} \end{array}] . \end{aligned}

Given the camera parameters $C$ , we have

\begin{aligned} {\tilde{v}}_{*} & = Π_{C} (v_{*}) \\ f_{*} & = Π_{C} (F_{*}) . \end{aligned}

Let the projection of the triangle in the image UV coordinates be $f_{1}$ , then for each pixel in the UV coordinates $(u, v)$ , the corresponding value $I_{u v}$ is

I_{u v} = {\begin{cases} 0, if (u, v) \in f_{1}, \\ 1, otherwise . \end{cases}

We can see this function is discontinuous, with zero gradients everywhere. There will be no geometry updates.

Soft Rasterization

Soft rasterization (SoftRas) is a differentiable method, which aggregates the contribution from multiple faces. Given a set of faces ${f_{i}}$ , for each pixel $I_{u v}$ , we calculate the influence of a triangle on pixel:

D (f_{i}, I_{u v}) = sigmoid (δ (f_{i}, I_{u v}) \cdot \frac{d^{2} (f_{i}, I_{u v})}{σ}),

where $δ$ is a sign function indicating if the pixel is inside or outside the triangle

δ (f_{i}, I_{u v}) = {\begin{cases} + 1, if (u, v) \in f_{i} \\ - 1, otherwise \end{cases},

$d (f_{i}, I_{u v})$ is the distance from face to pixel, and $σ$ is face sharpness.

The following shows the graph of $D (f_{i}, I_{u v})$ as a function of $d (f_{i}, I_{u v})$ , with increasing $σ \in [0, 1]$ . We can see that a larger $σ$ results in a smoother function.

Next, we want to normalize $D (f_{i}, I_{u v})$ to $w (f_{k}, I_{u v})$ with the consideration of depth of face and background influence, such that

\sum_{k = 1}^{N_{f}} w (f_{k}, I_{u v}) + w_{b} = 1.

Given the depth of each face at pixel $z_{i}^{u v}$ , we have

w (f_{i}, I_{u v}) = \frac{D (f_{i}, I_{u v}) \cdot \exp (z_{i}^{u v} / γ)}{\sum_{k = 1}^{N_{f}} D (f_{k}, I_{u v}) \cdot \exp (z_{k}^{u v} / γ) + \exp (ϵ / γ)},

where $γ$ is the aggregation sharpness and $ϵ$ is the background influence.

The final color value is

I_{u v} = \sum_{i = 1}^{N_{f}} w (f_{i}, I_{u v}) \cdot C_{i}^{u v} + w_{b} \cdot C_{b}^{u v} .