Mesh-based Differentiable Rendering

Mesh Primitives

A mesh X is defined by vertex locations VRNv×3, face indices FNNf×3, and textures (per-vertex in RGB) TRNv×3.

Given a mesh primitive X={V,F,T} and camera parameters C={K,R,t}, we want to find a rendering function r:X×CI, which is fully differentiable.

Rasterization

Consider the following scenario. We have a mesh which consists of only one triangle with constantly black texture:

X={V,F,T}V=[v1v2v3]=[x1y1z1x2y2z2x3y3z3]F=[F1]=[012]T=[0T0T0T].

Given the camera parameters C, we have

v~=ΠC(v)f=ΠC(F).

Let the projection of the triangle in the image UV coordinates be f1, then for each pixel in the UV coordinates (u,v), the corresponding value Iuv is

Iuv={0,if (u,v)f1,1,otherwise.

We can see this function is discontinuous, with zero gradients everywhere. There will be no geometry updates.

Soft Rasterization

Soft rasterization (SoftRas) is a differentiable method, which aggregates the contribution from multiple faces. Given a set of faces {fi}, for each pixel Iuv, we calculate the influence of a triangle on pixel:

D(fi,Iuv)=sigmoid(δ(fi,Iuv)d2(fi,Iuv)σ),

where δ is a sign function indicating if the pixel is inside or outside the triangle

δ(fi,Iuv)={+1,if (u,v)fi1,otherwise,

d(fi,Iuv) is the distance from face to pixel, and σ is face sharpness.

The following shows the graph of D(fi,Iuv) as a function of d(fi,Iuv), with increasing σ[0,1]. We can see that a larger σ results in a smoother function.

Next, we want to normalize D(fi,Iuv) to w(fk,Iuv) with the consideration of depth of face and background influence, such that

k=1Nfw(fk,Iuv)+wb=1.

Given the depth of each face at pixel ziuv, we have

w(fi,Iuv)=D(fi,Iuv)exp(ziuv/γ)k=1NfD(fk,Iuv)exp(zkuv/γ)+exp(ϵ/γ),

where γ is the aggregation sharpness and ϵ is the background influence.

The final color value is

Iuv=i=1Nfw(fi,Iuv)Ciuv+wbCbuv.