Parametric Human Models

Pose Representation

Kinematic Chains

The human poses can be expressed via a kinematic chain, which is a series of local rigid body motions. The rotation of each joint can be represented by a vector ω, whose norm is the angle of rotation, and direction is the axis of the rotation. Representing the relative coordinates of each joint with respect to its parent joint by j, the transformation matrix form the local joint coordinates to the parent joint coordinates is

G(ω,j)=[eωj0T1].

Pasted image 20240311165000.png|300
If we want to transform pb from coordinates of j2 to world coordinates, we first represent pb is homogeneous coordinates p¯b=[pb1]T, then
p¯s=G(ω1,j1)G(ω2,j2)p¯b

Pose Parameters

If there are K joints for a body model, we can represent the joint locations by a concatenated vector:

J=(j1jK)R3K.

The pose parameters, which are rotations of each joint, can be represented as a vector of concatenated axis-angles:

θ=(ω1ωK)R3K.

Problems of Kinematic Chains

Since points on the surface move along with the corresponding bones. If the transformations of two consecutive bones are different, there will be disconnectivity of surface points near joints.

Pasted image 20240813174346.png|400

Shape Representation

Linear Blend Skinning

In linear blend skinning (LBS), points are transformed as as blended linear combination of joint transformation matrices. For each vertex ti, we have a skinning weight assigned to it with respect to each joint wk,i. In homogeneous coordinates, the transformed position is

t¯i=k=1Kwk,iGk(θ,J)t¯i.

Skinning Functions

Given K joints and N vertices, the standard skinning function produces transformed vertices from

T=W(T,J,W,θ).

Problems of LBS

There are still artifacts when using LBS, especially when the joint angles are large
or when a bone undergoes a twisting motion. This will result in loss of volume.

Blend Shapes

Blend shapes are offsets added to vertex coordinates. The pose blend shapes, aimed to correct LBS issues, are a function of pose parameters. The shape blend shapes, aimed to introduce shape identity to each character, are fixed under all pose parameters.

The pose blend shape under the pose parameters θ is calculated by multiplying the vectorized joint rotation matrices f(θ)R9K by a pose blend shape matrix PR9K×3N:

BP(θ)=f(θ)P.

The shape blend shapes are obtained by performing PCA on rest pose vertices of different characters. They are chosen to be the eigenvectors corresponding to the largest eigenvalues. Suppose we have R shape blend shapes. The total shape blend shape can be calculated by

BS(β)=jRβjSj.

SMPL

Joint Regressor

The joint locations J can also be seen as a linear function of rest vertices T, parameterized by a joint regressor matrix JR3K×3N:

J=J(T;J)=JT.

SMPL

Putting these together, we have the SMPL model:

M(θ,β;T,S,P,W,J),

where

Learning

Model Objective

In the Multishape database, there are about 2,000 single-pose registrations per gender. If we use this to train the model, we want to find the model parameters that minimize a single objective measuring the distance between model and registrations:

Φ=argminΦjminθj,βjM(θj,βj;Φ)Vj2,

where Φ={T,S,P,W,J} are the parameters to be learned, and Vj are set of registrations.

Registration

In practice, we do not have registrations, which contain fixed number of vertices. We only have scanned point clouds {Si}. So we need to fit registrations to the raw scanned, and also need to fit the model to the registrations:

graph LR
    A[Data] --> B[Registration]
    B -- " " --> C[Model]
    C[Model] -- " " --> B

In the registration step, we aim to find a registration Vj which minimizes the distance between scanned points Sj and the registration surface V(V):

Vj=argminVsiSjdist(si,V(V))+Eprior(V),

where Eprior(V) is the prior loss for registration. However, optimizing vertices directly may lead to unstable results.

On the other hand, we can optimize the input parameters to the trained model, such that the model also fits the scanned points:

θj,βj=argminθ,βsiSjdist(si,M(θ,β))+Eprior(θ,β).

However, we do not learn anything new from the scan. We just find the shape and pose parameters that can fit well.

Ultimately, we can combine these two objective together, to get the final objective

θj,βj,Vj=argminθ,β,VsiSjdist(si,V(V))+dist(V(V),M(θ,β))+Eprior(θ,β),

which can be also formulated as follows:

Vj=argminVminθ,βEreg(Sj,V,θ,β),

where

Ereg(Sj,V,θ,β)=ES(Sj,V)scan-to-mesh distance+λCEC(V,θ,β)coupling+λθEθ(θ)pose prior+λβEβ(β)shape prior.

Scan-to-mesh Distance

The scan-to-mesh distance is calculated using the following:

ES(Sj,V)=sSjρ(minvV(V)sv),

where

ρ(x)=x2σ2+x2

is a robust function with upper bound.

Coupling

The coupling loss is defined as follows

EC(V,θ,β)={VM(θ,β)F2,if coupling on vertices,AVAM(θ,β)F2,if coupling on edges.

Priors

The priors are based on Mahalanobis distance if we assume a Gaussian distribution for both θ and β:

Eθ(θ)=(θμθ)TΣθ1(θμθ),Eβ(β)=(βμβ)TΣβ1(βμβ).

Coregistration

From #Model Objective, we can train a model given registrations, and from #Registration, we can obtain registrations from trained models. To solve both problems at once, the key idea is to minimize the registration objective across the dataset of scans:

E({Sj},{Vj},{θj},Φ)=j(ES(Sj,Vj)+λCEC(Vj,θj,Φ)+λθEθ(θj))+λΦEΦ(Φ)model regularization.

In training, the optimization of Φ and {Vj} are done alternatively, i.e., training one with the other fixed.