Evaluation Metrics and Loss Functions in Neural Rendering

Given predicted image Ipred and the ground truth Igt, we would like to calculate their differences.

Pixelwise Losses

Mean Squared Error (MSE)

LMSE(Ipred,Igt)=1WHi=1Wj=1HIijpredIijgt22.

Mean Absolute Error (MAE)\

LMAE(Ipred,Igt)=1WHi=1Wj=1H|IijpredIijgt|.

Peak Signal-to-noise Ratio (PSNR)

LPSNR(Ipred,Igt)=10log10(ImaxLMSE(Ipred,Igt)),

where Imax is the maximum possible pixel value (e.g. 255).

Local Image Statistics

Structural Similarity Metric (SSIM)

SSIM is calculated patch-wisely. Given two patches Ix and Iy, it is formulated as follows:

LSSIM(Ix,Iy)=(2μxμy+c1)(2σxy+c2)(μx2+μy2+c1)(σx2+σy2+c2),

with

Multiscale SSIM (MS-SSIM)

MS-SSIM computes and accumulates the statistics at multiple image scales.

CNN-Based Perceptual Loss

Given a pretrained model (e.g. VGG), we calculate the L1 loss between the output of each layers.

LVGG(Ipred,Igt)=i=05125i|fVGG(i)(Ipred)fVGG(i)(Igt)|,

where fVGG(i)() is the output of the i-th layer. Note that fVGG(0)(I)=I.