Evaluation Metrics and Loss Functions in Neural Rendering

Given predicted image $I^{pred}$ and the ground truth $I^{gt}$ , we would like to calculate their differences.

Pixelwise Losses

L_{MSE} (I^{pred}, I^{gt}) = \frac{1}{W \cdot H} \sum_{i = 1}^{W} \sum_{j = 1}^{H} ‖ I_{i j}^{pred} - I_{i j}^{gt} ‖_{2}^{2} .

L_{MAE} (I^{pred}, I^{gt}) = \frac{1}{W \cdot H} \sum_{i = 1}^{W} \sum_{j = 1}^{H} | I_{i j}^{pred} - I_{i j}^{gt} | .

L_{PSNR} (I^{pred}, I^{gt}) = 10 \cdot \log_{10} (\frac{I_{max}}{L_{MSE} (I^{pred}, I^{gt})}),

where $I_{max}$ is the maximum possible pixel value (e.g. 255).

SSIM is calculated patch-wisely. Given two patches $I_{x}$ and $I_{y}$ , it is formulated as follows:

L_{SSIM} (I_{x}, I_{y}) = \frac{(2 μ_{x} μ_{y} + c_{1}) (2 σ_{x y} + c_{2})}{(μ_{x}^{2} + μ_{y}^{2} + c_{1}) (σ_{x}^{2} + σ_{y}^{2} + c_{2})},

with

MS-SSIM computes and accumulates the statistics at multiple image scales.

Given a pretrained model (e.g. VGG), we calculate the L1 loss between the output of each layers.

L_{VGG} (I^{pred}, I^{gt}) = \sum_{i = 0}^{5} \frac{1}{2^{5 - i}} | f_{VGG}^{(i)} (I^{pred}) - f_{VGG}^{(i)} (I^{gt}) |,

where $f_{VGG}^{(i)} (*)$ is the output of the $i$ -th layer. Note that $f_{VGG}^{(0)} (I) = I$ .