Background
Similarity measurement(PSNR and SSIM)
PSNR
Peak signal-to-noise ratio (PSNR) is to check the similarity of differences.
The definition starts from the mean squared error. $$ MSE = \frac{1}{c\cdot h\cdot w}\sum(I_1-I_2)^2 $$ where $I_1$ and $I_2$ are two images, $h$ and $w$ are the height and width of images, $c$ is the numbers of channels.
Then PSNR is expressed as $$ PSNR = 10\cdot \log_{10} \left(\frac{MAX_I^2}{MSE}\right) $$ where the $MAX_I$ is the maximum valid value for a pixel.
In case of MSE is 0, we need to handle this separately.
The logarithmic scale is made because the pixel values have a very wide dynamic range.
SSIM
Humans are not sensitive to the absolute brightness/color of pixels, but are very sensitive to the position of edges and textures. Structural Similarity Index(SSIM) mimics human perception by focusing primarily on edge and texture similarity.
We cut pictures into patches and compare them patch by patch to compute SSIM. Given patch $x$ from image $I_1$ and $y$ from another image $I_2$, we compute several statistics, including
- $\mu_x$, $\mu_y$: the average of $x$ / $y$
- $\sigma_x^2$, $\sigma_y^2$: the variance of $x$ / $y$
- $\sigma_{x,y}$: the covariance of $x$ and $y$
Then the similarity of luminance between patches can be expressed as $$ l(x, y)=\frac{2\mu_x\mu_y}{\mu_x^2+\mu_y^2} $$ If there is a big difference between $x$ and $y$, $l(x,y)$ will be close to 0; If they have similar luminance, $l(x,y)$ will be close to 1. $l(x, y)$ is scale-invariance.
The similarity of contrast between patches can be expressed as $$ c(x,y) = \frac{2\sigma_x\sigma_y}{\sigma_x^2+\sigma_y^2} $$ If one patch is much “flat” than the other, the score close to 0; If both have the same contrast level, the score is close to 1. Contrast score compares the number of “textures” in an image block. This formula is also scale-invariant.
The similarity of structure between patches can be expressed as $$ s(x, y) = \frac{\sigma_{x,y}}{\sigma_x\sigma_y} $$ Scores are high when two patches contain edges with the same position and orientation; while low if the patches differ on the location of the edges.
The overall SSIM score is the product of these three scores. Some small constants are added to prevent division by zero. SSIM will return a similarity index averaged over all channels of the image. The value is between 0 and 1, where 1 corresponds to perfect fit.
Papers
PSENet: Progressive Self-Enhancement Network for Unsupervised Extreme-Light Image Enhancement
Most work focused on over-expose condition, while PSENet focused on under-expose condition.
In-camera image signal processors usually use highly nonlinear operations to generate the final 8-bit standard RGB image.