Click here to

Session: Multi-Disciplinary General ePoster Viewing [Return to Session]

Limitations of the Structural Similarity Index in Medical Image Synthesis Evaluation

D Gourdeau1,2,3*, S Duchesne1,2, L Archambault2,3, (1) Universite Laval (2) CERVO Brain research center (3) CHUQ Hotel-Dieu de Quebec


PO-GePV-M-15 (Sunday, 7/25/2021)   [Eastern Time (GMT-4)]

Purpose: Structural similarity (SSIM) is a popular image quality metrics used in the medical image synthesis community because of its higher correlation with human visual quality assessment. It has been widely adopted, but its limitations are often overlooked. SSIM is designed to work on a positive intensity scale, which is not the case in medical imaging. Intensity scales like the Hounsfield units contain negative numbers, and image normalization techniques can also introduce negative numbers.

Methods: To quantify the impact of misusing the SSIM metric, we trained two uncorrelated image synthesis models to perform T2 MRI synthesis from T1 MRI. The synthesized images are strictly positive and their real SSIM can be evaluated. The real SSIM score was compared to a SSIM score computed on the same synthetic images, but after subtracting the mean tissue value from the ground truth to mimic Z-normalization. Next, the suitability of SSIM as a loss function is evaluated by training two synthesis models, once to synthesize strictly positive T2, and once to synthesize Z-normalized T2. The quality of models is evaluated using MAE and PSNR.

Results: SSIM was reduced from 0.895 to 0.774 in model 1 and from 0.891 to 0.784 in model 2 when the tissue mean was subtracted from the synthetic images. This means that SSIM is underestimated on images with negative values, and it introduced an error that changed the relative ranking of models. This error can be clinically significant considering the small SSIM difference between state-of-the-art models. Secondly, training a synthesis model with SSIM as loss function on images with negative values drastically reduces image quality, with the mean relative error increasing from 8.65% to 22.78%.

Conclusion: We show that it is inappropriate to use SSIM on images containing negative values. Finally, we propose reporting guidelines for easier comparisons between articles.

Funding Support, Disclosures, and Conflict of Interest: National Science and Engineering Research Council of Canada fellowship to the first author (grant number: 534769)



    Image Analysis, Image Processing


    IM/TH- Image Analysis (Single Modality or Multi-Modality): Machine learning

    Contact Email