Click here to

Session: Breast Imaging [Return to Session]

Leveraging Predictive Uncertainty of a Convolutional Neural Network to Flag Unacceptable Segmentations

Z Klanecek1*, H Bosmans2,6, A Studen1,5, M Vrhovec4, D Huff3, B Schott3, Y Kuan2, L Cockmartin6, N Marshall2,6, T Wagner2, K Hertl4, M Krajc4, K Jarm4, R Jeraj1,3,5, (1) Faculty of Mathematics and Physics, Ljubljana, SI, (2) KU Leuven, Leuven, BE, (3) University of Wisconsin - Madison, Madison, WI, (4) Institute Of Oncology Ljubljana, Ljubljana, SI, (5) Jozef Stefan Institute, Ljubljana, SI, (6) UZ Leuven, Leuven, BE


MO-E115-IePD-F8-3 (Monday, 7/11/2022) 1:15 PM - 1:45 PM [Eastern Time (GMT-4)]

Exhibit Hall | Forum 8

Purpose: To leverage the predictive uncertainty (PU) of a convolutional neural network (CNN) to flag unacceptable segmentations and demonstrate its potential in segmentation of a pectoral muscle in mammogram images using inference-enabled Monte Carlo (MC) dropout.

Methods: A UNet segmentation model with MC dropouts set after each convolution layer was used. At inference time, MC dropouts were retained to enable a Bayesian approximation. 30 MC samples were obtained for each test image where the mean of the samples served as the final prediction, and the PU was quantified as the sum of the pixel-wise standard deviations above an optimized threshold, normalized by the length of the pectoral muscle. The model was trained and validated on 200 mammograms with ground-truth pectoral muscle delineations. The potential of PU to flag unacceptable segmentations was tested on an independent set of 100 mammograms. For each mammogram, PU was calculated, and the segmentation quality was evaluated by a radiologist on a 3-points scoring system (1-point=unacceptable, 2-points=95% of pectoral muscle is correctly segmented, 3-points=visually perfect).

Results: High Dice similarity coefficient (DSC) of 0.95±0.07 (mean+std) was achieved across the ground truth dataset (5-fold cross-validation). The strongest negative Pearson correlation of -0.76 (p<0.001) between DSC and PU was obtained using a standard deviation threshold of 0.02. Using this threshold, PU calculated in the independent mammogram dataset was 6.3±2.4, 1.7±1.1, and 1.2±1.9 (mean+std) for mammograms with radiologist scores of 1-, 2-, and 3-points, respectively. Student's t-test revealed significant statistical difference (p<0.001) between unacceptable segmentations (1 point) and others (2- or 3-points).

Conclusion: MC dropouts retained at inference time enables the quantification of PU that can be used for flagging unacceptable segmentations predicted by a CNN. Demonstrating the predictive power of PU in the quality of pectoral muscle segmentation from mammograms, similar method could be applied to other segmentation problems.


Mammography, Segmentation, Monte Carlo


IM/TH- image Segmentation: General (Most aspects)

Contact Email