Click here to

Session: Multi-Disciplinary: Segmentation II [Return to Session]

Evaluation of Deep Learning-Based Automatic Segmentation of the Pancreas

B Rigaud1*, E Kirimli2, S Yedururi3, G Cazoulat4, B Anderson5, M McCulloch6, M Zaid7, D Elganainy8, E Koay9, K Brock10, (1) The University of Texas MD Anderson Cancer Center, Houston, TX, (2) Md Anderson, ,,(3) The University Of Texas Md Anderson Cancer Center, ,,(4) The University of Texas MD Anderson Cancer Center, Houston, TX, (5) University of Texas MD Anderson Cancer Center, Houston, TX, (6) ,Houston, TX, (7) The University Of Texas Md Anderson Cancer Center, ,,(8) The University Of Texas Md Anderson Cancer Center, ,,(9) MD Anderson, Houston, TX, (10) UT MD Anderson Cancer Center, Houston, TX


TU-IePD-TRACK 4-3 (Tuesday, 7/27/2021) 3:00 PM - 3:30 PM [Eastern Time (GMT-4)]

Purpose: Investigate the performance of multiple deep learning (DL) models for the automatic segmentation of pancreas on contrast enhanced CTs.

Methods: A total of 595 CTs of pancreas with various characteristics were gathered from our institution and publicly available datasets. The population was separated into 425, 60, and 110 CTs for the training, validation and withheld test datasets. Both the training and validation datasets were composed of cyst, pancreatic and extra pancreatic tumor, pancreas from our institution (n=204) and the MICCAI Medical Segmentation Decathlon (MSD) (n=281). The withheld test dataset was composed of healthy pancreas from the NIH-82 (n=80) and SYNAPSE MICCAI (n=30) challenges. Four DL models were investigated: attention Unet (attUnet), 2D DeepLabV3+, 3D patch-based BasicUnet from MONAI, and 2-step Unet from RaySearch Laboratories (RSLab). The segmentation resulting from the majority vote (MV) of all DL models was also evaluated. The DL models' performance compared to manual segmentation was reported using the Dice similarity coefficient (DSC), distance to agreement (DTA), Hausdorff distances (100thHD and 95thHD). The intra- and inter-observer variability was reported for 20 pancreas segmentations.

Results: The median (min-max) DSC between manual and automatic pancreas segmentations on the test dataset were 0.78 (0.16-0.88), 0.80 (0.44-0.89), 0.80 (0.00-0.88), 0.82 (0.34-0.88), 0.82 (0.59-0.89), for the attUnet, DeepLabV3+, RSLab, MONAI, and MV models. Majority vote was able to reduce the number of segmentations with a DSC <0.7 to 5.0% and 7.3% for the validation and test datasets. Using the MONAI as reference, the disagreement between the MONAI and DLV3 models was able to detect 72% of the failing cases (DSC<0.7). The median (min-max) intra- and inter-observer delineation DSC were 0.83 (0.62-0.92) and 0.87 (0.78-0.91).

Conclusion: Multiple DL model’s majority vote improved the performance similarly to observer variability. DL model agreement could be used to automatically detect cases where DL fails.

Funding Support, Disclosures, and Conflict of Interest: This study was supported in part by the Helen Black Image Guided Fund and Image Guided Cancer Therapy Research Program at The University of Texas MD Anderson Cancer Center. Kristy Brock received funding from RaySearch Laboratories AB and has a licensing agreement with RaySearch Laboratories AB.



    Not Applicable / None Entered.


    Not Applicable / None Entered.

    Contact Email