An Independent Evaluation of Six Commercially Available Deep Learning-Based Auto Segmentation Platforms Using Large Multi- Institutional Datasets

L Yuan¹*, Q Chen², Y Rong³, H Al-Hallaq⁴, S Benedict⁵, B Cai⁶, Q Wu⁷, K Latifi⁸, Y Xiao⁹, X Yang¹⁰, X Qi¹¹, (1) Virginia Commonwealth University Medical Center, Richmond, Virginia, (2) City of Hope Medical Center, Duarte, CA, (3) Mayo Clinic Arizona, Phoenix, AZ, (4) The University of Chicago, Chicago, IL, (5) UC Davis Cancer Center, Davis, CA, (6) University of Texas Southwestern Medical Center, Clayton, MO, (7) Duke University Medical Center, Chapel Hill, NC, (8) H. Lee Moffitt Cancer Center, Tampa, FL, (9) University of Pennsylvania, Philadelphia, PA, (10) Emory University, Atlanta, GA, (11) UCLA School of Medicine, Los Angeles, CA

L Yuan

Presentations

TU-D1030-IePD-F2-5 (Tuesday, 7/12/2022) 10:30 AM - 11:00 AM [Eastern Time (GMT-4)]

Exhibit Hall | Forum 2

Purpose: Many commercial AI/Deep Learning-based auto segmentation software tools have become available in recent years. These tools were generally developed and evaluated using vendor specific data. This study evaluated six AI segmentation software (Mirada/RadFormation/Manteia/CarinaAI/LimbusAI/MIM) using a pool of independent clinical CT datasets from three institutions, to test their capabilities and limitations quantitatively for most common organs-at-risk (OARs) contours in extracranial region.

Methods: All auto-segmentation platforms studied utilize U-Net architectures. Some software also employ a pre-segmentation step to identify a proper region-of-interest or a post-processing step to reduce contouring artifacts. Our evaluation datasets consist of CT images of 120 clinical patients from three anatomical sites: thorax (N=40), abdomen (N=40) and pelvis (N=40). The dataset contains images acquired using typical imaging protocols representing a wide spectrum of clinical scenarios. The auto-segmented contours for 25 organs were compared with the manual clinical contours to calculate a comprehensive sets of contouring accuracy metrics.

Results: The organ averaged mean and range (minimum-maximum) of contouring metrics over all the software and all the organs are: Dice Similarity coefficient (DSC): 0.84 (0.34-0.96), 95 percentile of Hausdorff Distance (in mm): 7.3 (2.0-26); Mean Surface Distance (in mm): 2.4(0.8-9.5). Among the 25 organs, 10 have DSC >0.9, including lung, liver, Kidney, femoral head and bladder, heart, etc., while 8 have DSC of 0.7-0.89, including spinal cord, rectum stomach, etc., The remaining organs, including Gallbladder, Bronchus, duodenum, seminal vesicle, penile bulb and brachial plexus, reported DSC<0.7.

Conclusion: AI segmentation tools can generate contours with reasonable accuracies for most organs in extracranial region tested on independent multi- institutional CT image datasets. There are large variations in contouring accuracy among the OARs by the auto-segmentation software indicating that quality assurance (QA) of these tools is necessary before clinical implementation.

Keywords

Segmentation

Taxonomy

IM/TH- image Segmentation: CT

Contact Email

An Independent Evaluation of Six Commercially Available Deep Learning-Based Auto Segmentation Platforms Using Large Multi- Institutional Datasets

Presentations

Share:

Additional Links