Exhibit Hall | Forum 2
Purpose: Many commercial AI/Deep Learning-based auto segmentation software tools have become available in recent years. These tools were generally developed and evaluated using vendor specific data. This study evaluated six AI segmentation software (Mirada/RadFormation/Manteia/CarinaAI/LimbusAI/MIM) using a pool of independent clinical CT datasets from three institutions, to test their capabilities and limitations quantitatively for most common organs-at-risk (OARs) contours in extracranial region.
Methods: All auto-segmentation platforms studied utilize U-Net architectures. Some software also employ a pre-segmentation step to identify a proper region-of-interest or a post-processing step to reduce contouring artifacts. Our evaluation datasets consist of CT images of 120 clinical patients from three anatomical sites: thorax (N=40), abdomen (N=40) and pelvis (N=40). The dataset contains images acquired using typical imaging protocols representing a wide spectrum of clinical scenarios. The auto-segmented contours for 25 organs were compared with the manual clinical contours to calculate a comprehensive sets of contouring accuracy metrics.
Results: The organ averaged mean and range (minimum-maximum) of contouring metrics over all the software and all the organs are: Dice Similarity coefficient (DSC): 0.84 (0.34-0.96), 95 percentile of Hausdorff Distance (in mm): 7.3 (2.0-26); Mean Surface Distance (in mm): 2.4(0.8-9.5). Among the 25 organs, 10 have DSC >0.9, including lung, liver, Kidney, femoral head and bladder, heart, etc., while 8 have DSC of 0.7-0.89, including spinal cord, rectum stomach, etc., The remaining organs, including Gallbladder, Bronchus, duodenum, seminal vesicle, penile bulb and brachial plexus, reported DSC<0.7.
Conclusion: AI segmentation tools can generate contours with reasonable accuracies for most organs in extracranial region tested on independent multi- institutional CT image datasets. There are large variations in contouring accuracy among the OARs by the auto-segmentation software indicating that quality assurance (QA) of these tools is necessary before clinical implementation.