Cross-Comparison of Multi-Platform AI Auto-Segmentation Tools Using Independent Multi- Institutional Datasets for Head and Neck Cancer

Y Rong¹*, Q Chen², L Yuan³, X Qi⁴, K Latifi⁵, X Yang⁶, B Cai⁷, H Al-Hallaq⁸, Q Wu⁹, Y Xiao¹⁰, S Benedict¹¹, (1) Mayo Clinic Arizona, Phoenix, AZ, (2) City of Hope Medical Center, Duarte, CA, (3) Virginia Commonwealth University Medical Center, Richmond, Virginia, (4) UCLA School of Medicine, Los Angeles, CA, (5) H. Lee Moffitt Cancer Center, Tampa, FL, (6) Emory University School of Medicine, Atlanta, GA, (7) University of Texas Southwestern Medical Center, Clayton, MO, (8) The University of Chicago, Chicago, IL, (9) Duke University Medical Center, Chapel Hill, NC, (10) University of Pennsylvania, Philadelphia, PA, (11) UC Davis Cancer Center, Davis, CA

Y Rong

Presentations

PO-GePV-M-294 (Sunday, 7/10/2022) [Eastern Time (GMT-4)]

ePoster Forums

Purpose: Results provided by commercial AI deep learning-based auto-segmentation models have a wide range of quality and accuracy. It has not been systematically evaluated using the same independent cohort of datasets. This study aims to evaluate contour variation and accuracy generated from six major AI-based auto-segmentation tools for organs at risk (OARs) in head and neck (HN) cancer patients

Methods: A total of 40 HN patients anonymized from three intuitions were used for testing six AI-based segmentation tools, including MIM, Manteia, Mirada, LimbusAI, RadFormation, and CarinaAI. Quantitative metrics, including Dice, precision, Hausdorff distance percentage, mean surface distance, and surface Dice, were assessed for AI vs. manual and pairwise cross-comparisons. Eighteen OARs of various volumes and shapes were studied including brain, brainstem, mandible, optical nerves, parotids, eyes, lenses, sub-mandible glands, oral cavity, spinal cord, and cochlea.

Results: For all studied OARs, mean DICE ranges from 0.47 to 0.98 for all evaluated platforms compared to manual contours, with the cochleas being the lowest and brain the highest. Cross-comparison between each pair of AI tools displayed different levels of consistency in OARs, with the highest in brain, brainstem, eyes, and mandible, and the lowest in optical chiasm, oral cavity, optical nerves, and cochleas. All evaluated systems (except for MIM) showed equivalent or higher DICE values compared to individual comparison to manual contours for Brainstem (0.85-0.87 vs. 0.81), mandible (0.83-0.90 vs. 0.84), eyes (0.87-0.92 vs. 0.88), and brain (0.97-0.99 vs. 0.98). Higher standard deviations in DICE over 40 patients are observed for sub-mandible glands and lenses.

Conclusion: Pairwise cross-comparison of evaluated systems demonstrated high variation with respect to different OARs in the HN region. Some AI platforms showed better consistency compared to the corresponding manual contours, especially for those soft tissue organs that are difficult to identify on CT images.

Funding Support, Disclosures, and Conflict of Interest: Quan Chen is the developer for the CarinaAI auto-segmentation tool.

Keywords

Segmentation, Quality Control, CT

Taxonomy

IM/TH- Image Segmentation Techniques: Segmentation Method - other

Contact Email

Cross-Comparison of Multi-Platform AI Auto-Segmentation Tools Using Independent Multi- Institutional Datasets for Head and Neck Cancer

Presentations

Share:

Additional Links