Click here to

Session: Machine Intelligence Efficacy and Quality II [Return to Session]

Twin AI Algorithms for Quality Control of Auto-Segmentation in Radiation Therapy

C V Guthier1,2*, R Zeleznik1,2, D S Bitterman1,2, R S Punglia1, J S Bredfeldt1, H J W L Aerts1,2, R H Mak1,2, (1) Department of Radiation Oncology, Brigham and Womens Hospital/Dana-Farber Cancer Institute and Harvard Medical School, Boston MA (2) Artificial Intelligence in Medicine (AIM) Program at Harvard-MGB


MO-B-BRC-6 (Monday, 7/11/2022) 8:30 AM - 9:30 AM [Eastern Time (GMT-4)]

Ballroom C

Purpose: Deep-learning (DL) has shown great potential for various contouring tasks. Safe implementation of artificial intelligence (AI)-generated contours remains a challenge. We introduce a novel implementation framework with dual, independent AI algorithms to identify low quality contours for quality assurance (QA).

Methods: Two fully-independent DL models were trained to segment the heart. Model_1 (U-Net architecture) was trained on CTs with segmentations by cardiologists (n=858). Model_2 (U-Net with ResNet encoders) was trained on radiation oncology planning CTs and segmentations from lung cancer patients (n=700).. Both models were used to segment the heart of 2867 breast cancer patients. The two models were then geometrically compared to ground truth and each other, to define action levels for dissimilarity (low quality segmentations). DL models and action levels were implemented in a QA tool that prospectively screens all planning CTs by deploying both models and either 1) compares each model’s output for AI-only-contouring workflows or 2) checks final human-edited contours in an AI+human collaboration workflow (AI-Human-AI sandwich).

Results: Comparing DL models against ground truth showed a median Dice of 0.90(IQR=0.05) and 0.91(IQR=0.04) for Model_1 and Model_2. 61 cases (2.7%) showed a Dice of less than 0.75 (our threshold for dissimilarity). There was high agreement between the two DL models ((median Dice: 0.94(IQR=0.02). Utilizing the ROC curve, we identified an appropriate action level to trigger manual secondary review at a Dice of 0.85 between the two AIs models , (accuracy: 0.974, sensitivity:0.820; specificity: 0.979). In a subset of prospectively collected patients with clinician-edited Model_1 contours deployed in the clinic (n=20), QA with Model_2 detected no low quality contours.

Conclusion: We successfully implemented a dual AI approach for auto-segmentation quality control into the clinic


Quality Assurance


IM/TH- Image Analysis Skills (broad expertise across imaging modalities): Machine Learning

Contact Email