Click here to

Session: Machine Intelligence Efficacy and Quality II [Return to Session]

Clinical Applicability-Oriented Contour Quality Classification for Auto-Segmentation

Y Zhang*, J Ding, A Amjad, C Sarosiek, N Dang, W Hall, B Erickson, X Li, Medical College of Wisconsin, Milwaukee, WI


MO-B-BRC-1 (Monday, 7/11/2022) 8:30 AM - 9:30 AM [Eastern Time (GMT-4)]

Ballroom C

Purpose: Various auto-segmentations, including deep learning auto-segmentation (DLAS), are being increasingly adopted in radiotherapy, however, cannot always generate clinically acceptable contours. Evaluating accuracy of an auto-segmentation based on commonly used metrics (e.g., Dice similarity coefficient(DSC), mean-distance-to-agreement(MDA), and Hausdorff distance(HD)) does not always reflect its clinical usefulness, e.g., manual editing time. This work aims to develop a novel contour quality classification (CQC) model to evaluate auto-segmented contours based on their clinical applicability.

Methods: The CQC models were designed to classify a contour on a slice as acceptable, minor edit or major edit, based expected editing time, and were trained as organ-specific supervised ensemble tree classification models. The data used for model training and five-fold cross validation included seven calculated quantitative metrics of each contour slice as compared to ground truth contours and the corresponding manual labels (checked by two experienced physicians) from a total of 2564 DLAS contour slices of five abdomen organs on 20 MRIs. The model performance was evaluated using AUC, accuracy, and clinical risk rate(CRR) (the percentage of slices mislabeled as acceptable). Four observers were involved to review the model predictions independently on 9 MRI and CT test sets.

Results: For the cross validation, the average AUC of the five organs was 0.97(0.95-0.99) with CRR of 1.8%(1.3%-2.3%). The average accuracy was 88.6%±2.4%, significantly higher than that using three metrics (68.8%±3.6% using DSC, MDA and HD). The average agreement between the model prediction and majority vote label of all the observers was 95.6% and 95.1%, and the CRR was 0.2% and 0.6% for the MRI and CT test sets, respectively.

Conclusion: The proposed CQC model can classify the quality of a contour slice on both MRI and CT based on its clinical applicability with high accuracy. It can be used to evaluate/compare performance of any auto-segmentation algorithm, including DLAS.


Quality Assurance, Segmentation, Commissioning


IM/TH- image Segmentation: General (Most aspects)

Contact Email