Click here to

Session: Machine Intelligence Efficacy and Quality II [Return to Session]

Quality Assurance of Head & Neck OARs Segmentation with Machine Learning and Deep Learning

J Duan1*, J Castle2, X Feng2, Q Chen1, (1) University of Kentucky, Lexington, KY, (2) Carina Medical LLC, Lexington, KY


MO-B-BRC-5 (Monday, 7/11/2022) 8:30 AM - 9:30 AM [Eastern Time (GMT-4)]

Ballroom C

Purpose: Deep learning-based auto-segmentation (DLAS) can be used as a baseline for flagging manual contouring errors. Previous studies focused on only using dice coefficient as the metrics for outlier detection. This study investigates the use of combinations of contour comparison metrics with machine learning to improve the accuracy of catching contouring errors.

Methods: A total of 339 H&N cases from 4 different institutions were retrieved from the cancer imaging archive (TCIA) for this study. The existing BrainStem contours of each case were reviewed and flagged for deviating from clinical acceptance criteria. A commercial DLAS software was used to create DLAS delineation. The agreements between DLAS and manual contouring were evaluated with 27 metrics that including Dice, Hausdorff Distances, Surface Dices, etc. Twenty percent of the data was randomly selected as the test data. Four machine learning models: support vector machine (SVM), random forest classifier (RFC), K-nearest neighbors classifier (KNN), and multi-layer perception (MLP), were trained and tuned using 5-fold cross-validation on the remaining 80% data. Once the hyperparameters were determined, the models were retrained using all training data. The predictions from all models were ensembled to produce a final prediction on the test dataset.

Results: Using single metrics, the area under curve (AUC) achieved were 0.87, 0.86, 0.84, and 0.78 for Dice, HD95, Mean surface distance, and SurfaceDice2mm respectively. A combination of 10 metrics was determined to provide the highest AUC of 0.92 ± 0.07 at the cross-validation stage. On test dataset, it achieved 0.94 AUC, 0.91 weighted recall, 0.91 weighted precision, and 0.91 weighted F1-score.

Conclusion: Using more contouring comparison metrics increased the performance of automatically identifying contouring errors.

Funding Support, Disclosures, and Conflict of Interest: NIH:R44CA254844 Xue Feng and James Castle are employees of Carina Medical LLC. Quan Chen is shareholder of Carina Medical LLC


Not Applicable / None Entered.


IM/TH- Formal Quality Management Tools: General (most aspects)

Contact Email