Purpose: Organ at risk (OAR) contours for radiotherapy treatment plans require time-consuming quality review before treatment planning. We are developing automated tools that reduce contour evaluation time. Verifying contour quality involves many factors, including checks of contour-to-contour relationships for anatomically consistent overlapping or separated OARs. However, very little work has been done to quantify contour-to-contour relationships in contours segmented manually by humans or automatically by machine models, both of which are used clinically.
Methods: Thirty-five CT scans from head and neck radiotherapy patients were contoured by physicians specializing in head and neck radiotherapy to form a gold standard dataset. These contours were compared to manually segmented clinical contours used for treatment planning, and contours created by 5 FDA approved and commercially available machine learning algorithms. Contours were matched in MATLAB to an n-by-n matrix listing the contour-to-contour relationships for evaluation. Contour-to-contour relationships are quantified by calculating the minimum distance between contours (gap) and their fractional overlap volume. Data from each contouring method was combined and the positive predicted value (PPV) of contour-to-contour relationships to identify clinically relevant outliers was evaluated from 8 contour-to-contour metrics. Contour-to-contour outliers were selected by identifying values outside of the mean plus 3.5 times the interquartile range.
Results: Comparison of segmentation methodologies have been performed using 251 contour-to-contour relationships. Larger gaps between optic nerve and optic chiasm are observed in contours from auto-segmentation tools compared to manually segmented clinical contours. Differences in the mean gaps between brachial plexus and cord are observed between segmentation methods (range: 0 to 18.3mm). Twenty-two outliers were identified, 20 of which were found to be clinically unacceptable by institutional guidelines upon review (PPV=0.91).
Conclusion: We developed a method to identify incorrect contour-to-contour relationships using two metrics. Acceptable metric ranges will be determined based on highly curated gold standard contours.