Purpose: This study evaluates five commercially available autosegmentation models for delineation of organs at risk (OAR) in genitourinary (GU) malignancies and compared them against the reference standard contours.
Methods: A GU radiation oncology expert reviewed clinical OARs on 34 representative GU patients’ CT scans and confirmed they met departmental OAR guidelines. These reviewed OARs were considered as the reference standard and were utilized for benchmarking. Five commercially available autosegmentation models were applied to the same 34 CT datasets. Volumetric and Overlap Dice Similarity Coefficients (DSC) were used to compare the structures from each commercially available model to the reference OARs. Overlap DSC is a novel metric we defined in this study as the volumetric DSC evaluated on a subset of CT slices where contours from both structure sets are present. It is more meaningful in a context where the given OARs are not derived from an equivalent standard for a superior-inferior extent.
Results: A total of 1836 structures were generated by the five commercial models. Most of the commercial models delineate the bowel as a bowel bag contour and do not distinguish it as small bowel/large bowel. Hence the bowel contour was ignored in the analysis. For the five commercial autosegmentation models, the overall median volumetric DSCs were 0.937, 0.903, 0.773, 0.948, and 0.940. Of the 1836 structures, there were 12 (0.65%) contours with no overlap when compared to the reference standard.
Conclusion: 3 of 5 commercially available models performed at or above the clinical standard (volumetric DSC >=0.92). The bowel contour was the least likely to be delineated by the commercial models. Although autosegmentation models for OARs provide significant superiority compared to manual segmentation, no models performed to expert review. There are still some issues that need to be solved to enhance the use of these autosegmentation models clinically.