Purpose: Deep neural nets have revolutionized the science of auto-segmentation and present great promise for treatment planning automation. However, little data exists regarding clinical implementation. We evaluated the performance and clinical implementation of a novel deep learning-based auto-contouring workflow for 0.35T magnetic resonance imaging (MRI)-guided pelvic radiotherapy.
Methods: An auto-contouring model was developed using a UNet-derived architecture for the femoral heads, bladder, and rectum. Training data was taken from 75 patients treated on an 0.35T MRI-guided radiotherapy machine. Post-processing steps were developed in order to optimize usability of the resulting contours. The model was tested against 20 retrospective cases outside the training set, and subsequently was clinically implemented by interfacing it to the software used for clinical contouring in our department. Usability was evaluated on the first 10 consecutive clinical cases by computing 2D slice-by-slice Dice similarity coefficient (2D-DSC) and the fraction of slices that were used un-modified by planners. Final contours were retrospectively reviewed by an experienced treatment planner and clinical significance of deviations was graded as negligible, low, moderate and high probability of leading to actionable dosimetric variations.
Results: Average 2D-DSC for the retrospective test data were 92.8±4.9, 91.8±8.2, 91.6±13.3, and 87.2±11.8 for the right and left femoral heads, bladder, and rectum respectively. Post-implementation, average 2D-DSC were 100.0±0.4, 99.3±4.3, 94.9±10.5, 94.5±14.4, respectively. For each organ, 99.1, 96.3, 53.7, and 64.7 percent of slices were used unmodified by the planner. In retrospective review of contours used for planning, a total of 6 deviating slices in 2 patients were graded as low potential clinical significance. No deviations were graded as moderate or high clinical significance.
Conclusion: We have presented an analysis of the clinical implantation of a novel auto-contouring workflow. Research-oriented performance metrics correlate imperfectly with clinical utility. Substantial workflow savings were obtained, and little automation bias was observed.
Not Applicable / None Entered.