Purpose: Deep learning-based auto-countering (DLC) may be used to reduce the time required to generate organs at risk (OAR). In this study, we investigate the accuracy of DLC contouring relative to inter-observer variability.
Methods: Ten prostate patients were contoured by four experts, senior dosimetrists with an average of 15 years of experience (EC). The penile bulb, femurs, bladder and rectum were contoured by the DLC software and each expert. The Dice Similarity Coefficient (DSC) and Mean Distance to Agreement (MDA) was used to compare the DLC contours to multiple EC contours. The worst DSC value for the Deep learning vs Experts (DLC–EC) combinations and inter-expert (EC–EC) combinations for each patient were used for the statistical analysis. This ensured that lower performing outliers were included.
Results: No statistically significant (p<0.05) differences were measured for the worst DSC metric for each case. The single-worst DSC for the DLC-EC and EC-EC were: penile bulb (0.30 vs 0.31), Femurs (0.87 vs 0.87), bladder (0.74 vs 0.74), and rectum (0.76 vs 0.77 ). The average-worst DSC were: penile bulb (0.49 vs 0.49), Femurs (0.92 vs 0.92), bladder (0.91 vs 0.90), and rectum (0.81 vs 0.82). The mean DSC values for DLC-EC were: Penile Bulb = 0.62, Femur = 0.94, Bladder = 0.94, and Rectum = 0.83. The mean DSC values for EC-EC were: Penile Bulb = 0.60, Femur = 0.93, Bladder = 0.92, and Rectum = 0.86.The mean MDA values : Penile Bulb = 2.43mm, Femurs = 0.85mm, Bladder = 0.92mm, and Rectum = 2.31mm. By comparison, the mean MDA values for the EC-EC inter-comparison are: Penile Bulb = 2.39mm, Femurs = 1.15mm, Bladder = 1.02mm, and Rectum = 2.26mm.
Conclusion: The use of DLC based auto-contouring provides contours that are comparable to expert inter-observer variability.
Not Applicable / None Entered.