Ballroom C
Purpose: To investigate the performance of a single-institution deep-learning dose prediction model for head and neck (H&N) cancer at a different institution and inform future directions for institution-specific adaption.
Methods: A published deep-learning dose prediction model based on patients from institution A for H&N cancer was acquired from the developers. The model was trained to predict dose from PTVs and 44 OAR contours based on a densely connected U-net architecture using 120 patients treated with VMAT. At institution B, 23 H&N patients also treated with VMAT were used to test the model performance. The structure sets were carefully examined to ensure naming consistency with the original dataset, and the model was then applied without human intervention. The prediction results were qualitatively and quantitatively compared with the clinical plans and the reported statistics in the publication.
Results: At most 24 OARs were contoured in each testing patient. Compared to the clinical plans, the model predicted comparable global Dmax and PTV coverage. The Dmax at spinal cord and esophagus are -12.34%±10.16% and -13.49%±13.75% lower than the clinical plans, respectively. In some cases, unnecessary dose spillage to the brain was predicted, causing Dmax of optical structures and cochleae >100% hotter. Compared to the publication, the prediction errors of Dmean for the right parotid and larynx were substantially larger (24% vs. 12.5% and 27% vs. 7%, respectively), whereas for other OARs the performance was similar.
Conclusion: Applying A’s model to institution B’s cases produced dose predictions with comparable PTV coverage, lower spinal cord dose, and higher dose to optical structures and cochleae compared to the clinical plans. Model performed less reliably when transferred to B, resulting higher predicted dose to the brain stem and optical structures. These findings warrant further investigation of the model for sub-disease types and transfer learning for institution-specific adaption.