Purpose: To collect insights into dosimetric responses from fluence-map-prediction errors of a DL-based AI agent in H&N IMRT planning.
Methods: A reported AI agent could automatically predict fluence maps and generate plans that are comparable to clinical plans for prostate IMRT. However, when trained for a more complex site, H&N, the AI agent tended to generate plans with higher quality variations. Most DL models are “black-boxes”, whose inner mechanism is not intuitive nor interpretable, and the relationship between fluence-map-prediction error and AI plan quality lacks thorough understanding and investigations. We designed five analytical protocols to collect insights on how the DL-based AI agent model performance affects the final plan quality. The DL model was trained with H&N 216 cases and tested with 15 additional cases. The fluence-map-prediction error (prediction minus ground truth) was analyzed for its dosimetric effects using five error-decomposition methods, including three spatial decompositions: ground-truth fluence-intensity thresholds, predicted fluence-intensity thresholds, and ground-truth fluence-gradient thresholds; and two frequency domain decompositions: Fourier-space frequency bands and Fourier-space truncated low-frequency disks. The decomposed error components were analyzed for their impact on resulting plans’ dosimetric metrics. The PTV dosimetric metrics included heterogeneity index, conformity index, whole-body maximum dose D_2cc, PTV V_105%, V_110%, and V_115%. OAR dosimetric metrics included D_0.1cc of brainstem and cord+5mm, D_mean of parotid left/right, oral cavity, larynx, and pharynx, and D_2cc of mandible.
Results: Majority of PTV metrics were significantly correlated with various error components. Among different decompositions, the Fourier-space low-frequency disks could maximally extract error components that reveal plan quality impacts: error within ~20% area of Fourier space captures most of the dosimetric differences between prediction and ground truth.
Conclusion: The fluence-map-prediction error in Fourier-space low-frequency region is critical to AI plans’ quality, especially for PTV-related metrics. This insight will help improving network architecture and loss function design.
Funding Support, Disclosures, and Conflict of Interest: This work was supported by NIH grant (#R01CA201212) and Varian master research agreement.