Click here to

Session: [Return to Session]

How Fluence-Prediction Error Impact Final Plan Quality: Insights Into a Deep-Learning-Based (DL-Based) Head-And-Neck (H&N) IMRT Planning AI Agent

X Li1*, QJ Wu1, Q Wu1, C Wang1, Y Sheng1, W Wang1, H Stephens1, F Yin1, Y Ge2, (1) Duke University Medical Center, Durham, NC, (2) University of North Carolina at Charlotte, Charlotte, NC

Presentations

TH-D-TRACK 6-5 (Thursday, 7/29/2021) 2:00 PM - 3:00 PM [Eastern Time (GMT-4)]

Purpose: To collect insights into dosimetric responses from fluence-map-prediction errors of a DL-based AI agent in H&N IMRT planning.

Methods: A reported AI agent could automatically predict fluence maps and generate plans that are comparable to clinical plans for prostate IMRT. However, when trained for a more complex site, H&N, the AI agent tended to generate plans with higher quality variations. Most DL models are “black-boxes”, whose inner mechanism is not intuitive nor interpretable, and the relationship between fluence-map-prediction error and AI plan quality lacks thorough understanding and investigations. We designed five analytical protocols to collect insights on how the DL-based AI agent model performance affects the final plan quality. The DL model was trained with H&N 216 cases and tested with 15 additional cases. The fluence-map-prediction error (prediction minus ground truth) was analyzed for its dosimetric effects using five error-decomposition methods, including three spatial decompositions: ground-truth fluence-intensity thresholds, predicted fluence-intensity thresholds, and ground-truth fluence-gradient thresholds; and two frequency domain decompositions: Fourier-space frequency bands and Fourier-space truncated low-frequency disks. The decomposed error components were analyzed for their impact on resulting plans’ dosimetric metrics. The PTV dosimetric metrics included heterogeneity index, conformity index, whole-body maximum dose D_2cc, PTV V_105%, V_110%, and V_115%. OAR dosimetric metrics included D_0.1cc of brainstem and cord+5mm, D_mean of parotid left/right, oral cavity, larynx, and pharynx, and D_2cc of mandible.

Results: Majority of PTV metrics were significantly correlated with various error components. Among different decompositions, the Fourier-space low-frequency disks could maximally extract error components that reveal plan quality impacts: error within ~20% area of Fourier space captures most of the dosimetric differences between prediction and ground truth.

Conclusion: The fluence-map-prediction error in Fourier-space low-frequency region is critical to AI plans’ quality, especially for PTV-related metrics. This insight will help improving network architecture and loss function design.

Funding Support, Disclosures, and Conflict of Interest: This work was supported by NIH grant (#R01CA201212) and Varian master research agreement.

Handouts

    Keywords

    Statistical Analysis, Treatment Planning, Modeling

    Taxonomy

    TH- External Beam- Photons: Treatment planning using machine learning/Knowledge Based Planning/automation

    Contact Email

    Share: