Purpose: Automating metastasis-directed therapy (MDT) planning may improve plan consistency between users and centers, but variability in characteristics of metastasis lesions hinders existing supervised learning approaches due to training data requirements. The purpose of this study is to investigate the feasibility of reinforcement learning (RL)-based machine parameter optimization for automatic MDT planning for oligometastatic prostate cancer (OMPC), which eliminates dependence on prior plans for training and directly optimizes linear accelerator parameters.
Methods: We implemented a three-dimensional (3D) deep-Q RL volumetric modulated arc therapy optimization algorithm for MDT as follows. Dose was computed in a 3D grid using matRad. The dose grid, target and organ contours were re-sampled to extract axial slices aligned with individual pairs of leaves in the multi-leaf collimator (MLC) model. These slices were used as input in a 2D deep-Q network, which predicted slice-specific leaf position and dose rate updates. Slice-specific dose rate updates were recombined for each control point using a voting strategy. RL was conducted using four OMPC patients to maximize target coverage and minimize bowel dose, and applied to one independent test case. Resultant dose was compared to simple conformal arcs to quantify improvement over un-optimized plans.
Results: Training was conducted for 4,800 iterations taking 3 days. RL MDT maintained mean±SD target coverage of 95.1±7.9% in the training cohort and 100.0% in the test patient. Compared to the conformal arcs, RL MDT reduced maximum and mean bowel dose by 20.0±16.3% and 38.0±18.6% in the training cohort, and 1.5% and 37.6% in the test patient, demonstrating a preliminary trend toward improved dosimetry.
Conclusion: These preliminary results suggest that RL MDT optimization is feasible in a 3D beam model despite variable targets and limited training time. Ongoing work involves increasing training cohort size and training time to enable high-quality auto-planning for MDT.
Funding Support, Disclosures, and Conflict of Interest: This work was supported by an AAPM Seed Research Funding Grant.
Not Applicable / None Entered.