Room 206
Purpose: Reinforcement learning (RL) is a machine learning approach that trains an agent by rewarding desired actions while penalizing undesired ones, and has demonstrated super-human performance in multiple problem domains. We previously investigated deep-Q network (DQN) RL for VMAT machine parameter optimization (MPO), which inherently updates parameters in discretized steps thereby limiting performance. Here we propose a new RL approach providing continuous parameter updates using deep deterministic policy gradients (DDPG) and compare it to DQN.
Methods: The DDPG approach was applied in a simplified 2D VMAT model, in which two opposing multi-leaf collimator leaves and dose rate are controlled to optimize dose for a single 2D axial slice. DDPG involves training two convolutional neural networks using the current plan (dose distribution, contours, and machine parameters) as input. A critic network is trained to predict a reward value based on dose objectives. An actor network is trained to predict continuous leaf position and dose rate updates through gradient ascent on the critic network to maximize reward. DDPG and DQN were trained, validated, and tested using cohorts of 15, 5, and 20 localized prostate cancer patients, respectively. Plans were re-scaled to 80 Gy prescription. Dose metrics derived from DDPG plans were compared to DQN and clinical plans using paired t-tests.
Results: In the test cohort, DDPG, DQN, and clinical plans provided PTV V95 of 97.6±3.2%, 93.2±10.7% (P<0.001), and 99.6±1.9% (P<0.001), rectum mean doses of 41.6±11.6 Gy, 47.0±10.2 Gy (P<0.001), and 37.4±12.8 Gy (P<0.001), and femoral head max doses of 40.2±10.9 Gy, 46.4±6.4 Gy (P<0.001), and 34.3±9.3 Gy (P<0.001), respectively.
Conclusion: DDPG provided significant improvements in final plan quality compared to DQN and approaches clinical plan quality. DDPG overcomes a major limitation of DQN by providing continuous actions, representing a step towards advanced RL-based radiotherapy planning.
Funding Support, Disclosures, and Conflict of Interest: This work was supported by an AAPM Research Seed Funding Grant.
Inverse Planning, Prostate Therapy, Optimization
TH- External Beam- Photons: IMRT/VMAT dose optimization algorithms