Purpose: We demonstrate that reinforcement learning (RL) can complement traditional treatment planning algorithms to improve their performance. Here, we focus on reducing the run-time of fluence-based IMRT optimizers with an RL agent. An agent was successfully trained to suggest on-the-fly conjugate gradient (CG) mixing ratios in IMRT optimizations. We demonstrate that (i) agents can be trained with reasonable computational effort, and (ii) agent-learned mixing ratios could lead to faster convergence than a simpler heuristic approach, such as a Fletcher-Reeves type method.
Methods: The RL environment encloses an IMRT optimizer engine, which we initiate with the patient’s data, the beam arrangement, and the optimization objectives. The RL agent (1) observes the fluence gradients from the environment, then (2) suggests a CG mixing ratio for the environment, and finally (3) accepts a reward (or penalty) from the decrease (or increase) in the optimization cost function. We train the agent’s interaction over many episodes, while training finishes when the cost value reaches below a predefined level. We test our RL environment on Head and Neck patients and 7-field IMRT.
Results: We discuss various setups, parameters, and requirements for the RL agent. We highlight that: (1) RL agent outperformed a heuristic approach (Fletcher-Reeves method) to accelerate IMRT and (2) continuous-action RL agent performed better than discrete-action agent. We judge these improvements on the total number of IMRT optimization iterations. RL agent reduced IMRT optimization by nearly 20%.
Conclusion: This work illustrates the potential of RL agents to improve the performance of traditional treatment planning algorithms. We see interesting potential for RL agents and their action policies to tune, improve, and manage heuristic meta-parameters that influence algorithmic performance. Next steps for this work are additional agent developments, generalization to arbitrary IMRT fields, and extensions to other treatment planning algorithms.
Funding Support, Disclosures, and Conflict of Interest: The authors are employed by Varian Medical System Finland.
Optimization, Cost Function, Treatment Planning
TH- External Beam- Photons: Treatment planning using machine learning/Knowledge Based Planning/automation