ePoster Forums
Purpose: Aperture shape optimization is one of the most time-consuming parts in VMAT planning. Current state-of-the-art algorithms rely on gradient information to reduce the search space. Here we propose to incorporate past plan knowledge into this key part of optimization. Specifically, we train a deep learning model based on the self-attention mechanism from the transformer architecture, to predict the aperture shapes directly from the patient contour volume.
Methods: We use a network structure similar to the compact-convolutional-transformer. The input to the transformer is the 3D-volume of the patient structure set where each voxel number stands for the oar or PTV type. We calculated 180 beams-eye-view 3D-volumes through ray casting, which are tokenized by convolutions to 128-dimensional vectors,later summed with positional embeddings. This is then fed to a transformer encoder which consists of a multi head attention layer and a feed forward layer. During the training phase, 180-aperture shapes are processed by a decoder with a similar structure to the encoder. This and the encoder output are then sent through another decoder which generates the aperture shapes and weights. At inference phase, a fully closed aperture and BEV volumes are sent into the transformer to predict apertures. We collected 11 lung SBRT plans as our training data. The transformer is trained using root mean squared propagation with a loss function of binary cross entropy. The DVHs are calculated from the predicted aperture and weighting.
Results: The transformer efficiently learns from the training data and predicts apertures close to the past plan. The calculated DVH matches well for the OAR.
Conclusion: The transformer predicted apertures can work as a warm start point for any VMAT optimization algorithms to speed up the clinical workflow. This calls for further study with larger training sets and model parameters optimization to improve generalization and speed.