Purpose: To carry out a large international validation of how dose prediction quality translates to plan quality in a radiotherapy knowledge-based planning (KBP) process.
Methods: We collected dose predictions for head-and-neck cancer radiotherapy from 21 different research groups internationally who participated in the OpenKBP Grand Challenge. Each research group used the same training dataset (n=200) and validation dataset (n=40) to develop their methods. These methods predicted dose on a testing dataset (n=100), and those 2100 unique dose predictions were input to a previously published plan optimization method to generate 2100 treatment plans. The predictions and plans were compared to the ground truth dose via: (1) error, the mean absolute voxel-by-voxel difference in dose; and (2) quality, the mean and maximum deviation across 23 dose-volume histogram (DVH) criteria.
Results: The range in median prediction error among the top 20 methods was 2.3Gy to 12.0Gy, which was 6.8Gy wider than the range in median plan error of 2.1Gy to 5.0Gy. One method also achieved significantly lower prediction error (P<0.05; one-sided Wilcoxon test) than all the other methods, however, it generated plans with error that was not significantly lower than 28.6% of the other methods. Additionally, predicted dose was consistently lower quality than plan dose. Half (n=1050) of all predictions and plans had an average deviation that was 0.1Gy worse and 0.8Gy better than the ground truth dose, respectively. Similarly, half of all predictions had a maximum deviation that was 3.7Gy worse than the ground truth dose, which was 1.0Gy worse than half of all plans.
Conclusion: Many dose prediction methods can achieve low error, however, optimization often improves upon the predictions and eliminates significant differences between prediction methods. Thus, it is critical that we improve the optimization stage in KBP to get better utility out of the existing high-quality dose prediction methods.