Click here to

Session: Multi-Disciplinary General ePoster Viewing [Return to Session]

The Dataset Heterogeneity Matters: A Machine Learning Study of Dataset Conformation Effects On Model Performance for Dose Deliverability Prediction

P Quintero1,2*, D Benoit1, Y Cheng1, C Moore2, A Beavis2, (1) University Of Hull, ,UK, (2) Queens Centre for Oncology & Haematology, Cottingham, ,UK


PO-GePV-M-52 (Sunday, 7/10/2022)   [Eastern Time (GMT-4)]

ePoster Forums

Purpose: Within reported machine learning (ML) applications in radiotherapy, unbalanced datasets have often been associated with unfavorable model performance, specifically for gamma passing rates (GPR) predictions. However, consideration has only been given to GPR values (labels), neglecting the predictors' heterogeneity (features). This work evaluates the influence on prediction performance from ML models using various datasets containing plans with different, controlled, treatment factors (number of arcs and treatment unit).

Methods: The area under the ROC curve (ROC-AUC) was calculated for random forest (RF), extreme-gradient boosting (XG-Boost), and neural network (NN) models, implemented to perform binary classification (pass/fail) based on GPR prediction. The predictor features (N=309) based on plan parameters, complexity metrics, and radiomics were extracted and calculated for 945 prostate treatment plans. Thirteen datasets were created: one reference dataset (randomly assembled), emulating those previously reported; six datasets controlling for the number of treatments with one or two arcs, each dataset having a different ratio of factors; similarly, six datasets controlling for treatment unit (Halcyon or TrueBeam).

Results: The ROC-AUC values for models based on the reference dataset were 0.78 ± 0.15, 0.65 ± 0.13, and 0.87 ± 0.03 for RF, XG-Boost, and NN, respectively. The ROC-AUC values for models based on datasets with more homogeneous datasets (i.e., all containing one arc or all for same treatment unit) were 0.84 ± 0.13 and 0.82 ± 0.05 for RF, 0.85 ± 0.07 and 0.87 ± 0.09 for XG-Boost, and 0.90 ± 0.05 and 0.96 ± 0.04 for NN. The variations of the ten most important features for each heterogeneity dataset group demonstrated the heterogeneity effect on model prediction.

Conclusion: ML models trained with more homogeneous datasets allow a better data generalization increasing the classification accuracy and prediction reliability. This study attempted to analyze and reduce implicit random effects in ML modeling.


DICOM-RT, ROC Analysis, Quality Assurance


TH- Dataset Analysis/Biomathematics: Machine learning techniques

Contact Email