Exhibit Hall | Forum 5
Purpose: Prediction of head and neck lymphedema due to multi-dimensionality and multicollinearity with dosimetric data in radiotherapy (RT) is challenging. To avoid over-fitting the prediction model, we proposed using ensemble-feature-selection to first reduce the clinicopathologic and dosimetric dataset prior to using machine learning (ML) and competing risk model to determine the predictors and cumulative risks of lymphedema.
Methods: Clinicopathologic and dose-volume data for 30 retrospectively contoured lymph node levels were extracted from the treatment plans of 76 HN patients. Ensemble-feature-selection, together with four ML models, was used to reduce the initial dataset by selecting top features associated with external and internal lymphedema incidences. These features were used in (i)optimizing four prediction models where highly collinear features were further removed and (ii)optimizing a competing risk model where risk was calculated.
Results: For external lymphedema, the random forest (RF) had the best performance in accuracy (77.5±7%), F1-score (84.9±4.6%), and AUC (84.4±5.5%). For internal lymphedema, the other models (logistic regression, SVM, extreme gradient boost) had superior performance over RF with average accuracy, F1-score, and AUC of 69±6.1%, 60.4±14.8%, and 74.8±7.7% respectively. From 604 features, 10 features predictive of lymphedema were selected by the four prediction models. While majority of the top 10 predictors for external and internal lymphedema were dosimetric features, the non-dosimetric predictors include grouped T-N-stage status, bulky nodes and numbers of lymph nodes removed. At 180 days, the risk for external lymphedema increased from 47% (non-bulky, <25 lymph nodes removed) to 98% (bulky-nodes, >50 lymph nodes removed) (p<0.003). For internal lymphedema the risk increased from 82% (non-bulky, adjuvant RT) to 99% for bulky nodes with definitive RT (p=0.04).
Conclusion: By using ensemble-feature-selection and ML, issues associated with RT dosimetric data were overcome and predictors for HN lymphedema were determined. These predictors could help guide treatment of HN cancers.
TH- Dataset Analysis/Biomathematics: Machine learning techniques