Click here to

Session: Early-Career Investigator Symposium [Return to Session]

Explainable Machine Learning for Predicting Overall Survival of Patients with Locally Advanced Non-Small Cell Lung Cancer Treated with Photon and Proton Radiotherapy

L Duan1*, S Lee1, R Caruana2, T Kegelman1, S Feigenberg1, Y Xiao1 (1) Department of Radiation Oncology, University of Pennsylvania, Philadelphia, PA (2) Microsoft Research, Redmond, WA


MO-FG-BRB-6 (Monday, 7/11/2022) 1:45 PM - 3:45 PM [Eastern Time (GMT-4)]

Ballroom B

Purpose: To study the impact of clinical and dosimetric parameters on 2-year overall survival (OS) of patients with locally advanced non-small cell lung cancer (LA-NSCLC) treated with photon and proton radiation therapy by using explainable machine learning.

Methods: A dataset of 185 LA-NSCLC patients treated with photon therapy (n=87) and proton therapy (n=98) from 2008 to 2015 was retrospectively collected. A total of 22 clinical and dosimetric features were extracted from the dataset. Feature selection was performed for each of the photon and proton datasets using Cox proportional hazards (CPH) regression analysis that was assessed by a 5-fold cross-validated Harrell’s concordance index. Explainable boosting machine (EBM) was trained separately on photon and proton datasets. Performance of the EBM for predicting 2-year OS was evaluated using the area under the receiver operating characteristic curve (AUC) estimated by 20×5-fold cross-validation (CV). The EBM’s mean absolute score for each feature was used to explain the overall feature importance associated with 2-year OS, and EBM risk scores as a function of each feature were graphically summarized to interpret the complex non-linear relationship between each feature and 2-year OS.

Results: Selected features for the photon dataset were world health organization performance status (WHO-PS), smoking history, V5 heart, age, V30 heart, V40 heart, mean heart dose, primary internal gross tumor volume (IGTV) and nodal IGTV; those for the proton dataset were WHO-PS, age, smoking history, T stage, and primary IGTV. Heart dosimetric parameters for the photon dataset were more strongly associated with the risk of survival than those for the proton dataset. EBM models for the photon and proton datasets achieved CV AUCs of 0.813 and 0.712, respectively.

Conclusion: EBM has the potential for predicting survival of patients with LA-NSCLC as well as explaining differences in associated risk factors between photon and proton radiotherapy.

Funding Support, Disclosures, and Conflict of Interest: This project was supported by grants U24CA180803 (IROC) and U10CA180868 (NRG), from the National Cancer Institute.


Lung, Protons, Radiation Effects


TH- Dataset Analysis/Biomathematics: Machine learning techniques

Contact Email