Click here to

Session: Quality Improvement and Outcomes [Return to Session]

Do Machine Learning-Based Models Perform Better Than Clinical Models in Predicting Biochemical Outcome for Prostate Cancer Patients?

L Sun1,2*, H Quon1,2, W Smith1,2, and C Kirkby1,3 (1) University of Calgary, Calgary, AB, CA (2) Tom Baker Cancer Centre, Calgary, AB, CA (3) Jack Ady Cancer Centre, Lethbridge, AB, CA


SU-H430-IePD-F7-3 (Sunday, 7/10/2022) 4:30 PM - 5:00 PM [Eastern Time (GMT-4)]

Exhibit Hall | Forum 7

Purpose: To determine whether machine learning (ML) based models that incorporate additional treatment planning and treatment delivery features will outperform clinical models that only include patient demographic and tumor features in predicting biochemical failure-free survival (BFFS) for prostate cancer patients.

Methods: Our retrospective dataset consists of 2724 prostate cancer patients diagnosed between 2005 and 2016 and treated with curative EBRT at four institutions. Four models were trained using 2188 patients from two institutions: Cox regression model with elastic net regularization (Cox-EN) model, random survival forest (RSF) model, risk stratification-based model, and Cancer of the Prostate Risk Assessment (CAPRA) score-based model. The two ML-based models considered treatment planning and treatment delivery features in addition to patient demographic and tumor features. The four models were validated using 536 patients from all four institutions. Harrell’s concordance index (c-index) for the four models was compared. For the two ML-based models, the calibration ability was assessed using calibration plots. For the two clinical models, the predicted and observed BFFS curves for each group were compared.

Results: The c-index for the Cox-EN model, RSF model, risk stratification-based model, and CAPRA score-based model were 0.67, 0.71, 0.58, 0.64 for the training dataset, and 0.65, 0.65, 0.51, 0.62 for the validation dataset. For the ML-based models, the calibration plots showed reasonable agreement between the predicted and observed BFFS at 5 years for training and validation datasets with an average difference of 0.02 and 0.06 for the Cox-EN model, and 0.03 and 0.05 for the RSF model. For the clinical models, the agreement between predicted and observed BFFS curves for each group was reasonable for the training dataset but was poor for the validation dataset.

Conclusion: The two ML-based models outperformed the two clinical models. All four models performed worse when validated on data not included in the training.

Funding Support, Disclosures, and Conflict of Interest: Lingyue Sun receives 2019 Alberta Innovates Graduate Studentships in Health Innovation.


Prostate Therapy, Tumor Control, Radiation Therapy


TH- Response Assessment: Modeling: Machine Learning

Contact Email