Purpose: Outcome prediction with a small number of cases has been challenging task because of biased estimation of statistical population. Although mixup is effective technique that estimates statistical population with virtual data constructed by interpolation of training data, it could be more effective by extrapolation of limited number of training data. The purpose of this study was to develop machine learning (ML) models for prediction of prostate cancer recurrence using extrapolation data.
Methods: The ML models trained with and without extrapolation data were compared. A total of 100 patients with prostate cancer who were treated with radiotherapy were included in this study. Effective features were selected by elastic net from 48 candidate features that include 35 clinical features and 13 dose features. The dataset was divided into training and test dataset. Virtual data were constructed by linear interpolation or linear extrapolation of two data randomly selected from training dataset. In this study, two kind of regression models (multiple linear regression: MLR and artificial neural network: ANN) were used as ML models. The ML models trained with and without extrapolation data were evaluated with the area under the receiver operating characteristic curve (AUC) for the prediction of prostate cancer recurrence in test dataset.
Results: The AUCs of both ML models were significantly improved by using extrapolation data (MLR: p<0.0001, ANN: p<0.0001). The AUC of MLR with and without extrapolation data was 0.730 and, 0.671, respectively. The AUC of ANN with and without extrapolation data was 0.807 and, 0.764, respectively.
Conclusion: Our results showed that the performance of ML models for the prediction of prostate cancer recurrence was improved by using extrapolation data.
Funding Support, Disclosures, and Conflict of Interest: This work was supported by JSPS KAKENHI Grant Number 18K15604.