Purpose: To evaluate the cross-site generalizability of the optimal machine learning method searched from various feature selection (FS) and machine learning (ML) methods for prostate cancer diagnosis using clinical and magnetic resonance (MR) radiomic features.
Methods: We conducted the experiment using the publicly available PRORAD (PROstate Radiology Diagnosis) ML challenge dataset. The dataset contained 432 samples with 294 samples from one site (Site I) and 138 samples from another site (Site II). A total of 19107 clinical and MR radiomic features were provided. The binary diagnosis was based on biopsy results. We used data from Site I for training and from Site II for external validation. Seven FS and seven ML methods were implemented (details can be found in the supporting document), resulting in 49 models being trained using all combinations. Each model was internally validated under five-fold cross validation using the averaged area under the receiver operating characteristic curve (AUC). The final models were trained using all training data and externally validated using data from site II. The external validation performance was also evaluated by AUC.
Results: The average internal validation AUC among all 49 models reached 0.914, but dropped to 0.673 during external validation. The best-performing model based on internal validation (recursive feature elimination + support vector machine) had an AUC of 0.995. But its performance dropped significantly during external validation to an AUC of 0.642. The best-performing model based on external validation was from a different combination of the LASSO feature selection and linear regressor model with an AUC of 0.787.
Conclusion: The best-performing model established at one institution (training site) did not provide the optimal performance at a different institution (adoption site), possibly due to heterogeneous data distributions. Extra procedures such as transfer learning or feature harmonization are recommended to increase cross-site generalizability.
Cross Validation, MR, ROC Analysis