Purpose: This project’s goal is to utilize machine learning techniques to detect anomalous outlier prescriptions by modeling the radiation oncologist’s prescription decision making protocol process. The data-driven approach allows us to model the process without complete knowledge of the particular protocol used and their details.
Methods: 14 years of clinical data (01/01/2007 – 01/01/2021) were queried from MOSAIQ, which includes all patients treated in the radiation oncology department of our institution. All features related to patients’ treatment information were extracted, including treatment intent, treatment techniques, treatment site, tumor stages, tumor markers, biomarkers etc. Prescription data included number of fractions and total dose. Patients were grouped by their diagnostic code for site-specific model development. Prostate group data (6482 patients) was preprocessed, missing data was imputed, models were trained and tested with a Random Forest (RF) supervised learning base model. We trained hyper-parameter (maximum depth) to avoid overfitting. Feature importance analysis was performed by fitting the data with RF regression models. In addition, statistical analysis and visualization tools such as joyplot and scatter plot were employed to assist discovering important features and to explore the data.
Results: Initial stage analysis shows that the treatment techniques and treatment intent are among the most important features for the Prostate group and average percentage prediction error was 70.87% for the RF model in contrast with 96.36% error from a constant mean prediction.
Conclusion: Preliminary analysis shows that the treatment techniques are the most important features for predicting number of fractions as well as predicting total dose. Tumor stages and morphology codes are the least important features. Improvement on feature selection, imputation method, and multiple hyper-parameters optimization can potentially improve the model prediction result.
Funding Support, Disclosures, and Conflict of Interest: Funding support: NSF 2035750, SBIR grant Disclosures and Conflict of Interest: Oncospace Inc.
Modeling, Numerical Analysis, Statistical Analysis
IM/TH- Mathematical/Statistical Foundational Skills: Machine Learning