Purpose: Asses the performance of a machine learning algorithm to classify benign from potentially malignant cystic renal masses (CRM) presented in computerized tomography (CT) images based on the radiomics analyses. Evaluate the classification performance of the 1st order features alone in comparison to the inclusion of the higher order features. Investigate the effect of inter-reader variability in delineating the region of interest (ROI) for the same classification task.
Methods: 230 region of interests (ROIs) were independently delineated by two radiologists. Through a combination of random fluctuations, dilation, and erosion operations over the original ROIs, we generated four additional sets of synthetic ROIs, which aimed to represent the inter-reader variability in a realistic manner. We evaluated the degree of variability among the ROIs visually and quantitatively based on the dice coefficient measurements. We applied a 10-fold stratified cross-validation (CV) to train and test the performance of the random forest model for the classification of CRMs. Each fold included the selection of the robust features based on ICC calculated over the 2 original and 4 synthetic sets of ROIs; removal of the highly correlated features; grid-search to tune the model training parameters; and fitting and testing the model.
Results: The mean area under curve (AUC), sensitivity, specificity, positive predictive value, and negative predictive value calculated over the six ROIs were 0.87, 0.82, 0.90, 0.85, and 0.93, respectively, for the ~20 robust (ICC > 0.85) and uncorrelated (Pearson< 0.85) features, where 5 of them were 1st-order features. Similar results were obtained using 1st-order features only.
Conclusion: We determined the usage of the 1st-order features alone is sufficient for the classification of cystic renal masses and the inclusion of higher order features does not necessarily improve the performance. We also determined the degree of variability introduced by the delineation process in the prediction outcome.