Purpose: Assessment of case-based repeatability of artificial intelligence algorithms can provide insight into potential clinical use over a range of cases. Previous studies have demonstrated that variation in training cases impacts repeatability. This study investigated impact of classifier on case-based classification repeatability using human-engineered radiomic features extracted from full-field digital mammography (FFDM) images of breast lesions, for three different classifiers: linear discriminant analysis, support vector machine, and random forest.
Methods: A retrospective, HIPAA/IRB compliant collection of 268 unique lesions (138 malignant, 130 benign) was used. There were 516 FFDM images; most lesions (144) were imaged in two or three views. Lesions were automatically segmented; 29 radiomic features were extracted for each image. Classifier training/testing was conducted using a 0.632 bootstrap/1000 iterations, with cancer prevalence maintained in each bootstrap sample. The same bootstrap samples were used for the classifiers. Classifier outputs were averaged from different views of the same lesion to obtain case-based output (CBO). Performance in the task of distinguishing malignant from benign lesions was evaluated by the 0.632+ bootstrap-corrected area under the receiver operating characteristic curve (AUC). The bootstrap samples were used to calculate confidence intervals for AUC and the difference in AUC in pair-wise comparisons between classifiers. Difference in AUC was significant if the 98.33% confidence interval (CI, adjusted from 95% for multiple comparisons) failed to include zero. Case-based repeatability by classifier was characterized by repeatability profiles, i.e., histograms of median 95% CI of CBO as function of median CBO across all cases from the bootstrap samples.
Results: Classifier performance failed to demonstrate statistically significant differences, but classifier output was more repeatable when the random forest classifier was used.
Conclusion: This suggests that even if classifiers appear to perform similarly in terms of AUC, classifier choice can impact repeatability and should be considered when developing computer-aided diagnosis systems.
Funding Support, Disclosures, and Conflict of Interest: KD receives royalties from Hologic, Inc. MLG is a stockholder in R2 technology/Hologic and QView, receives royalties from Hologic, GE Medical Systems, MEDIAN Technologies, Riverain Medical, Mitsubishi and Toshiba, and was a cofounder in Quantitative Insights (now consultant to Qlarity Imaging).