ePoster Forums
Purpose: Feature selection and feature reduction are often used to minimize overfitting and improve transfer learning models especially with small training datasets. In this study, we explore how feature selection and reduction impact transfer learning classification performance in the task of identifying pathology status of cytopathological-indeterminant thyroid nodules from ultrasound images (US).
Methods: The dataset of 151 thyroid cases, collected under an IRB protocol, included 69 indeterminant thyroid nodule cases (222 grayscale ultrasounds) with a surgical pathology of malignant (IM) and 82 indeterminant nodule cases (254 grayscale ultrasounds) with a surgical pathology of benign (IB). US images were cropped to a nodule region of interest as contoured by an UCM surgical endocrinology team and resized to 224x224. Image features were extracted from a VGG19 model pretrained on ImageNet (yielding 1472 total initial deep learning features). Three methods of feature selection were investigated: low variance thresholding (LVT), high correlation filtering using Pearson’s coefficients (HCF), and Mann-Whitney U Test (MWU-T). After feature selection, feature reduction was performed using principal component analysis (PCA). With each of the reduced feature sets, classification was conducted with a Support Vector Machine (SVM) employing 5-fold cross-validation by nodule over 10 iterations. Classification performances were assessed by receiver operating characteristics (ROC) analysis, using the area under curve (AUC) with 95% confidence intervals (CI) as the figure of merit.
Results:
Results: ROC analysis from the SVM classifications yielded AUC values [CI] of 0.70 [0.62,0.78], 0.72 [0.68,0.80], 0.73 [0.66,0.81], and 0.80 [0.73,0.87] with all features, LVT (threshold=1E-6, 50 components), HCF (Pearson’s >0.8, 50 components), and MWU-T (p<.025, 50 components), respectively.
Conclusion: Feature selection shows promise for improving performance at reduced classification parameters in the deep transfer models explored. These strategies merit further study for other small dataset medical classification tasks to avoid overfitting and optimize model performance.
Funding Support, Disclosures, and Conflict of Interest: Funding in part by T32 GM07281. MLG receives royalties from Hologic, GE Medical Systems, MEDIAN Technologies, Riverian Medical, Mitsubishi, Toyota, and is a cofounder of Qlarity Imaging. HL receives royalties from Hologic.
Feature Selection, Feature Extraction, Computer Vision
IM/TH- Image Analysis (Single Modality or Multi-Modality): Computer/machine vision