Purpose: To compare the effect of deep learning data set class imbalance handling techniques on the performance of deep learning models.
Methods: Two image datasets were examined: diabetic retinopathy images (EyePACS) and face images (CelebA). A binary classification problem was established for each dataset: diabetic retinopathy for EyePACS and glasses for CelebA. Several class imbalance ratios (inherent, 5, 10) were examined, and several standard architectures were trained (ResNet 50, DenseNet 121, and EfficientNet B0) using transfer learning with ImageNet weights and training restricted to the final classification layer. Stochastic gradient descent with learning rate 0.001 and momentum 0.9 for 10 epochs was used for all models. Three model trainings with dataset shuffling and random weight initialization for the final classification layer were used. A model fit was performed without any imbalance handling technique to establish a baseline performance for each dataset, imbalance ratio, and architecture combination. The imbalance handling techniques examined included oversampling, undersampling, focal loss, two-phase learning, and dynamic sampling. Performance was quantified based on recall, precision, accuracy, and area under the precision recall curve (AUPRC).
Results: The best performing methods as measured by AUPRC were oversampling and focal loss. However, AUPRC improvements over baseline models were minimal and often produced similar AUPRC values. Imbalance handling techniques generally increased recall and decreased precision and accuracy compared to baseline models. The previous effect of imbalance handling techniques on recall, precision, and accuracy was more pronounced for higher levels of class imbalance. The previous results held true between the two datasets and three architectures used.
Conclusion: The results suggest that deep learning imbalance handling techniques do not significantly improve AUPRC performance but instead change a model’s decision boundary. The techniques should be considered when a higher false positive rate is preferred over a higher false negative rate.
Image Analysis, Image Processing
IM/TH- Image Analysis (Single Modality or Multi-Modality): Computer/machine vision