Purpose: Gathering a sufficiently large labelled dataset for automated disease detection algorithms is a difficult and time-consuming task. Here, we investigate how lesion detection performance of convolutional neural networks (CNNs) is impacted when dataset size is increased through combining data from multiple disease types.
Methods: Lesions were manually contoured on baseline and follow-up FDG PET/CT images of patients with diffuse large b-cell lymphoma (Npatients=133, Nscans=415), head/neck cancer (Npatients=594, Nscans=898), and non-small cell lung cancer (Npatients=225, Nscans=339). A held-out test dataset of 40 scans per disease included 127 lymphoma scans (1,452 lesions), 55 head/neck (190 lesions), and 65 lung (416 lesions). Two CNN architectures were implemented (3D U-net and 3D retina U-net) with PET/CT images as inputs. Four CNNs were trained for each architecture: one per disease type and one with all train images combined. Performance differences of disease-mixed vs disease-specific training on the lesion detection sensitivity and number of false positives per patient (FPs/patient) was assessed using Wilcoxon signed-rank tests.
Results: In lymphoma, for U-net and retina U-net respectively, disease-mixed training resulted in an overall 20% and 13% lower sensitivity (p<0.001, p=0.04), with 1.3 and 2.2 more FPs/patient (both p<0.001) compared to the lymphoma-only model. For head/neck cancer, mixed-disease U-net and retina U-net training resulted in an overall 13% and 5% increase in sensitivity (p=0.02, p=0.04), respectively, with no change in FPs/patient for U-net and 0.4 more FPs/patient for retina U-net (p=0.04) compared to the head/neck-only model. For lung cancer, no significant impact was found for U-net, while retina U-net showed disease-mixed training had an overall 7% lower sensitivity (p=0.002), but 2.1 fewer FPs/patient (p<0.001) compared to the lung-only model.
Conclusion: For some disease types, lesion detection algorithms may be improved with the inclusion of images from other disease types, while others may require disease-specific models for optimal performance.
Funding Support, Disclosures, and Conflict of Interest: All authors are employed by AIQ Solutions