Purpose: To identify the best performing configuration of two state-of-the-art, self-configuring deep learning-based image segmentation methods, no-new-UNet (nnUNet) and not-another transFormer (nnFormer), for automatic cardiac substructure segmentation in thoracic planning CT images.
Methods: Fifty-two lung cancer patients underwent CT simulation. Ground truth delineations were performed for contouring ten cardiac substructures: superior vena cava (SVC), left ventricle/atrium (LV/LA), right ventricle/atrium (RV/RA), pulmonary artery (PA), left anterior descending artery (LAD), ascending/descending aorta (AA/DA) and aortic arch. nnUNet and nnFormer were both trained as five-fold cross-validation with four different configurations: 2d, 3d_fullres (operated on full resolution images), 3d_lowres (operated on downsampled images), and 3d_cascade_fullres (refining 3d_lowres’s output with 3d_fullres). Predictions from any two of these configurations were ensembled by averaging softmax probabilities, to create six separate ensembles: ensemble_2d_3d_cascade_fullres, ensemble_2d_3d_fullres, ensemble_2d_3d_lowres, ensemble_3d_fullres_3d_cascade_fullres, ensemble_3d_fullres_3d_lowres, and ensemble_3d_lowres_3d_cascade_fullres. For each structure, cross-validated auto-segmentation results were evaluated using Dice score (DS) and Jaccard coefficient (JC).
Results: Ensembling nnUNet or nnFormer configurations yielded the best results for automatic segmentation of all cardiac substructures except the aortic arch. nnUNet_ensemble_3d_fullres_3d_cascade_fullres provided the most accurate segmentations for SVC (DS: 0.833±0.041, JC: 0.716±0.059 [mean and SD]), RV (DS: 0.850±0.033, JC: 0.741±0.048) and LAD (DS: 0.439±0.162, JC: 0.294±0.129); nnUNet_ensemble_3d_fullres_3d_lowres for LV (DS: 0.896±0.028, JC: 0.813±0.044); nnFormer_ensemble_2d_3d_cascade_fullres for LA (DS: 0.843±0.045, JC: 0.730±0.065) and PA (DS: 0.835±0.046, JC: 0.719±0.065); nnUNet_ensemble_2d_3d_lowres for RA (DS: 0.837±0.038, JC: 0.720±0.056); nnUNet_ensemble_2d_3d_fullres for AA (DS: 0.892±0.024, JC: 0.806±0.038); nnFormer_ensemble_2d_3d_fullres for DA (DS: 0.919±0.019, JC: 0.850±0.033); and nnUNet_2d for aortic arch (DS: 0.872±0.068, JC: 0.778±0.102). By contrast, nnUNet_2d/nnFormer_2d generated the least accurate segmentations for SVC, LV, LA, RV, RA, PA, and LAD, while nnUNet_3d_lowres/nnFormer_3d_lowres for AA, DA, and aortic arch.
Conclusion: Self-configuring nnUNet/nnFormer produce accurate predictions of cardiac substructures in strong agreement with manual segmentation, yet LAD segmentation remains challenging. Ensemble configurations can offer superior predictions for improved cardiac-sparing radiotherapy planning.
Funding Support, Disclosures, and Conflict of Interest: This project was supported by grants U24CA180803 (IROC) and U10CA180868 (NRG), from the National Cancer Institute.