Click here to

Session: Deep Learning for Image-guided Therapy [Return to Session]

Transformer-Based Deep Learning Architecture for Improved Cardiac Substructure Segmentation

N Summerfield1, 2*, J Qiu3, S Hossain1, M Dong3, C Glide-Hurst1, 2, (1) University of Wisconsin Madison, Department of Human Oncology, (2) University of Wisconsin Madison, Department of Medical Physics, (3) Wayne State University, MI, Department of Computer Science


WE-C1030-IePD-F2-2 (Wednesday, 7/13/2022) 10:30 AM - 11:00 AM [Eastern Time (GMT-4)]

Exhibit Hall | Forum 2

Purpose: Cardiac substructures are highly radiosensitive. However, accurate segmentation for treatment planning remains challenging, particularly for the coronary arteries. This work optimized a state-of-the-art 3D-UNET transformer-based deep learning architecture (3D-UNETR) by capturing both global and local dependencies to improve volumetric substructure segmentation accuracy in both diagnostic and radiation oncology applications.

Methods: 3D-UNETR implements encoder-decoder architecture based on self-attention layers where each transformer block includes a multi-head attention layer, a feed-forward neural network, shortcut connections, and layer normalization. Initial training was conducted using 20 (16 training, 4 test) labeled 3D whole-heart cardiac diagnostic MRI from the Multi-Modality Whole Heart Segmentation (MM-WHS) challenge (left/right ventricles, myocardium, ascending aorta, etc). To support multi-modality needs in radiation oncology, the 3D-UNETR was trained using 25 T2-weighted MRI/CT simulations with ground truth labels of 12 cardiac substructures (great vessels, coronary arteries, etc.) and compared against 3D-UNET. After data augmentation and training, predictions were evaluated against ground truth via dice similarity coefficients (DSC) and mean distance to agreement (MDA).

Results: MM-WHS MR-only model yielded an average DSC of 0.76 across all substructures (ranging from 0.65 for pulmonary artery to 0.83-0.85 for chambers). For the multi-modality radiation oncology model, 3D-UNETR validation histories demonstrated learning global and local dependencies yielding more accurate substructure localization and segmentation as compared to traditional 3D-UNET. Paired t-tests yielded statistically different improvements in MDA of the entire model between 3D-UNETR and 3D-UNET (p < 0.05), with 3D-UNETR producing higher DSC in the left anterior descending artery predictions than 3D-UNET (p < 0.05).

Conclusion: By incorporating self-attention, 3D-UNETR more precisely segmented small, complex substructures than 3D-UNet while maintaining acceptable segmentation of the larger substructures. The methodology appears promising for both diagnostic MRI and multi-modality MRI/CT infrastructures. Future work will involve merging training to develop an image-agnostic pipeline that can be widely applicable for cardiac applications.

Funding Support, Disclosures, and Conflict of Interest: Research collaborations with Philips Healthcare, GE Healthcare, ViewRay, Inc., and Modus Medical. Research partially supported by the National Cancer Institute of the National Institutes of Health under Award Number R01CA204189 and R01HL153720.


Segmentation, Heart


IM/TH- Image Segmentation Techniques: Modality: MRI

Contact Email