Click here to

Session: AI in Imaging [Return to Session]

Deep Learning Prostate Segmentation in 3D Ultrasound and the Impact of Image Quality and Training Dataset Size

N Orlando1,2*, I Gyacskov2, D Gillies3, D Cool1,3, D Hoover1,3, A Fenster1,2, (1) Western University, London, ON, CA, (2) Robarts Research Institute, London, ON, CA, (3) London Health Sciences Centre, London, ON, CA


TH-D-207-3 (Thursday, 7/14/2022) 11:00 AM - 12:00 PM [Eastern Time (GMT-4)]

Room 207

Purpose: While deep learning offers promising results for image segmentation tasks, including prostate segmentation in ultrasound, large medical image datasets are rare, making widespread clinical translation difficult. In addition, image quality is highly variable in ultrasound imaging and there is currently no method for quantifying image quality in transrectal ultrasound (TRUS). We have previously proposed a 2D radial deep learning plus 3D reconstruction approach for prostate segmentation. To examine the efficiency of our method we assessed segmentation performance as a function of training dataset size. An image quality grading scale specific to TRUS prostate imaging was developed to explore the impact on segmentation performance.

Methods: Our complete dataset consisting of 206 3D TRUS images from clinical biopsy (end-fire) and brachytherapy (side-fire) procedures was resliced into 6761 2D images for training a U-Net++ network. Split end-fire, side-fire, and mixed datasets were reduced in size to 1000, 500, 250, and 100 2D images. The testing dataset consisted of 20 end-fire and 20 side-fire 3D TRUS images unseen during training. A TRUS prostate image quality scale was developed with three independent factors (acquisition quality, anatomy artifact severity, and boundary visibility) graded on a 5-point scale.

Results: Compared to the full training dataset, Dice score was only significantly lower when training datasets were reduced to 500 images, with segmentation performance plateauing at 1000 images. For our specific dataset, image quality had no impact on performance for end-fire images, while boundary visibility and acquisition quality had a significant effect for side-fire images.

Conclusion: High performance with training datasets as small as 1000 2D images highlights the efficiency of our approach. This could increase access to automated segmentation, even if data is scarce. The image quality grading scale provides a tool for assessing segmentation performance, allowing for comparison between networks trained on different datasets.

Funding Support, Disclosures, and Conflict of Interest: The authors are grateful for funding support from the Ontario Institute of Cancer Research, Canadian Institutes of Health Research, and Natural Sciences and Engineering Council. This work was also supported by the London Regional Cancer Program's Catalyst Grants program using funds raised by the London Health Sciences Foundation.


Segmentation, Ultrasonics, Image Processing


IM- Ultrasound : Machine learning, computer vision

Contact Email