Purpose: Inter-observer variation is an important issue for automated segmentation in the clinic. To account for inter-observer variability in delineating the prostate gland, we employed a deep neural network architecture, comprised of a U-net and a variational autoencoder (VAE). VAE encodes inter-observer variability and produces multiple segmentations for each image CT slice robust to the inter-user uncertainty. To our knowledge, this is the first such application of a VAE.
Methods: The source dataset contained 300 patient CT images in which the prostate was delineated by a single physician. The target dataset used for incorporating inter-observer variation was comprised of 10 patient CT datasets, where the prostate for each patient was segmented by 5 independent physicians. Data augmentation utilizing random rotation (<5degrees), cropping and horizontal flipping was applied to each dataset to increase the sample size by factor of 100. Probabilistic hierarchical U-net with VAE (PHiSeg) was pre-trained using the source dataset for 30 epochs. Model weights were then transferred to fine tune PHiSeg using the target dataset for 100 epochs via transfer learning. Ten iterations using random permutation sampling (with training/validation/test ratio of 6:1:3) of the augmented target dataset were performed.
Results: Average results of models trained with the target dataset only versus source followed by target datasets (transfer learning) were as follows: Dice score=0.76+0.03 vs. 0.80+0.02 (p=0.001); Hausdorff distance (mm)=11.48+2.28 vs.10.18+1.35 (p=0.019); normalized-cross-correlation=0.52+0.10 vs. 0.62+0.06 (p=0.006); generalized energy distance=0.33+0.09 vs. 0.26+0.06 (p=0.002). All metrics were shown to improve significantly using the proposed VAE and transfer learning approach.
Conclusion: A VAE combined with a hierarchical U-net demonstrated promise toward accounting for inter-observer variability in automatic prostate segmentation. Multiple segmentations for each CT slice enable users to determine clinical tradeoffs in selecting the “best fitting” contour, which offers advantages over a standard U-net, where output is limited to just one contour.
Funding Support, Disclosures, and Conflict of Interest: This work was supported in part by a grant from Varian Medical Systems (Palo Alto, CA)