Purpose: Accuracy of liver resection planning can benefit from automated tools to contour the liver segments. We developed and validated deep learning-based models utilizing multiple architectures and datasets to auto-contour liver segments for surgical planning.
Methods: Data included liver segments that were manually contoured on 81 CT images of internal patients diagnosed with liver cancer and CT scans of 193 cases from task 8 Medical Imaging Decathlon (MID) dataset annotated by the 2018 MID participant (https://github.com/GLCUnet/dataset). A patch-based 3D Attention U-Net architecture (paUNet) and a 3D full resolution architecture of nnUNet were used to train 5 models using combinations of internal, MID datasets, and combined datasets. The tuned hyperparameter for paUNet involves filters (N=64, 48, 32, 16), blocks (N=2,4), and cyclic and stable learning rates from 16 different models. Segments 1, 2, 3, 4, combined segments 5-8 and 2-3 were evaluated. Models were quantitatively evaluated using Dice similarity coefficients (DSC), Mean distance to agreement (MDA), and volumetric percent difference (PDV) on 10 withheld internal CT scans. The intraobserver variability of DSC was calculated for 5 cases.
Results: The best results were obtained for nnUNet_internal and nnUNet_combined with median (standard deviation) values of DSC, MDA, and PDV of 0.85(0.16), 0.25cm(0.36cm) and 13.57%(40%), and 0.86(0.16), 0.26cm(0.36cm) and 13.90%(41%), respectively. Seg 5-8 showed best results with the average DSC, MDA, PDV of 0.96(0.05), 0.15cm(0.18cm), and 3%(12%). For paUNet, the best hyperparameters were 64 filters, 2 block, and cyclic learning rate with an overall average DSC of 0.75. The average intraobserver variability over all segments was DSC 0.84(0.07), MDA 0.28cm(0.14cm), and PDV 6.5%(17%).
Conclusion: We concluded that nnUNet based models provided the most accurate results for liver segment contouring. Training on multi-institutional data did not improve segmentation results for single institution test data. The auto-contouring accuracy was consistent with intraobserver variability of manual contouring.
Funding Support, Disclosures, and Conflict of Interest: Research reported in this publication was supported in part by the National Cancer Institute of the National Institutes of Health under award number 1R01CA221971