Purpose: Recently, there has been renewed interest in using object detection algorithms to localize the treatment target on X-ray Fluoroscopy to overcome the time-consuming shortage of template matching. However, the major obstacle of deep-learning-based object detection is that the accuracy drops severely when the target is blocked by vertebral bodies. In this study, we set out to verify the usefulness of a modified YoloV3 framework combined with training data augmentation strategies to mitigate this problem.
Methods: To compare with the previous study applying the Faster-RCNN framework, we repeated the experiments with the dynamic thorax phantom (CIRS), in which a tumor surrogate was programmed to move independently from the chest. Besides, we verified the reproducibility of the proposed method using datasets provided by The MArkerless Lung Target Tracking CHallenge (MATCH). The datasets consist of CT and X-ray imaging files of a 3D-printed anthropomorphic phantom and files recording 3D trajectories of the complex phantom motion. In both experiments, the programmed 3D positions were mapped to 2D X-ray images to serve as the ground truth.
Results: For the CIRS phantom, the mislocalization problem was addressed when the surrogate size was 18mm or 27mm. But when it decreased to 12mm, the localization failure rate(LFR) dropped to 16% compared with the result of 41% for Faster-RCNN. In MATCH datasets, for target1, targets in all frames were successfully detected and errors of 86% results were less than 1mm with an average error of (0.58±1.7mm, -0.35±2.4mm). For target2, the LFR was 7% and errors of 72% results were less than 1mm with an average error of (0.74±1.9mm, 1.32±1.47mm). The overall end-to-end processing time of an image is less than 60ms(~16.7FPS).
Conclusion: The proposed method is obviously improved, considering the trade-off between accuracy and localization latency. It nearly addresses the mislocalization problem while further approaching rea-time operation.