Purpose: Physicians give widely varying customized names to radiotherapy structures present in the prostate and lungs. The standardization of the physician-given names of the Organs at Risk (OAR), Target Organs (PTV), and other Organs inside the area of interest is a significant problem towards radiotherapy quality assurance. Prior work in this area either considered image or textual data but not both. We designed and evaluated an integrated model that considers both types of data compiled from the radiotherapy centers administered by VHA and the Department of Radiation Oncology at VCU.
Methods: The combined VCU and VHA radiation oncology data has 16,290 prostate and 13,999 lung structures. Within each structure set name, all digits and English alphabets were converted into lower-case. The data was stratified and split into 70:30 training and testing sets. BioBERT was used to tokenize the text and the embedding layer was based on our corpus. Since the textual data performed better than the feature-reduced image data, the textual features were assigned higher importance. Then two parallel CNNs were designed for the textual dataset and another CNN for the feature-reduced image data (with 1000 features) and all three outputs are concatenated together and fed into two dense layers of the CNN to standardize structure names.
Results: Macro-averaged precision, recall, and F1-score for prostate were 0.886, 0.94, 0.911, and that for lung were 0.896, 0.878, 0.915, respectively.
Conclusion: CNN-based integration of image and text data provides higher F1-scores than on individual datasets delivering state-of-the-art performance for structure name standardization. The majority of structures in real clinical datasets are not true OARs or planning targets adding class imbalance challenges leading to lower overall accuracy despite improved F1-scores. Addressing the large number of non-OAR and non-PTV structures is important for future research.
Funding Support, Disclosures, and Conflict of Interest: This work is funded by US VHA National Radiation Oncology Program (NROP). The authors don't have any conflicting interests.