Purpose: As machine learning (ML) systems become more and more widely used, it is critical to build mechanisms for them to report the confidence level of predictions. For organ-at-risk (OAR) autosegmentation, one measure of confidence is whether the shape determined at prediction time is similar to shapes used to train the model. Here, we develop an ML method to characterise the distribution of OAR shapes and use it to find erroneous training data and anomalous shapes in simulated prediction data.
Methods: We built a 3D Resnet-style classifier with 13 layers using TensorFlow(TM) 2.3.1 and Keras, and trained it to classify 11 head and neck OAR from a publicly-available dataset . 20 of each OAR were used. The trained model was used in inference mode to classify 878 OAR from the same dataset. The shape distribution was determined by reducing dimensionality of weights in intermediate layers of the classifier to 2, using principal component analysis and t-SNE. Outliers were detected by visual inspection or calculating distances of data points from cluster center-of-mass. t-SNE variables (perplexity) and classifier layer were varied.  Wee, L., & Dekker, A. (2019). Data from Head-Neck-Radiomics-HN1 [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/tcia.2019.8kap372n
Results: Examples of anomalous shapes were detected include mislabelling (both cochlea labelled 'left cochlea' in training data); unusual anatomy (oral cavity of patient without teeth); patient positioning differences (brain of a patient with reduced neck extension); contouring styles (volume of cochlea, inferior limit of spine contour). One OAR (oral cavity) showed two distinct clusters indicating variation in contouring styles.
Conclusion: Clustering of OAR shapes is useful for cleaning ML training data and flagging potentially incorrect predictions. This may be used to help autosegmentation programs determine prediction confidence levels.
Not Applicable / None Entered.