Purpose: To present a new two-stage clustering algorithm, which overcomes the shortcomings of current standard clustering algorithms for radiomics applications.
Methods: The problem was classifying M groups, including N samples with J variables, into L classes based on the similarity. First, we reduced the number of variables from J to p by doing the principal component analysis (PCA). Then, the k-means algorithm was applied to cluster N samples into K subgroups. This provided a probability P(j,k) that a sample of the j-th group belongs to subgroup k. Then, using P(j,k) as the new K variables for the stage-2 clustering, we classified M groups into L classes by the hierarchical clustering method. We tested our method with 1175 (=N) contours of 22 (=M) anatomical structures obtained from the head-and-neck IMRT/VMAT plans of 36 patients. We used the SIBEX program to calculate 174 (=J) radiomics features. The final results showed six (=L) classes of 22 anatomical structures. The results were compared with the consensus clustering (CC) algorithm, a newer clustering method. Note that we calculated the mean values of 174 radiomics features for the 22 structures for the CC algorithm.
Results: The hyperparameters p and K were determined to be 5 and 6, respectively, by optimizing the performance of PCA and k-means clustering. The final classification result of the two-state clustering method presented by a dendrogram showed six classes whose members shared similar characteristics. Class 1 contained CTV and PTV. Class 2 included bony structures like mandibles and cochleae. The CC algorithm lead the same classification results for six clusters. However, it is noteworthy that the two-stage algorithm did not use the radiomics feature values averaged for contours of the same structures.
Conclusion: Our algorithm could successfully cluster many samples into classes that share common characteristics.