Purpose: Radiation oncology is experiencing a rapid adoption of artificial intelligence (AI)- based tools for organ-at-risk (OAR) autosegmentation. To support related efforts at our institution, we developed a framework for qualitative review of autosegmentation model results applicable to custom model development, model comparison or clinical translation.
Methods: We leveraged MIM Maestro™ (MIM Software, Inc.) and developed a MIM Workflow (custom script) to guide experts through qualitative review of 42 modeled OARs on head and neck (H&N) planning CTs (HNpCT). On a per-OAR basis within the MIM environment: HU window-leveling was customized and automated navigation was performed to localize and resize axial, sagittal and coronal CT views; a progression of questions were asked with user entry based on a fixed qualitative coding scheme (details provided in Supporting Document). Responses, including automated capture of time spent, were recorded in real time and output to a .csv file, labeled automatically with HNpCT and reviewer identifiers. Custom Matlab (The Mathworks, Inc.) tools were developed to aggregate findings over the HNpCT cohort and produce summary plots. Two preliminary H&N AI model versions under custom development with an industrial partner were reviewed within this framework. We present findings for the latest of these models based on 81 HNpCTs (not used for model training) and five expert reviewers (two radiation therapists who received specialized H&N OAR segmentation training; one medical dosimetrist and two H&N radiation oncologists).
Results: H&N OARs were separated into high and low clinical priority (Nhigh=25; Nlow=17) based on dosimetric relevance; high-priority OARs were reviewed for 81 HNpCTs by all five experts requiring 1.08 minutes/OAR (on average); low-priority OARs were reviewed for a subset of 46 HNpCTs by four experts requiring 0.93 minutes/OAR.
Conclusion: This tool streamlines the reviewer experience and minimizes their effort; it is extremely valuable as a counterpart to quantitative model evaluation.
Segmentation, Quality Assurance, Modeling
Not Applicable / None Entered.