Click here to

Session: Data Science Robustness, Performance, and Data Harmonization [Return to Session]

The Medical Imaging and Data Resource Center (MIDRC) Technology Development Project (TDP) 3c: Developing Tools to Assist in Task-Specific Performance Evaluation for Machine Learning Algorithms Employing MIDRC Data

K Drukker1, B Sahiner2, T Hu2, G Kim3, H Whitney1,4, N Baughan1, K Myers5, M Giger1, M McNitt-Gray3*, (1) University of Chicago, Chicago, IL, (2) US Food and Drug Administration, Silver Spring, MD, (3) David Geffen School of Medicine at UCLA, Los Angeles, CA, (4) Wheaton College, Wheaton, IL, (5) US Food and Drug Administration (retired),Phoenix, AZ


SU-H430-IePD-F6-4 (Sunday, 7/10/2022) 4:30 PM - 5:00 PM [Eastern Time (GMT-4)]

Exhibit Hall | Forum 6

Purpose: In response to the COVID-19 pandemic, one of the aims of MIDRC is to facilitate machine learning research for tasks relating to early detection, diagnosis, prognosis, and assessment of treatment response related to COVID-19. The purpose of our technology development project (TDP) is to create accessible resources to assist researchers in the selection of appropriate metrics for task-specific performance evaluation of their machine learning algorithms.

Methods: The TDP 3c team identified multiple use cases of clinical tasks that could be approached using MIDRC data. For each task, the team identified the expected type of 1) image data, 2) reference standard, and 3) machine learning output, and provided recommendations on performance evaluation approaches and metrics that would be appropriate. An interactive decision tree has been developed in which users can select the type of task, the nature of the reference standard, and the content of the algorithm output and then obtain recommendations regarding appropriate performance evaluation approaches and metrics, including literature references, short video tutorials, and links to available software when applicable.

Results: To date, the decision tree has been made public for two-class classification problems (e.g., COVID-19 positive vs. negative) where the reference standard has negligible variability (e.g., using PCR test for COVID-19 status). The tree provides suggestions for situations where the algorithm produces either binary or non-binary (e.g. continuous) output. Performance evaluation approaches and metrics as well as references and links to software are provided.

Conclusion: As part of the MIDRC resource environment, tools are being developed to assist researchers in conducting task-specific performance evaluation of their machine learning algorithms. The current decision tree addresses two-class classification problems, and it is currently being expanded to include multi-class classification (such as assessing disease severity), disease localization, segmentation, time-to-event, and estimation analyses.

Funding Support, Disclosures, and Conflict of Interest: This research was funded through MIDRC, which is funded by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) of the National Institutes of Health under contracts 75N92020C00008 and 75N92020C00021. The authors reported no other conflicts relating to this work.


Quantitative Imaging, Statistical Analysis, Image Analysis


IM- Dataset Analysis/Biomathematics: Machine learning

Contact Email