Click here to

Session: Multi-Disciplinary General ePoster Viewing [Return to Session]

Evaluation of Diversity in the Medical Imaging and Data Resource Center (MIDRC) Open Data Commons

H Whitney1,2*, N Baughan1, K Drukker1, K Myers3, M Giger1, MIDRC Bias and Diversity Working Group1, (1) University of Chicago, Chicago, IL, (2) Wheaton College, Wheaton, IL, (3) Food and Drug Administration, retired


PO-GePV-M-344 (Sunday, 7/10/2022)   [Eastern Time (GMT-4)]

ePoster Forums

Purpose: The Medical Imaging and Data Resource Center (MIDRC) is a multi-institutional initiative that aims to equitably collect, curate, and share medical images and associated data resources for the COVID-19 pandemic and beyond. To ensure that MIDRC provides data resources for development of strong generalizable models, MIDRC routinely evaluates the diversity of data as it is ingested into the repository by monitoring demographic metrics.

Methods: The MIDRC Bias & Diversity Working Group focuses on analysis of data distributions by key demographic elements of age, sex, race, ethnicity, and COVID-19 status at imaging for the primary imaging-based datasets Open-A1 and Open-R1 at The distributions of these demographic elements are routinely assessed both for current and cumulative MIDRC data commons. In this study, current selected data distributions are compared to US Census Bureau (USCB) data to evaluate data characteristics and identify areas where data are not representative.

Results: Since the data commons became open in October 2021, the data have demonstrated moderate variation from USCB data, including higher proportion of patients aged 50 and older (currently 71% in MIDRC, compared to 36% USCB), higher female-male sex ratio (55:44 MIDRC, 51:49 USCB), and higher proportion of Black or African-American patients (24% MIDRC, 12% USCB). The disparities in age and race may be due to disproportionate impact of COVID-19 to older and/or minoritized patients. Data by race could also be impacted by the large percentage of data for which no race is reported (15.3%) and the limited, though growing, number of contributing institutions.

Conclusion: To approach demographics comparable to the US population, more data are needed to supplement under-represented categories. This highlights the importance of diversity of data contributors and inclusion of demographic data elements with data contributions. Future studies will also compare the demographics to expected populations impacted by COVID-19.

Funding Support, Disclosures, and Conflict of Interest: Research reported is part of MIDRC and was made possible by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) of the National Institutes of Health under contracts 75N92020C00008 and 75N92020C00021.


CAD, Diagnostic Radiology, Quantitative Imaging


IM- Dataset Analysis/Biomathematics: Machine learning

Contact Email