Scalable De-Identification Pipeline for Radiation Therapy Machine Learning Research

D Moseley*, S Seetamsetty, E Tryggestad, S Shiraishi, Mayo Clinic, Rochester, MN

D Moseley

Presentations

PO-GePV-M-30 (Sunday, 7/10/2022) [Eastern Time (GMT-4)]

ePoster Forums

Purpose: The purpose of this study was to construct and evaluate a scalable in-house tool for DICOM-RT de-identification.

Methods: Our custom tool was written in C#. Alteration of given tags is controlled by a customizable template which can include targeted string or character redaction. Unique Identifiers (UIDs) are translated through a salted one-way hash function. UID traceability is maintained through a private database which can be queried to decode transformed patient identifiers. The tool was deployed as a DICOM adaptor which can interface with any DICOM node or (PACS) and can readily handle parallel data transfer (i.e., multiple DICOM associations). DeID output is subsequently directed to another DICOM node or to a file storage system such as network-attached storage. We evaluated this tool for a specific use-case of data curation for deep learning (DL) applications requiring DeID of approximately 1000 radiotherapy (RT) DICOM datasets including DICOM CT, RT Structure Sets, RT Plans, RT Doses and Spatial Registration Objects. The output was reviewed by medical physicists, ML scientists and an external privacy expert.

Results: The external expert concluded that no PHI was present after the de-identification and that our pipeline met Safe Harbor standards. The tool met the needs of the use case ML data curation and was invaluable in terms of efficiency gains (given no manual interaction). This allowed the curation team to focus on actual curation efforts and reduced the otherwise potentially huge burden for DeID.

Conclusion: The DeID tool ensures proper mapping of DICOM Instance UIDs and facilitates de-identification of DICOM data while preserving the longitudinal relations and linkage between files. The tool was also able to redact any identifiable information in free-text DICOM tags.

Keywords

DICOM-RT, Computer Software, Image Processing

Taxonomy

IM- Dataset Analysis/Biomathematics: Machine learning

Contact Email

Scalable De-Identification Pipeline for Radiation Therapy Machine Learning Research

D Moseley*, S Seetamsetty, E Tryggestad, S Shiraishi, Mayo Clinic, Rochester, MN

Presentations

Share:

Additional Links