This repository provides tools for validating and harmonizing datasets submitted to the RADx-rad Data Coordinating Center (DCC) for integration into the NIH RADx Data Hub. It includes utilities to process and convert raw submission files into standardized formats for downstream use.
The following RADx-rad datasets have been harmonized using this toolkit and are available in the NIH RADx Data Hub.
RADx-rad study datasets must follow this structure before harmonization:
data_harmonized/
└── rad_xxx_yyy-zz/ # Unique study directory
└── preorigcopy/ # Raw submitted files
├── rad_xxx_yyy-zz_label_DATA_preorigcopy.csv
├── rad_xxx_yyy-zz_label_DICT_preorigcopy.csv
├── rad_xxx_yyy-zz_label_META_preorigcopy.csv
└── ...
Each label
is a unique user-defined string that describes each triplet of files (data, dictionary, metadata).
Run the following steps for each study (rad_xxx_yyy-zz
), fixing any reported errors along the way.
cd src
python phase1.py -include rad_xxx_yyy-zz
- Output:
work/phase1_errors.csv
- Fix files in
preorigcopy/
and rerun if needed.
python phase2.py -include rad_xxx_yyy-zz
- Output:
work/phase2_errors.csv
- Fix files in
work/
and rerun if needed.
python phase3.py -include rad_xxx_yyy-zz
- Output directories:
origcopy/
: Harmonized raw submission filestransformcopy/
: Globally harmonized Tier 1 files (optional)
- Errors:
work/phase3_errors.csv
Submit the origcopy/
and, if available, transformcopy/
directories to the NIH RADx Data Hub.
- Miniconda3
- Git
- Java 17
# Update Conda and install prerequisites
conda update conda
# Install git if not present
conda install git -n base -c anaconda
# Install Java 17 if not present
git clone https://github.com/radxrad/metadata.git
git clone https://github.com/radxrad/radx-harmonizer.git
cd radx-harmonizer
mkdir source
# Data Dictionary Validator
wget -P source/ https://github.com/bmir-radx/radx-data-dictionary-validator/releases/download/v1.3.4/radx-data-dictionary-validator-app-1.3.4.jar
# Metadata Validator
wget -P source/ https://github.com/bmir-radx/radx-metadata-validator/releases/download/v1.0.6/radx-metadata-validator-app-1.0.6.jar
# Metadata Compiler
wget -P source/ https://github.com/bmir-radx/radx-rad-metadata-compiler/releases/download/v1.0.3/radx-rad-metadata-compiler-1.0.3.jar
mkdir reference
# Metadata Specification
wget -P reference/ https://github.com/bmir-radx/radx-metadata-validator/releases/download/v1.0.6/RADxMetadataSpecification.json
# Global Tier1 Dictionary
wget -P reference/ https://raw.githubusercontent.com/radxrad/common-data-elements/refs/heads/main/cdes/RADx-global_tier1_dict_2025-06-24.csv
# RADx-rad Tier1 and Tier2 Dictionaries
wget -P reference/ https://raw.githubusercontent.com/radxrad/common-data-elements/refs/heads/main/cdes/RADx-rad_tier1_dict_2025-06-24.csv
wget -P reference/ https://raw.githubusercontent.com/radxrad/common-data-elements/refs/heads/main/cdes/RADx-rad_tier2_dict_2025-06-24.csv
# Legacy Dictionary
wget -P reference/ https://raw.githubusercontent.com/radxrad/common-data-elements/refs/heads/main/cdes/RADx-rad_legacy_dict_2025-06-24.csv
mkdir meta
cp ../metadata/metadata_templates/*.csv meta
Create and activate the project environment using the provided environment.yml
.
conda env create -f environment.yml
conda activate radx-harmonizer
To deactivate:
conda deactivate
Resource | Description |
---|---|
RADx Data Dictionary Specification | Specification of the RADx Data Dictionary format |
RADx-rad Data Dictionaries | Tier 1 (RADx global) and Tier 2 (RADx-rad-specific) data elements |
RADx-rad Metadata | Study-specific metadata files |
RADx-rad Publications | List of publications related to RADx-rad objectives |
RADx-rad Tech Data Organization | Description how data for diagnostic methods development are organized |
Peter W. Rose, RADx-rad Harmonizer: Data Validation and Harmonization Toolkit for Data Submissions, Available online: https://github.com/radxrad/radx-harmonizer (2025)
Supported by the Office of the Director, National Institutes of Health under:
RADx-Rad Discoveries & Data: Consortium Coordination Center Program Organization
Grant: 7U24LM013755