This repository contains the complete analysis pipeline for the study "Unveiling genetic architecture of white matter microstructure through unsupervised deep representation learning of fractional anisotropy maps".
- Overview
- Installation
- Repository Structure
- UDIP-FA Model Usage
- GWAS & Post-Analysis
- Reproducibility
- Citation
- Contact
This study introduces UDIP-FA (Unsupervised Deep Image Phenotyping of Fractional Anisotropy), a novel deep learning approach for analyzing white matter microstructure in brain imaging data. The pipeline includes:
- Deep representation learning of FA maps using customized 3D AutoEncoders.
- Genome-wide association studies (GWAS) on learned endophenotypes.
- Polygenic risk score (PRS) associations with brain disorders.
- Network-based drug targeting analysis.
- Python 3.8 or higher
- R 4.0 or higher
- Git
We recommend using a virtual environment (conda or venv).
# Create and activate environment
conda create -n udip-fa python=3.8
conda activate udip-fa
# Install dependencies from requirements.txt
pip install -r requirements.txtNote: Ensure you have a compatible PyTorch version for your CUDA driver installed.
install.packages(c("ggplot2", "dplyr", "tidyr", "data.table",
"ComplexHeatmap", "circlize", "RColorBrewer",
"cowplot", "ggpubr", "pheatmap"))
# Bioconductor packages
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("clusterProfiler", "org.Hs.eg.db", "DOSE"))UDIP-FA/
βββ Model/ # Deep Learning Model & Scripts
β βββ model.py # AutoEncoder Architecture (PyTorch)
β βββ dataset.py # Dataset Loading Logic
β βββ Train.py # Training Script (PyTorch Lightning)
β βββ inference.py # Inference Script for generating embeddings
β βββ model_compare.py # Analysis & Visualization scripts
βββ FA_GWAS_all.ipynb # Main GWAS Analysis Notebook
βββ FA_all.R # Post-GWAS Analysis (R)
βββ FA_network_drug_analysis.R # Network & Drug Analysis (R)
βββ requirements.txt # Python Project Dependencies
βββ README.md # Project Documentation
The deep learning model is located in the Model/ directory.
Input data should be Affine registered MRI images (NIfTI format).
Prepare a CSV file containing the paths to your images under a column named mri_names (or specify your column name during inference).
To train the AutoEncoder from scratch:
python Model/Train.pyNote: Model/Train.py is configured to use PyTorch Lightning. Adjust hyperparameters (learning rate, batch size, GPUs) directly in the file or by modifying the LitAutoEncoder class.
To generate latent representation (endophenotypes) from trained models:
python Model/inference.py --input_csv /path/to/data.csv \
--checkpoint /path/to/model.ckpt \
--output_dir /path/to/resultsCommon Arguments:
--input_csv: Path to CSV file with image paths.--checkpoint: Path to the.ckptmodel file.--output_dir: Folder to save the output pickle files.--device:cuda:0orcpu.
For performing analysis on significant SNPs and feature correlations:
python Model/model_compare.pyThis script includes functions to:
- Plot significant SNPs across different thresholds.
- Compute and visualize pairwise correlations (CCA, Pearson) between feature sets.
The repository includes comprehensive scripts for the genetic analysis stages:
This Jupyter notebook serves as the main entry point for the genetic analysis, covering:
- UDIP-FA feature association analyses: Correlating deep learning features with genetic variants.
- Polygenic Risk Score (PRS) associations: Investigating links between learned features and brain disorders.
- Model Explainability: Interpretability assessments of the autoencoder features.
- Comparative Analysis: Benchmarking against previous white matter studies.
R script dedicated to post-GWAS statistical processing:
- Result Aggregation: Filtering and summarizing GWAS statistics.
- Figure Generation: Producing publication-ready plots (Manhattan plots, QQ plots).
- Meta-analysis: Effect size calculations and statistical validation.
Advanced network analysis for biological insights:
- Gene-Drug Interaction: Constructing networks to identify potential drug targets.
- Therapeutic Targets: Highlighting genes actionable by existing drugs.
- Mechanism of Action: Pathway analysis to understand underlying biological mechanisms.
The pretrained model can be accessed at this Google Drive Link.
- Python:
np.random.seed(42) - R:
set.seed(42)
If you use this code in your research, please cite:
@article{zhao2025udip,
title={Unveiling genetic architecture of white matter microstructure through unsupervised deep representation learning of fractional anisotropy maps},
author={Zhao, Xingzhong and Xie, Ziqian and He, Wei and Fornage, Myriam and Zhi, Degui},
journal={medRxiv},
year={2025},
doi={10.1101/2025.07.04.25330856}
}- Xingzhong Zhao - [xingzhong.zhao@uth.tmc.edu]
Keywords: white matter, fractional anisotropy, deep learning, GWAS, neuroimaging, brain imaging, genetics, biomarker
