Official Python implementation for "Targeted Cause Discovery with Data-Driven Learning". This repository provides inference codes for TCD-DL model.
conda create --name tcd python=3.10
conda activate tcd
pip install -r requirements.txt
- To use CUDA, download PyTorch as
pip install torch==1.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
.
We provide jupyter notebook file for human cell analysis conducted in our paper
- Download all files in Download link and open
leukemia_analysis.ipynb
.
The files above contain the cause scores obtained by our model. You can obtain the scores using your own computing resources by following the steps below.
We release preprocessed Perturb-seq dataset in Google Drive. Download the k562 folder from the link, and set DATAPATH
in args.py
to point to the directory containing the k562 folder.
Obtain the predicted cause scores on the Perturb-seq dataset by running
python -B run.py --data_file human
- You can select target variable by
--target_idx [int]
. The gene list is provided in the jupyter notebook above. - Results will be saved at
SAVEPATH
inargs.py
- Write data loading codes at
data/load.py
-load_test_data
function. You need to load observation data matrix (n_obs x n_var), intervention matrix (n_obs x n_var, boolean), and a causal adjacency matrix (n_var x n_var). - Run the code
python -B run.py --data_file custom
The default model is well-suited for UMI count data. You can also test other types of trained models, including those trained on imputed data or continuous data:
- Download the model checkpoints from the Google Drive and locate the folders at
./ckpt
. - Run the
run.py
. Set--model_type continuous
to use model trained on continuous data, and set--model_type impute
to use model trained on imputed data.
We release training codes and simulator codes in ./train
. Due to storage limitations, we are currently not distributing the training data for our model. (You can generate the data by running our simulation codes.) Please use the training code for reference only.
@article{kim2024targeted,
title={Targeted Cause Discovery with Data-Driven Learning},
author={Kim, Jang-Hyun and Gibbs, Claudia Skok and Yun, Sangdoo and Song, Hyun Oh and Cho, Kyunghyun},
journal={arXiv preprint arXiv:2408.16218},
year={2024}
}