Targeted-Cause-Discovery

Official Python implementation for "Targeted Cause Discovery with Data-Driven Learning". This repository provides inference codes for TCD-DL model.

Setup

conda create --name tcd python=3.10
conda activate tcd
pip install -r requirements.txt

To use CUDA, download PyTorch as pip install torch==1.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116.

Human cell analysis

We provide jupyter notebook file for human cell analysis conducted in our paper

Download all files in Download link and open leukemia_analysis.ipynb.

The files above contain the cause scores obtained by our model. You can obtain the scores using your own computing resources by following the steps below.

1. Download data (Perturb-seq)

We release preprocessed Perturb-seq dataset in Google Drive. Download the k562 folder from the link, and set DATAPATH in args.py to point to the directory containing the k562 folder.

2. Run our model

Obtain the predicted cause scores on the Perturb-seq dataset by running

python -B run.py --data_file human

You can select target variable by --target_idx [int]. The gene list is provided in the jupyter notebook above.
Results will be saved at SAVEPATH in args.py

Inference on custom data

Write data loading codes at data/load.py - load_test_data function. You need to load observation data matrix (n_obs x n_var), intervention matrix (n_obs x n_var, boolean), and a causal adjacency matrix (n_var x n_var).
Run the code
```
python -B run.py --data_file custom
```

The default model is well-suited for UMI count data. You can also test other types of trained models, including those trained on imputed data or continuous data:

Download the model checkpoints from the Google Drive and locate the folders at ./ckpt.
Run the run.py. Set --model_type continuous to use model trained on continuous data, and set --model_type impute to use model trained on imputed data.

Training a model

We release training codes and simulator codes in ./train. Due to storage limitations, we are currently not distributing the training data for our model. (You can generate the data by running our simulation codes.) Please use the training code for reference only.

Citation

@article{kim2024targeted,
    title={Targeted Cause Discovery with Data-Driven Learning},
    author={Kim, Jang-Hyun and Gibbs, Claudia Skok and Yun, Sangdoo and Song, Hyun Oh and Cho, Kyunghyun},
    journal={arXiv preprint arXiv:2408.16218},
    year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
ckpt/discrete		ckpt/discrete
data		data
image		image
model		model
train		train
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
args.py		args.py
inference.py		inference.py
requirements.txt		requirements.txt
run.py		run.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Targeted-Cause-Discovery

Setup

Human cell analysis

1. Download data (Perturb-seq)

2. Run our model

Inference on custom data

Training a model

Citation

About

Releases

Packages

Languages

License

snu-mllab/Targeted-Cause-Discovery

Folders and files

Latest commit

History

Repository files navigation

Targeted-Cause-Discovery

Setup

Human cell analysis

1. Download data (Perturb-seq)

2. Run our model

Inference on custom data

Training a model

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages