We are proud to announce that our work on SENA-discrepancy-VAE has been accepted for publication at the International Conference on Learning Representations (ICLR) 2025 (link).
SENA-discrepancy-VAE is a Causal Representation Learning (CRL) model designed to predict the impact of genomic and drug perturbations on cellular function by mapping biological processes to latent causal factors. The model improves interpretability by leveraging biological processes (BPs) as prior knowledge, allowing the prediction of unseen perturbations while inferring biologically meaningful causal factors. This method extends the discrepancy-vae (https://github.com/uhlerlab/discrepancy_vae).
-
Clone the repository:
git clone https://github.com/ML4BM-Lab/SENA
-
The model has been evaluated on a large-scale Perturb-seq dataset, which profiles gene expression changes in leukemia cells under genetic perturbations. We use the preprocessed version by CPA (https://github.com/facebookresearch/CPA), which we provide in the data folder as a zip file.:
```bash wget https://dl.fbaipublicfiles.com/dlp/cpa_binaries.tar tar -xvf cpa_binaries.tar cp Norman2019_raw.h5ad data/. rm cpa_binaries.tar ``` Alternatively, we have generated a reduced version of this dataset, which you can find as a .zip file in the data folder. This will be the datased used by default. ```bash #this will generate a Norman2019_reduced.h5ad file unzip Norman2019_reduced.zip ```
-
This project runs in a docker container. Run this code to build the image and to run the container
# move to the dockerfile folder (important) cd dockerfile # build the image docker build -t <image_name> . # move to the root directory cd .. # run creating a virtual link to your SENA folder. docker run -dt -v .:/wdir/ --gpus all --name <container_name> <image_name> # access the docker (we will assume its called sena_vae) docker exec -it sena_vae bash
-
The metrics are reported in a local mlflow server. To run it:
# assuming you are placed within the SENA folder and port 5678 is unused docker run -dt -p 5678:5678 -v .:/wdir/ --gpus all --name <container_name> <image_name> # access the docker on a second terminal docker exec -it sena_vae bash # start the server mlflow ui --host 0.0.0.0 --port 5678
You can now access the server through your browser by typing:
localhost:5678
. Note: This script was tested using docker version 27.2.0. -
Now you can run SENA-discrepancy-VAE by doing:
python3 src/sena_discrepancy_vae/main.py
-
Finally, you can retrieve the metrics of the trained model:
# You can choose the folds to evaluate. (e.g. only double) python3 src/sena_discrepancy_vae/inference.py --savedir results/example --evaluation train test double
The script accepts several command-line arguments to customize the training process. Below is a detailed list of all available parameters:
-s
,--savedir
(str): Directory to save the results. Default:'./results/'
. (folder will be created automatically)--device
(str): Device to run the training on (cpu or cuda). Default:'cuda:0'
.--model
(str): Model to use for training, either'sena'
or'original'
Default:'sena'
.--dataset
(str): Name of the dataset used. Default:'Norman2019_reduced'
.--name
(str): Name of the run, used for organizing output files. Default:'example'
.--log
(bool): Whether to use a local mlflow server to visualize training. Default: False.--seed
(int): Random seed for reproducibility. Default:42
.
--epochs
(int): Number of training epochs. Default:100
.--batch_size
(int): Batch size for training. Default is set within the script (32
), but can be modified.--lr
(float): Learning rate. Default is set within the script (1e-3
), but can be modified.--sena_lambda
(float): Sena λ value. Default:0
.--latdim
(int): Latent dimension size. Default:105
. (equal to the number of perturbations/knockout types)--lr
(float): Learning rate. Default is set within the script (1e-3
), but can be modified.--grad_clip
(bool): Whether to apply gradient clipping during training. Default isFalse
.
Refer to README_ablation.md
in the src
folder.
If you use SENA-discrepancy-VAE in your research, please cite our work:
BibTeX:
@inproceedings{deinterpretable,
title={Interpretable Causal Representation Learning for Biological Data in the Pathway Space},
author={de la Fuente Cede\~no, Jesus and Lehmann, Robert and Ruiz-Arenas, Carlos and Voges, Jan and Mar\'{\i}n-Go\~ni, Irene and de Morentin, Xabier Martinez and Gomez-Cabrero, David and Ochoa, Idoia and Tegn\'er, Jesper and Lagani, Vincenzo and others},
booktitle={The Thirteenth International Conference on Learning Representations}
}
This project is licensed under the MIT License. See the LICENSE file for more details.