|
1 | | -This Workflow allows separate execution of the CPU & GPU steps. It also distributes the inference runs accross multiple GPU devices using GNU parallel. |
2 | | -1. Build the singularity container that supports parallel inference runs using the following command: |
3 | | -singularity build alphafold3_parallel.sif docker://ntnn19/alphafold3:latest_parallel_a100_40gb |
4 | | -<number_of_inference_job_lists> should be set to 1 for local runs and n for slurm runs, where n is the number of nodes with GPU |
5 | | -2. Download alphafold3 databases and obtain the weights |
| 1 | +# AlphaFold3 Workflow |
| 2 | + |
| 3 | +This workflow supports separate execution of the **CPU** and **GPU** steps. It also distributes inference runs across multiple GPU devices using **GNU parallel**. |
6 | 4 |
|
7 | | -The following steps assume that you are located in the project directory. |
| 5 | +## Steps to setup & execute |
8 | 6 |
|
9 | | -3. Clone this repo to your project directory. It must follow the following structure after cloning: |
| 7 | +### 1. Build the Singularity Container |
10 | 8 |
|
11 | | - |
| 9 | +Run the following command to build the Singularity container that supports parallel inference runs: |
| 10 | + |
| 11 | +```bash |
| 12 | +singularity build alphafold3_parallel.sif docker://ntnn19/alphafold3:latest_parallel_a100_40gb |
| 13 | +``` |
12 | 14 |
|
| 15 | +**Notes** |
| 16 | +- Set <number_of_inference_job_lists> to 1 for local runs. |
| 17 | +- For SLURM runs, set <number_of_inference_job_lists> to n, where n is the number of nodes with GPUs. |
| 18 | +- Make sure to download the required [AlphaFold3 databases](https://github.com/google-deepmind/alphafold3/blob/main/docs/installation.md#obtaining-genetic-databases) and [weights](https://forms.gle/svvpY4u2jsHEwWYS6) before proceeding. |
13 | 19 |
|
14 | | -An example json file can be found in this repo under example/example.json |
| 20 | +### 2. Clone This Repository |
15 | 21 |
|
16 | | -4. Create & activate snakemake environment: |
| 22 | +Clone this repository into your project directory. After cloning, your project structure should look like this: |
17 | 23 |
|
18 | | - Install mamba/micromamba |
| 24 | +```bash |
| 25 | +. <-- This represents your current location |
| 26 | +├── dataset_1 |
| 27 | +│ ├── af_input |
| 28 | +│ ├── data_pipeline |
| 29 | +│ └── <your_input_json_file> |
| 30 | +├── example |
| 31 | +│ └── example.json |
| 32 | +├── README.md |
| 33 | +└── workflow |
| 34 | + ├── scripts |
| 35 | + │ ├── create_job_list.py |
| 36 | + │ ├── parallel.sh |
| 37 | + │ └── split_json_and_create_job_list.py |
| 38 | + ├── Snakefile |
| 39 | +``` |
| 40 | +An example JSON file is available in the example/ directory: |
| 41 | +example/example.json |
19 | 42 |
|
20 | | - mamba create env -p $(pwd)/env -f environment.yml |
| 43 | +### 3. Create and Activate the Snakemake Environment |
21 | 44 |
|
22 | | - mamba activate $(pwd)/env |
| 45 | +Install mamba or micromamba if not already installed. Then, set up and activate the environment using the following commands: |
| 46 | +```bash |
| 47 | +mamba create -p $(pwd)/env -f environment.yml |
| 48 | +``` |
| 49 | +```bash |
| 50 | +mamba activate $(pwd)/env |
| 51 | +``` |
23 | 52 |
|
24 | | -6. Run the workflow |
25 | | -# Dry local run |
26 | | -snakemake --use-singularity --config af3_container=<path_to_your_alphafold3_container> --singularity-args \'--nv -B <alphafold3_weights_dir>:/root/models -B $(pwd)/<dataset_directory>/af_input:/root/af_input -B $(pwd)/<dataset_directory>/af_output:/root/af_output -B <path_to_alphafold3_db_directory>:/root/public_databases\' -c all --set-scatter split=<number_of_inference_job_lists> -n |
27 | | -# Dry run with slurm |
28 | | -snakemake --use-singularity --config af3_container=<path_to_your_alphafold3_container> --singularity-args '\--nv -B <alphafold3_weights_dir>:/root/models -B $(pwd)/<dataset_directory>/af_input:/root/af_input -B $(pwd)/<dataset_directory>/af_output:/root/af_output -B <path_to_alphafold3_db_directory>:/root/public_databases\' -j 99 --executor slurm --set-scatter split==<number_of_inference_job_lists> -n |
| 53 | +### 4. Run the Workflow |
| 54 | +**Dry run (local)** |
| 55 | +```bash |
| 56 | +snakemake --use-singularity \ |
| 57 | + --config af3_container=<path_to_your_alphafold3_container> \ |
| 58 | + --singularity-args '--nv -B <alphafold3_weights_dir>:/root/models -B $(pwd)/<dataset_directory>/af_input:/root/af_input -B $(pwd)/<dataset_directory>/af_output:/root/af_output -B <path_to_alphafold3_db_directory>:/root/public_databases' \ |
| 59 | + -c all \ |
| 60 | + --set-scatter split=<number_of_inference_job_lists> -n |
| 61 | +``` |
| 62 | +**Dry run (slurm)** |
| 63 | +```bash |
| 64 | +snakemake --use-singularity \ |
| 65 | + --config af3_container=<path_to_your_alphafold3_container> \ |
| 66 | + --singularity-args '--nv -B <alphafold3_weights_dir>:/root/models -B $(pwd)/<dataset_directory>/af_input:/root/af_input -B $(pwd)/<dataset_directory>/af_output:/root/af_output -B <path_to_alphafold3_db_directory>:/root/public_databases' \ |
| 67 | + -j 99 \ |
| 68 | + --executor slurm \ |
| 69 | + --set-scatter split=<number_of_inference_job_lists> -n |
| 70 | +``` |
| 71 | +**Local run** |
| 72 | +```bash |
| 73 | +snakemake --use-singularity \ |
| 74 | + --config af3_container=<path_to_your_alphafold3_container> \ |
| 75 | + --singularity-args '--nv -B <alphafold3_weights_dir>:/root/models -B $(pwd)/<dataset_directory>/af_input:/root/af_input -B $(pwd)/<dataset_directory>/af_output:/root/af_output -B <path_to_alphafold3_db_directory>:/root/public_databases' \ |
| 76 | + -c all \ |
| 77 | + --set-scatter split=<number_of_inference_job_lists> |
| 78 | +``` |
29 | 79 |
|
30 | | -# Local run |
31 | | -snakemake --use-singularity --config af3_container=<path_to_your_alphafold3_container> --singularity-args \'--nv -B <alphafold3_weights_dir>:/root/models -B $(pwd)/<dataset_directory>/af_input:/root/af_input -B $(pwd)/<dataset_directory>/af_output:/root/af_output -B <path_to_alphafold3_db_directory>:/root/public_databases\' -c all --set-scatter split=<number_of_inference_job_lists> |
32 | | -# Run with slurm |
33 | | -snakemake --use-singularity --config af3_container=<path_to_your_alphafold3_container> --singularity-args \'--nv -B <alphafold3_weights_dir>:/root/models -B $(pwd)/<dataset_directory>/af_input:/root/af_input -B $(pwd)/<dataset_directory>/af_output:/root/af_output -B <path_to_alphafold3_db_directory>:/root/public_databases\' -j 99 --executor slurm --set-scatter split==<number_of_inference_job_lists> |
| 80 | +**slurm run** |
| 81 | +```bash |
| 82 | +snakemake --use-singularity \ |
| 83 | + --config af3_container=<path_to_your_alphafold3_container> \ |
| 84 | + --singularity-args '--nv -B <alphafold3_weights_dir>:/root/models -B $(pwd)/<dataset_directory>/af_input:/root/af_input -B $(pwd)/<dataset_directory>/af_output:/root/af_output -B <path_to_alphafold3_db_directory>:/root/public_databases' \ |
| 85 | + -j 99 \ |
| 86 | + --executor slurm \ |
| 87 | + --set-scatter split=<number_of_inference_job_lists> -n |
| 88 | +``` |
0 commit comments