-
Notifications
You must be signed in to change notification settings - Fork 10
update cli docs for clarity and accuracy #283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,17 +1,20 @@ | ||
| # Relative Free Energies with the OpenFE CLI | ||
| # Binding Free Energies with the OpenFE CLI | ||
|
|
||
| This tutorial will show how to use the OpenFE CLI (Command Line Interface) to calculate | ||
| free energies - with no Python at all! This CLI works for simple setups, but you | ||
| may need to use the Python API for more complicated setups. | ||
| This tutorial demonstrates how to use the OpenFE CLI (Command Line Interface) to calculate free energies - with no Python at all! | ||
|
|
||
| The entire process of running the campaign of simulations is split into 3 | ||
| stages, each of which corresponds to a CLI command: | ||
| The CLI is useful for simple setups, but you may need to use the Python API for more complicated setups. | ||
|
|
||
| 1. Setting up the files necessary to run each of the simulations | ||
| 2. Running the simulations | ||
| 3. Gathering the results of the simulations into a single table | ||
| RBFE calculations with ``openfe`` are split into 3 steps: plan, run, and gather, each of which corresponds to a CLI command: | ||
|
|
||
| To work through this tutorial, start out with a fresh directory. You can download the tutorial materials (including these instructions) using the command: | ||
| 1. ``openfe plan-rbfe-network``: Define the systems and prepare simulations to be run. | ||
| 2. ``openfe quickrun``: Run the simulations | ||
| 3. ``openfe gather``: Gather and analyze simulation results to generate a table of free energies. | ||
|
|
||
| ## 0. Collect input files | ||
|
|
||
| To work through this tutorial, start out with a fresh directory. | ||
|
|
||
| You can download the tutorial materials (including these instructions) using the command: | ||
|
|
||
| ```bash | ||
| openfe fetch rbfe-tutorial | ||
|
|
@@ -23,46 +26,43 @@ Then when you run `ls`, you should see that your directory has: | |
| - `python_tutorial.ipynb`: a notebook detailing how to do this analysis using the Python API, instead of the CLI shown here. | ||
| - `tyk2_ligands.sdf` and `tyk2_protein.pdb` : files containing the molecules we'll use in this tutorial. | ||
|
|
||
| ## Setting up the campaign | ||
| ## 1. Set up the campaign | ||
|
|
||
| The CLI makes setting up the simulation very easy - it's just a single CLI | ||
| command. There are separate commands for relative binding free energy (RBFE) | ||
| and relative hydration free energy setups (RHFE). | ||
| command. | ||
| There are separate commands for relative binding free energy (RBFE) and relative hydration free energy setups (RHFE). | ||
|
|
||
| For RBFE campaigns, the relevant command is `openfe plan-rbfe-network`. For | ||
| RHFE, the command is `openfe plan-rhfe-network`. They work mostly the same, | ||
| except that the RHFE planner does not take a protein. In this tutorial, we'll | ||
| do an RBFE calculation. The only difference for RHFE is in the setup stage - | ||
| running the simulations and gathering the results are the same. | ||
| For RBFE campaigns, the relevant command is `openfe plan-rbfe-network`. | ||
| For RHFE, the command is `openfe plan-rhfe-network`. | ||
| They work mostly the same, except that the RHFE planner does not take a protein. | ||
| In this tutorial, we'll perform an RBFE calculation. | ||
| The only difference for RHFE is in the setup stage - running the simulations and gathering the results are the same. | ||
|
|
||
| With the single command: | ||
| The single command: | ||
|
|
||
| ```bash | ||
| openfe plan-rbfe-network -M tyk2_ligands.sdf -p tyk2_protein.pdb -o network_setup --n-protocol-repeats 1 | ||
| openfe plan-rbfe-network -M tyk2_ligands.sdf -p tyk2_protein.pdb -o network_setup/ --n-protocol-repeats 1 | ||
| ``` | ||
|
|
||
| we do the following: | ||
| performs the following steps: | ||
|
|
||
| - Read all the ligands from the SDF by giving | ||
| the option `-M tyk2_ligands.sdf`. You can also use `-M` with a directory, and | ||
| it will load all molecules found in any SDF or MOL2 file in that directory. | ||
| - Read all the ligands from the SDF by giving the option `-M tyk2_ligands.sdf`. | ||
| You can also use `-M` with a directory, and it will load all molecules found in any SDF or MOL2 file in that directory. | ||
| - Pass a PDB of the protein target (TYK2) with `-p tyk2_protein.pdb`. | ||
| - Instruct `openfe` to output files into a directory called `network_setup` | ||
| with the `-o network_setup` option. | ||
| - Instruct `openfe` to only run one repeat of the alchemical simulation per | ||
| `quickrun` call using `--n-protocol-repeats 1`. | ||
| **Note:** `openfe`'s default behaviour is to use three | ||
| repeats to calculate the uncertainty (i.e. standard deviation) in an estimate. When | ||
| setting `--n-protocol-repeats 1`, you must execute the transformation multiple times - at minimum 2, but best practie is 3 independent repeats. | ||
| - Create transformation JSONs, stored in the directory `network_setup/`, that contain all information needed to run simulations with `openfe quickrun`. | ||
| - Instruct `openfe` to only run one repeat of the alchemical simulation per `quickrun` call using `--n-protocol-repeats 1`. | ||
|
|
||
| **Note:** `openfe`'s default behaviour is to use three repeats to calculate the uncertainty (i.e. standard deviation) in an estimate. | ||
| When setting `--n-protocol-repeats 1`, you must execute the transformation multiple times - at minimum 2, but best practice is 3 independent repeats. | ||
|
|
||
| Planning the campaign may take some time due to the complex series of tasks involved: | ||
|
|
||
| - partial charges are generated for each of the ligands to ensure reproducibility, by default this requires a semi-empirical quantum | ||
| chemical calculation to calculate `am1bcc` charges | ||
| - atom mappings are created and scored based on the perceived difficulty for all possible ligand pairs | ||
| - atom mappings are created and scored based on the perceived difficulty for all possible ligand pairs | ||
| - an optimal network is extracted from all possible pairwise transformations which balances edge redundancy and the total difficulty score of the network | ||
|
|
||
| The partial charge generation can take advantage of multiprocessing which offers a significant speed-up, you can specify | ||
| The partial charge generation can take advantage of multiprocessing which offers a significant speed-up, you can specify | ||
| the number of processors available using the `-n` flag: | ||
|
|
||
| ```bash | ||
|
|
@@ -88,64 +88,80 @@ network_setup | |
| ... | ||
| ``` | ||
|
|
||
| The `ligand_network.graphml` file describes the atom mappings between the | ||
| ligands. We can visualize it with the `openfe view-ligand-network` command: | ||
| The `ligand_network.graphml` file describes the network of ligands connected by atom mappings. | ||
|
|
||
| We can visualize this network with the `openfe view-ligand-network` command: | ||
|
|
||
| ```bash | ||
| openfe view-ligand-network network_setup/ligand_network.graphml | ||
| ``` | ||
|
|
||
| This opens an interactive viewer. You can move the ligand names around to get a | ||
| better view of the structure, and if you click on the edge, you will see the | ||
| to open an interactive viewer. | ||
| You can move the ligand names around to get a better view of the structure, and if you click on the edge, you will see the | ||
| mapping for that edge. | ||
|
|
||
| The files that describe each individual simulation we will run are located within | ||
| `network_setup/transformations/`. Each JSON file represents a single alchemical | ||
| leg to run and contains all the necessary information to run that leg. | ||
| Filenames indicate ligand names as taken from the SDF; for example, the file | ||
| `rbfe_lig_ejm_31_complex_lig_ejm_42_complex.json` is the leg | ||
| associated with the transformation of the ligand `lig_ejm_31` into `lig_ejm_42` | ||
| while in complex with the protein. | ||
|
|
||
| A single RBFE between a pair of ligands requires running two legs of an alchemical cycle (JSON files): | ||
| one for the ligand in solvent, and one for the ligand complexed with the | ||
| protein. The results from these two simulations can then be combined to obtained a single $\Delta\Delta G$ relative binding free energy value. | ||
|
|
||
| Note that this specific setup makes a number of choices for you. All of | ||
| these choices can be customized in the Python API. Here are the specifics on | ||
| how these simulation are set up: | ||
|
|
||
| 1. LOMAP is used to generate the atom mappings between ligands, with a | ||
| 20-second timeout, core-core element changes disallowed, and max3d set to 1. | ||
| 2. The network is a minimal spanning tree, with the default LOMAP score used to | ||
| score the mappings. | ||
| 3. Solvent is water with NaCl at an ionic strength of 0.15 M (neutralized) with a | ||
| minimum distance of 1.2 nm from the solute to the edge of the box. | ||
| The files that describe each individual simulation we will run are located within `network_setup/transformations/`. | ||
| Each JSON file represents a single alchemical leg to run and contains all the necessary information to run that leg. | ||
| Filenames indicate ligand names as taken from the SDF; for example, the file `rbfe_lig_ejm_31_complex_lig_ejm_42_complex.json` is the leg associated with the transformation of the ligand `lig_ejm_31` into `lig_ejm_42` while in complex with the protein. | ||
|
|
||
| A single RBFE between a pair of ligands requires running two legs of an alchemical cycle (JSON files) - one for the ligand in solvent, and one for the ligand complexed with the | ||
| protein. | ||
| The results from these two simulations can then be combined in the next step (``openfe gather``) to obtain a single $\Delta\Delta G$ relative binding free energy value. | ||
|
|
||
| Note that this specific setup makes a number of choices for you, from filenames to default values. | ||
| All of these choices can be customized in the Python API. | ||
| Here are the specifics on how these simulation are set up: | ||
|
|
||
| 1. **kartograf** is used to generate the atom mappings between ligands. | ||
| 2. The ligand network is a minimal spanning tree, with the default LOMAP scorer used to score the mappings. | ||
| 3. Solvent is water with NaCl at an ionic strength of 0.15 M (neutralized) with a minimum distance of 1.2 nm from the solute to the edge of the box. | ||
| 4. The protocol used is OpenFE's OpenMM-based Hybrid Topology RFE protocol, with [default settings](https://docs.openfree.energy/en/stable/reference/api/openmm_rfe.html#protocol-settings). | ||
|
|
||
| ## Customize your campaign setup | ||
| ### Optional step: Customize your campaign setup | ||
|
|
||
| OpenFE contains many different options and methods for setting up a simulation campaign. | ||
| The options can be easily accessed and modified by providing a settings | ||
| file in the `.yaml` format. | ||
| Let's assume you want to exchange the LOMAP atom mapper with the Kartograf | ||
| atom mapper, the Minimal Spanning Tree | ||
| Network Planner with the Maximal Network Planner and the am1bcc charge method with the am1bccelf10 version from openeye, | ||
| then you could do the following: | ||
| While less flexible than using the API, some options can be modified by providing a settings file in the `.yaml` format. | ||
|
|
||
| The default settings represented in YAML settings format is as follows: | ||
|
|
||
| ``` yaml | ||
| mapper: kartograf | ||
| settings: | ||
| atom_max_distance: 0.95 | ||
| atom_map_hydrogens: true | ||
| map_hydrogens_on_hydrogens_only: true | ||
| map_exact_ring_matches_only: true | ||
| allow_partial_fused_rings: true | ||
| allow_bond_breaks: false | ||
|
|
||
| network: | ||
| method: generate_minimal_spanning_network | ||
|
|
||
| partial_charge: | ||
| method: am1bcc | ||
| settings: | ||
| off_toolkit_backend: ambertools | ||
| number_of_conformers: None | ||
| nagl_model: None | ||
|
|
||
| ``` | ||
|
|
||
| Let's assume you want to exchange the kartograf atom mapper with the LOMAP atom mapper, the Minimal Spanning Tree | ||
| Network Planner with the Maximal Network Planner and the am1bcc charge method with [OpenFF NAGL](https://docs.openforcefield.org/projects/nagl/): | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Jumping into 1/2/3 seems to be missing something here. Maybe in this sentence you should something like "to achieve this, you would do the following N steps:" |
||
|
|
||
| 1. provide a file like `settings.yaml` with the desired changes: | ||
|
|
||
| ```yaml | ||
| mapper: | ||
| method: kartograf | ||
| method: lomap | ||
|
|
||
| network: | ||
| method: generate_maximal_network | ||
|
|
||
| partial_charge: | ||
| method: am1bccelf10 | ||
| method: nagl | ||
| settings: | ||
| off_toolkit_backend: openeye | ||
| nagl_model: null # null specifies the use of the latest nagl model | ||
| ``` | ||
|
|
||
| 2. Plan your rbfe network with an additional `-s` flag for passing the settings: | ||
|
|
@@ -160,7 +176,7 @@ openfe plan-rbfe-network -M tyk2_ligands.sdf -p tyk2_protein.pdb -o network_setu | |
| RBFE-NETWORK PLANNER | ||
| ______________________ | ||
|
|
||
| Parsing in Files: | ||
| Parsing in Files: | ||
| Got input: | ||
| Small Molecules: SmallMoleculeComponent(name=lig_ejm_31) SmallMoleculeComponent(name=lig_ejm_42) SmallMoleculeComponent(name=lig_ejm_43) SmallMoleculeComponent(name=lig_ejm_46) SmallMoleculeComponent(name=lig_ejm_47) SmallMoleculeComponent(name=lig_ejm_48) SmallMoleculeComponent(name=lig_ejm_50) SmallMoleculeComponent(name=lig_jmc_23) SmallMoleculeComponent(name=lig_jmc_27) SmallMoleculeComponent(name=lig_jmc_28) | ||
| Protein: ProteinComponent(name=) | ||
|
|
@@ -176,75 +192,38 @@ Using Options: | |
| n_protocol_repeats=1 (1 simulation repeat(s) per transformation) | ||
| ``` | ||
|
|
||
| That concludes the straightforward process of tailoring your OpenFE setup to your specifications. | ||
| Additionally, we've provided a snippet for generating YAML files with | ||
| various of the current options for your convenience. | ||
|
|
||
| Option Examples: | ||
|
|
||
| ```yaml | ||
| mapper: | ||
| method: lomap | ||
| # method: kartograf | ||
|
|
||
| network: | ||
| method: generate_minimal_spanning_network | ||
| # method: generate_radial_network | ||
| # method: generate_maximal_network | ||
| # method: generate_minimal_redundant_network | ||
|
|
||
| partial_charge: | ||
| method: am1bcc | ||
| # method: am1bccelf10 | ||
| # settings: | ||
| # off_toolkit_backend: openeye # required for the am1bccelf10 method | ||
| ``` | ||
|
|
||
| **Customize away!** | ||
| To see all settings customizable by YAML input, run `openfe plan-rbfe-network -h`. | ||
|
|
||
| ## Running the simulations | ||
| ## 2. Run the simulations | ||
|
|
||
| For this tutorial, we have precalculated data that you can load, since | ||
| running the simulations can take a long time. However, you could, in principle, | ||
| run each simulation on your local machine. | ||
| For this tutorial, we have precalculated data that you can load, since running the simulations can take a long time. | ||
| However, you could, in principle, run each simulation on your local machine. | ||
|
|
||
| You can run each leg individually by using the `openfe quickrun` command. It | ||
| takes a transformation JSON as input, and the flags `-o` to give the final | ||
| output JSON file and `-d` for the directory where simulation results should be | ||
| stored. For example, | ||
| You can run each leg individually by using the `openfe quickrun` command: | ||
|
|
||
| ```bash | ||
| openfe quickrun path/to/transformation.json -o results.json -d working-directory | ||
| ``` | ||
|
|
||
| where `path/to/transformation.json` is the path to one of the files created above. | ||
| where | ||
|
|
||
| - `path/to/transformation.json` is the path to one of the transformation files created by ``openfe plan-rbfe-network`` in the prior step | ||
| - `-o results.json` to give the final output JSON file and `-d` for the directory where simulation results should be stored. | ||
|
|
||
| When running a complete network of simulations, it is important to ensure that | ||
| the file name for the result JSON and name of the working directory are | ||
| different for each leg and each repeat, otherwise you'll overwrite results. We recommend doing | ||
| that with something like the following, which uses the fact that the JSON files | ||
| in `network_setup/transformations/` have unique names, and creates directories | ||
| and result JSON files based on those names. To run all legs sequentially (not | ||
| recommended) you could do something like: | ||
| to run one simulation from the tutorial data, a command might look like: | ||
|
|
||
| ```bash | ||
| # this will take a very long time! don't actually do it! | ||
| for file in network_setup/transformations/*.json; do | ||
| relpath=${file:30} # strip off "network_setup/transformations/" | ||
| dirpath=${relpath%.*} # strip off final ".json" | ||
| # loop over three repeats | ||
| for repeat in {1..3}; do | ||
| openfe quickrun $file -o results/repeat${repeat}/$relpath -d results/repeat${repeat}/$dirpath | ||
| done | ||
| done | ||
| openfe quickrun transformations/rbfe_lig_ejm_31_solvent_lig_ejm_42_solvent.json -o results/rbfe_lig_ejm_31_solvent_lig_ejm_42_solvent.json -d results/rbfe_lig_ejm_31_solvent_lig_ejm_42_solvent/ | ||
| ``` | ||
|
|
||
| In practice, you probably want to submit these to a queue. In that case, you'll | ||
| want to create a new job script for each simulation JSON file, and the core of | ||
| that job script will be to run the `openfe quickrun` command above. | ||
| When running a complete network of simulations, it is important to ensure that the file name for the result JSON and name of the working directory are different for each leg and each repeat, otherwise you'll overwrite results. | ||
| We recommend doing this programmatically, such as the example below, which uses the fact that the JSON files in `network_setup/transformations/` have unique names, and creates directories | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it's a good idea to just have the slurm example be the one that shows this off. Partly this is because a lot of folks just don't have any clue about slurm. I would re-add the bash example that demonstrates how to do this and loop over all the files. |
||
| and result JSON files based on those names. | ||
|
|
||
| In practice, you probably want to submit these to an HPC queue. | ||
| In that case, you'll want to create a new job script for each simulation JSON file, and the core of that job script will be to run the `openfe quickrun` command above. | ||
|
|
||
| Details of what information is needed in that job script will depend on your | ||
| computing center. Here is an example of a very simple script that will create | ||
| Details of what information is needed in that job script will depend on your computing center, but below is an example of a very simple script that will create | ||
| and submit a job script for the simplest SLURM use case: | ||
|
|
||
| ```bash | ||
|
|
@@ -260,29 +239,23 @@ for file in network_setup/transformations/*.json; do | |
| done | ||
| ``` | ||
|
|
||
| Note that the exact structure of the results directory is not important, as | ||
| long as all result JSON files are contained within a single directory tree. The | ||
| approach listed here is what was used for the example results that we'll | ||
| download in the next section. | ||
| The approach listed here is what was used for the example results that we'll download in the next section. | ||
|
|
||
| ## Gathering the results | ||
| ## 3. Gather the results | ||
|
|
||
| To get example data, use the following commands: | ||
| To get example simulation output data, use the following commands: | ||
|
|
||
| ```bash | ||
| openfe fetch rbfe-tutorial-results | ||
| tar xzf rbfe_results.tar.gz | ||
| ``` | ||
|
|
||
| This will create a directory called `results/` that contains files with the file | ||
| structure you would get from running the calculations as above. The result JSON | ||
| files are the actual results of a simulation. Other files that are generated | ||
| during the simulation (such as detailed simulation information) have been | ||
| replaced by empty files to keep the size smaller. The structure looks something | ||
| like this: | ||
| openfe | ||
| <!-- take the top lines from `tree results` --> | ||
| This will create a directory called `results/` that contains files with the file structure you would get from running the calculations as above. | ||
| The result JSON files are the actual results of a simulation. | ||
| To keep this example data a reasonable size, files typically generated during the simulation (such as detailed simulation information) have been replaced by empty files to keep the size smaller. | ||
| The structure should look something like this: | ||
|
|
||
| <!-- take the top lines from `tree results` --> | ||
| ```text | ||
| results | ||
| ├── replicate_0 | ||
|
|
@@ -323,21 +296,18 @@ results | |
| ... | ||
| ``` | ||
|
|
||
| The JSON results file contains not only the calculated $\Delta G$, and | ||
| uncertainty estimate, but also important metadata about what happened during | ||
| the simulation. In particular, it will contain information about any errors or | ||
| failures that occurred -- these errors will not cause the entire campaign to | ||
| fail, and will be recorded so you can later analyze what went wrong. | ||
| The JSON results file contains not only the calculated $\Delta G$, and uncertainty estimate, but also important metadata about what happened during the simulation. | ||
| In particular, it will contain information about any errors or failures that occurred -- these errors will not cause the entire campaign to fail, and will be recorded so you can later analyze what went wrong. | ||
|
|
||
| To gather all the $\Delta G$ estimates into a single file, use the `openfe | ||
| gather` command from within the working directory used above: | ||
| To gather all the $\Delta G$ estimates into a single file, use the `openfe gather` command from within the working directory used above: | ||
|
|
||
| ```bash | ||
| openfe gather results/ --report dg -o final_results.tsv | ||
| ``` | ||
|
|
||
| This will write out a tab-separated table of results where the results | ||
| reported are controlled by the `--report` option: | ||
| Note that if you have multiple results directories, you can pass multiple directories, e.g. ``openfe gather results_0/ results_1/``. | ||
|
|
||
| This will write out a tab-separated table of results where the results reported are controlled by the `--report` option: | ||
|
|
||
| - `dg` (default) reports the ligand and the results are the maximum | ||
| likelihood estimate of its absolute free, and the associated | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to the sentence on line 93? It feels a bit jarring to just have the end of a sentence after the method is being shown.