-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ligand parametrization #1
Comments
Hi @AminaMnhr! Everything has been run from parameterize no changes were doing prior and after. Once the ligand was parameterized with that tool it went into the rbfe part of the protocol. In theory you could run AceForce with different parameters (ligand force field and/or charges), but for now the results we have are the ones with the combination reported. |
Amazing, thank you for the fast answer!
Thanks a lot! |
We only set the flags of RESP-PSI4 and GAFF2 as FF, the rest is left at default. In the future we are planning to run tests with other combinations of parameters but these runs take some time. If you think that another set of parameters will work best for your system, feel free to test them. |
Hi, Thanks a lot for your answer! It is very kind from you! |
Parameterize with GAFF2 and RESP-PSI4 doesn't take that long. Maybe a couple (or 15) minutes per ligand I would assume. @qsabanes I think meant that we put Dihedral fitting to "Off" on the web interface. |
Thank you so much for your help both, it is definitively more clear right now! I will get back to you if I have any question on the workflow. Thanks a lot! |
Hi, I would like to reproduce now the tutorial presented in your github for QuantumbindRBFE : https://github.com/Acellera/quantumbind_rbfe/blob/main/tutorial/tutorial_atm.ipynb acemd does not implement the ATM force as you said, so it is my understanding that the minimization, equilibration and production run are done with AtoM-OpenMM as presented here : https://github.com/Gallicchio-Lab/AToM-OpenMM Moreover, the NN is not anymore implemented in a .yaml file (as acemd) but a .cntl file, required input for AtoM-OpenMM Sorry for my numerous questions as I am not yet familiar with the workflow. Question 1 : I have created my environment with htmd, pytorch-cuda, and integrated AtoM-OpenMM for the ATM Method as well as openmm-ml and nnpops for the NN:
However in your tutorial you run the equilibration, production and uwham analysis doing : from atm.rbfe_structprep import rbfe_structprep, from atm.rbfe_production import rbfe_production and from atm.uwham import calculate_uwham. Question 2 : the quantumbindRBFE tutorial seems to work on an input file with the .yaml extension. Is that not supposed to be a .cntl extension as the simulations are done with AtoM-OpenMM? Question 3 : The system equilibration in your tutorial does not include the NN in the parameters. Is this step running without the potential computed by the NN? So the NN is just recomputing the potential during the production step? Question 4 : In the ATM production of the tutorial, you wrote : Does it also need to be adapted somehow if I want to use SLURM scheduler for my simulations? I am not sure about the localhost and tmp notations. Question 5 : If I want to integrate the workflow on a SLURM scheduler, you wrote : _We can also create a run.sh script to run the equilibration with a queue system later I am not sure to understand this neither. Is the run.sh just containing the SLURM ressources and this line, to run the structprep with SLURM? And same for the production step I would guess? Thanks a lot Amina |
Hi @AminaMnhr Apologies, as part of your issues are on our side. I see that we didn't specify correctly that you need this repository Acellera-ATM. By installing this with HTMD, you'll have the commands you are missing in your environment, as well as all the required By the way, to avoid any confusion, Acellera-ATM (and all the runs shown in the paper) both were ran using OpenMM entirely, no need to use Acemd here. Now, on to the other questions: Question 2: This will be solved with Acellera-ATM, which uses .yaml file as inputs Question 3: Another mistake on my side, forgot to include it. However, we've tried both scenarios (equilibration with/without NNP, production with NNP) and we didn't observe any effects in performance. In any case, I'll update the tutorial now to also include NNPs in equilibration. You just need to add the same parameters in the config file as with the production step. Question 4: No, we also use SLURM and we did not have to adapt it. You can keep it as it is Question 5: This is just an example including a simple |
Hi Adria, Thank you it is way more clear now, thank you so much! Small precision: I installed the Acellera-ATM package correctly, however the command conda install acellera-atm python=3.10 -c acellera -c conda-forge do not install pytorch with CUDA packages but the cpu version packages of pytorch, torchvisio etc. I think this is due to the specification python=3.10. So I did install first an environment with pytorch-cuda with python 3.12 instead and cuda 11.8 and then installed accellera-atm without specifying python=3.10 (otherwise it downgraded my packages to the cpu ones and makes torch not detected by my env). No idea if the workflow can work like that with Python 3.12 instead of 3.10 but I will let you know after trying your quantumbindRBFE tutorial. For Question 2: Just to be sure, is your yaml file input following the cntl nomenclature with all their input variables in uppercase https://www.compmolbiophysbc.org/atom-openmm/atom-control-file ? Does it have the same parameters settings as them (job settings, alchemical transfer schedule settings, binding site restraint settings) ? Or does it follow the acemd nomenclature of yaml file : https://software.acellera.com/acemd/manual.html with the same acemd parameter settings ? Thank so much for your help until now. I keep you updated for my try on the tutorial. |
The command I've recently used to do fresh environments with Acellera-ATM and HTMD, as well as all libraries required to load AceForce as an NNP. is this one: Try it and see if that works for you. Perhaps I'll add this in the README aswell. Question 2: It has the same parameter settings as the AToM-OpenMM control file. We removed some that are not used in our version, but the ones in there are equivalent to those in .cntl files. |
Thank you so much for the fast answer! I will try this with the tutorial and will let you know how it goes! Thanks. |
Hi, I tried the tutorial and commands went well, so the environment should be alright. However I would have some questions about the workflow as I did not get exactly the same results. Sorry again as it is quite a lot of questions (hopefully those are fast questions) : Results reproduction
This is far from your value, and also why is the value between production runs not the same? Is that because of the very few replicas?
Workflow
However I do not understand how 30 comes from as ligand 1 and ligand 2 in the tutorial have more than 30 atoms. Also is this parameter supposed to be the number of atoms of ligand 1, or the sum of atoms for both ligands? |
Hi @AminaMnhr,
If you followed the tutorial then you only run 2 samples. Which is around 2ns of the total ensemble. This is far from necessary for your system to be converged. The tutorial is set up as this to test the system running, but to run a proper production you need to run for longer. As you can see in our https://github.com/Acellera/quantumbind_rbfe/tree/main/inputs folder we run the systems for 400 samples, which corresponds to approximately 70ns for the total ensemble. We have observed that with that simulation time we observe convergence in most cases. But there are cases in which you would need more or that with less is sufficient. Another showing that there's something wrong with your simulations is that you have a high error.
Yes, indeed. It is better to run replicas because sometimes you can get outliers. The more replicas you do the more consistent average ddG you'll get. This is a balance betweem time/cost that the user has to decide.
Yes
Exactly, we used it the inputs as is
These systems are already equilibrated, you just need to run production on them.
A way to measure the robustness of your setup and calculations is to test the simulations in both directions like you propose. For these benchmarks the edges/ligand pairs are already defined. But I suggest you check out Lomap or Konnektor (or similar) if you want to build your own perturbation network.
Parameterize relabels all ligands with more than 3 characters to MOL. That's why we have to relabel them.
On our inputs we have an automatic approach to select the displacement vector, but you can select it manually. Just try to ensure that the ligand outside of the pocket is more than 10-15 A away from the protein.
These are indexes from the ligand .sdf file
Yes with pymol you can do label > rank and the numbers you see correspond to the atom indexes
Yes, a poor selection of restrain atoms can influence a lot on the DDG value. I try to ensure that the ligands pre and post equilibration are as similar as possible (no weird torsion or displacements observed)
I think the comments in the tutorial explain it fine:
Ok this, should be set to 40, as it is what we did in the paper, we'll change that. It should take into account the number of atoms for the bigger ligand. The issue is that the higher you set up the number of neighbours, the slower the calculation. So even if we set it up to 40 there are sometimes where the ligands are bigger. This is something that we still have to evaluate once we have the time. |
Thanks a lot for the fast answer! Happy to have the feedbacks today, as I asked a lot. Thanks again!
Okay that is what I was thinking. So the fact that I am not getting the same value is due to the MAX SAMPLES = 2. Similarly, then I should not be worried that my value is far from the one in the tutorial?
Thanks for the answer. However my question was more about the number of times you executed the equilibration and production for a same pair. Replicas are already run inside the production step itself (with the max samples), but I was wondering if you would launch the commands rbfe_structpep(input) and rbfe_production(input) several times as well for a same pair, and then getting the average ddG obtained for each?
Thanks that is good to know as well! May I ask what would you check after running the simulations in both directions, to know which one to take? Also, are all my pairs for a target supposed to have the same sign accordingly (I remember seeing something like that for FEP practices but not sure) ? Okay I see, are the benchmarks pairs the one defined by Schrodinger directly?
Good to know too thanks! As Parameterize takes an sdf or mol2, which property in the sdf did you change for the relabeling, is that just the first line of the sdf file for the ligand?
Okay thanks a lot! We would like to try quantumbindRBFE on new pairs/new targets, so would it make sense if I put for example the ligand 15 Angstrom away from the furthest protein atom in X dimension?
Also good to know, thanks a lot! Just to make sure, the pre-equilibration is directly the structure.pdb after having built the system, right. Do you check the TYK2_ejm31_jmc28_0.pdb file?
Yep, so for this : config_params["LIGAND1_CM_ATOMS"] = [int(lig1_restr_idx[0] + lig1_atoms[0])]
Thanks! I guess that this parameter also would influence a lot the ddG? Thanks for your help. |
Exactly, if you simulate for longer you should get closer values along the different runs.
Here by replicas I meant independent runs. I can understand this can be confusing since it has the same name.
If you do first A --> B and then B --> A then you should expect that DDG(A-->B) is similar to -DDG(B-->A)
Ligand name, first line, yes. But if you prepare/parameterize your ligand on your own you should be able to not change it and make it work as well.
Play around with it. But take into account that the bigger the box --> more atoms --> slower sims
Do you check the TYK2_ejm31_jmc28_0.pdb file? Yes
In general this value is the first atom you select in the index restraints
Iirc jumping from 30 to 40 neighbours increased the computational cost by 30%, but I don't know if this is an exponential, linear or another kind of increase |
Thanks for the answer!!
Okay thanks. So for a same ligand pair, I should run rbfe_structprep and rbfe_production several times then. Do you just re-run rbfe_production() only keeping the same equilibration files, or do you also re-run the equilibration each time? For each time you would repeat independent run, do you change some settings of the yaml file or everything should remain the same?
config_params["LIGAND1_CM_ATOMS"] = [int(lig1_restr_idx[0] + lig1_atoms[0])] Also:
|
we re-run everything
Same protocol
I am pretty sure that the protocol takes care of it, but maybe @AdriaPerezCulubret can confirm
Is it possible to restart an equilibration or a production for jobs that ended before completed? So to restart the job from the last step before it ended. equilibration script does not have checkpointing, production does You can run with more than one GPU modifying the nodefile file and adding the info of the additional/id GPUs you want to use |
Thank you for the answer! I am trying to reproduce the results for TYK2 and will let you know if I managed to reproduce the benchmark. Some questions for speed of calculations and eq steps:
However if I want to do the equilibration without reducing the steps, should I just remove those parameters at all? Because those parameters are not in your input.yaml template example so I was wondering if I still need to set up those keys for the equilibration and assign a value for each. If I do not have those keys in the yaml file, what would be then the default values?
Thanks so much. |
Default values should be:
This depends on your GPU, size of the system and amount of time you want to run it
You can reduce the number of samples of course, but I cannot promise that the simulations would be converged. This is very system/transformation dependent
We haven't tested multi-GPU runs. We prefer to run several calculations in parallel
Sadly I have no experience working with macrocycles and RBFE. Furthermore I'm not really sure is AceForce 1.0 is adequate for these kind of ligands, but feel free to test them. As for this moment we have only evaluated AceForce with ligands from the JACS/Schrodinger dataset. |
Thanks for the answer:
Is that correctly 150000 and not 15000? Are they also the default values you found for minimal convergence of your system?
|
Hi, I did try to reproduce the benchmark and here are my results; Edge,ddG_my_value,E,ddG_std,ddG_your_value,ddG_exp *I runned each pair once. The pairs that I have put in italics, my results are really far from yours like it differs from even the sign of ddG, do you know why the sign is not consistent? *Also as yours is an average over different runs, how many runs of eq + prod for an edge did you do in general? Thank you |
Hi Amina, As for the reasons? Probably the main one is that RBFE calculations can be inconsistent, therefore the importance of running several repeats to identify potential outliers. I don't know if you read this paper which relates to the approached we base QuantumBind on: https://pubs.acs.org/doi/full/10.1021/acs.jctc.3c01250 |
Thank for your answer! I did not see this paper I will check this. *As for the difference in the sign, is that also part of the inconsistency in the method? (For example for A02_A09, A08_A02 etc, my value is of the opposite sign of yours) *And if the edge A02_A09 = 0.73, does mean dG(A02) - dG(A09) = 0.73 (Just want to make sure which way it has been computed). I am also wondering how do we rank the compounds having the edges? (So how do we go from the RBFE to absolute delta G to then rank them) |
I am interested in using the quantumbindRBFE method to compute RBFE for our protein datasets, using the workflow with AceForce 1.0 that you just published.
I would like to use the same protocol as described in the paper but I have questions about the parameterization of the ligand in your pipeline and the use of AceForce 1.0 as I am not sure at which step it is used.
For the parameterization of the ligand:
To produce the files for the ligand (.cif and .frcmod) do you parameterize the ligands using just the interface of Parameterize tool (https://open.playmolecule.org/tools/parameterize?docs=true), using the force field GAFF2 and charge-fitting method RESP-PSI4 of Parameterize or did you compute the charges before / after parameterization with a different tool ? Was there any dihedral fitting using the Parameterize tool or was it turned off ?
Do you first do the partial charges with openFFRecharge Package and then use just Parameterize without any charges fitting ? Or do you use Parameterize tool first with charges fitting (like AM1-BCC, Gasteiger) and then use openFF Recharge Package to do the RESP ?
Thank you so much for your help, I am excited to hear from you.
The text was updated successfully, but these errors were encountered: