-
Notifications
You must be signed in to change notification settings - Fork 3
The MHCBI pipeline
Welcome to the BindingInteraction wiki!
The MHCBI
pipeline has four stages (last stage is optional) in which each stage has its own algorithm. Named here as steps, each one has a main work:
First step modifies PDB structure (input) adding waters through Dowser program and performs some optimization procedures for refining PDB geometry. At the end of this step a PDB file and .arc file (MOPAC output) are used as input for the Second step.
Second step uses First step output and listm.log for building new structures according to the written orders in listm.log, those new structures are named here as mutations.
Each mutation is performed by Dunbrack library in chimera program, it is possible to build mutiple amino acid changes for the same PDB structure. To sum up, Second step creates new similar protein geometries from the initial PDB structure in the First step. After creating new geometries, Second step performs partial optimizations.
listm.log file has the using instructions for creating new PDB structures. Read completely listm.log file before writing your mutation list. There are two examples of writing mutation list for mono- and multiple-substitutions. The only part that you are going to change is explicitly explained in the listm.log between
*** Begin
and (Changes must be here)
*** End
In this frame there is an example that you can use or remove according your necessities, in the example there are ten mono-substitutions named as P01
, P02
, until P10
. Amino acid that will be used in this example is Lysine for the ten cases for positions 1 up to 10 in the chain P from PDB structure.
At the end of the Second step performance the output is created as tobe_charged folder. Inside that folder there are all new PDB structures partially optimized, the initial PDB structure and the list of 8-angstrom-distance residues from the ligand (main part of protein binding region) that could be protonated or deprotonated.
Third step uses tobe_charged folder (Second step output) as input for using propka 3.1 for protonating and deprotonating residues in each PDB structure (geometries set) and performs single points throught MOPAC splitting each PDB structure in Complex, Receptor and Ligand. At the end output is generated as final_pdbs and be_outputs folders. In the first folder there are the PDB files that will be used for creating FMO GAMESS input (optional) and in the second folder there are .arc files that contains the energy as:
HEAT OF FORMATION = (energy value in Kcal/mol)
Those energies will be used by the user for computing binding energies.
Optional Fourth step uses final_pdbs folder as input for creating input for GAMESS computations. This step depends on user supervision due to the GUI Facio environment. You have to put your GAMESS inputs in the same PDB structures place. After GAMESS execution You can extract the energies from output file (.log file) by looking for (at the end of the file):
The first energy printed below is the best in FMO/PCM
Free unco+D energy in solvent= (energy value in hartree)
Those energies will be used by the user for computing binding energies. All hierarchy of each step is explained below (go to hierarchy).
After performing all steps in the installing instructions you can run the pipeline over a new project.
$ ./pre-run.sh
You have to complete all steps in order. First step is aimed to set some paths prior to run a new project:
Notepadding: A new project is a work performed by MHCBI
pipeline passing through the three stages: optimizations, mutations and calculations. All folders, inputs and outputs created during the whole process are the new project. Each individual PDB structure has one new project.
PDB structure path is the directory where is placed the new project PDB structure.
PDB structure name is the PDB structure (located in the previous path). A PDB format can have several valid formats, however in this MHCBI
pipeline, format has the following guideline:
- PDB structures having a receptor-ligand MHC-like
Receptor has to be named as chain A and B and ligand as chain P. In case of having crystallographic water molecules, all of them must be named as chain V.
- PDB structures having a 12-column features
In test folder there are two examples (shortened 1BX2 and 3OXS) having 12-column features as it should be used for every PDB structure in any project. In the PDB structure for a new project remove all information not related with 3D atomic coordinates.
- PDB structures name
PDB structure names have to be named only with alpha-numeric characters and without any blank space. Remove any special character (#$/!@'´{}}}}
), even any kind of hyphen (-_
), or punctuation (.,:;
) in your PDB structure name.
to-do work path is the directory where you want to place the new project.
Name of your work is the name of your new project.
Setup and pre-run scripts create .log files having all required paths. You can modify them without using the pipeline, but you can not change any word related to the format. It is advisable to use the pipeline for setting paths.
In pre-run script, Second step configures pipeline core in all of the new folders that will be the new project. If the MHCBI
can not detect one of the required paths, configuration step will not be performed.
After configuring the new project you can run the pipeline. Third step presents two options for running MHCBI
over a PDB structure:
- Performing the three-stage methodology in a single step.
- Performing the three-stage methodology step by step.
Step Instruction Command
1 Set work and PDB directories ./pre-run.sh (option 1)
2 Configure folders and scripts for the new project ./pre-run.sh (option 2)
3 Run the MHCBI pipeline ./pre-run.sh (option 3)
Now you can analyse your results!
The MHCBI
pipeline has several paths. First path is the directory where you download and open the MHCBI
pipeline (by your own or using git clone) this directory is named as the git path. You can configure and install the pipeline in the git path or in another path. If you are going to modify some scripts according to your necessities, it is advisable to configure and install the pipeline in other location.
If you use other location for configuring and installing the pipeline, the second path will be officially the MHCBI
path, otherwise git and MHCBI path will be the same. In MHCBI
path you can test the pipeline and configure your own projects.
Pipeline test will remain in the MHCBI path and all configuration programs and setting works will be in this directory (MHCBI
path).
When you are configuring paths for a new project you have to use another directory for placing your new project. This new directory is named as workdir path.
This directory contains the MHCBI
pipeline as appearing in GitHub repository. In this directory you can modify, add and remove any lines of code that you consider appropriate for your needs. If you think that your changes are important and could improve the MHCBI
pipeline, do a pull request explaining your upgrades. Every time that you change your code you have to repeat the installation and configuration process doing ./clean.sh
in git path first of all.
This path contains the following folders:
- BindingInteraction
- docs
- misc
- source
- test
Where in the first folder there are manager scripts that control all scripts placed in the fourth folder. Detailed documentation about the MHCBI
pipeline are placed in the second folder. Pipeline test are in the fifth folder and DFTB3 parameters are in the third one.
For further information go to each folder and read their README files
In this directory, the pipeline is going to work. Do not change any line of code in this directory (read git path). In this directory test folder is going to change, adding a new folder that contains the test performance.
The following folders have the same information than git path ones:
- misc
- source
- test
In this directory your new project will be placed. Test folders in MHCBI
path and workdir path folders have the same hierarchy.
In this directory there are the following folders:
- optimizations
- mutations
- calculations
- fmo-calculations
- misc
- source
When finishing the pipeline execution you can explore the followings folders:
1. optimizations
This folder corresponds to the all First step and you can explore step by step of the methodology into the folder that receives the name of your new project. Output folder in this step contains the result that Second step will use. Into the folder (named as your new project by the pipeline) there are six steps:
- Dowser (putting waters)
- Conversion to PDB structure for MOPAC
- Assigning hydrogens
- Optimizing hydrogens
- Optimizing all PDB structure
- Removing all waters
2. mutations
This folder corresponds to the all Second step and you can explore it by looking for each mutation folder. In this location there are two folder per each mutation (previously written in listm.log). Each pair of folders are named like this:
name_of_substitution folder and mutation_name_of_substitution folder.
For instance: if the substitution name is P02
the name of its pair of folder will be: P02
and mutation_P02
.
In the first folder you will find all substitution process and assignation of new hydrogens. In the second folder you will find the partial optimization process for this substitution. At the end if you set ten mutations you will find twenty folder (two per substitution).
Additionally, there is a folder named as tobe_charged. This folder is the Second step output.
3. calculations
This folder corresponds to the all Third step and you can explore it by looking for each mutation folder. In this location there is one folder per each pdb file located in the tobe_charged folder (that in this location is the input).
Additionally, there are another two folders (final_pdbs and be_outputs) that are the Third step output.
4. fmo-calculations
This folder corresponds to the all optional Fourth step and contains two folders. First folder named as input_pdbs that is a part of the Third step output (final_pdbs). The second folder is named as fmo_molecules in which there are all FMO GAMESS input and a outputs results.
5. misc
This folder is the same that are placed in git path.
6. source
This folder is the same that are placed in git path.
For understanding in detail each file in each folder it is necessary to study all scripts in the source code.
Some Open Babel operations generate messages that could be frightening, however those messages are atoms (ATOM
) in PDB format named as HETATM
by Dowser operations, do not worry about it.
When Dowser puts waters, some conversions by OpenBabel generates some warnings that look like:
=============================
*** Open Babel Warning in parseAtomRecord
WARNING: Problems reading a PDB file
Problems reading a HETATM or ATOM record.
According to the PDB specification,
columns 77-78 should contain the element symbol of an atom.
but OpenBabel found ' ' (atom 3)
==============================
In case of not finding waters by Dowser, messages like this could appear:
PDB file '/home/work_path/work_name/../dowser/1_iter/placed_waters_1.pdb' contains no atoms.
PDB file '/home/work_path/work_name/../dowser/1_iter/placed_waters_2.pdb' contains no atoms.
PDB file '/home/work_path/work_name/../dowser/1_iter/placed_waters_3.pdb' contains no atoms.
1 molecule converted
The following message could appear due to a non-existent binary path related to another process in vmd:
rlwrap: Command not found.
That message do not affect the MHCBI
performance.
🥇