-
Notifications
You must be signed in to change notification settings - Fork 32
Monomer Structure Prediction
This page introduces the use of the AWSEM simulation package, and the use of simulated annealing simulation for the purposes of predicting structure of monomer. The designed protein Top7 (PDB ID: 1QYS) was chosen to be the target for this example.
• Fasta sequence (ID.fasta.text from PDB), rename it to ID.fasta
• PDB file (ID.pdb)
It is important to check and make necessary edits to the sequence in the Fasta file so that it contains only the portion of the sequence that have coordinates in the PDB file.
For example: the first 2 residues and the last 12 residues in the fasta file of Top7 (PDB ID: 1QYS) were cut off to match with the sequence in the PDB.
Generate ssweight file using JPRED prediction tool
-
Go to JPRED homepage: http://www.compbio.dundee.ac.uk/www-jpred/
-
Feed the sequence from fasta file into JPRED. Choose to continue carrying out a Jpred prediction. When the prediction is complete, look at the prediction in “ViewSimple”
-
Copy the JPRED prediction into a new text file, called IDjpred
-
Call command to generate ssweight file
python /awsemmd/tools/create_project_tools/GenSswight.py IDjpred ssweight
Generate ssweight file using STRIDE server
• Access the Stride Web interface: http://webclu.bio.wzw.tum.de/stride/
• Output the result (in plain text format) and save as ssweight.stride
• Issue command to generate ssweight file from STRIDE assignment:
python /awsemmd/tools/create_project_tools/stride2ssweight.py > ssweight
rnative.dat can be generated by the following command:
python /awsemmd/tools/create_project_tools/GetCACADistancesFile.py ID rnative.dat
nativecoords.dat can be generated by the following command:
python /awsemmd/tools/create_project_tools/GetCACoordinatesFromPDB.py ID nativecoords.dat
Obtaining and Preparing Protein Database
• You can obtain your own database of structures with desired resolution and maximum mutual sequence identity by using the PISCES Protein Sequence Culling Server (http://dunbrack.fccc.edu/PISCES.php). The server will give you a FASTA file as output.
• To generate the Fragment Library you need a database of well defined structures in BLASTable format and a FASTA file which contains the sequences of those structures. The FASTA file should have the same prefix as the database. If you already have a FASTA file you can convert it to BLASTable format using makeblastdb executable.
makeblastdb -in database-prefix.fasta -out database-prefix -dbtype prot
Output: database-prefix.phr; database-prefix.pin and database-prefix.psq
Generatng Fragment Library
You now can run the following script to generate fragment library for a single-chain simulation.
python /awsemmd/tools/frag_mem_tools/prepFragsLAMW_index.py database-prefix ID.fasta 20 1/0
Where 20 is typically the desirable number of memories per position. The last number represents the option of homolog excluded (1), and homolog allowed (0). Homolog excluded is used for de novo structure prediction, in which all sequence homologs will be excluded from the search.
The above script will give you /frablib/ directory; fragsLAMW.mem as outputs. fragsLAMW.mem file contains {Memories} section with one line description of memories found. The coordinate files (with .gro extensions) are also generated by the scripts and can be found in ./fraglib/ directory.
[Memories]
./fraglib/2q3xa.gro 1 1462 6 1
./fraglib/3s3ea.gro 2 169 7 1
./fraglib/3l48a.gro 2 783 8 1
./fraglib/1q3oa.gro 1 646 9 1
When running homolog excluded simulations, turn on Memory or Memory Table in fix backbone coeff.data; the later one uses precomputed tables for routine energy and force computations and is much faster compared to Memory.
Run the following script to generate the data file, sequence file, and input file.
PdbCoords2Lammps.sh ID project_name
It will give you three files as outputs: data.project name; project name.seq; project name.in.
Bellow are list of files that will be needed to run monomer structure prediction:
• anti HB; anti NHB; anti one; para HB; para one; gamma.dat; burial gamma.dat; uniform gamma; fix backbone coeff.data. These files can be obtained from the /parameters/ directory of AWSEM source code.
• ssweight
• rnative.dat, nativecoords.dat
• fragsLAMW.mem with correct paths link to gro file in /fraglib/
• data.project name; project name.seq, project name.in
• Run a short equilibration at a constant temperature above the folding temperature (Tf) to unfold the protein into a long extended chain.
• Run a long simulation (typically 10 milions 2fs time steps) while slowly bringing the temperature from above Tf to below Tf.