-
Notifications
You must be signed in to change notification settings - Fork 32
AMHGo Model And Umbrella Sampling
This page introduces the use of AMH-Go, a non-additve structure based model included in the AWSEM simulation package, and the use of umbrella sampling over a collective coordinate for the purposes of performing free energy profile and expectation value calculations. AMH-Go is a non-additive structure based model, which is described in detail in: Eastwood, Michael P., and Peter G. Wolynes. "Role of explicitly cooperative interactions in protein folding funnels: a simulation study." The Journal of Chemical Physics 114 (2001): 4702.
Obtain a structure of your protein of interest from the PDB (www.pdb.org) or elsewhere.
The structure should have coordinates for at least the backbone atoms and C-beta for all residues that you are interested in simulating. For the purposes of illustration, we will use Ubiquitin (pdb id: 1UBQ). Download the pdb file to a local directory. Use some tools, or edit the pdb file using a text editor, so that it only includes the chain that you want to simulate. In this case we will be simulating chain A, the only chain in this particular pdb file. If you are following along in this tutorial, rename this modified file to 1ubqA.pdb.
Next, run the AWSEM script to start generating the input files:
PdbCoords2Lammps.sh 1ubqA 1ubqA
The first argument is the name of the pdb file that you generated and modified but without the ".pdb" extension. The second argument will serve as the pre/postfix for the input files that are generated by the script. In this case, we have chosen to use the same name for both, but this is not necessary. Note that many of the scripts, if run with no arguments, will show the basic syntax, such as:
$ PdbCoords2Lammps.sh
> /home/ns24/opt/script/PdbCoords2Lammps.sh pdb_id project_name
The files generated include coord, data, seq and in files. The coord file is a temporary file used only for constructing the data file and may be safely deleted. The in file is the LAMMPS input script. The seq file contains the one letter amino acid sequence, and the data file contains generic information about the number of atoms and bonds in the simulation, the number of atom and bond types, the dimensions of the simulation box, the masses of the atoms, the atom IDS, chain IDs, residue IDs, atom types, atom charges and x, y and z coordinates of the atoms as well as bond coefficients and a list of bonds between atoms.
Next, copy amh-go.gamma and fix_backbone_coeff.data from the parameters directory to your local directory. Edit the fix_backbone_coeff.data file so that it contains only the following blocks, and make sure that none of the blocks have a "-" after the closing bracket, as this will inactivate the corresponding potential:
[Chain]
10.0 10.0 30.0
2.45798 2.50665 2.44973
[Chi]
20.0 -0.83
[Epsilon]
1.0
[Rama]
2.0
5
1.3149 15.398 0.15 1.74 0.65 -2.138
1.32016 49.0521 0.25 1.265 0.45 0.318
1.0264 49.0954 0.65 -1.041 0.25 -0.78
2.0 419.0 1.0 0.995 1.0 0.820
2.0 15.398 1.0 2.25 1.0 -2.16
[Rama_P]
3
0.0 0.0 1.0 0.0 1.0 0.0
2.17 105.52 1.0 1.153 0.15 -2.4
2.15 109.09 1.0 0.95 0.15 0.218
0.0 0.0 1.0 0.0 2.0 0.0
0.0 0.0 1.0 0.0 2.0 0.0
[SSWeight]
0 0 0 1 1 0
0 0 0 0 0 0
[ABC]
0.483 0.703 -0.186
0.444 0.235 0.321
0.841 0.893 -0.734
[AMH-Go]
1.0
2.5
8.0
In the [AMH-Go] block, the first parameter is the scaling of this term relative to the other terms. The second parameter is the "p-value", or non-additivity exponent. Higher p-values give high barriers and more cooperative folding. Values in the range of 2-3 have been found to give realistic levels of cooperativity. The third and final parameter is the cutoff for defining native contacts between residues, and is in units of Angstroms. For the purposes of this tutorial, we will use the values p=2.5 and a cutoff of 8 angstroms.
The amh-go.gamma file contains information about the strength of interactions between residues of different identities and sequence separation classes. The default file is consistent with those parameters used in the original study where the model was introduced and subsequent studies. The model is a "vanilla" model is the sense that the strength of interaction between residues does not depend on their identity. This can be easily changed by modifying the amhgo-gamma file, but that is beyond the immediate scope of this tutorial.
We now need to generate the amh-go.gro file, which will be used to determine which residues are in contact in the native state. Make sure that you renumber the PDB file that you will be using to generate the amh-go.gro file such that the first residue index is 1. Then, the file can be generated using:
python /path/to/awsemmd/tools/frag_mem_tools/Pdb2Gro.py 1ubqA amh-go.gro
where, again, this assumes that you have a file called 1ubqA.pdb and creates a file callled amh-go.gro. The gro file contains the number of atoms, residue IDs, residue types, atom types, atom IDs, and x, y and z coordinates of the atoms.
Finally, it is necesary to generate an "ssweight" file, which contains information about how to bias the secondary structure using the Ramachandran potentials. Since we don't want to provide any extra bias to the native structure in this case, we can just use a file containing 76 (the number of residues in 1UBQ chain A) of "0.0 0.0", meaning that the extra bias to an alpha or beta conformation are both zero for all residues.
$ for i in {1..76}; do echo "0.0 0.0"; done > ssweight
You are now ready to run your first simulation with the AMH-Go model. By issuing the command
$ lmp_serial < 1ubqA.in
you should see output that looks something like
LAMMPS (9 Oct 2012)
Scanning data file ...
3 = max bonds/atom
Reading data file ...
orthogonal box = (-10000 -10000 -10000) to (10000 10000 10000)
1 by 1 by 1 MPI processor grid
228 atoms
302 bonds
Finding 1-2 1-3 1-4 neighbors ...
5 = max # of 1-2 neighbors
5 = max # of special neighbors
76 atoms in group alpha_carbons
76 atoms in group beta_atoms
76 atoms in group oxygens
Chain flag on
Chi flag on
Rama flag on
Rama_P flag on
SSWeight flag on
ABC flag on
AMH-Go flag on
Setting up run ...
Memory usage per processor = 0.891418 Mbytes
Step Temp E_pair E_mol TotEng Press Volume
0 300 69.685911 2804.7907 2656.0812 -19470.21 37318.056
1000 309.6291 3.2541839 154.84377 -57.57167 377.89729 37318.056
2000 317.92045 4.5022927 145.15843 -46.009096 -513.3088 34061.896
3000 289.04461 4.2000368 136.29229 -80.36671 -53.054637 34061.896
4000 284.42523 3.7390817 122.59595 -81.326705 -701.07713 32570.479
5000 338.64666 4.4512755 122.41484 -38.536759 533.1893 35143.81
6000 284.88008 1.6497933 105.21846 -101.70821 708.35035 37609.352
7000 309.85035 2.2742725 107.32425 -71.231702 319.249 36253.327
8000 288.93468 4.5076069 114.54767 -69.546086 144.3742 37828.582
9000 318.63316 3.7868783 101.90289 -60.801079 -147.73095 37141.499
10000 317.62052 2.9372909 106.25488 -54.292209 172.91441 36277.457
Loop time of 5.3855 on 1 procs for 10000 steps with 228 atoms
Pair time (%) = 0.740053 (13.7416)
Bond time (%) = 0.0497167 (0.923159)
Neigh time (%) = 0.00521493 (0.0968328)
Comm time (%) = 0.00129962 (0.0241318)
Outpt time (%) = 0.00178814 (0.0332029)
Other time (%) = 4.58743 (85.1811)
Nlocal: 228 ave 228 max 228 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost: 0 ave 0 max 0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs: 10891 ave 10891 max 10891 min
Histogram: 1 0 0 0 0 0 0 0 0 0
FullNghs: 21782 ave 21782 max 21782 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Total # of neighbors = 21782
Ave neighs/atom = 95.5351
Ave special neighs/atom = 2.64912
Neighbor list builds = 10
Dangerous builds = 0
In order to sample different parts of the landscape, it is often useful to apply biases to your simulation Hamiltonian. In this case, we will apply a bias to the Q coordinate. Low values of Q correspond to unfolded structures, while high values of Q correspond to folded, near-native structures. To apply a bias to your molecule, you can add code like this to your input file:
# apply a qbias
fix qbias alpha_carbons qbias fix_qbias_coeff.data # apply a Q-biasing potential as specified in fix_qbias_coeff.data
fix_modify qbias energy no # do not add the energy from the Q-biasing potential to the total potential energy
variable biasinge equal f_qbias
See the LAMMPS manual for general information about how "fixes" work. In this case, the first line applies a fix with the label "qbias" to the "alpha_carbons" group, which is of type "qbias" and takes as input the file "fix_qbias_coeff.data". The qbias fix assumes that you have a file named "rnative.dat" in the current directory which contains a square matrix of native CA-CA distances. Qbias fix uses that matrix to calculate the instantaneous Q value of the simulated structure, the biasing energy, and the biasing forces at every timestep. The procedure for generating rnative.dat is described below. The second line, starting with "fix_modify", prevents the potential energy coming from the bias from being automatically added to the total potential energy for the purposes of output. Whether or not you want to automatically include the biasing energy into the total potential energy depends on how you will be subtracting out the bias later. A complete explanation of this is beyond the scope of this tutorial, but most WHAM-like algorithms explicitly calculate the biasing energy internally and add it to the unbiased total potential energy, so including it here would be tantamount to double-counting. The last line creates a variable so that the biasing energy can be output separately if so desired.
The fix_qbias_coeff.data file has a format similar to that of fix_backbone_coeff.data discussed previously. An example fix_qbias_coeff.data file looks like:
[QBias_Exp]
KQBIAS
Q0
2
0.15
In this case KQBIAS should be replaced by a floating point number that dictates the strength of the biasing potential. For models with larger barriers it is necessary to apply larger biasing potentials, but applying too larger of a biasing potentials will reduce the overlap between adjacent sampling "umbrellas" and thereby impact the quality of the unbiasing calculation. In general it is necessary to check whether or not your sampling is sufficient by (at least) checking for significant overlap between sampling umbrellas. Values of KQBIAS between 100 and 1000 are appropriate for different models/proteins/purposes when using AWSEM. The Q0 parameter should replaced by a number between 0 and 1 and specifies the center of the biasing potential. The next two lines indicate that the biasing potential is harmonic (an exponent of 2) and that the width of the gaussian well used for biasing increases with increasing sequence separation according to |i-j|^0.15.
AWSEM and LAMMPS can be used to compute some collective coordinates on-the-fly during the simulation using LAMMPS' compute functionality. An example block of input code that computes several collective coordinates including a Q value, radius of gyration, contact Q, the number of total contacts, and the total potential energy, is given below (replace SAMPLETIME with how often you want to output to the wham.dat file, e.g., 1000):
# output wham.dat, a file with collective variables that can be used to compute pmfs
compute qw alpha_carbons qwolynes rnative.dat 2 0.15
variable qw equal c_qw
compute rg alpha_carbons gyration
variable rg equal c_rg
compute qo alpha_carbons qonuchic cutoff 12.0 nativecoords.dat 1.2
variable qo equal c_qo
compute tc beta_atoms totalcontacts 6.5 2
variable tc equal c_tc
variable step equal step
variable E_P equal pe
fix wham all print SAMPLETIME "${step} ${qw} ${rg} ${qo} ${tc} ${E_P}" file wham.dat screen no title "# timestep qw rg qo tc energy"
In order to use these compute functions, it is necessary to prepare the rnative.dat and nativecoords.dat files. This can be done using:
python /path/to/awsemmd/tools/create_project_tools/GetCACADistancesFile.py 1ubqA rnative.dat
python /path/to/awsemmd/tools/create_project_tools/GetCACoordinatesFromPDB.py 1ubqA nativecoords.dat
Note:Use GetCACADistanceFile_multiChain.py for multi-chain simulation. The result of running a simulation with code like that above in the input file should be a file called wham.dat in the directory where the simulation was run.
Once these extra items have been added to the input file, you are ready to run umbrella sampling by specifying an array of Q0 values over a set of directories. All of these simulations can be run in parallel, and once the simulations have finished you will want to collect the wham.dat files, and perhaps other files such as energy.log, in order to compute 1D and 2D free energy profiles and expectation value plots. Performing these calculations is beyond the scope of this tutorial, but you may find the following tools to be useful: WHAM.jar pyMBAR Grossfield WHAM