Contact: Darren Hsu (darrenhsu2015 at u.northwestern.edu)
In time-resolved X-ray solution scattering, researchers obtain difference signal as a function of system evolution. In my case, a protein assumes different states and gives distinct difference scattering signals. We want to find some structures that give the difference signal that match the difference signal.
We input the X-ray scattering signal as a constraint in the MD simulation. The program calculates X-ray scattering signals at each defined snapshot, compare it to the reference signal and calculate the difference signal. It then compare the calculated difference signal to the input one, and it uses the deviation between them to derive force that acts on each atom. The math is described in our paper.
-
It takes care of scattering intensity changes due to surface area variation. This is necessary if you have an (un)folding process to explore. For example, a study focusing on molten globule states will benefit a lot.
-
It is fast. If force is evaluated every 50 steps, the computational overhead is about 3 - 4 %.
-
Only one simulation box is required. You solvate the protein, equlibrate it, fit to the static scattering data, and then fit to the difference data.
-
It is compatible with some enhanced sampling methods such as metadynamics. This again comes in handy when exploring the conformational space for an unfolding reaction, while the correction force keeps the structure from deviating from the difference signal.
This program is written in CUDA C and runs only on Nvidia GPUs. It assumes that you have access to an Nvidia GPU. At the time of writing, this code has been tested on K40, K80, P100, and Quadro 6000 cards.
-
For some parts GNU Scientific Library (GSL) >= 2.5 is required. Your cluster may have it installed. In that case, set the path in the Makefile to it. Otherwise, you need to install a local copy of GSL and point to the directory in the Makefile. More on that in the Installation section.
-
Nvidia's nvcc compiler is required. Consult your system administrator about cuda availability on your machine. At the time of writing we are using cuda/9.1.85.
-
SWIG >= 3.0.x interface is required. It's likely already installed on your machine. See SWIG website for more inforamtion about installing it locally on your server.
-
There is a basic python (python 3, numpy, scipy) files for converting pdb to input C code. You should not need an env for running the provided python script (input.py).
Since the code is intended to be a module of NAMD, NAMD (CUDA build) should be
installed on your machine. This code is tested with NAMD 2.11.
You need to know the path to executable namd2
.
Note about NAMD 2.12 and 2.13: It seems that the pre-built binary version
of NAMD 2.12 and 2.13 (both multicore-CUDA and ibverbs builds) have a different
behavior when its TCL script loads XSMD.so
which generates errors regarding
undefined symbols such as Tcl_GetStringFromObj
. To run on those versions you need
to compile the NAMD from source code using the TCL library they provide.
Basically follow the instruction in their release note.
Simply download this repo by git clone https://github.com/darrenjhsu/XSNAMD
on your machine. It creates a directory called XSNAMD/ in your working directory.
Now follow the READMEs in both root directory and example/ for more information.
Example datasets and a tutorial can be found in the example/ folder. The input files are prepared for you, so you can try out the program.
-
Prepare a run-able NAMD simulation system (see NAMD tutorials for more information if you are not familiar with NAMD simulations)
-
After that, prepare (see below) experimental data so that you can compile the
XSMD.so
file. -
Re-run the NAMD simulation with tclForce turned on and points to the
XSMD.so
while running simulations.
XSNAMD/
| input.py
| make_input
| Makefile
| Readme.md
|---bin/
| |---PROT1/
| speedtest.out
| fit_initial.out
| fit_traj_initial.out
| 1e-5/ # Each k_chi will have its own directory. Might change.
| XSMD.so
| backup_code/
| |---PROT2/
| # Similar structures as PROT1/
|---data/
| |---PROT1/
| PROT1.psf
| PROT1.pdb
| S_exp.txt
| dS_exp.txt
| mol_param.cu/hh # Will be generated along prep
| scat_param.cu/hh # Will be generated along prep
| env_param.cu/hh # Will be generated along prep
| coord_ref.cu/hh # Will be generated along prep
| |---PROT2/
| # Similar structures as PROT2
|---include/
| *.hh files
|---lib/
|---src/
| *.cu files
If you have two known structures, and you want to test if the program can drive one structure to another
You will calculate the scattering signals from both structures and prepare a mock dataset of q, S_ref, and dS.
Good for test runs and for understanding how the program works. Refer to the README in example/ for more information.
It is encouraged that you at least do an equilibrium run for the structure and fit the average of that run to the static measurement for c, c1 and c2. To do that see next section.
-
Assuming the input data are prepared in data/PROT1/, where PROT1 is your protein's name.
-
An input python file input.py contains pointers to the PDB and PSF files (that you probably generated through
psfgen
in VMD), absolute scattering profiles (q, S(q), Serr(q)), difference scattering profiles (q, dS(q), and dSerr(q)) if you have it, and a few parameters. This is the file to edit when changing the parameters. Please refer to the comments and instructions in the file. -
Run
python input.py
to parse the PDB and PSF files and generatemol_param.cu/hh
andcoord_ref.cu/hh
-
Run
make fit DSET=PROT1
to compilefit_initial.cu
. In doing so an .out file calledfit_initial.out
will be placed inbin/PROT1/
. The Makefile reads thesrc
and alsodata/PROT1
for all relevant source code files. -
Execute
fit_initial.out
to calculate c1 and c2 that best fit your data. See the Input arguments section below for input arguments for this .out file. On an HPC this is usually done through a job submission script. Yours may vary. This will also generate thescat_param.cu/hh
with the data you provided properly scaled to the calculated curve, in the directorydata/PROT1
. -
Run
make KCHI=1e-5 DSET=PROT1
to finally compile theXSMD.cu
to a shared object that NAMD will call every step. DSET is the name of your dataset. KCHI can be just a string. It does not need to be the actual spring constant. It simply determines where in thebin
theXSMD.so
will be placed. The actual spring constant is dictate in thedata/PROT1/env_param.cu
. For example, KCHI can be20kcal
so you know you are supplying 20 kcal/mol of initial X-ray scattering related potential. -
When running NAMD, add these keywords in your NAMD config files. See the examples for more information.
tclForces on
set opt 0 # 1 when you are ready to turn it on
tclforcesscript XSMD.tcl # See /doc/XSMD.tcl for details
set XSMDrestartFreq 5000 # Will write to .restart.XSMDscat and restart.XSMDEMA files
set XSMDoutputName $outputname
set XSMDrestart 0 # 1 if it's a continuing XSMD run
if {$XSMDrestart} {
set XSMDrestartScat $outputname.restart.XSMDscat
set XSMDrestartEMA $outputname.restart.XSMDEMA
}
During the simulation, look at the log file. It should show the chi square decreasing gradually.
This is the ideal setting and is the typical case of a real application.
-
Assuming the input data are prepared in data/PROT1/, where PROT1 is your protein's name.
-
An input python file input.py contains pointers to the PDB and PSF files (that you probably generated through
psfgen
in VMD), scattering profiles (q, S_q, dS_q and S_err if you have it), and a few parameters. This is the file to edit when changing the parameters. Please refer to the comments and instructions in the file. -
Run
python input.py
to parse the PDB and PSF files and generatemol_param.cu/hh
andcoord_ref.cu/hh
. -
Export the atomic coordinates of the solute (a.k.a. the atoms you're using to calculate the scattering pattern) from the equilibrium run as a multiframe xyz file named
trajectory.xyz
to the root folder. -
Run
make fit_traj DSET=PROT1
to compilefit_traj_initial.cu
. This will compile files indata/PROT1/
andsrc
to producebin/PROT1/fit_traj_initial.out
. -
Execute
fit_traj_initial.out
to calculate c1 and c2 that best fit your data using the trajectory as the ground state ensemble. See the Input arguments section below for input arguments for this .out file. On an HPC this is usually done through a job submission script. Yours may vary. This will also generate thescat_param.cu/hh
with the data you provided properly scaled to the calculated curve, in the directorydata/PROT1
. -
Run
make KCHI=1e-5 DSET=PROT1
to finally compile theXSMD.cu
to a shared object that NAMD will call every step. DSET is the name of your dataset. KCHI can be just a string. It does not need to be the actual spring constant. It simply determines where in thebin
theXSMD.so
will be placed. The actual spring constant is dictate in thedata/PROT1/env_param.cu
. For example, KCHI can be20kcal
so you know you are supplying 20 kcal/mol of initial X-ray scattering related potential. -
When running NAMD, add these keywords in your NAMD config files.
tclForces on
set opt 0 # 1 when you are ready to turn it on
tclforcesscript XSMD.tcl # See /doc/XSMD.tcl for details
set XSMDrestartFreq 5000 # Will write to .XSMDscat and .XSMDEMA files
set XSMDoutputName $outputname
set XSMDrestart 0 # 1 if it's a continuing XSMD run
if {$XSMDrestart} {
set XSMDrestartScat $outputname.restart.XSMDscat
set XSMDrestartEMA $outputname.restart.XSMDEMA
}
During the simulation, look at the log file. It should show the chi square decreasing gradually.
Manuals coming ...
-
fit_initial.out
takes 6 arguments (c1 initial, c1 step, c1 end, c2 initial, c2 step, and c2 end) to define a scan range. For example:./fit_initial.out 0.95 0.01 1.05 0.0 0.1 4.0
-
fit_initial.out
takes 9 arguments (trajectory file name, nubmer of frames in that file, how many frames (from the last) to use for fitting, c1 initial, c1 step, c1 end, c2 initial, c2 step, and c2 end) to define a scan range. For example:./fit_traj_initial.out mytraj.xyz 2000 500 0.95 0.01 1.05 0.0 0.1 4.0
You can cheat this program to use a middle section by setting the number of frames smaller than actual number of frames. For example, you have a 2000 frame trajectory
mytraj.xyz
, you can use the frames 1000 - 1500 by:./fit_traj_initial.out mytraj.xyz 1500 500 0.95 0.01 1.05 0.0 0.1 4.0
-
XSMD.so
takes 7 arguments while being called by the XSMD.tcl. They are: (1) an array of coordinates, (2) an array of force, (3) an array of old force (not used), (4) and array of scattering intensity, (5) frame number, (6) normalizing constant of exponential moving averaging, and (7) whether XSMD is a restart run. Ideally you don't need to worry about these arguments.
Contains a function which will be compiled into a .so shared object file. In this function kernels will be sequentially called to calculate solute surface area, scattering patterns, and forces that should be applied to the solute atoms.
Contains kernel functions that actually perform scattering calculations.
The interface file that goes along with SWIG, which enables tcl to call a C function. It also provides some helper functions such as declaring float arrays in C from tcl side.
A file that looks like XSMD.cu
but reads in coordinates from coord.cu
. This
allows you to test the functionality of these kernels (as well as do speed
tests if you use nvprof in the HPC job submission.)
Tells number of atoms and atom types. Atom types are defined in WaasKirf.cu and
are used in the FF_calc
kernel in kernel.cu
. This file is different for
every system you want to run simulations on.
Tells some environmental parameters such as k the spring constant (that is
inherited all the way from the initial input.py
) or rho the solvent electron
density (0.334 for water at 20 deg C). delta_t and tau are defined in the
paper. This file is different for every system you want to run simulations on.
Contains number of q points, c1, c2, c, scattering patterns of the reference structure (or ensemble run), the difference pattern scaled to the scattering magnitude of reference signal, and error of the difference signal also scaled. Note that c is not used in the actual calculation. This file is different for every system you want to run simulations on.
This project is to use GPU to accelerate X-ray scattering calculation with Debye formula looping over atoms and use it in MD simulation. Concepts were taken from Bjorling et al. JCTC 2015, 11, 780 and exponential moving average is taken from Chen & Hub, Biophysics J, 2015.
In short, at each interval we evaluate the scattering profile, and take the negative gradient (definition of force) of chi square, which is a function of all coordinates (only).
The calculation of scattering profile is the same as in FoXS but the form factors are calculated explicitly. The solvation shell contrast coefficient is currently set as adjustable just as in FoXS: Schneidman-Duhovny et al, NAR 2010.
The atomic form factors in vacuum are calculated using Waasmaier-Kirfel table.
The volume for dummy atom calcualtion were taken from Svergun 1995 J Appl Crystallgr paper, which refers to Fraser 1978 J Appl Crystallgr paper and International Tables for X-ray Crystallography (1968).
The surface area calculation is done numerically following J Appl Crystallgr 1983 Connolly "Analytical Molecular Surface Calculation." Rasterized points sample the vdW sphere, which has to be outside of any other vdW spheres of other atoms. Extended by solvent radius, the point (solvent center) must also be far enough from the vdW spheres of other atoms.
In the surface area calculation part the spiral generating function is from Bauer 1998 Guidance Navigation and Control Conference and Exhibit paper.