These jupyer
notebooks, and associated python
3 source code, can be used to
generate the figures in Motor-like properties of non-motor enzymes by Slochower and Gilson, 2018. We have several
other notebooks used to explore the data -- that are not hosted here -- and are
happy to share them. Please feel free to contact slochower@gmail.com
for
details.
Archived and current versions of the manuscript are available on bioRxiv.
- Version 1, April 19 2017
- Version 2, May 26 2017
- Version 3, November 7 2017
- Version 4, February 7 2018 (to be published in Biophysical Journal)
This code does not prepare or perform any of the MD simulations. We assume that two separate equilibrium simulations have been completed for each system: one in the apo (substrate-not-bound) form, and one in the substrate-bound form. In the system we tested, 1 μs of simulation time was sufficient for the dihedral histograms to converge, but this likely depends on the protein and simulation details.
generate-dihedral-histogram-input-files.pl
can be used to generate an input
file for cpptraj
that will create one file per torsion of interest. The torsions are defined in the perl
script and use AMBER FF14SB atom types. The output files from cpptraj
contain the equilibrium probability distribution of each torsion for a given trajectory; the script will need to be run separately for apo and substrate-bound trajectories.
The easiest way to run the perl
script is to start with a PDB file for one protein structure and get a list of the residues by grepping for the CA
atoms:
echo "parm <system.prmtop>" > cpptraj.in
echo "trajin <system.trajectory>" >> cpptraj.in
grep CA <system.pdb> | awk '{print $5,$4}' | xargs -L 1 \
perl generate-dihedral-histogram-input-files.pl >> cpptraj.in
- There is an AMBER style parameter file and AMBER format trajectory. As far as I know, this can also be accomplished using Gromacs tools.
- The apo and substrate-bound structures are a perfect residue-for-residue match.
- There are no non-standard amino acids (they will simply be omitted).
cpptraj
may complain about the phi and psi angles of the first and last residue.- Like the previous point, if there are discontinuities in the protein, the input file may require adjustment.
- Files of format
{torsion}{RES}{ID}.dat
that contain two columns: the first column is the angular value (in degrees) of bin$i$ , and the second column is the probability of being between bin$i$ and$i+1$ averaged over the trajectory.
Create a conda environment using conda env create -f build/environment.yaml
.
All of the data processing and model-building is carried out in simulation.py
using the class Simulation()
. For each system, four pieces of information need to be configured:
- The path to the torsion histograms (produced in step 1 or otherwise). If the format of the histograms is not as described, the
np.genfromtxt
calls in the functionsimulate()
can be modified. - The parameter
C_intersurface
-- which is the same ask_on
in the manuscript -- the bimolecular on rate for the substrate and enzyme. This should be specified in units of per molar per second. - The parameter
offset_factor
-- which is the same asmu_offset^*
in the manuscript -- the effective free energy offset of the apo system. This should be specified in units of kcal per mol. - The parameter
catalytic_rate
-- which is the same ask_cat
in the manuscript -- the catalytic rate of the enzyme. This should be specified in units of per second.
These four parameters are defined in the __init()__
function of the class and have corresponding values for each data_source
. They are already populated with default values for PKA, ADK, and HIVP. See Supplementary Tables 1 and 2 in the manuscript for more information.
Note that these parameters can be overridden at run time, as shown below.
There are basically two ways to interact with this code. The first is to poke around and look at individual files. The second is to generate the flux, power, and torque data for all the files over a range of parameters (e.g., substrate concentration), store or save that data, and then take specific views of the sliced data. This approach was used to generate the threshold plots in the manuscript.
To look at a particular torsion, we need only to specify the data set and the name of the torsion.
this = Simulation(data_source = 'adk_md_data')
this.name = 'chi2THR175'
this.simulate()
By calling plot_energy(this)
, we can see the free energy surface:
...and by calling plot_flux(this)
, we get the probability flux along each surface:
Other plotting routines are in plot.py
. Any of the class properties can be accessed after calling simulate()
; here is a partial list:
unbound_population
,bound_population
: The normalized probability distributions along either surface, read from the input files.tm
: The Markov transition matrix that contains the fractional probability flow in timedt
.ss
: The nonequilibrium steady-state across both surfaces.flux_u, flux_b
: Probability flux along the apo and bound surface.flux_ub
: Probability flux between the two surfaces.
The easiest way to look at distributions in the data is to first scan through the files, record the interesting results, and then plot. The notebook scan-concentrations
is a good template for how to populate a pandas
dataframe with flux, power, and torque for a group of files over a range of substrate concentration. These can then be used to make complex views into the data, such as the number of torsions with
To facilitate data exploration, I've included pickle
files used to make the figures in the manuscript.
adk-concentration-scan.pickle
pka-concentration-scan.pickle
hiv-concentration-scan-catalytic-rate-10.pickle
hiv-concentration-scan-catalytic-rate-200.pickle
Then, it is easy to slice the data for particular quantities of interest:
df = pd.read_pickle('adk-concentration-scan.pickle')
fig = plt.figure(figsize = (6 * 1.2, 6))
ax = df[df['Directional flux'].abs() > 1]['Directional flux'].plot(kind='hist')
ax.set_xlabel('Directional flux, given $|J| > 1$')
ax.set_ylabel('Count')
paper_plot(fig)
...or make a facet plot across a concentration range:
tmp = tmp[(tmp['Concentration'] == -6) |
(tmp['Concentration'] == -5) |
(tmp['Concentration'] == -4) |
(tmp['Concentration'] == -3) |
(tmp['Concentration'] == -2) |
(tmp['Concentration'] == -1) ]
g = sns.FacetGrid(tmp, col='Concentration', col_wrap=3, sharex=False)
sns.set(style="ticks", font_scale=1.5)
g = g.map(plt.hist, 'Driven flux')\
.set_axis_labels(r'Reciprocating flux', r'Number')\
.set_titles(r'[S] = $10^{{ {col_name} }}$ M')
g.fig.tight_layout(w_pad=1)
Another useful feature is the ability to filter on the dataframe. Here is a list of angles that have directional flux magnitude below 1 cycle per second with driven flux greater than 1 cycle per second, at a substrate concentration of 0.001 M:
tmp = df.round({'Concentration': 2})
tmp = tmp[tmp['Concentration'] == -3.0]
tmp[(tmp['Directional flux'].abs() < 1) & (tmp['Driven flux'] > 1)].head(5)
Concentration | Directional flux | Driven flux | File | Intersurface flux | Max load | Max power | ResID | |
---|---|---|---|---|---|---|---|---|
27090 | -3.0 | -0.144208 | 10.573240 | chi1ALA11 | 3.059189 | 0.0 | 0.0 | 11 |
27091 | -3.0 | -0.007508 | 3.124329 | chi1ALA127 | 1.412363 | 0.0 | 0.0 | 127 |
27092 | -3.0 | -0.032975 | 7.072458 | chi1ALA17 | 2.718294 | 0.0 | 0.0 | 17 |
27093 | -3.0 | 0.014157 | 4.480657 | chi1ALA176 | 1.624334 | 0.0 | 0.0 | 176 |
27094 | -3.0 | 0.013930 | 1.639636 | chi1ALA186 | 0.580522 | 0.0 | 0.0 | 186 |
It is easy to change the default parameters before running simulate()
. For example, to change the catalytic rate for ADK:
this = Simulation(data_source = 'adk_md_data')
this.name = 'chi2THR175'
this.catalytic_rate = 100 # Instead of 312 per second
this.simulate()
Another way to explore the data is by using toy models. One example of this may be setting
the apo and bound populations to something simple. Note that the other parameters still need
to be specified when using data_source='manual'
.
this = Simulation(data_source='manual')
this.unbound_population = [abs(np.cos(i)) for i in np.linspace(0, 2*np.pi, 60)]
this.bound_population = [abs(np.sin(i)) for i in np.linspace(0, 2*np.pi, 60)]
this.offset_factor = 0
this.C_intersurface = 10**6
this.catalytic_rate = 100
this.cSubstrate = 10**6
this.simulate()
plot_input(this)
Using data_source = 'manual'
makes it easy to bypass specifying populations entirely
and directly work with energies. This can be handy for testing the model without MD data.
We accomplish this by passing the parameter user_energies=True
so the function
data_to_energy()
isn't called by simulate()
.
this = Simulation(data_source='manual')
this.unbound = [abs(np.cos(i)) for i in np.linspace(0, 2*np.pi, 60)]
this.bound = [abs(np.sin(i)) for i in np.linspace(0, 2*np.pi, 60)]
this.offset_factor = 0
this.C_intersurface = 10**6
this.catalytic_rate = 100
this.cSubstrate = 10**6
this.simulate(user_energies=True)
plot_energy(this)
build/
- Directory containing script to setup aconda
environment calledmotors
that is used in the notebooksdihedral-timeseries/
- Data to check the convergence of the angle population histogramsimages/
- Figures contained in this file (README.md
)md-data/
- Raw angle population histogram data extracted from the simulationsprecomputed-pickles/
- Python pickle format files containing flux, torque, and power data across a range of concentrations for the angles