Releases: wehs7661/ensemble_md
Version 1.0.0
Most of the important changes made since version 0.9.0 lie in implementing unit tests and improving the documentation for the package. Below are some more details of the changes. The full changelog can be found here.
- Deployed the package to PyPI: https://pypi.org/project/ensemble-md/1.0.0/
- Added unit tests for all functionalities, including those that require MPI. This enhanced the code coverage of the entire package from ~50% to >95%.
- Added Tutorial 1: Standard REXEE simulations to the documentation.
- Added the function
calc_t_relax
toanalyze_matrix.py
. - Added
synthesize_data.py
toensemble_md.analysis
, which includes functionssynthesize_traj
andsynthesize_transmtx
. - Modified the function
calc_spectral_gap
to allow uncertainty estimation. - Removed
count_transitions
fromclustering.py
. - Removed the class
FiltUtils
fromgmx_parser.py
. - Refactored the class
MDP
inutils.py
. - Renamed the function
autoconvert
as_conver_to_numeric
inutils.py
and refactored the function. - Added simulation inputs for the test systems presented in the JCTC paper.
- Tweaked
.circleci/config.yaml
. For context, please visit Issue #55. - Refined the docstrings for all functionalities in the package.
- Updated the documentation for the package/
- Updated the README file.
- Minor bug fixes.
Version 0.9.0
Here we list important changes made since version 0.8.0. The full changelog can be found here. In the next release, we plan to significantly improve the code coverage and documentation.
- Deployed the package to PyPI: https://pypi.org/project/ensemble-md/0.9.0/
- We decided to rename the method as REXEE, which is short for Replica EXchange of Expanded Ensemble that combines the working principles of replica exchange (REX) and expanded ensemble (EE). Accordingly, we have modified all relevant files to swap out EEXE with REXEE, including in the filenames, file contents, and documentation. (For more details, please see PR #27.)
- We corrected the code for calculating the acceptance ratio. This would influence the result of weight-updating REXEE simulations but not fixed-weight ones. (For more details, please see PR #28.)
- We simplified the options available in REXEE simulations, especially for those related to weight combinations. This includes the following. (For more details, please see PR #31.)
- Removing the functionality relevant to
rmse_cutoff
- Removing the option final from the YAML parameter
w_combine
- Turning the YAML parameter
w_combine
into a boolean - Removing the function
prepare_weights
- Turning the YAML parameter
- Deprecating the schemes of multiple swaps.
- Provide a YAML parameter w_mean_type to specify using simple means or weighted means in weight combination
- Compartmentalize the histogram correction method
- Update the documentation
- Update the unit tests
- Removing the functionality relevant to
- The following YAML parameter
acceptance
was removed, and the documentation has been updated accordingly. - The function
get_delta_w_updates
was added. - The following functions were tweaked. Some details can be found in the PR #36.
get_g_evolution
_calculate_weighted_df
, which was later renamed as_combine_df_adjacent
convert_npy2xvg
stitch_time_series
cluster_traj
- The CLI
analyze_REXEE
- Minor bugfixes.
- Updated README and the documentation
Version 0.8.0
Here we list important changes made since version of 0.7.0
- Deployed the package to PyPI: https://pypi.org/project/ensemble-md/0.8.0/
- The following functions were renamed:
- The function
histogram_correction
is renamed asweight_correction
. - The functions
stitch_trajs
andstitch_trajs_for_sim
are renamedstitch_time_series
andstitch_time_series_for_sim
, respectively.
- The function
- The following functions/modules were developed:
- The function
prepare_weights
was added torun_EEXE.py
- A new function
stitch_trajs
was added toanalyze_traj.py
. (Note that the original functionstitch_trajs
was renamed. - The function
calc_rmse
was added toutils.py
. - The function
convert_npy2xvg
was added toanalyze_traj.py
- The module
clustering.py
was added toensemble_md.analysis
.
- The function
- The following functions were moved to a new place:
- The function
compare_MDPs
was moved fromutils
togmx_parser
. - The function
run_gmx_cmd
was moved fromEnsemble_EXE.py
toutils.py
. - The function
get_g_evolution
was added toanalyze_traj.py
- The function
calculate_hist_rmse
was added toanalyze_traj.py
- The function
- The following YAML parameters are added/modified:
- Added a new YAML parameter
rmse_cutoff
and modifiedw_combine
. - Added a new YAML parameter
subsampling_avg
and modifiedprocess_data
accordingly. - Added the YAML parameter
rm_cpt
and improve YAML parameter handling - Added the YAML parameter
df_ref
and enabled RMSE calculations of free energy profiles.
- Added a new YAML parameter
- The following functions have been converted to internal functions (by adding a
_
prefix):utils.autoconvert
utils.get_subplot_dimension
run_grompp
inEnsemble_EXE.py
run_mdrun
inEnsemble_EXE.py
- The main function of
run_EEXE.py
was modified to- Use
prepare_weights
. - Allow backing up simulation output gro files before using
modify_coords
.
- Use
- Implemented the method for histogram correction in
combine_weights
. - Refactored
preprocess_data
for higher flexibility and enabled it to deal with multiple datasets. - Fixed a minor bug in removing
state.cpt
- Resolved the issue of exiting loops for weight-updating EEXE simulations.
- Fixed a minor bug in
combine_weights
for the case of 0 histogram counts. - Fixed a minor bug in
compare_MDPs
. - Fixed minor bugs in
plot_transit_time
- Improved error handling in
run_EEXE.py
. - Updated README and the documentation
Version 0.7.0
Several major changes have been made since the last release of version 0.6.0. Below we list all changes that would influence the way ensemble_md
is used, in the order of decreasing importance.
- Deployed the package to PyPI: https://pypi.org/project/ensemble-md/0.7.0/
ensemble_md
has been enabled to work with MPI-enabled GROMACS.- Enabled coordinate modification/relative free energy calculations in the EEXE framework
- See the project page to find the corresponding issues and PRs relevant to the work.
- Added the YAML parameter
modify_coords
for specifying the external module for coordinate manipulation. - Added a predefined contract for the external module
modify_coords
- Enabled specifying different GRO and TOP input files.
- Enabled specifying different MDP parameters for different replicas by adding the YAML parameter
mdp_args
. - Added the YAML parameter
add_swappables
to enable relative free energy calculations using EEXE.
- Better parameter handling
- Added restrictions for specifying
nstlog
andnstdhdl
in the input MDP template - Added
get_ref_dist
to the classEnsembleEXE
and modifiedupdate_MDP
to fix the reference distance(s) for all iterations when distance restraint(s) are used. This solved the issue of poor sampling for the CB7-10 test system.
- Added restrictions for specifying
- Enabled inverse variance weighting in weight combination
- Added new functionalities to
analyze_traj.py
, includingplot_g_vecs
,get_swaps
,plot_swaps
, andstitch_trajs_for_sim
- Modified functionalities
- Converted
identify_swappable_pairs
into a static method - Modified
explore_EEXE
to allow estimating the chance of not having any swappable pairs
- Converted
- Removed
map_lambda2state
and the attributeslambda_dict
andlambda_ranges
- Fixed the bug so that executions on all ranks are terminated whenever is a GROMACS error in any child process
- Several minor bug fixes in several functions, including
get_averaged_weights
,reformat_MDP
,parse_log
, andautoconvert
- Updated the documentation
In the next release, we will focus more on refining functionalities for data analysis.
Version 0.6.0
Several major changes have been made since the last release of version 0.5.0. Below we list all changes that would influence the way ensemble_md
should be used, in the order of decreasing importance.
- Deployed the package to PyPI: https://pypi.org/project/ensemble-md/0.6.0/
- We have completely removed the use of
gmxapi
from the implementation of EEXE. This includes the following (see the PR #9 ):- Using
subprocess.run
to replacegmxapi.commandline_operation
andgmxapi.mdrun
. - Removing all functions and variables using
gmxapi
and their unit tests. - Updating
setup.py
and the installation instruction in the documentation. - Updating
docs/conf.py
and.circleci/config.yaml
. - Updating the code for extending an EEXE simulation.
- Using
- Modifications made to the weight-combining methods:
- The methods
mean
andgeo-mean
have been disabled, which left only one method (g-diff
) - The weight-combining method is now based on time-averaged weights, mainly enabled by
get_averaged_weights
. - The function
combine_weights
was modified so that simulation will end once the weights of all replicas get equilibrated.
- The methods
- The following bugs have been fixed:
- The bug of using weights updated by
get_swapping_pattern
inupdate_MDP
. - A minor bug in
autocovert
triggered by parsing an MDP file with the parameterpull-coord1-dim
specified.
- The bug of using weights updated by
- Modified
ensemble_EXE.py
so now the initial histogram counts for the next iteration can be specified. - Tweaked the available parameters in the input YAML file:
- Added parameters
grompp_args
andgmx_executable
. - Removed the parameter
parallel
. - Renamed the parameter
w_scheme
asw_combine
.
- Added parameters
- Added the attribute
n_empty_swappable
to the classEnsembleEXE
. - Developed
get_time_metrics
andanalyze_EEXE_time
- Eliminated the need to copy over gro and top files for the next iteration.
- Developed unit tests for
get_averaged_weights
and removed unit tests for obsolete functions. - Updated the documentation.
- Removed any description about
gmxapi
. - Added more description about the CLI
run_EEXE
- Removed any description about
Version 0.5.0
Compared to the last release, the following changes have been made to version 0.5.0:
- Deployed the package to PyPI: https://pypi.org/project/ensemble-md/0.5.0/
- Fixed the bug in updating the replica-space trajectories and exchanging replicas in
get_swapping_pattern
. - Renamed the parameter
mc_scheme
asacceptance
. - Added a new parameter
proposal
that can be specified in the input YAML file and transferred a part of functionalities ofn_ex
to it. - Re-implemented the method of multiple swaps so that the swappable pairs are re-identified whenever an attempted swap is accepted.
- Implemented the method of exhaustive swap in
get_swapping_pattern
. - Refactored functions including
calc_prob_acc
andget_swapping_pattern
inensemble_EXE.py
- Updated the documentation
- Updated the instruction about installation via PyPI
- Added a section about the available proposal schemes
- Developed/Refined unit tests for the following modules to improve the code coverage:
ensemble_EXE.py
gmx_parser.py
analyze_matrix.py
Version 0.4.0
Compared to the last version, the following changes have been made in version 0.4.0:
- Deployed the package to PyPI: https://pypi.org/project/ensemble-md/0.4.0/
- Developed the CLI
explore_EEXE
. - Fixed the error for building the documentation.
- Developed
plot_state_hist
inanalyze_traj.py
. - Developed unit tests for
ensemble_md.utils
. - Improved the docstrings.
- Switched from GitHub action to CircleCI for continuous integration
- Other minor bug fixes.
Version 0.3.0
The following improvements have been made in version 0.3.0:
- Entry points including
run_EEXE
andanalyze_EEXE
were added to allow command-line interfaces. - Methods for analyzing EEXE were developed, with all the relevant codes residing in the folder
analysis
. Relevant codes includeanalyze_free_energy.py
,analyze_matrix.py
,analyze_traj.py
, andmsm_analysis.py
. - More EEXE parameters are now allowed to be specified in
params.yaml
to reflect the changes made in the codes. - Methods for checkpointing and extending an EEXE simulation has also been implemented.
- One more weight-combining method is added:
g-diff
. - We have improved the flexibility of parsing parameters from GROMACS MDP files and input YAML files.
- We have improved the code coverage to 82%. However, CI performed by GitHub Action has been constantly failing (hanging until timeout) whenever
test_run_EEXE
is run. The reason is unknown but we will change to CircleCI or other similar services in the next release. - Documentation was restructured and updated to reflect most changes in the code.
In the next release, we aim to complete at least the following action items.
- Improve some of the existing unit tests.
- Fix the issues in building the package documentation, especially the issue that readthedocs was not able to capture the docstrings of classes that used
gmxapi
. - Make sure that the package constantly passes the continuous integration and lining tests.
- Update the documentation (see issue #2).
- Deploy the package to PyPI.
Version 0.2.0
In this version, we've finished all the action items mentioned in the previous release note. This includes
- Implemented a new method for combining weights from multiple replicas using the probability ratios, with the old methods removed. This also automatically circumvents the issue of reference selection.
- Implemented a new method for carrying out multiple swaps and removed the old method. That is, now multiple exchanges can be drawn from the swappable list in one attempt with replacement. Neighboring swapping has also been implemented.
- Added descriptions about the newly implemented methods mentioned above.
- Passed continuous integration and improved code coverage to 74%. The development of the unit tests is still a work in progress.
Version 0.1.0
In this version, we have implemented a preliminary version of ensemble_md
that allows users to run an ensemble of expanded ensemble in GROMACS. We are still finalizing our choice of algorithms implemented within the method, including methods for swapping multiple pairs of simulations and combining weights. We have implemented these methods in a basic and valid way, but they are subject to changes in later versions.
Multiple swaps
Here is how we perform multiple swaps in version 0.1.0: From
Weight-combining methods
Weight-shifting method
In the GROMACS log file of an expanded ensemble simulation, all weights are shifted such that the weight of the first state is always 0. That is, simulations with different
State 0 1 2 3 4 5 6
Original i 0.0 2.1 4.0 3.7 4.8 X X
Original j X X 0.0 -0.4 0.7 1.5 2.4
Shifted j X X 4.0 3.6 4.7 5.5 6.4
Shifted i -4.0 -1.9 0.0 -0.3 0.8 X X
For simplicity, here we are just taking simple averages (although it doesn't makes sense), so that the weights of simulations
State 0 1 2 3 4 5 6
Modified i 0.0 2.1 4.0 3.65 4.75 X X
Modified j X X 0.0 -0.35 0.75 1.5 2.4
Therefore, instead of having [0.0, 2.1, 4.0, 3.7, 4.8]
specified as init-lambda-weights
in the next iteration of simulation [0.0, 2.1, 4.0, 3.65, 4.75]
. Notably, using simple averages, the choice of the reference does not influence the calculation of the acceptance ratio (even if the values of the individual weights are changed). However, with exponential averaging, the choice of reference will influence the value of the acceptance ratio, but it's hard to justify which reference is the best. This is the problem that we want to solve in the next version.
Weight combination
In version 0.1.0, weights are combined across the pair of simulations to be exchanged after the swap is either accepted or rejected. Currently, the following three methods have been implemented.
-
none
: In this case, the simulations in the new iterations will just inherit the final weights of the previous iteration. No weights will be modified. -
exp-avg
: In the limit that all$\lambda$ states are equally sampled, the$\lambda$ weight of a state is equal to the dimensionless free energy of that state. That is, we can write$g(\lambda)=-\ln p(\lambda)$ , or$p(\lambda)=\exp(-g(\lambda))$ (in the units of kT). Given this, one intuitive way is to average the probability of the two simulations, i.e.$p=\frac{1}{2}(p_1 + p_2)$ . Or in terms of free energy, we have$\exp(-g)=\frac{1}{2}(\exp(-g_1) + \exp(-g_2))$ . By re-arrangement, we have$$g = -\ln\left(\frac{e^{-g_1} + e^{-g_2}}{2}\right)$$ . -
histogram-exp-avg
: In the course of the simulation, the histogram is generally not flat, so$g$ as an estimate of the free energy in the expression ofexp-avg
could be off. To account for the flatness level of the histogram, I propose introducing a simple correction term of$\ln(N_{k-1}/N_k)$ . That is,$$g_k'=g_k + \ln\left(\frac{N_{k-1}}{N_k}\right)$$ where$g_k'$ is the corrected$\lambda$ weight and$N_{k-1}$ and$N_k$ are the histogram counts of states$k-1$ and$k$ . Combining this withexp-avg
, we have$$g = -\ln\left(\frac{e^{-g_1'} + e^{-g_2'}}{2}\right)$$ In this case, we might need to take care of 0 counts in some cases though.
Future work
In the next version, we expect the following changes:
- Change how multiple swaps are carried out, using the method used in replica exchange in GROMACS.
- Combine weights from multiple simulations as long as there are overlapping states instead of combining weights only across the 2 simulations to be exchanged
- Solve or circumvent the problem incurred by different choices of references in weight-shfiting
- Add more unit tests to improve the code coverage
- Pass continuous integration at least for Ubuntu systems
- Add descriptions about implemented methods in the documentation page