Skip to content

Releases: wehs7661/ensemble_md

Version 1.0.0

05 Aug 06:27
5357b8c
Compare
Choose a tag to compare

Most of the important changes made since version 0.9.0 lie in implementing unit tests and improving the documentation for the package. Below are some more details of the changes. The full changelog can be found here.

  • Deployed the package to PyPI: https://pypi.org/project/ensemble-md/1.0.0/
  • Added unit tests for all functionalities, including those that require MPI. This enhanced the code coverage of the entire package from ~50% to >95%.
  • Added Tutorial 1: Standard REXEE simulations to the documentation.
  • Added the function calc_t_relax to analyze_matrix.py.
  • Added synthesize_data.py to ensemble_md.analysis, which includes functions synthesize_traj and synthesize_transmtx.
  • Modified the function calc_spectral_gap to allow uncertainty estimation.
  • Removed count_transitions from clustering.py.
  • Removed the class FiltUtils from gmx_parser.py.
  • Refactored the class MDP in utils.py.
  • Renamed the function autoconvert as _conver_to_numeric in utils.py and refactored the function.
  • Added simulation inputs for the test systems presented in the JCTC paper.
  • Tweaked .circleci/config.yaml. For context, please visit Issue #55.
  • Refined the docstrings for all functionalities in the package.
  • Updated the documentation for the package/
  • Updated the README file.
  • Minor bug fixes.

Version 0.9.0

29 Mar 09:20
Compare
Choose a tag to compare

Here we list important changes made since version 0.8.0. The full changelog can be found here. In the next release, we plan to significantly improve the code coverage and documentation.

  • Deployed the package to PyPI: https://pypi.org/project/ensemble-md/0.9.0/
  • We decided to rename the method as REXEE, which is short for Replica EXchange of Expanded Ensemble that combines the working principles of replica exchange (REX) and expanded ensemble (EE). Accordingly, we have modified all relevant files to swap out EEXE with REXEE, including in the filenames, file contents, and documentation. (For more details, please see PR #27.)
  • We corrected the code for calculating the acceptance ratio. This would influence the result of weight-updating REXEE simulations but not fixed-weight ones. (For more details, please see PR #28.)
  • We simplified the options available in REXEE simulations, especially for those related to weight combinations. This includes the following. (For more details, please see PR #31.)
    • Removing the functionality relevant to rmse_cutoff
    • Removing the option final from the YAML parameter w_combine
      • Turning the YAML parameter w_combine into a boolean
      • Removing the function prepare_weights
    • Deprecating the schemes of multiple swaps.
    • Provide a YAML parameter w_mean_type to specify using simple means or weighted means in weight combination
    • Compartmentalize the histogram correction method
    • Update the documentation
    • Update the unit tests
  • The following YAML parameter acceptance was removed, and the documentation has been updated accordingly.
  • The function get_delta_w_updates was added.
  • The following functions were tweaked. Some details can be found in the PR #36.
    • get_g_evolution
    • _calculate_weighted_df, which was later renamed as _combine_df_adjacent
    • convert_npy2xvg
    • stitch_time_series
    • cluster_traj
    • The CLI analyze_REXEE
  • Minor bugfixes.
  • Updated README and the documentation

Version 0.8.0

25 Oct 16:43
9cb910f
Compare
Choose a tag to compare

Here we list important changes made since version of 0.7.0

  • Deployed the package to PyPI: https://pypi.org/project/ensemble-md/0.8.0/
  • The following functions were renamed:
    • The function histogram_correction is renamed as weight_correction.
    • The functions stitch_trajs and stitch_trajs_for_sim are renamed stitch_time_series and stitch_time_series_for_sim, respectively.
  • The following functions/modules were developed:
    • The function prepare_weights was added to run_EEXE.py
    • A new function stitch_trajs was added to analyze_traj.py. (Note that the original function stitch_trajs was renamed.
    • The function calc_rmse was added to utils.py.
    • The function convert_npy2xvg was added to analyze_traj.py
    • The module clustering.py was added to ensemble_md.analysis.
  • The following functions were moved to a new place:
    • The function compare_MDPs was moved from utils to gmx_parser.
    • The function run_gmx_cmd was moved from Ensemble_EXE.py to utils.py.
    • The function get_g_evolution was added to analyze_traj.py
    • The function calculate_hist_rmse was added to analyze_traj.py
  • The following YAML parameters are added/modified:
    • Added a new YAML parameter rmse_cutoff and modified w_combine.
    • Added a new YAML parameter subsampling_avg and modified process_data accordingly.
    • Added the YAML parameter rm_cpt and improve YAML parameter handling
    • Added the YAML parameter df_ref and enabled RMSE calculations of free energy profiles.
  • The following functions have been converted to internal functions (by adding a _ prefix):
    • utils.autoconvert
    • utils.get_subplot_dimension
    • run_grompp in Ensemble_EXE.py
    • run_mdrun in Ensemble_EXE.py
  • The main function of run_EEXE.py was modified to
    • Use prepare_weights.
    • Allow backing up simulation output gro files before using modify_coords.
  • Implemented the method for histogram correction in combine_weights.
  • Refactored preprocess_data for higher flexibility and enabled it to deal with multiple datasets.
  • Fixed a minor bug in removing state.cpt
  • Resolved the issue of exiting loops for weight-updating EEXE simulations.
  • Fixed a minor bug in combine_weights for the case of 0 histogram counts.
  • Fixed a minor bug in compare_MDPs.
  • Fixed minor bugs in plot_transit_time
  • Improved error handling in run_EEXE.py.
  • Updated README and the documentation

Version 0.7.0

02 Aug 22:23
a4bae8d
Compare
Choose a tag to compare

Several major changes have been made since the last release of version 0.6.0. Below we list all changes that would influence the way ensemble_md is used, in the order of decreasing importance.

  • Deployed the package to PyPI: https://pypi.org/project/ensemble-md/0.7.0/
  • ensemble_md has been enabled to work with MPI-enabled GROMACS.
  • Enabled coordinate modification/relative free energy calculations in the EEXE framework
    • See the project page to find the corresponding issues and PRs relevant to the work.
    • Added the YAML parameter modify_coords for specifying the external module for coordinate manipulation.
    • Added a predefined contract for the external module modify_coords
    • Enabled specifying different GRO and TOP input files.
    • Enabled specifying different MDP parameters for different replicas by adding the YAML parameter mdp_args.
    • Added the YAML parameter add_swappables to enable relative free energy calculations using EEXE.
  • Better parameter handling
    • Added restrictions for specifying nstlog and nstdhdl in the input MDP template
    • Added get_ref_dist to the class EnsembleEXE and modified update_MDP to fix the reference distance(s) for all iterations when distance restraint(s) are used. This solved the issue of poor sampling for the CB7-10 test system.
  • Enabled inverse variance weighting in weight combination
  • Added new functionalities to analyze_traj.py, including plot_g_vecs, get_swaps, plot_swaps, and stitch_trajs_for_sim
  • Modified functionalities
    • Converted identify_swappable_pairs into a static method
    • Modified explore_EEXE to allow estimating the chance of not having any swappable pairs
  • Removed map_lambda2state and the attributes lambda_dict and lambda_ranges
  • Fixed the bug so that executions on all ranks are terminated whenever is a GROMACS error in any child process
  • Several minor bug fixes in several functions, including get_averaged_weights, reformat_MDP, parse_log, and autoconvert
  • Updated the documentation

In the next release, we will focus more on refining functionalities for data analysis.

Version 0.6.0

01 May 19:21
239776a
Compare
Choose a tag to compare

Several major changes have been made since the last release of version 0.5.0. Below we list all changes that would influence the way ensemble_md should be used, in the order of decreasing importance.

  • Deployed the package to PyPI: https://pypi.org/project/ensemble-md/0.6.0/
  • We have completely removed the use of gmxapi from the implementation of EEXE. This includes the following (see the PR #9 ):
    • Using subprocess.run to replace gmxapi.commandline_operation and gmxapi.mdrun.
    • Removing all functions and variables using gmxapi and their unit tests.
    • Updating setup.py and the installation instruction in the documentation.
    • Updating docs/conf.py and .circleci/config.yaml.
    • Updating the code for extending an EEXE simulation.
  • Modifications made to the weight-combining methods:
    • The methods mean and geo-mean have been disabled, which left only one method (g-diff)
    • The weight-combining method is now based on time-averaged weights, mainly enabled by get_averaged_weights.
    • The function combine_weights was modified so that simulation will end once the weights of all replicas get equilibrated.
  • The following bugs have been fixed:
    • The bug of using weights updated by get_swapping_pattern in update_MDP.
    • A minor bug in autocovert triggered by parsing an MDP file with the parameter pull-coord1-dim specified.
  • Modified ensemble_EXE.py so now the initial histogram counts for the next iteration can be specified.
  • Tweaked the available parameters in the input YAML file:
    • Added parameters grompp_args and gmx_executable.
    • Removed the parameter parallel.
    • Renamed the parameter w_scheme as w_combine.
  • Added the attribute n_empty_swappable to the class EnsembleEXE.
  • Developed get_time_metrics and analyze_EEXE_time
  • Eliminated the need to copy over gro and top files for the next iteration.
  • Developed unit tests for get_averaged_weights and removed unit tests for obsolete functions.
  • Updated the documentation.
    • Removed any description about gmxapi.
    • Added more description about the CLI run_EEXE

Version 0.5.0

28 Mar 17:11
Compare
Choose a tag to compare

Compared to the last release, the following changes have been made to version 0.5.0:

  • Deployed the package to PyPI: https://pypi.org/project/ensemble-md/0.5.0/
  • Fixed the bug in updating the replica-space trajectories and exchanging replicas in get_swapping_pattern.
  • Renamed the parameter mc_scheme as acceptance.
  • Added a new parameter proposal that can be specified in the input YAML file and transferred a part of functionalities of n_ex to it.
  • Re-implemented the method of multiple swaps so that the swappable pairs are re-identified whenever an attempted swap is accepted.
  • Implemented the method of exhaustive swap in get_swapping_pattern.
  • Refactored functions including calc_prob_acc and get_swapping_pattern in ensemble_EXE.py
  • Updated the documentation
    • Updated the instruction about installation via PyPI
    • Added a section about the available proposal schemes
  • Developed/Refined unit tests for the following modules to improve the code coverage:
    • ensemble_EXE.py
    • gmx_parser.py
    • analyze_matrix.py

Version 0.4.0

28 Feb 03:05
Compare
Choose a tag to compare

Compared to the last version, the following changes have been made in version 0.4.0:

  • Deployed the package to PyPI: https://pypi.org/project/ensemble-md/0.4.0/
  • Developed the CLI explore_EEXE.
  • Fixed the error for building the documentation.
  • Developed plot_state_hist in analyze_traj.py.
  • Developed unit tests for ensemble_md.utils.
  • Improved the docstrings.
  • Switched from GitHub action to CircleCI for continuous integration
  • Other minor bug fixes.

Version 0.3.0

03 Feb 04:13
Compare
Choose a tag to compare

The following improvements have been made in version 0.3.0:

  • Entry points including run_EEXE and analyze_EEXE were added to allow command-line interfaces.
  • Methods for analyzing EEXE were developed, with all the relevant codes residing in the folder analysis. Relevant codes include analyze_free_energy.py, analyze_matrix.py, analyze_traj.py, and msm_analysis.py.
  • More EEXE parameters are now allowed to be specified in params.yaml to reflect the changes made in the codes.
  • Methods for checkpointing and extending an EEXE simulation has also been implemented.
  • One more weight-combining method is added: g-diff.
  • We have improved the flexibility of parsing parameters from GROMACS MDP files and input YAML files.
  • We have improved the code coverage to 82%. However, CI performed by GitHub Action has been constantly failing (hanging until timeout) whenever test_run_EEXE is run. The reason is unknown but we will change to CircleCI or other similar services in the next release.
  • Documentation was restructured and updated to reflect most changes in the code.

In the next release, we aim to complete at least the following action items.

  • Improve some of the existing unit tests.
  • Fix the issues in building the package documentation, especially the issue that readthedocs was not able to capture the docstrings of classes that used gmxapi.
  • Make sure that the package constantly passes the continuous integration and lining tests.
  • Update the documentation (see issue #2).
  • Deploy the package to PyPI.

Version 0.2.0

09 Aug 07:41
Compare
Choose a tag to compare

In this version, we've finished all the action items mentioned in the previous release note. This includes

  • Implemented a new method for combining weights from multiple replicas using the probability ratios, with the old methods removed. This also automatically circumvents the issue of reference selection.
  • Implemented a new method for carrying out multiple swaps and removed the old method. That is, now multiple exchanges can be drawn from the swappable list in one attempt with replacement. Neighboring swapping has also been implemented.
  • Added descriptions about the newly implemented methods mentioned above.
  • Passed continuous integration and improved code coverage to 74%. The development of the unit tests is still a work in progress.

Version 0.1.0

04 Aug 23:42
Compare
Choose a tag to compare

In this version, we have implemented a preliminary version of ensemble_md that allows users to run an ensemble of expanded ensemble in GROMACS. We are still finalizing our choice of algorithms implemented within the method, including methods for swapping multiple pairs of simulations and combining weights. We have implemented these methods in a basic and valid way, but they are subject to changes in later versions.

Multiple swaps

Here is how we perform multiple swaps in version 0.1.0: From $N_\text{swap}$ pairs, randomly choose 1 pair and repeat $N_\text{ex}$ times. Notably, here we choose pairs without replacements and after every draw, we update the swappable list so that none of the replicas will be involved in multiple pairs. Therefore, the maximum number of $N_\text{ex}$ is half of the number of replicas. Notably, in this method, it is possible that we can not reach the number of $N_\text{ex}$ due to the lack of choices even if $N_\text{ex}$ is specified. For example, given 4 replicas, it is possible that the swappable pairs only include (0, 1), (1, 2), (2, 3) because other pairs of simulations are in the state not present in the alchemical range of their exchanging partner. Then, if we choose (1, 2) in the first round, there would be no choice to draw in the second round, in the case that we have $N_\text{ex}$ being 2.

Weight-combining methods

Weight-shifting method

In the GROMACS log file of an expanded ensemble simulation, all weights are shifted such that the weight of the first state is always 0. That is, simulations with different $\lambda$ ranges will have different references for the weights. To account for this, we shift the weights of one of the simulations such that the first states in the overlapped ranges have the same weights. After shifting, we then combine weights of the overlapped states across the simulations. Below is a schematic example for adjusting weights for simulations $i$ and $j$:

State            0        1        2        3        4        5        6
Original i       0.0      2.1      4.0      3.7      4.8      X        X         
Original j       X        X        0.0      -0.4     0.7      1.5      2.4
Shifted j        X        X        4.0      3.6      4.7      5.5      6.4
Shifted i        -4.0     -1.9     0.0      -0.3     0.8      X        X    

For simplicity, here we are just taking simple averages (although it doesn't makes sense), so that the weights of simulations $i$ and $j$ will be as follows after adjustment:

State            0        1        2        3        4        5        6
Modified i       0.0      2.1      4.0      3.65     4.75     X        X         
Modified j       X        X        0.0      -0.35    0.75     1.5      2.4

Therefore, instead of having [0.0, 2.1, 4.0, 3.7, 4.8] specified as init-lambda-weights in the next iteration of simulation $i$, we use [0.0, 2.1, 4.0, 3.65, 4.75]. Notably, using simple averages, the choice of the reference does not influence the calculation of the acceptance ratio (even if the values of the individual weights are changed). However, with exponential averaging, the choice of reference will influence the value of the acceptance ratio, but it's hard to justify which reference is the best. This is the problem that we want to solve in the next version.

Weight combination

In version 0.1.0, weights are combined across the pair of simulations to be exchanged after the swap is either accepted or rejected. Currently, the following three methods have been implemented.

  • none: In this case, the simulations in the new iterations will just inherit the final weights of the previous iteration. No weights will be modified.
  • exp-avg: In the limit that all $\lambda$ states are equally sampled, the $\lambda$ weight of a state is equal to the dimensionless free energy of that state. That is, we can write $g(\lambda)=-\ln p(\lambda)$, or $p(\lambda)=\exp(-g(\lambda))$ (in the units of kT). Given this, one intuitive way is to average the probability of the two simulations, i.e. $p=\frac{1}{2}(p_1 + p_2)$. Or in terms of free energy, we have $\exp(-g)=\frac{1}{2}(\exp(-g_1) + \exp(-g_2))$. By re-arrangement, we have $$g = -\ln\left(\frac{e^{-g_1} + e^{-g_2}}{2}\right)$$.
  • histogram-exp-avg: In the course of the simulation, the histogram is generally not flat, so $g$ as an estimate of the free energy in the expression of exp-avg could be off. To account for the flatness level of the histogram, I propose introducing a simple correction term of $\ln(N_{k-1}/N_k)$. That is, $$g_k'=g_k + \ln\left(\frac{N_{k-1}}{N_k}\right)$$ where $g_k'$ is the corrected $\lambda$ weight and $N_{k-1}$ and $N_k$ are the histogram counts of states $k-1$ and $k$. Combining this with exp-avg, we have $$g = -\ln\left(\frac{e^{-g_1'} + e^{-g_2'}}{2}\right)$$ In this case, we might need to take care of 0 counts in some cases though.

Future work

In the next version, we expect the following changes:

  • Change how multiple swaps are carried out, using the method used in replica exchange in GROMACS.
  • Combine weights from multiple simulations as long as there are overlapping states instead of combining weights only across the 2 simulations to be exchanged
  • Solve or circumvent the problem incurred by different choices of references in weight-shfiting
  • Add more unit tests to improve the code coverage
  • Pass continuous integration at least for Ubuntu systems
  • Add descriptions about implemented methods in the documentation page