Skip to content

Releases: ATOMScience-org/AMPL

1.7.0 Release

20 May 18:27
b89515e

Choose a tag to compare

1.7.0 Release

Highlights

  • Adds the seed and random state and sampling features to AMPL. (#344) The features are:
    • Imbalance-learn sampling
    • Seed for Reproducibility
  • Changes to control model sparsity and improvements to MultitaskScaffoldSplitter (#331):
    • Sped up MultitaskScaffoldSplitter and changed its implementation to allow better optimization of validation & test set difference from training set.
    • Added split_diagnostic_plots module for visualizing aspects of split quality.
    • Added L1 and L2 penalty parameters to XGBoost models to control model sparsity.
    • Added hyperopt search domain parameters for NN and XGBoost model sparsity parameters.
  • Resolved a bug in Transformers fitting where the Transformers for normalizing inputs, outputs, and weights were trained on the entire dataset instead of only the training dataset, potentially causing data leakage. (#385)
  • Added MODAC API Client + Example Docs (#361)
  • Incorporates CodeCov into the CI/CD pipeline to generate code coverage reports for enhancing code quality. (#372, #373)

Enhancements

  • Fixed a bug when running predictions on classification models with balancing weight transformers requires MinimalDataset weights for the prediction data. Previously, get_multitask_perf_from_files_new returned NaN metrics for single-task models mixed with multitask models; now, both return correct metrics. (#387)
  • Allows the users to combine calculated features with embedded features from pre-trained models (#395)
  • Logs exceptions generated during a HyperOpt search. Previously they were swallowed, ignored (#392)
  • Add compute_drug_likeness function to the RDkit_easy module to compute various drug-likeness criteria for compounds in a data frame (Lipinski rule of 5, Ghose and Veber filters, QED), along with the descriptors used to derive them. (#384)
  • AD calculation improvements. Fixed an error in the calculation. Added the ability to query for the nearest training set neighbors of each compound running predictions for. (#378)
  • Integrates the MODAC unit tests and automates their execution on GitHub CI/CD Actions (#371)
  • Implemented unit tests for plotting packages using matplotcheck and the PlotTester API to perform plot validation. (#394)

Maintenance

  • System clean up:
    • Improved the CI test pipeline to eliminate duplicate job executions.
    • Added markers to indicate the resources used in certain tests.
    • Expanded the range of tests executed in the CI pipeline. Only exclude those that require LLNL resources. (#393)

Bug Fixes

  • Correct the AD index calculation for Mordred features containing NaN value (#390)

1.6.3 Release

06 Sep 19:53
7d4ae2a

Choose a tag to compare

Highlights

  • Automated GitHub CI Actions:
    • Integrated Ruff Linter to identify potential low-risk errors and code defects.
    • Configured Pytest to automatically run AMPL unit and integration tests with every commit or pull request to ensure code validation
    • Automated the Docker build and push to DockerHub upon the publication of a release.

Enhancements:

ruff_linter_ampl

Bug Fixes:

  • Updated code according to Linter reports (commits: bc34dcf, 86f826e, 32d8b9d)
  • Handled cases where the split subset is empty (commit: a001100)

AMPL 1.6.2 release

01 Aug 22:19
c509945

Choose a tag to compare

Bug fixes/Misc changes

  • Issue 332 plot_pred_vs_actual not working for multitask models
  • Updates to plotting predictions for visualizing uncertainty
  • Pinned rdkit 2023.3.3 in pip install to work with rdkit/rdkit#7364 till 2024.3.5 is released on pypi
  • Updated pip README to include 'mchip_requirements.txt'

1.6.1 Release

28 Jun 00:25
fb86eea

Choose a tag to compare

Highlights

  • Created a core tutorial series that represents the end-to-end modeling pipeline to build a machine learning model
  • Numerous improvements to visualizations in perf_plots module:
    • Modified all plots to use color vision deficiency (CVD) friendly colors
    • Added functions to visualize confusion matrices and model performance metrics
    • Improved layout of plots produced by plot_perf_vs_epoch and plot_pred_vs_actual and added parameter to control plot size
    • Reimplemented plot_prec_recall_curve to produce smoother curves.
  • Enhancements to multitask scaffold splitter: faster performance and optimization for response value distribution matching
  • Redesigned the AMPL readthedocs for easier end-user navigation.

Enhancements

  • Added ability to optimize multitask scaffold split for similarity of response value distributions across split subsets, using Wasserstein distance as dissimilarity metric; controlled by new parameter mtss_response_distr_weight. Improved performance of MTSS code to be much faster.
  • Added perf_plots functions plot_confusion_matrices, plot_model_metrics, get_metrics_from_model_pipeline and get_metrics_from_model_file to visualize and provide access to model performance metrics.
  • Modified plot_pred_vs_actual_from_file to make the output more consistent with plot_pred_from_actual; changed plot_pred_from_actual so that it accepts either a ModelPipeline or a model file path as its argument.
  • Reimplemented plot_prec_recall_curve with sklearn PrecisionRecallDisplay, with better handling of multitask models.

Bug Fixes

  • Fixed bug when number of scaffolds < number of superscaffolds requested
  • Fixed plot_pred_vs_actual_from_file so that it works on models trained with k-fold CV.
  • Fixed to exclude NaNs from % active calculation.

1.6.0

23 Feb 16:13
a319ccf

Choose a tag to compare

Highlights

  • Minimal pip install requirements
    • Provide separate installations for different CPU/GPU runtimes (cpu, cuda, ROCm)

Python compatibility

Python 3.9.x

Documentation

  • Improved the AMPL README with better logic flow and topic grouping.
  • Enhanced the API documentation.
    • Removed private modules from the API list
    • Updated all Python code to PEP 257 / Google docstring convention for consistent formatting
      and so that all public modules and functions are included in API documentation.

Enhancements:

  • Provided Dockerfile for a local AMPL Docker image build.
  • Added a parameter to train a model in production mode, where all data are used to train model.
  • Added full support for all XGBoost model parameters, including in hyperopt searches.
  • Added split_strategy output column to compare_models.get_filesystem_perf_results.
  • Added script for patching model tarballs to point to local copy of training data (needed for AD computation).
  • Save the class_number parameter for multiclass classification models.
  • Added option to map SMILES strings to canonical tautomers in standardization functions rdkit_smiles_from_smiles and base_smiles_from_smiles.
  • Added model_file_reader module to simplify extraction of saved model metadata.
  • Added function to plot predicted vs actual responses with saved regression models.
  • Added module to plot nearest neighbor Tanimoto distance distributions between training and validation/test sets.
  • Added module to plot response value distributions for split subsets.
  • Updated diversity_plots to allow a user-specified color palette and increase the resolution of the figure

Bug Fixes:

  • Made get_featurized_data() check if all the smiles in a dataset are represented in the prefeaturized data
  • Fixed bug in setting response column weights to make it consistent across featurizers.
  • Fixed error handling in rdkit_easy.mol_to_html to return empty string rather than None.
  • Fixed the Tanimoto distance plot to reflect the nearest neighbor distance instead of all distances.
  • Fixed freq_table's handling of nans in selected columns
  • Fixed bug in setting response column weights to make it consistent across featurizers.
  • Fixed error handling in rdkit_easy.mol_to_html to return empty string rather than None.
  • Fixed bug in EmbeddingFeaturization where descriptors were not transformed before input to embedding model.

AMPL 1.5.1 release

01 Mar 17:36
dc8f9df

Choose a tag to compare

Fixed the readthedocs build issue.

AMPL 1.5.0 release

01 Mar 00:43
57f2220

Choose a tag to compare

  • Updated AMPL to deepchem 2.7.1 and the related libraries
    -- Python 3.8.x
    -- numpy 1.21.6
    -- rdkit 2022.9.3
    -- rdkit-pypi 2022.3.5

  • Changed the environment setup from a mixture of conda and pip packages to pip exclusively
    -- Updated the related document to reflect the change
    -- Removed unused packages from the requirements list

  • Feature enhancements/code clean-up
    -- Added ability to highlight substructures and SMARTS pattern matches in molecules rendered with rdkit_easy functions mol_to_svg, mol_to_html, etc.
    -- Updated hyper_perf_plots.py to work with minimal examples
    -- Changed splitting code to allow many-to-one mapping from compound IDs to SMILES strings
    -- Change to support AD index computation for graphconv models using embeddings as features
    -- Added max_dataset_rows parameter to limit number of training set records used for AD index computation, so that AD computation is feasible for models trained on large datasets.
    -- Replaced all uses of deepchem.data.DiskDataset with NumpyDataset to boost performance and reduce creation of temporary files
    -- Added workaround for DeepChem issue #1821, which was causing predictions to fail on single-compound batches.
    -- Implemented tar archive safe extract to fix vulnerability CVE-2007-4559
    -- Turned off uncertainty for multi_class_config_delaney_fit_NN_graphconv.json
    -- Refined AMPL version/model version compatibility checking to define groups of compatible versions according to whether the associated DeepChem versions have the same format of model checkpoint files. The current compatibility groups are:
    -- Group1: '1.2', '1.3'
    -- Group2: '1.4'
    -- Group3: '1.5'

  • Bug fixes

AMPL 1.4.2 release

22 Aug 22:50

Choose a tag to compare

  • Added the EmbeddingFeaturization class to support transfer learning from NN models.
  • Added multitaskscaffold to the list of splitters that require SMILES strings as (temporary) IDs.
  • Added basic hyper param plotting functions
  • Bug fixes (fixed Multi-task models bug, etc)
  • Setup GitHub CI workflow to automate test jobs on push

AMPL 1.4.1 release

16 Jun 21:00
f67b716

Choose a tag to compare

  • Reverted ipython version from 7.16.3 to 7.16.1 to work with jedi.
  • Updated the README.md with two install options for AMPL.

AMPL 1.4.0 release

15 Jun 22:18
2ff4e51

Choose a tag to compare

  • Updated AMPL with deepchem 2.6.1 and the related libraries
    o Numpy to 1.21.0
    o Ipython to 7.16.3
    o PyYAML to 5.4
    o Tensorflow to 2.8.0
    o Switched to Pytorch implementation of full connected neural networks
  • Added multitaskscaffold split to the pipeline
  • Updated plot_tani_dist_distr() and hyperparameter shortlist splitting code to include fingerprint splitter.
  • Adds function curate_data.remove_outlier_replicates as part of the standard curation pipeline. Miscellaneous other improvements to data curation functions.
  • Removed hard-coded random seed from the code
  • Bugs fixes.
  • Updated test code