20 May 18:27

mauvais2

b89515e

1.7.0 Release Latest

Latest

1.7.0 Release

Highlights

Adds the seed and random state and sampling features to AMPL. (#344) The features are:
- Imbalance-learn sampling
- Seed for Reproducibility
Changes to control model sparsity and improvements to MultitaskScaffoldSplitter (#331):
- Sped up MultitaskScaffoldSplitter and changed its implementation to allow better optimization of validation & test set difference from training set.
- Added split_diagnostic_plots module for visualizing aspects of split quality.
- Added L1 and L2 penalty parameters to XGBoost models to control model sparsity.
- Added hyperopt search domain parameters for NN and XGBoost model sparsity parameters.
Resolved a bug in Transformers fitting where the Transformers for normalizing inputs, outputs, and weights were trained on the entire dataset instead of only the training dataset, potentially causing data leakage. (#385)
Added MODAC API Client + Example Docs (#361)
Incorporates CodeCov into the CI/CD pipeline to generate code coverage reports for enhancing code quality. (#372, #373)

Enhancements

Fixed a bug when running predictions on classification models with balancing weight transformers requires MinimalDataset weights for the prediction data. Previously, get_multitask_perf_from_files_new returned NaN metrics for single-task models mixed with multitask models; now, both return correct metrics. (#387)
Allows the users to combine calculated features with embedded features from pre-trained models (#395)
Logs exceptions generated during a HyperOpt search. Previously they were swallowed, ignored (#392)
Add compute_drug_likeness function to the RDkit_easy module to compute various drug-likeness criteria for compounds in a data frame (Lipinski rule of 5, Ghose and Veber filters, QED), along with the descriptors used to derive them. (#384)
AD calculation improvements. Fixed an error in the calculation. Added the ability to query for the nearest training set neighbors of each compound running predictions for. (#378)
Integrates the MODAC unit tests and automates their execution on GitHub CI/CD Actions (#371)
Implemented unit tests for plotting packages using matplotcheck and the PlotTester API to perform plot validation. (#394)

Maintenance

System clean up:
- Improved the CI test pipeline to eliminate duplicate job executions.
- Added markers to indicate the resources used in certain tests.
- Expanded the range of tests executed in the CI pipeline. Only exclude those that require LLNL resources. (#393)

Bug Fixes

Correct the AD index calculation for Mordred features containing NaN value (#390)

Assets 2

06 Sep 19:53

mauvais2

1.6.3

7d4ae2a

1.6.3 Release

Highlights

Automated GitHub CI Actions:
- Integrated Ruff Linter to identify potential low-risk errors and code defects.
- Configured Pytest to automatically run AMPL unit and integration tests with every commit or pull request to ensure code validation
- Automated the Docker build and push to DockerHub upon the publication of a release.

Enhancements:

Defined separate Dockerfiles for CPU and GPU configurations.
Introduced a Makefile to streamline the management of Docker builds, pulls, and Jupyter session execution.
Pip requirements:
- Introduced a new dev_requirements.txt for development and testing.
- Updated rdkit to version 2024.3.5, which resolves the PandasTools patching error in rdkit_easy and the descriptor calculation issue reported in rdkit/rdkit#7364.
- Use tensorflow-cpu in cpu_requirements.txt to install the appropriate platform package.
Ruff Linter findings:
- Updated and fixed code based on Ruff reports, addressing common issues such as:
  - F401 unused-import
  - F405 undefined-local-with-import-star-usage
  - F841 unused-variable
  - F601 multi-value-repeated-key-literal
  - E721 type-comparison
  - E402 module-import-not-at-top-of-file

Bug Fixes:

Updated code according to Linter reports (commits: bc34dcf, 86f826e, 32d8b9d)
Handled cases where the split subset is empty (commit: a001100)

Assets 2

01 Aug 22:19

mauvais2

1.6.2

c509945

AMPL 1.6.2 release

Bug fixes/Misc changes

Issue 332 plot_pred_vs_actual not working for multitask models
Updates to plotting predictions for visualizing uncertainty
Pinned rdkit 2023.3.3 in pip install to work with rdkit/rdkit#7364 till 2024.3.5 is released on pypi
Updated pip README to include 'mchip_requirements.txt'

Assets 2

28 Jun 00:25

mauvais2

1.6.1

fb86eea

1.6.1 Release

Highlights

Created a core tutorial series that represents the end-to-end modeling pipeline to build a machine learning model
Numerous improvements to visualizations in perf_plots module:
- Modified all plots to use color vision deficiency (CVD) friendly colors
- Added functions to visualize confusion matrices and model performance metrics
- Improved layout of plots produced by plot_perf_vs_epoch and plot_pred_vs_actual and added parameter to control plot size
- Reimplemented plot_prec_recall_curve to produce smoother curves.
Enhancements to multitask scaffold splitter: faster performance and optimization for response value distribution matching
Redesigned the AMPL readthedocs for easier end-user navigation.

Enhancements

Added ability to optimize multitask scaffold split for similarity of response value distributions across split subsets, using Wasserstein distance as dissimilarity metric; controlled by new parameter mtss_response_distr_weight. Improved performance of MTSS code to be much faster.
Added perf_plots functions plot_confusion_matrices, plot_model_metrics, get_metrics_from_model_pipeline and get_metrics_from_model_file to visualize and provide access to model performance metrics.
Modified plot_pred_vs_actual_from_file to make the output more consistent with plot_pred_from_actual; changed plot_pred_from_actual so that it accepts either a ModelPipeline or a model file path as its argument.
Reimplemented plot_prec_recall_curve with sklearn PrecisionRecallDisplay, with better handling of multitask models.

Bug Fixes

Fixed bug when number of scaffolds < number of superscaffolds requested
Fixed plot_pred_vs_actual_from_file so that it works on models trained with k-fold CV.
Fixed to exclude NaNs from % active calculation.

Assets 2

23 Feb 16:13

mauvais2

1.6.0

a319ccf

1.6.0

Highlights

Minimal pip install requirements
- Provide separate installations for different CPU/GPU runtimes (cpu, cuda, ROCm)

Python compatibility

Python 3.9.x

Documentation

Improved the AMPL README with better logic flow and topic grouping.
Enhanced the API documentation.
- Removed private modules from the API list
- Updated all Python code to PEP 257 / Google docstring convention for consistent formatting
  and so that all public modules and functions are included in API documentation.

Enhancements:

Provided Dockerfile for a local AMPL Docker image build.
Added a parameter to train a model in production mode, where all data are used to train model.
Added full support for all XGBoost model parameters, including in hyperopt searches.
Added split_strategy output column to compare_models.get_filesystem_perf_results.
Added script for patching model tarballs to point to local copy of training data (needed for AD computation).
Save the class_number parameter for multiclass classification models.
Added option to map SMILES strings to canonical tautomers in standardization functions rdkit_smiles_from_smiles and base_smiles_from_smiles.
Added model_file_reader module to simplify extraction of saved model metadata.
Added function to plot predicted vs actual responses with saved regression models.
Added module to plot nearest neighbor Tanimoto distance distributions between training and validation/test sets.
Added module to plot response value distributions for split subsets.
Updated diversity_plots to allow a user-specified color palette and increase the resolution of the figure

Bug Fixes:

Made get_featurized_data() check if all the smiles in a dataset are represented in the prefeaturized data
Fixed bug in setting response column weights to make it consistent across featurizers.
Fixed error handling in rdkit_easy.mol_to_html to return empty string rather than None.
Fixed the Tanimoto distance plot to reflect the nearest neighbor distance instead of all distances.
Fixed freq_table's handling of nans in selected columns
Fixed bug in setting response column weights to make it consistent across featurizers.
Fixed error handling in rdkit_easy.mol_to_html to return empty string rather than None.
Fixed bug in EmbeddingFeaturization where descriptors were not transformed before input to embedding model.

Assets 2

01 Mar 17:36

mauvais2

1.5.1

dc8f9df

AMPL 1.5.1 release

Fixed the readthedocs build issue.

Assets 2

01 Mar 00:43

mauvais2

1.5.0

57f2220

AMPL 1.5.0 release

Updated AMPL to deepchem 2.7.1 and the related libraries
-- Python 3.8.x
-- numpy 1.21.6
-- rdkit 2022.9.3
-- rdkit-pypi 2022.3.5
Changed the environment setup from a mixture of conda and pip packages to pip exclusively
-- Updated the related document to reflect the change
-- Removed unused packages from the requirements list
Feature enhancements/code clean-up
-- Added ability to highlight substructures and SMARTS pattern matches in molecules rendered with rdkit_easy functions mol_to_svg, mol_to_html, etc.
-- Updated hyper_perf_plots.py to work with minimal examples
-- Changed splitting code to allow many-to-one mapping from compound IDs to SMILES strings
-- Change to support AD index computation for graphconv models using embeddings as features
-- Added max_dataset_rows parameter to limit number of training set records used for AD index computation, so that AD computation is feasible for models trained on large datasets.
-- Replaced all uses of deepchem.data.DiskDataset with NumpyDataset to boost performance and reduce creation of temporary files
-- Added workaround for DeepChem issue #1821, which was causing predictions to fail on single-compound batches.
-- Implemented tar archive safe extract to fix vulnerability CVE-2007-4559
-- Turned off uncertainty for multi_class_config_delaney_fit_NN_graphconv.json
-- Refined AMPL version/model version compatibility checking to define groups of compatible versions according to whether the associated DeepChem versions have the same format of model checkpoint files. The current compatibility groups are:
-- Group1: '1.2', '1.3'
-- Group2: '1.4'
-- Group3: '1.5'
Bug fixes

Assets 2

22 Aug 22:50

mauvais2

1.4.2

b24ffc2

AMPL 1.4.2 release

Added the EmbeddingFeaturization class to support transfer learning from NN models.
Added multitaskscaffold to the list of splitters that require SMILES strings as (temporary) IDs.
Added basic hyper param plotting functions
Bug fixes (fixed Multi-task models bug, etc)
Setup GitHub CI workflow to automate test jobs on push

Assets 2

16 Jun 21:00

mauvais2

1.4.1

f67b716

AMPL 1.4.1 release

Reverted ipython version from 7.16.3 to 7.16.1 to work with jedi.
Updated the README.md with two install options for AMPL.

Assets 2

15 Jun 22:18

mauvais2

1.4.0

2ff4e51

AMPL 1.4.0 release

Updated AMPL with deepchem 2.6.1 and the related libraries
o Numpy to 1.21.0
o Ipython to 7.16.3
o PyYAML to 5.4
o Tensorflow to 2.8.0
o Switched to Pytorch implementation of full connected neural networks
Added multitaskscaffold split to the pipeline
Updated plot_tani_dist_distr() and hyperparameter shortlist splitting code to include fingerprint splitter.
Adds function curate_data.remove_outlier_replicates as part of the standard curation pipeline. Miscellaneous other improvements to data curation functions.
Removed hard-coded random seed from the code
Bugs fixes.
Updated test code

Assets 2

Releases: ATOMScience-org/AMPL

1.7.0 Release

1.7.0 Release

Highlights

Enhancements

Maintenance

Bug Fixes

Uh oh!

1.6.3 Release

Highlights

Enhancements:

Bug Fixes:

Uh oh!

AMPL 1.6.2 release

Bug fixes/Misc changes

Uh oh!

1.6.1 Release

Highlights

Enhancements

Bug Fixes

Uh oh!

1.6.0

Highlights

Python compatibility

Documentation

Enhancements:

Bug Fixes:

Uh oh!

AMPL 1.5.1 release

Uh oh!

AMPL 1.5.0 release

Uh oh!

AMPL 1.4.2 release

Uh oh!

AMPL 1.4.1 release

Uh oh!

AMPL 1.4.0 release

Uh oh!