Bio-QSARs 2.0

This repository includes data, code, and models from the publication: Bio-QSARs 2.0: Unlocking a new level of predictive power for machine learning-based ecotoxicity predictions by exploiting chemical and biological information

Freshwater fish and invertebrate models

The second-generation Bio-QSAR models provided here allow the prediction of toxicity in freshwater fish and invertebrates. For information on development, use, and limitations, please see our associated publication.

Models were built using R version 4.1.3 and the packages tidyverse (version 2.0.0), gpboost (version 1.2.1), and SHAPforxgboost (version 0.1.3).

The script Example.R includes examples on how to to make predictions with the models, how to apply the respective applicability domain, and how to analyse predictions locally using SHAP.

Updated algorithm for multicollinearity correction

Additionally, we provide an updated R version of an algorithmic approach to correct datasets for multicollinearity that was presented in a blog post by Brian Pietracatella. This approach is deemed to prevent the drop of too many variables and thus loose an unnecessarily large amount of information, while still eliminating multicollinearity. For more information, see our associated publication.

The function was built using R version 4.1.3 and the packages tidyverse (version 2.0.0) and caret (version 6.0.94).

The function multicoll_sol.R now allows for missing data running on pairwise complete observations. An example of usage can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Example.R		Example.R
Fish_with_DEB.json		Fish_with_DEB.json
Fish_without_DEB.json		Fish_without_DEB.json
Invertebrate_with_DEB.json		Invertebrate_with_DEB.json
Invertebrate_without_DEB.json		Invertebrate_without_DEB.json
LICENSE		LICENSE
PCAs.rds		PCAs.rds
README.md		README.md
SHAP_names.rds		SHAP_names.rds
SHAP_weights.rds		SHAP_weights.rds
constant_features.rds		constant_features.rds
cutoff_dists.rds		cutoff_dists.rds
multicoll_sol.R		multicoll_sol.R
test_data.rds		test_data.rds
training_data.rds		training_data.rds

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bio-QSARs 2.0

Freshwater fish and invertebrate models

Updated algorithm for multicollinearity correction

About

Releases 2

Packages

Contributors 2

Languages

License

syngenta/bio-qsar

Folders and files

Latest commit

History

Repository files navigation

Bio-QSARs 2.0

Freshwater fish and invertebrate models

Updated algorithm for multicollinearity correction

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Packages