Word embedding-based virtual screening (WEBVS) is a computational screening strategy that classifies bioactive compounds and plants in semantic space generated by word embedding algorithm. This repository stores the data and R scripts to generate results and figures for the corresponding research1.
The R scripts need following packages: word2vec, e1071.
Please make sure to install all dependencies prior to running the code. The code presented here was implemented and tested in R ver.4.2.2.
- Download this repository.
- Uncompress (using bunzip2) "data.tar.bz2" file to create "data" folder which holds two files, "plant_sen" and "label".
- "plant_sen" : pre-processed literature data
- "label" : label data for compounds and plants appeard in the "plant_sen" file. The descriptions of the labels are as follows:
- "1": antimicrobial compounds
- "4": antimicrobial plant genera listed in a systematic review2.
- "3": other plant genera
- Set your R working directory to the root directory of the project.
- Run a R script "src/run_WEBVS.R"
- The script performs 5-fold cross-validation and plots an enrichment curve for WEBVS.
Footnotes
-
Yabuuchi H et al. Virtual screening of antimicrobial plant extracts by machine-learning classification of chemical compounds in semantic space. PLoS ONE 2023; 18(5): e0285716. doi: 10.1371/journal.pone.0285716. ↩
-
Chassagne F et al. A systematic review of plants with antibacterial activities: A taxonomic and phylogenetic perspective. Front Pharmacol. 2021; 11:586548. doi: 10.3389/fphar.2020.586548. ↩