Text classification of PubMed abstracts on anticancer activity

This repository contains files and information about the step 1 of Kaphta Architecture: Text classification of PubMed abstracts on anticancer activity. The text classification was based on the ensemble method. In the creation (training and tests) of the ensemble were selected four machine learning algorithms with better accuracy. Below, there are information about the files:

Rotulated-corpus.rar: PubMed abstracts textual corpus rotulated for training and tests of machine learning algorithms used in ensemble creation. Save this file in the same folder of training-and-text-classification-gh.R script, because it is needed to execute the script.
training-and-text-classification-gh.R: R script for creation of the ensemble for text classification of PubMed abstracts on anticancer activity.
db_total_project.db: SQLite Database needed to execute all R scripts of kaphta architecture steps. This database contains tables with the Entity dictionary, Total PubMed abstracts textual corpus, and Pubmed abstracts classified as positive in text classification. Save this file in the same folder of training-and-text-classification-gh.R script, because it is needed to execute the script.
Entities Dictionary: folder with files and details about entity dictionary created for Kaphta architecture.

For more information about this and other steps of the Kaphta Architecture, see sections of the Kaptha Web Tool available in https://portal.ifsuldeminas.edu.br/kaphtawebtool/.

Results of Text Classification

PubMed-PMID-abstracts-positives.tsv: tsv file with PubMed abstracts classified as positive in text classification based on ensemble method. Attention: The PubMed abstracts classified as positive are available in db_total_project.db SQLite file too.

Results of training of machine learning algorithms

Below is presented a table with the resulted measures of the training of supervised machine learning algorithms. The ensemble was constructed by combining the four classifiers with the best accuracies: LogitBoost, Randon Forest, Support Vector Machine, and Maximum entropy.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
Entities-dictionary		Entities-dictionary
images		images
PubMed-PMID-abstracts-positives.tsv		PubMed-PMID-abstracts-positives.tsv
README.md		README.md
Rotulated-corpus.rar		Rotulated-corpus.rar
funcoes.R		funcoes.R
training-and-text-classification-gh.R		training-and-text-classification-gh.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text classification of PubMed abstracts on anticancer activity

Results of Text Classification

Results of training of machine learning algorithms

Table with results of the training of machine learning algorithms

About

Releases

Packages

Languages

ramongsilva/Text-classification-of-pubmed-abstracts-on-polyphenols-anticancer-activity

Folders and files

Latest commit

History

Repository files navigation

Text classification of PubMed abstracts on anticancer activity

Results of Text Classification

Results of training of machine learning algorithms

Table with results of the training of machine learning algorithms

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages