Skip to content

Repository for student projects within biomedical text mining from Lund University

Notifications You must be signed in to change notification settings

SalmaKazemiRashed/BioNLP

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BioNLP

Repository for student projects within biomedical text mining from Lund University.

Resources

1. NLP Python packages

scispaCy
scispaCy is a Python package containing spaCy models for processing biomedical, scientific or clinical text
https://allenai.github.io/scispacy/

2. Resources for dictionaries

UniprotKB
To create a dictionary of gene/protein names (contains protein and gene names including synonyms)
https://www.uniprot.org/

PubChem To create a dictionary of chemicals/drugs (contains small molecules, but also larger molecules such as nucleotides, carbohydrates, lipids, peptides, and chemically-modified macromolecules)
https://pubchem.ncbi.nlm.nih.gov/

Disease Ontology
http://disease-ontology.org/

3. Corpora for training and validation

GeneTag
To evaluate the dictionary approach as well as train a model for annotating proteins; also available in BioC format and updated version GeneTag-05
https://www.ncbi.nlm.nih.gov/pubmed/15960837

Corpora in BioC format
http://bioc.sourceforge.net/

Drugbank www.drugbank.ca Database with drugs and known protein targets (including references for the interaction) => for validation

OMIM
Database with human diseases and known genes => for validation

KEGG Database with known protein signalling pathways => for validation

Gene Ontology
Database with known function of genes => extract list of cell death genes for validation
http://geneontology.org/

4. Biomedical text sources

Pubmed abstracts

Pubmed Central full-length articles

BioRxiv

Wikipedia

5. Other (relevant blogs, discussion forums, etc)

Devblog with working example code for med-text relations extraction
https://www.microsoft.com/developerblog/2016/09/13/training-a-classifier-for-relation-extraction-from-medical-literature/

BioStars bioinformatics forum
https://www.biostars.org/

Link list with many resources
https://www2.informatik.hu-berlin.de/~hakenber/links/benchmarks.html

About

Repository for student projects within biomedical text mining from Lund University

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 53.9%
  • Jupyter Notebook 46.1%