Repository for student projects within biomedical text mining from Lund University.
scispaCy
scispaCy is a Python package containing spaCy models for processing biomedical, scientific or clinical text
https://allenai.github.io/scispacy/
UniprotKB
To create a dictionary of gene/protein names (contains protein and gene names including synonyms)
https://www.uniprot.org/
PubChem
To create a dictionary of chemicals/drugs (contains small molecules, but also larger molecules such as nucleotides, carbohydrates, lipids, peptides, and chemically-modified macromolecules)
https://pubchem.ncbi.nlm.nih.gov/
Disease Ontology
http://disease-ontology.org/
GeneTag
To evaluate the dictionary approach as well as train a model for annotating proteins; also available in BioC format and updated version GeneTag-05
https://www.ncbi.nlm.nih.gov/pubmed/15960837
Corpora in BioC format
http://bioc.sourceforge.net/
Drugbank www.drugbank.ca Database with drugs and known protein targets (including references for the interaction) => for validation
OMIM
Database with human diseases and known genes => for validation
KEGG Database with known protein signalling pathways => for validation
Gene Ontology
Database with known function of genes => extract list of cell death genes for validation
http://geneontology.org/
Pubmed abstracts
Pubmed Central full-length articles
BioRxiv
Wikipedia
Devblog with working example code for med-text relations extraction
https://www.microsoft.com/developerblog/2016/09/13/training-a-classifier-for-relation-extraction-from-medical-literature/
BioStars bioinformatics forum
https://www.biostars.org/
Link list with many resources
https://www2.informatik.hu-berlin.de/~hakenber/links/benchmarks.html