A platform to create crowd-sourced gene function gold standards with Amazon Mechanical Turk
- Make sure you have all requirements: python2, pipenv, and java (tested on openjdk 1.8, used for NobleCoder).
- Download the repository
- Change into it and
pipenv install
python dependencies - Launch NobleCoder from
tools/NobleCoder-1.0.jar
and import the Gene Ontology (download from here) under the namego
. Theprocess.py
script will run NobleCoder on your abstracts and tell it to use the Ontology "go", so if you choose a different name you will have to adapt the script.
- Put the Pubmed IDs of the abstracts you're interested in into
data/pmid_list.txt
- Run
pipenv run python process.py
- Output is in
data/abstracts
anddata/brat-input
. Put all files from these folders together in the same folder of your brat installation. In that same folder you will also need a fileannotation.conf
that could look like this (more information here):There will also be a file[entities] Gene Function [relations] Does Arg1:Gene, Arg2:Function Does Arg1:Function, Arg2:Gene DoesNot Arg1:Function, Arg2:Gene DoesNot Arg1:Gene, Arg2:Function [attributes] [events]
data/statistics.cvs
containing the number of words, genes, and functions for each abstract.