Skip to content
Aleksandar Bojchevski edited this page Jun 16, 2015 · 24 revisions

Welcome to the carsten-bthesis wiki!

Introductory talk: Online presentation Thesis

Pipeline diagram View the pipeline visualization externally in a bigger resolution here.

Datasets

  • IDP4 (so far using alex's annotations; ideallz combining)
  • tmVar
  • TODO remember, 10 full text

Resources

Dev Stack

  • We use Python 3 because:
    • it will be more supported in the future
    • default use for UTF8/Unicode
    • Difficulty in writing software that works both for python 2 & 3

Data structure / Database

  • We store in a text file a list of the PMIDs that were analyzed to get sentences for annotation (with a high probablity of including mutation mentions)
  • We store in ann.jsons files who annotated what (either ml: or user (manual)), and confidence. When an automatic annotation had to be manually reviewed, the list of who will be ml:..., user:...

(As for for how to filter annotations by confidence, we either do it ourselves or use possible tagtog feature)

Clone this wiki locally