Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fact extraction project #311

Open
draciti opened this issue Oct 21, 2024 · 0 comments
Open

Fact extraction project #311

draciti opened this issue Oct 21, 2024 · 0 comments

Comments

@draciti
Copy link
Collaborator

draciti commented Oct 21, 2024

Need to think about the next step after sentence classification, i.e. fact extraction

  1. An assessment of how our methods perform wrt annotation
    Set up a workflow
  • Extract sentences with BioBert (current production pipeline)
    -- Feed data to an LLM (GPT-4o) asking to extract an annotation
  • Use full text and ask an LLM (GPT-4o) to extract an annotation
    -- A curator creates an annotation from text

What are the differences between the three methods?
How do we evaluate/score the results? Annotation based?

  1. An assessment of how our methods perform on literature from other organisms
    Focus on co-published species?
    Prioritize SGD>ZFIN>Xenbase
    How to get the list of copublished papers from postgres
    A more direct comparison with other methods, e.g. rule-based methods such as Textpresso category searches or RLIMS-P (they have an api)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant