-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FlairNLP Sequence Tagging #55
Conversation
Thanks for the PR. Could you please add the same license header that we also use in the other files? It would also be nice if you could add this to the table here: https://github.com/inception-project/inception-external-recommender?tab=readme-ov-file#contrib-models Best also directly upgrade the requirements as necessary so the PR can be merged "as is". |
I added the content you asked for, but the requirements show a conflict: |
@raykyn I have relaxed the version restriction on itertools in cassis - looks the tests all work with the new range: I guess we need a release of cassis now, right? |
I believe so, otherwise the dependency won't be updated for anyone using pip install to get the dependencies. |
Roger, I'll run a release tonight probably. |
Cassis 0.9.1 is available |
…h flair 0.13.1 (both can use more-itertools 0.8.14 now)
Perfect! Now while it works, there's just one thing - if someone has flair previously installed, it will still show a warning when installing dkpro-cassis because the requirement is still set to have the version below 0.9 (and more-itertools is now over version 0.10). But I don't think that's too big of a problem? I've also tried adding a test, but I can't get the tests (not only my flair test, but also the spacy one) to run, I always get the error
I did install the test dependencies as required in the README. |
btw I'm already using the flair recommender on my inception instance and it really speeds up the annotation. Thank you for your efforts @reckart ! (there's a bug that i can work around, but my inception instance is a few versions behind, so before i write an issue I'll update and see if it's resolved) |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #55 +/- ##
==========================================
- Coverage 55.13% 52.61% -2.52%
==========================================
Files 22 23 +1
Lines 838 878 +40
==========================================
Hits 462 462
- Misses 376 416 +40 ☔ View full report in Codecov by Sentry. |
This pull request adds a script to the contribs which enables the usage of the FlairNLP (https://flairnlp.github.io/) sequence tagger (not necessarily only for NER). The class can either be used with SegTok-Sentencesplitting or simply input the whole document as a single Sentence-object (do not use for very long documents).
I had to implement a workaround when not using the CAS-Sentence-Nodes because Inception performs an internal tokenization where punctuation is represented as their own tokens, even if not separated by whitespaces.
I tested it with and without sentence splitting, and with local and remote models. Works well on my server (Still on Version 26.8, but I assume it should work on newer versions as well).
If this script gets added, the requirements of the package will also need updating, I tested it with flair Version 0.13.1.