Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FlairNLP Sequence Tagging #55

Merged
merged 6 commits into from
Mar 5, 2024
Merged

Conversation

raykyn
Copy link
Contributor

@raykyn raykyn commented Feb 20, 2024

This pull request adds a script to the contribs which enables the usage of the FlairNLP (https://flairnlp.github.io/) sequence tagger (not necessarily only for NER). The class can either be used with SegTok-Sentencesplitting or simply input the whole document as a single Sentence-object (do not use for very long documents).

I had to implement a workaround when not using the CAS-Sentence-Nodes because Inception performs an internal tokenization where punctuation is represented as their own tokens, even if not separated by whitespaces.

I tested it with and without sentence splitting, and with local and remote models. Works well on my server (Still on Version 26.8, but I assume it should work on newer versions as well).

If this script gets added, the requirements of the package will also need updating, I tested it with flair Version 0.13.1.

@reckart
Copy link
Member

reckart commented Feb 27, 2024

Thanks for the PR. Could you please add the same license header that we also use in the other files?

It would also be nice if you could add this to the table here: https://github.com/inception-project/inception-external-recommender?tab=readme-ov-file#contrib-models

Best also directly upgrade the requirements as necessary so the PR can be merged "as is".

@reckart reckart added the ⭐️ Enhancement New feature or request label Feb 27, 2024
@raykyn
Copy link
Contributor Author

raykyn commented Feb 28, 2024

I added the content you asked for, but the requirements show a conflict:
Flair 0.13.1 needs more-itertools >=8.13.0, but dkpro-cassis is very strict in requiring version 8.12.*. I didn't run into any problems when using the newer (flair-compatible) version of more-itertools though. How should I proceed @reckart ?

@reckart
Copy link
Member

reckart commented Feb 28, 2024

@raykyn I have relaxed the version restriction on itertools in cassis - looks the tests all work with the new range:

dkpro/dkpro-cassis#305

I guess we need a release of cassis now, right?

@raykyn
Copy link
Contributor Author

raykyn commented Feb 29, 2024

I believe so, otherwise the dependency won't be updated for anyone using pip install to get the dependencies.

@reckart
Copy link
Member

reckart commented Feb 29, 2024

Roger, I'll run a release tonight probably.

@reckart
Copy link
Member

reckart commented Feb 29, 2024

Cassis 0.9.1 is available

…h flair 0.13.1 (both can use more-itertools 0.8.14 now)
@raykyn
Copy link
Contributor Author

raykyn commented Mar 1, 2024

Perfect!

Now while it works, there's just one thing - if someone has flair previously installed, it will still show a warning when installing dkpro-cassis because the requirement is still set to have the version below 0.9 (and more-itertools is now over version 0.10). But I don't think that's too big of a problem?

I've also tried adding a test, but I can't get the tests (not only my flair test, but also the spacy one) to run, I always get the error

Traceback (most recent call last):
  File "(my path)/inception-external-recommender/tests/test_spacy_recommender.py", line 21, in <module>
    from tests.util import load_obama, PREDICTED_TYPE, PREDICTED_FEATURE, PROJECT_ID, USER
ImportError: cannot import name 'load_obama' from 'tests.util' (/home/iprada/anaconda3/lib/python3.11/site-packages/tests/util.py)

I did install the test dependencies as required in the README.

@raykyn
Copy link
Contributor Author

raykyn commented Mar 1, 2024

btw I'm already using the flair recommender on my inception instance and it really speeds up the annotation. Thank you for your efforts @reckart ! (there's a bug that i can work around, but my inception instance is a few versions behind, so before i write an issue I'll update and see if it's resolved)

Copy link

codecov bot commented Mar 5, 2024

Codecov Report

Attention: Patch coverage is 0% with 40 lines in your changes are missing coverage. Please review.

Project coverage is 52.61%. Comparing base (d4fea9a) to head (5fe369f).

Files Patch % Lines
ariadne/contrib/flair.py 0.00% 40 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #55      +/-   ##
==========================================
- Coverage   55.13%   52.61%   -2.52%     
==========================================
  Files          22       23       +1     
  Lines         838      878      +40     
==========================================
  Hits          462      462              
- Misses        376      416      +40     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@reckart reckart merged commit 23d7d8f into inception-project:main Mar 5, 2024
4 of 6 checks passed
@reckart reckart self-requested a review March 5, 2024 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⭐️ Enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants