-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create HuggingFaceTransformer.py #35
base: main
Are you sure you want to change the base?
Conversation
Here is the code for adding any HuggingFace Transformer model over INCEpTION
tokenizer = AutoTokenizer.from_pretrained(self._model) | ||
model = AutoModelForTokenClassification.from_pretrained(self._model) | ||
nlp_ner = pipeline("ner", model=model, tokenizer=tokenizer,aggregation_strategy="max") | ||
for c, sentence in enumerate(cas.select(SENTENCE_TYPE)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see the c
being used. If it is not needed, I guess the enumerate
is not needed either?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, true. I was using them for other purposes and forgot to remove them.
Thank you for the PR! The basics looks good to me. huggingface/transformers#14305 I also do not understand why you would need pandas here, it is certainly possible to do it just without. The file name does not fit with the users, Python and we us typically snake case for files. It would be nice to have a unit test, even if it just does smoke testing. |
Regarding the first point, yes, I was facing this problem yesterday and used the character level granularity as suggested by Richard. My problem was resolved, and I don’t think that I have the time to do this alignment now. I just wanted to share what I have as a solution to a problem that I was facing especially since the Adapter code isn’t working, and it was misleading TBH. I believe that INCEpTION is a very powerful tool and it should definitely have examples for HuggingFace classifiers. For the second point, I need pandas, as the output of the pipeline in my case is a list of list of dictionaries. A sample of the pipeline output looks like this |
@mobashgr Sorry for getting back to you late. Could you please add the same Apache License license header to the file that we use in the other files? I believe it should not be a strong problem if the recommender users a different tokenization. If the recommender creates a suggestion that does not fit in with the layer settings in INCEpTION, it will be ignored - it should not cause trouble. |
Here is the code for adding any HuggingFace Transformer model over INCEpTION