Create HuggingFaceTransformer.py #35

mobashgr · 2022-02-07T15:45:45Z

Here is the code for adding any HuggingFace Transformer model over INCEpTION

ariadne/contrib/HuggingFaceTransformer.py

reckart · 2022-02-07T17:53:02Z

ariadne/contrib/HuggingFaceTransformer.py

+        tokenizer = AutoTokenizer.from_pretrained(self._model)
+        model = AutoModelForTokenClassification.from_pretrained(self._model)
+        nlp_ner = pipeline("ner", model=model, tokenizer=tokenizer,aggregation_strategy="max")
+        for c, sentence in enumerate(cas.select(SENTENCE_TYPE)):


I can't see the c being used. If it is not needed, I guess the enumerate is not needed either?

Yes, true. I was using them for other purposes and forgot to remove them.

jcklie · 2022-02-07T19:12:02Z

Thank you for the PR! The basics looks good to me.
This code still has the issue that you do not use the tokenization of INCEpTION, which requires you then to use character level granularity which is not so nice to use later when exporting the corpus and using it downstream. In other recommenders, we align the predictions of recommenders to the INCEpTION tokenization which you would need to do here also before I would merge it tbh. Examples and hints can be found in

huggingface/transformers#14305
https://huggingface.co/docs/transformers/custom_datasets?highlight=offset_mapping#token-classification-with-wnut-emerging-entities
https://discuss.huggingface.co/t/predicting-with-token-classifier-on-data-with-no-gold-labels/9373

I also do not understand why you would need pandas here, it is certainly possible to do it just without.
As you only support token classification here, I would also name it TransformerTokenClassifier or so, the name indicates that it is a general implementation.

The file name does not fit with the users, Python and we us typically snake case for files.

It would be nice to have a unit test, even if it just does smoke testing.

mobashgr · 2022-02-07T19:41:42Z

Regarding the first point, yes, I was facing this problem yesterday and used the character level granularity as suggested by Richard. My problem was resolved, and I don’t think that I have the time to do this alignment now. I just wanted to share what I have as a solution to a problem that I was facing especially since the Adapter code isn’t working, and it was misleading TBH. I believe that INCEpTION is a very powerful tool and it should definitely have examples for HuggingFace classifiers.

For the second point, I need pandas, as the output of the pipeline in my case is a list of list of dictionaries. A sample of the pipeline output looks like this [{'entity_group': 'Chemical', 'score': 0.9996301, 'word': 'acety', 'start': 66, 'end': 71}, {'entity_group': 'Chemical', 'score': 0.99999845, 'word': 'nicotine', 'start': 98, 'end': 106}, {'entity_group': 'Chemical', 'score': 0.99911577, 'word': 'la dicine evised', 'start': 122, 'end': 144}, {'entity_group': 'Chemical', 'score': 0.9999038, 'word': 'alpha - only hete', 'start': 308, 'end': 325}] . So, I prefer to change it into a dataframe.

reckart · 2024-02-27T10:54:17Z

@mobashgr Sorry for getting back to you late. Could you please add the same Apache License license header to the file that we use in the other files?

I believe it should not be a strong problem if the recommender users a different tokenization. If the recommender creates a suggestion that does not fit in with the layer settings in INCEpTION, it will be ignored - it should not cause trouble.

Create HuggingFaceTransformer.py

0b23a77

Here is the code for adding any HuggingFace Transformer model over INCEpTION

reckart reviewed Feb 7, 2022

View reviewed changes

Update HuggingFaceTransformer.py

fc6c387

reckart added the ⭐️ Enhancement New feature or request label Feb 27, 2024

reckart assigned mobashgr Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create HuggingFaceTransformer.py #35

Create HuggingFaceTransformer.py #35

mobashgr commented Feb 7, 2022

reckart Feb 7, 2022

mobashgr Feb 7, 2022

jcklie commented Feb 7, 2022 •

edited

Loading

mobashgr commented Feb 7, 2022

reckart commented Feb 27, 2024

Create HuggingFaceTransformer.py #35

Are you sure you want to change the base?

Create HuggingFaceTransformer.py #35

Conversation

mobashgr commented Feb 7, 2022

reckart Feb 7, 2022

Choose a reason for hiding this comment

mobashgr Feb 7, 2022

Choose a reason for hiding this comment

jcklie commented Feb 7, 2022 • edited Loading

mobashgr commented Feb 7, 2022

reckart commented Feb 27, 2024

jcklie commented Feb 7, 2022 •

edited

Loading