SklearnMentionDetector error in BIO encoding #29

david-waterworth · 2021-04-21T05:43:51Z

I think the line below

inception-external-recommender/ariadne/contrib/sklearn.py

Line 92 in 41d894c

if token.begin >= annotation.begin and annotation.end:

should be

if token.begin >= annotation.begin and token.end <= annotation.end:

The text was updated successfully, but these errors were encountered:

david-waterworth · 2021-04-21T06:15:10Z

Also I think the state machine is wrong, if there are more than 2 tokens for a single annotation, the results is BIBI rather than BIII. The code will only generate an I-MENTION if the preceding token is B-MENTION. But what it should do is generate I-MENTION if the previous token is B-MENTION or I-MENTION and we're still in the same annotation.

I replaced lines 88-103 with the following - I'm not 100% sure its correct / robust though

for token in tokens:
    tag = "O"
    for annotation in annotations:
        if token.begin >= annotation.begin and token.end <= annotation.end:
            if token.begin == annotation.begin:
                tag = "B-MENTION"
            elif token.end <= annotation.end:
                tag = "I-MENTION"
            break

jcklie · 2021-04-21T07:50:23Z

I will have a look. I never really used this recommender so there certainly might be bugs in there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SklearnMentionDetector error in BIO encoding #29

SklearnMentionDetector error in BIO encoding #29

david-waterworth commented Apr 21, 2021

david-waterworth commented Apr 21, 2021 •

edited

Loading

jcklie commented Apr 21, 2021

SklearnMentionDetector error in BIO encoding #29

SklearnMentionDetector error in BIO encoding #29

Comments

david-waterworth commented Apr 21, 2021

david-waterworth commented Apr 21, 2021 • edited Loading

jcklie commented Apr 21, 2021

david-waterworth commented Apr 21, 2021 •

edited

Loading