Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SklearnMentionDetector error in BIO encoding #29

Open
david-waterworth opened this issue Apr 21, 2021 · 2 comments
Open

SklearnMentionDetector error in BIO encoding #29

david-waterworth opened this issue Apr 21, 2021 · 2 comments

Comments

@david-waterworth
Copy link

I think the line below

if token.begin >= annotation.begin and annotation.end:

should be

if token.begin >= annotation.begin and token.end <= annotation.end:

@david-waterworth
Copy link
Author

david-waterworth commented Apr 21, 2021

Also I think the state machine is wrong, if there are more than 2 tokens for a single annotation, the results is BIBI rather than BIII. The code will only generate an I-MENTION if the preceding token is B-MENTION. But what it should do is generate I-MENTION if the previous token is B-MENTION or I-MENTION and we're still in the same annotation.

I replaced lines 88-103 with the following - I'm not 100% sure its correct / robust though

for token in tokens:
    tag = "O"
    for annotation in annotations:
        if token.begin >= annotation.begin and token.end <= annotation.end:
            if token.begin == annotation.begin:
                tag = "B-MENTION"
            elif token.end <= annotation.end:
                tag = "I-MENTION"
            break

@jcklie
Copy link
Contributor

jcklie commented Apr 21, 2021

I will have a look. I never really used this recommender so there certainly might be bugs in there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants