Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potential incorrect annotation #7

Open
chijames opened this issue Aug 2, 2021 · 7 comments
Open

potential incorrect annotation #7

chijames opened this issue Aug 2, 2021 · 7 comments

Comments

@chijames
Copy link

chijames commented Aug 2, 2021

Hi,

I am manually checking the data annotation. I randomly pick one file in the test set, which is Bed008.json. In my opinion, the annotations are a bit problematic. See below for detailed analysis:

  1. Discussion about future meeting: The first 154 utterances have nothing to do with future meeting. In addition, PHD F never shows up.
  2. What was said on getting fluent English speakers: they are discussing about network, not english speakers. Also, where is Postdoc E?
  3. What were the options that were discussed on the location of the recording equipment: I really cannot see the relation of this Q to turns 157-160.
  4. What were Grad B...: where is Grad B?
    ...

Seems to me that you mismatch the transcripts and questions/summaries. Please correct me if I misunderstand anything!

Thanks.

@chijames chijames changed the title incorrect annotation potential incorrect annotation Aug 2, 2021
@chijames
Copy link
Author

chijames commented Aug 2, 2021

I pick another file Bmr006.json, and the annotations also seem to be somewhat, if not very, problematic. I did a little bit search and found that train/Bed005.json and test/Bmr006.json share the exact same topic_list and queries but different meeting transcripts.

@chijames
Copy link
Author

chijames commented Aug 3, 2021

@maszhongming Any updates on this?

@maszhongming
Copy link
Collaborator

Sorry for the annotations that may be problematic!

In fact, @WadeYin9712 and I are responsible for the review of Product domain and (part of) Committee domain in QMSum dataset. It seems that the data in the Academic domain has various problems. We will contact the corresponding annotator and reviewer and try to fix them one by one.

Have you found similar problems in the data of the other two domains?

@chijames
Copy link
Author

chijames commented Aug 4, 2021

Thanks for the reply! I haven't found other issues so far in the other domains. Is there an estimated timeline of the fix?

@chijames
Copy link
Author

chijames commented Aug 9, 2021

@maszhongming Any updates?

@WadeYin9712
Copy link
Collaborator

Sorry for the late reply! We're trying to seek another batch of qualified annotators. But it might take a long time to find them, train them and finish the re-annotation. We plan to fix the problematic meetings like Bed008.json this week. We will inform you once we accomplished the re-annotation of these meetings. Thanks!

@maszhongming
Copy link
Collaborator

Hi, we have updated the annotations of Bmr006 and Bed008. We will continue to look for problematic annotations and fix them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants