Skip to content

the task is multi-choice or free-response question? And how to evaluate on the val split? #9

@Molly-3000

Description

@Molly-3000

Hi,

Excellent work and thank you for making the resources publicly available.

While conducting evaluation experiments on your dataset, I encountered some issues.

In the published paper, you stated:

Evaluation. We use answer prediction accuracy as the metric and evaluate model performance on answering different types of questions. The answer vocabulary consists of 42
possible answers (22 objects, 12 counting choices, 6 location types, and yes/no) to different types of questions in the dataset. For training, we use one single model to handle all
questions without training separated models for each type. So the accuracy with random choice is 1/42≈2.4%. Additionally, all models are trained on our AVQA dataset using
the same features for a fair comparison.

It seems that the task is muti-choice questions, right? Could you provide a list of the 42 possible answers?

Any additional details regarding how to perform the evaluation would be helpful. Thank you.

@DTaoo @ayameyao

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions