the task is multi-choice or free-response question? And how to evaluate on the val split?

Hi,

Excellent work and thank you for making the resources publicly available.

While conducting evaluation experiments on your dataset, I encountered some issues.

In the published paper, you stated:
> Evaluation. We use answer prediction accuracy as the metric and evaluate model performance on answering different types of questions. The answer vocabulary consists of 42
possible answers (22 objects, 12 counting choices, 6 location types, and yes/no) to different types of questions in the dataset. For training, we use one single model to handle all
questions without training separated models for each type. So the accuracy with random choice is 1/42≈2.4%. Additionally, all models are trained on our AVQA dataset using
the same features for a fair comparison.

It seems that the task is muti-choice questions, right?  Could you provide a list of the 42 possible answers?

Any additional details regarding how to perform the evaluation would be helpful. Thank you.

@DTaoo @ayameyao 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the task is multi-choice or free-response question? And how to evaluate on the val split? #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

the task is multi-choice or free-response question? And how to evaluate on the val split? #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions