-
Notifications
You must be signed in to change notification settings - Fork 29
Open
Description
Hello, and thank you for this great work! I have a question about how the training dataset was constructed:
“To enable our model can decide when high resolution is necessary, we collect corresponding VQA samples, including both cases requiring high-resolution images and cases adequately answered using downsampled images.”
However, I wasn’t able to locate in the paper where these samples originate from. Could you please clarify:
-
Data source
- Which VQA dataset(s) or external benchmarks were used to gather these examples?
-
Selection strategy
- What criteria or heuristics did you apply to filter cases from the original data?
- Was this labeling done manually, via pre-defined rules, or by some automated process?
If I’ve simply overlooked this detail in the manuscript, my apologies—could you point me to the relevant section or appendix? Thank you for your time and clarification!
Metadata
Metadata
Assignees
Labels
No labels