Reproducing full resolution Swin-T baseline from FastVQA paper #42

sh-r · 2024-02-09T10:13:33Z

Hello. Thanks you for your great work!
I had a question about the full-resolution Swin-T baseline given in the FastVQA paper. It is mentioned that fixed recognition features were regressed to get the baseline. Does this mean all frames of the video (no temporal sampling) and no fragmentation or resizing was done? Or was the temporally sampled video the input to the Swin-T model for generating the fixed features?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing full resolution Swin-T baseline from FastVQA paper #42

Reproducing full resolution Swin-T baseline from FastVQA paper #42

sh-r commented Feb 9, 2024

Reproducing full resolution Swin-T baseline from FastVQA paper #42

Reproducing full resolution Swin-T baseline from FastVQA paper #42

Comments

sh-r commented Feb 9, 2024