-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shape error #7
Comments
+1 Did you end up find a way of resolving this issue and running the demo? |
Follow the instructions the repo gives for open_clip installation and you should not get this error. If you use |
For reference, yes the installation of open_clip was the issue, but it was caused by the |
Hello, why did I remove the square brackets on line 92 and still report shape error? |
Not sure if this is just a pasting error in your question but if you're using the above command exactly as you've written there the
|
Traceback (most recent call last): I got this error, do u know why? |
I reckon it could be related to the "batch_first" argument in a relatively newer version of torch. You can try to remove the two "permute" operations in TextEncoder's forward function at model/maxvqa.py:27 and 29 |
Hi, When I run the demo_maxvqa.py for a test, something is wrong with the shape:
Traceback (most recent call last):
File "E:/MaxVQA-master/demo_maxvqa.py", line 167, in
a = inference(video)
File "E:/MaxVQA-master/demo_maxvqa.py", line 160, in inference
vis_feats = visual_encoder(data["aesthetic"].to(device), data["technical"].to(device))
File "D:\tools\Anaconda\set\envs\python37tf\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "E:\MaxVQA-master\model\visual.py", line 19, in forward
clip_feats = clip_feats[1:].reshape(7,7,-1,1024).permute(3,2,0,1)
RuntimeError: shape '[7, 7, -1, 1024]' is invalid for input of size 64512
vis_feats = visual_encoder(data["aesthetic"].to(device), data["technical"].to(device))
data["aesthetic"]---[3, 64, 224, 224]
data["technical"]---[3, 128, 224, 224]
The specific problem is found in the following two lines of code
clip_feats = self.clip_visual(x_aes)
clip_feats = clip_feats[1:].reshape(7,7,-1,1024).permute(3,2,0,1)
However, the shape of clip_feats is [64, 1024]
The text was updated successfully, but these errors were encountered: