Replies: 2 comments
-
Try to replace the vggish features with zeros. Note that this feature is neither supported nor documented. Also, check out the discussion in #34 – you might want to collaborate with the author of that issue. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Thanks a lot for your prompt reply. I will have a try based on your advise. And again, deeply impressed by your patient and kind help for everyone interested in this work. 👍 All the best! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi ~ @v-iashin ,
Thanks for sharing your wonderful work and the detailed instructions on the usage!! I want to caption my own video with the provided pre-trained model. But the video doesn't has an audio. So I wander if I can directly follow the instructions in "Single Video Prediction" of the README? My concerns mainly lie in (1) the feature extraction module (VGGish) ,should I skip the VGGish feature or just do it even (error may occur?) through the video has no an audio. (2) Can I use the pre-trained model directly, or I have to re-training the model without audio information (I just want to finish a small application, re-training is time-consuming)?
Thanks and best reagrds!
Beta Was this translation helpful? Give feedback.
All reactions