-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing sparsely video feature extraction module #2
Comments
Hi, thanks for the interest. I have uploaded the related code (for reference only). To extract region feature, you need to sample frames in the same way and use the tool provided by BUTD. |
Thank you very much for providing them. It would also be good if you could add some documentation to the files and functions so that we can better understand the starting point and steps to follow in order to extract feature properly. |
Bascially, you can follow a coarse pipeline: extract_video.py (decode mp4 into frames)->preprocess_feature.py (sample and encode frames into CNN representations)->split_dataset_feat.py(split the feature into train/val/test). |
That's so helpful. Thanks for explaining it. |
Which mode of 'cafe 'or 'd2' did you use to extract regional features? |
Please choose resnet-101 with d2. |
@doc-doc It seems that object_align.py does not give a complete method to obtain the bounding box, but directly reads region_8c10b_{}.h5. Is there any complete code that can detect the bounding box and then write it to region_8c10b_{}.h5? |
To get inference on a custom video dataset we need to sparsely extract video features, the same way as you do to get a good result. That would be great if you can make the module accessible on the repo.
The text was updated successfully, but these errors were encountered: