forked from Sundrops/video-caption.pytorch
-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.md.old
executable file
·140 lines (104 loc) · 3.54 KB
/
README.md.old
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
# requirements #
- cuda
- pytorch 0.3.1
- python3(未测试) or python2(已测试,最好统一用py2吧)
- ffmpeg (can install using anaconda)
# usage #
1. 2d特征提取, 如resnet101, nasnet等
```bash
sh ./2d_extract_feat.sh
# model 模型选择
# n_frame_steps 一段视频提取多少帧,默认选80吧
```
2. 3d特征提取
```bash
cd c3d_feat_extract
sh ./c3d_feat_extract.sh
# --mode feature 提取特征模式,无需改动
# 以下根据所选模型不同进行更改
# --model_name resnext \
# --model_depth 101 \
# --resnext_cardinality 32 \
# --resnet_shortcut B \
# --model pretrained_models/resnext-101-64f-kinetics.pth
```
3. 训练
```bash
./train_s2vt.sh
# 根据相关配置进行设置,具体选项含义参考opts.py
```
4. 测试和评分
```bash
./eval_s2vt.sh
# 根据相关配置进行设置,具体选项含义参考eval.py
```
# file tree #
相关文件下载
链接: https://pan.baidu.com/s/1RDNygrWtz_PtVH8nh4vG3w 密码: nxyk
```
data
│ all_caption.json
│ all_info.json
│ all_videodatainfo_2017.json
└───feats
│ └───nasnet
│ │ │ videoxxx.npy
│ │ │ ...
│ └───resnet
│ │ │ videoxxx.npy
│ │ │ ...
│ └───xxnet
│ │ videoxxx.npy
│ │ ...
└───videos
│ │ videoxxx.mp4
│ │ ...
│
│
新建这些目录
log
checkpoint
result
```
# pytorch implementation of video captioning
recommend installing pytorch and python packages using Anaconda
### python packages
- tqdm
- pillow
- pretrainedmodels
- nltk
## Data
MSR-VTT. Test video doesn't have captions, so I spilit train-viedo to train/val/test. Extract and put them in `./data/` directory
- train-video: [download link](https://drive.google.com/file/d/1Qi6Gn_l93SzrvmKQQu-drI90L-x8B0ly/view?usp=sharing)
- test-video: [download link](https://drive.google.com/file/d/10fPbEhD-ENVQihrRvKFvxcMzkDlhvf4Q/view?usp=sharing)
- json info of train-video: [download link](https://drive.google.com/file/d/1LcTtsAvfnHhUfHMiI4YkDgN7lF1-_-m7/view?usp=sharing)
- json info of test-video: [download link](https://drive.google.com/file/d/1Kgra0uMKDQssclNZXRLfbj9UQgBv-1YE/view?usp=sharing)
## Options
all default options are defined in opt.py or corresponding code file, change them for your like.
## Usage
### (Optional) c3d features
you can use [video-classification-3d-cnn-pytorch](https://github.com/kenshohara/video-classification-3d-cnn-pytorch) to extract features from video. Then mean pool to get a 2048 dim feature for each video.
### Steps
1. preprocess videos and labels
this steps take about 3 hours for msr-vtt datasets use one titan XP gpu
```bash
python prepro_feats.py --output_dir data/feats/resnet152 --model resnet152 --n_frame_steps 40 --gpu 4,5
python prepro_vocab.py
```
2. Training a model
```bash
python train.py --gpu 5,6,7 --epochs 9001 --batch_size 450 --checkpoint_path data/save --feats_dir data/feats/resnet152 --dim_vid 2048 --model S2VTAttModel
```
3. test
opt_info.json will be in same directory as saved model.
```bash
python eval.py --recover_opt data/save/opt_info.json --saved_model data/save/model_1000.pth --batch_size 100 --gpu 1,0
```
## Metrics
I fork the [coco-caption XgDuan](https://github.com/XgDuan/coco-caption/tree/python3). Thanks to port it to python3.
## TODO
- lstm
- beam search
- reinforcement learning
## Note
This repository is not maintained, please see my another repository [video-caption-openNMT.py](https://github.com/xiadingZ/video-caption-openNMT.pytorch). It has higher performence and test score.