This is pytorch implementation of our ECCV-2018 paper: Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation. This project is based on our previous work: Multi-level Scene Description Network.
- Guide for Project Setup
- Guide for Model Evaluation with pretrained model
- Guide for Model Training
- Uploading pretrained model and format-compatible datasets.
- Update the Model link for VG-DR-Net (We will upload a new model by Aug. 27).
- Update the Dataset link for VG-DR-Net.
- A demonstration of our Factorizable Net
- Migrate to PyTorch 1.0.1
- Multi-GPU support (beta version): one image per GPU
- Feb 26, 2019: Now we release our beta [Multi-GPU] version of Factorizable Net. Find the stable version at branch 0.3.1
- Aug 28, 2018: Bug fix for running the evaluation with "--use_gt_boxes". VG-DR-Net has some self-relations, e.g. A-relation-A. Previously, we assumed there is no such relation. This commit may influence the model performance on Scene Graph Generation.
-
Install the requirements (you can use pip or Anaconda):
conda install pip pyyaml sympy h5py cython numpy scipy click conda install -c menpo opencv3 conda install pytorch torchvision cudatoolkit=8.0 -c pytorch pip install easydict
-
Clone the Factorizable Net repository
git clone git@github.com:yikang-li/FactorizableNet.git
-
Build the Cython modules for nms, roi pooling,roi align modules
cd lib make all cd ..
-
Download the three datasets VG-MSDN, VG-DR-Net, VRD to
F-Net/data
. And extract the folders withtar xzvf ${Dataset}.tgz
. We have converted the original annotations tojson
version. -
Download Visual Genome images and VRD images.
-
Link the image data folder to target folder:
ln -s /path/to/images F-Net/data/${Dataset}/images
- p.s. You can change the default data directory by modifying
dir
inoptions/data_xxx.json
.
- p.s. You can change the default data directory by modifying
-
[optional] Download the pretrained RPN for Visual Genome and VRD. Place them into
output/
. -
[optional] Download the pretrained Factorizable Net on VG-MSDN, VG-DR-Net and VG-VRD, and place them to
output/trained_models/
There are several subfolders contained:
lib
: dataset Loader, NMS, ROI-Pooling, evaluation metrics, etc. are listed in the folder.options
: configurations forData
,RPN
,F-Net
andhyperparameters
.models
: model definitions forRPN
,Factorizable
and related modules.data
: containing VG-DR-Net (svg/
), VG-MSDN (visual_genome/
) and VRD (VRD/
).output
: storing the trained model, checkpoints and loggers.
Pretrained models on VG-MSDN, VG-DR-Net and VG-VRD are provided. --evaluate
is provided to enable evaluation mode. Additionally, we also provide --use_gt_boxes
to fed the ground-truth object bounding boxes instead of RPN proposals.
- Evaluation on VG-MSDN with pretrained.
Scene Graph Generation results: Recall@50:
12.984%
, Recall@100:16.506%
.
CUDA_VISIBLE_DEVICES=0 python train_FN.py --evaluate --dataset_option=normal \
--path_opt options/models/VG-MSDN.yaml \
--pretrained_model output/trained_models/Model-VG-MSDN.h5
- Evaluation on VG-VRD with pretrained. : Scene Graph Generation results: Recall@50:
19.453%
, Recall@100:24.640%
.
CUDA_VISIBLE_DEVICES=0 python train_FN.py --evaluate \
--path_opt options/models/VRD.yaml \
--pretrained_model output/trained_models/Model-VRD.h5
- Evaluation on VG-DR-Net with pretrained.
Scene Graph Generation results: Recall@50:
19.807%
, Recall@100:25.488%
.
CUDA_VISIBLE_DEVICES=0 python train_FN.py --evaluate --dataset_option=normal \
--path_opt options/models/VG-DR-Net.yaml \
--pretrained_model output/trained_models/Model-VG-DR-Net.h5
-
Training Region Proposal Network (RPN). The shared conv layers are fixed. We also provide pretrained RPN on Visual Genome and VRD.
# Train RPN for VG-MSDN and VG-DR-Net CUDA_VISIBLE_DEVICES=0 python train_rpn.py --dataset_option=normal # Train RPN for VRD CUDA_VISIBLE_DEVICES=0 python train_rpn_VRD.py
-
Training Factorizable Net: detailed training options are included in
options/models/
.# Train F-Net on VG-MSDN: CUDA_VISIBLE_DEVICES=0 python train_FN.py --dataset_option=normal \ --path_opt options/models/VG-MSDN.yaml --rpn output/RPN.h5 # Train F-Net on VRD: CUDA_VISIBLE_DEVICES=0 python train_FN.py \ --path_opt options/models/VRD.yaml --rpn output/RPN_VRD.h5 # Train F-Net on VG-DR-Net: CUDA_VISIBLE_DEVICES=0 python train_FN.py --dataset_option=normal \ --path_opt options/models/VG-DR-Net.yaml --rpn output/RPN.h5
--rpn xxx.h5
can be ignored in end-to-end training from pretrained VGG16. Sometime, unexpected and confusing errors appear. Ignore it and restart to training. -
For better results, we usually re-train the model with additional epochs by resuming the training from the checkpoint with
--resume ckpt
:# Resume F-Net training on VG-MSDN: CUDA_VISIBLE_DEVICES=0 python train_FN.py --dataset_option=normal \ --path_opt options/models/VG-MSDN.yaml --resume ckpt --epochs 30
We thank longcw for his generous release of the PyTorch Implementation of Faster R-CNN.
If you find our project helpful, your citations are highly appreciated:
@inproceedings{li2018fnet,
author={Li, Yikang and Ouyang, Wanli and Bolei, Zhou and Jianping, Shi and Chao, Zhang and Wang, Xiaogang},
title={Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation},
booktitle = {ECCV},
year = {2018}
}
We also have two papers regarding to scene graph generation / visual relationship detection:
@inproceedings{li2017msdn,
author={Li, Yikang and Ouyang, Wanli and Zhou, Bolei and Wang, Kun and Wang, Xiaogang},
title={Scene graph generation from objects, phrases and region captions},
booktitle = {ICCV},
year = {2017}
}
@inproceedings{li2017vip,
author={Li, Yikang and Ouyang, Wanli and Zhou, Bolei and Wang, Kun and Wang, Xiaogang},
title={ViP-CNN: Visual Phrase Guided Convolutional Neural Network},
booktitle = {CVPR},
year = {2017}
}
The pre-trained models and the Factorizable Network technique are released for uncommercial use.
Contact Yikang LI if you have questions.