office repository for ACM MM 2021 paper: "Cross Modal Compression: "Cross Modal Compression: Towards Human-comprehensible Semantic Compression"
- linux
- python 3.5 (not test on other versions)
- pytorch 1.3+
- torchaudio 0.3
- librosa, pysoundfile
- json, tqdm, logging
- you can download CUB-200-2011 dataset and MS COCO 2014 from the offficial site
- download our json file for MS COCO from here(google drive, 百度网盘提取码:c31g)
- download our pretrained models from here(google drive, 百度网盘提取码:i0un)
export PYTHONPATH=path_for_this_project
# for CUB dataset
python ./TextImage/pretrain_DAMSM.py --cfg ./cfg/bird_DAMSM.yml --data_dir ./data/birds --dataset bird --output_dir ./output/TextImage --no_dist
# for COCO
python ./TextImage/pretrain_DAMSM.py --cfg ./cfg/coco_DAMSM.yml --data_dir ./data/coco --dataset coco --output_dir ./output/TextImage --no_dist
# for CUB
python ./ImageText/train.py --cfg ./cfg/bird_train.yml --data_dir ./data/birds --dataset bird --output_dir ./output/ImageText
# for COCO
python ./ImageText/train.py --cfg ./cfg/coco_train.yml --data_dir ./data/coco --dataset coco --output_dir ./output/ImageText
# for CUB
# first set the text encoder path in ./cfg/bird_train.yml: TRAIN.NET_E
python ./TextImage/train.py --cfg ./bird_train.yml --data_dir ./data/birds --dataset bird --output_dir ./output/TextImage
# for COCO
# first set the text encoder path in ./cfg/coco_train.yml: TRAIN.NET_E
python ./TextImage/train.py --cfg ./coco_train.yml --data_dir ./data/coco --dataset coco --output_dir ./output/TextImage
- write the pretrained models' paths in cfg/bird_eval.yml for CUB-200-2011 dataset or cfg/coco_eval.yml for MS COCO dataset
- run
python ./ImageText/end_to_end_test.py --cfg cfg/coco_eval.yml --data_dir COCO_PATH --output_dir ./output/end_to_end_coco_test
for MS COCO or
python ./ImageText/end_to_end_test.py --cfg cfg/coco_eval.yml --data_dir CUB_PATH --output_dir ./output/end_to_end_bird_test
https://smallflyingpig.github.io/cross_modal_compression_mainpage/main.html Feel free to mail me at: jiguo.li@vipl.ict.ac.cn/jgli@pku.edu.cn, if you have any question about this project.
Thanks to the valuable discussion with Junlong Gao. Besides, thanks to the open source of COCO API, AttnGAN, a-PyTorch-Tutorial-to-Image-Captioning.
Note that this work is only for research. Please do not use it for illegal purposes.