This project implements a deep learning neural network model to retrieve the target image from a given image and a text containing desired modifications in the input image. The text and the image are composed using the TIRG model, use this composed feature to retrieve the target image. An experimental study to generate the target image is also implemented using StackGAN. This project is implemented as part of course project for Deep Learning Course at University of Victoria.
More details could be found at: 'Report.pdf'.
- Download the dataset and give the location of the dataset in config.py.
- Change the following according to the needs: batch_size, epochs, dataset location.
- Load the pre-trained model in 'save' directory if needed.
bash download.sh
unzip CSS.zip
python train_TIRG.py --batch_size=64 --num_iters=210000
-
Training consist of two stages, stage I and Stage II.
-
Train the TIRG model or use the pretrained TIRG weights before training.
-
To train Stage I:
python train_GAN.py --stage=1 --batch_size=64
- To train Stage II:
python train_GAN.py --stage=2 --batch_size=8
- Change the following according to the needs: batch_size, epochs, lr (learning rate).
- Load the pre-trained model in 'save' directory if needed.
To train in TensorFlow:
bash download.sh
unzip CSS.zip
python main.py
- CSS 16K (1 GB):
- Extract and load the dataset outside the folder.
- TIRG. Pretrained model is trained on 1 NVIDIA GeForce GTX 1080Ti for 20 hours(450 epoches).
- StackGAN Stage I. Pretrained model is trained on 2 NVIDIA Tesla T4 (120 epoches).
- StackGAN Stage II. Pretrained model is trained on 2 NVIDIA Tesla T4 (90 epoches).
This project was implemented taking reference from the following papers:
Composing Text and Image for Image Retrieval - An Empirical Odyssey (arXiv 2018) **[Nam Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, James Hays]
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks (arXiv 2016) **[Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas]