This repository modifies the original StackGAN code from github.
use MSCOCO data set
- Download MSCOCO dataset and annotations including captions and instances
- Download pretrained char-CNN-RNN embedding of MSCOCO.
- misc/preprocess_mscoco.py preprocess the image in to different sizes for selected supercategory ,write them into tfrecords file along with the corresponding caption embedding.
- use mscoco python API
- dataloader that load tfrecords from mscoco
- image augumentation including cropping, flipping, and standarlization (when downsample the image, use INTER_AREA method)
- sampling from multiple caption embeddings, visualize embedding distributions
- negative example (use inner product of embedding captions, see method CLSGAN)
- filter out selective images based on classes and their areas
- enlarge capacity of generator network, adding 3 residual blocks.
- change relu to leaky relu
- option to no batch norm in discriminator
- increase or reduce discriminator final dimension
- Option to trian with vanilla GAN
- Option to train with WGAN (excluding weight clipping for batchnorm)
- Option to train with LSGAN
- Option to train with CLSGAN, continous least square GAN that estimates the inner products of embeddings between right caption embeddings and wrong caption embeddings.
- Option to train with BGAN (not implemented yet)
- Label each image in MSCOCO with multiple labels for objects that have area larger than the threshold
- Transfer resnet from Caffe to Tensorflow
- Train resnet to classify the 80 categories of objects in MSCOCO
- StackGAN
- text2image
- [char-RNN-CNN]
- WGAN
- LSGAN
- BGAN