hanzhanggit · Redcof · May 5, 2023 · May 5, 2023 · May 9, 2023 · May 10, 2023
diff --git a/.gitignore b/.gitignore
@@ -3,4 +3,9 @@ code/*.pyc
 code/miscc/*.pyc
 backup
 .DS_Store
-.idea/
+.idea/
+/.ipynb_checkpoints/
+__pycache__
+/test_stage1.sh
+/test_stage2.sh
+/train_stage2.sh
diff --git a/README.md b/README.md
@@ -1,65 +1,110 @@
 # StackGAN-pytorch
+
+Original Repository [hanzhanggit/StackGAN-Pytorch](https://github.com/hanzhanggit/StackGAN-Pytorch)
+
 - [Tensorflow implementation](https://github.com/hanzhanggit/StackGAN)
 
 - [Inception score evaluation](https://github.com/hanzhanggit/StackGAN-inception-model)
 
 - [StackGAN-v2-pytorch](https://github.com/hanzhanggit/StackGAN-v2)
 
-Pytorch implementation for reproducing COCO results in the paper [StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks](https://arxiv.org/pdf/1612.03242v2.pdf) by Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas. The network structure is slightly different from the tensorflow implementation. 
+Pytorch implementation for reproducing COCO results in the
+paper [StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks](https://arxiv.org/pdf/1612.03242v2.pdf)
+by Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas. The network
+structure is slightly different from the tensorflow implementation.
 
 <img src="examples/framework.jpg" width="850px" height="370px"/>
 
+# Environment Setup (Linux)
 
-### Dependencies
-python 2.7
+## Install conda (if not available)
 
-Pytorch
+- `git clone https://github.com/Redcof/StackGAN-Pytorch.git`
+- `wget https://repo.anaconda.com/miniconda/Miniconda3-py38_23.3.1-0-Linux-x86_64.sh`
+- `bash Miniconda3-py38_23.3.1-0-Linux-x86_64.sh -b`
+- `$HOME/miniconda3/bin/conda init`
+- `source $HOME/.bashrc`
 
-In addition, please add the project folder to PYTHONPATH and `pip install` the following packages:
-- `tensorboard`
-- `python-dateutil`
-- `easydict`
-- `pandas`
-- `torchfile`
+## Create environment
 
+- `conda create -n ganenv python=3.8`
+- `conda activate ganenv`
 
+## Install dependencies
 
-**Data**
+- `pip install -r requirements.txt`
+- `conda install -c conda-forge fasttext`
+- `conda install pytorch torchvision  pytorch-cuda=11.8 -c pytorch -c nvidia`
 
-1. Download our preprocessed char-CNN-RNN text embeddings for [training coco](https://drive.google.com/open?id=0B3y_msrWZaXLQXVzOENCY2E3TlU) and  [evaluating coco](https://drive.google.com/open?id=0B3y_msrWZaXLeEs5MTg0RC1fa0U), save them to `data/coco`.
-  - [Optional] Follow the instructions [reedscot/icml2016](https://github.com/reedscot/icml2016) to download the pretrained char-CNN-RNN text encoders and extract text embeddings.
-2. Download the [coco](http://cocodataset.org/#download) image data. Extract them to `data/coco/`.
+## Install CUDA drivers(if not available)
 
+**How to check?**
 
+```cmd
+python cuda_test.py # should return True
+```
 
-**Training**
-- The steps to train a StackGAN model on the COCO dataset using our preprocessed embeddings.
-  - Step 1: train Stage-I GAN (e.g., for 120 epochs) `python main.py --cfg cfg/coco_s1.yml --gpu 0`
-  - Step 2: train Stage-II GAN (e.g., for another 120 epochs) `python main.py --cfg cfg/coco_s2.yml --gpu 1`
-- `*.yml` files are example configuration files for training/evaluating our models.
-- If you want to try your own datasets, [here](https://github.com/soumith/ganhacks) are some good tips about how to train GAN. Also, we encourage to try different hyper-parameters and architectures, especially for more complex datasets.
+**Check OS architecture**
+`cat /etc/os-release` return the OS name and `uname -m` command should return the OS architecture. For us, it was 'x86_64'
 
+**Downloading Toolkit**
+[https://developer.nvidia.com/cuda-11-7-0-download-archive?target_os=Linux](https://developer.nvidia.com/cuda-11-7-0-download-archive?target_os=Linux)
 
+We choose to install online:
+```commandline
+sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
+sudo dnf clean all
+sudo dnf -y module install nvidia-driver:latest-dkms
+sudo dnf -y install cuda
+```
 
-**Pretrained Model**
-- [StackGAN for coco](https://drive.google.com/open?id=0B3y_msrWZaXLYjNra2ZSSmtVQlE). Download and save it to `models/coco`.
-- **Our current implementation has a higher inception score(10.62±0.19) than reported in the StackGAN paper**
+**Data - Text**
+
+1. Download our preprocessed char-CNN-RNN text embeddings
+   for [training coco](https://drive.google.com/open?id=0B3y_msrWZaXLQXVzOENCY2E3TlU)
+   and  [evaluating coco](https://drive.google.com/open?id=0B3y_msrWZaXLeEs5MTg0RC1fa0U), save them to `data/coco`.
+
+2. [Optional] Follow the instructions [reedscot/icml2016](https://github.com/reedscot/icml2016) to download the
+   pretrained char-CNN-RNN text encoders and extract text embeddings.
+
+**Data - Image**
 
+1. Download the [coco](http://cocodataset.org/#download) image data. Extract them to `data/coco/`.
 
+**Custom Dataset**
+
+1. See `data/README.md` file
+
+**Training COCO**
+
+- The steps to train a StackGAN model on the COCO dataset using our preprocessed embeddings.
+    - Step 1: train Stage-I GAN (e.g., for 120 epochs) `python code/main.py --cfg cfg/coco_s1.yml --gpu 0`
+    - Step 2: train Stage-II GAN (e.g., for another 120 epochs) `python code/main.py --cfg cfg/coco_s2.yml --gpu 1`
+- `*.yml` files are example configuration files for training/evaluating our models.
+- If you want to try your own datasets, [here](https://github.com/soumith/ganhacks) are some good tips about how to
+  train GAN. Also, we encourage to try different hyper-parameters and architectures, especially for more complex
+  datasets.
+
+**Pretrained Model**
+
+- [StackGAN for coco](https://drive.google.com/open?id=0B3y_msrWZaXLYjNra2ZSSmtVQlE). Download and save it
+  to `models/coco`.
+- **Our current implementation has a higher inception score(10.62±0.19) than reported in the StackGAN paper**
 
 **Evaluating**
-- Run `python main.py --cfg cfg/coco_eval.yml --gpu 2` to generate samples from captions in COCO validation set.
+
+- Run `python code/main.py --cfg cfg/coco_eval.yml --gpu 2` to generate samples from captions in COCO validation set.
 
 Examples for COCO:
- 
+
 ![](examples/coco_2.png)
 ![](examples/coco_3.png)
 
-Save your favorite pictures generated by our models since the randomness from noise z and conditioning augmentation makes them creative enough to generate objects with different poses and viewpoints from the same discription :smiley:
-
-
+Save your favorite pictures generated by our models since the randomness from noise z and conditioning augmentation
+makes them creative enough to generate objects with different poses and viewpoints from the same discription :smiley:
 
 ### Citing StackGAN
+
 If you find StackGAN useful in your research, please consider citing:
 
 ```
@@ -71,14 +116,14 @@ booktitle = {{ICCV}},
 }
 ```
 
-
 **Our follow-up work**
 
 - [StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks](https://arxiv.org/abs/1710.10916)
 - [AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks](https://arxiv.org/abs/1711.10485) [[supplementary]](https://1drv.ms/b/s!Aj4exx_cRA4ghK5-kUG-EqH7hgknUA)[[code]](https://github.com/taoxugit/AttnGAN)
 
-
 **References**
 
-- Generative Adversarial Text-to-Image Synthesis [Paper](https://arxiv.org/abs/1605.05396) [Code](https://github.com/reedscot/icml2016)
-- Learning Deep Representations of Fine-grained Visual Descriptions [Paper](https://arxiv.org/abs/1605.05395) [Code](https://github.com/reedscot/cvpr2016)
+- Generative Adversarial Text-to-Image
+  Synthesis [Paper](https://arxiv.org/abs/1605.05396) [Code](https://github.com/reedscot/icml2016)
+- Learning Deep Representations of Fine-grained Visual
+  Descriptions [Paper](https://arxiv.org/abs/1605.05395) [Code](https://github.com/reedscot/cvpr2016)
diff --git a/TODO.txt b/TODO.txt
@@ -0,0 +1,7 @@
+May-25
+[+] Share image with - Tejas
+[*] Generate data for testing and test it
+[ ] Image superresolution with S1 images
+[ ] Run training with fasttext self-trained + latest captions
+[ ] VQGAN
+[ ] CLIP
diff --git a/code/__pycache__/model.cpython-38.pyc b/code/__pycache__/model.cpython-38.pyc
diff --git a/code/__pycache__/trainer.cpython-38.pyc b/code/__pycache__/trainer.cpython-38.pyc
diff --git a/code/cfg/sixray_500_s1.yml b/code/cfg/sixray_500_s1.yml
@@ -0,0 +1,34 @@
+CONFIG_NAME: 'stage_1'
+
+DATASET_NAME: 'sixray_2381_charcnnrnn_1536_jemb'
+EMBEDDING_TYPE: 'embedding_bulk_word_1536_jemb.pickle'
+GPU_ID: '0,1'
+Z_DIM: 200
+DATA_DIR: 'data/sixray_2381'
+IMSIZE: 64
+WORKERS: 4
+STAGE: 1
+TRAIN:
+  FLAG: True
+  BATCH_SIZE: 64
+  BATCH_DROP_LAST: True
+  MAX_EPOCH: 1000
+  LR_DECAY_EPOCH: 20
+  SNAPSHOT_INTERVAL: 5
+  DISCRIMINATOR_LR: 0.0004
+  GENERATOR_LR: 0.0004
+  COEFF:
+    KL: 2.0
+  FINETUNE:
+    FLAG: False
+    EPOCH_START: 1001
+    NET_G: '/home/icmore_acc/Downloads/StackGAN-Pytorch/output/sixray_500_stage_1_2023_05_19_15_38_51/Model/netG_epoch_300.pth'
+    NET_D: '/home/icmore_acc/Downloads/StackGAN-Pytorch/output/sixray_500_stage_1_2023_05_19_15_38_51/Model/netD_epoch_last.pth'
+
+GAN:
+  CONDITION_DIM: 128
+  DF_DIM: 96
+  GF_DIM: 192
+
+TEXT:
+  DIMENSION: 1536
diff --git a/code/cfg/sixray_500_s2.yml b/code/cfg/sixray_500_s2.yml
@@ -0,0 +1,36 @@
+CONFIG_NAME: 'stage_2'
+
+DATASET_NAME: 'sixray_2381_charcnnrnn_1536_jemb'
+EMBEDDING_TYPE: 'embedding_bulk_word_1536_jemb.pickle'
+GPU_ID: '0,1'
+Z_DIM: 200
+DATA_DIR: 'data/sixray_2381'
+IMSIZE: 256
+WORKERS: 4
+STAGE: 2
+STAGE1_G: ''
+TRAIN:
+  FLAG: True
+  BATCH_SIZE: 64
+  BATCH_DROP_LAST: True
+  MAX_EPOCH: 1000
+  LR_DECAY_EPOCH: 20
+  SNAPSHOT_INTERVAL: 5
+  DISCRIMINATOR_LR: 0.0008
+  GENERATOR_LR: 0.0008
+  COEFF:
+    KL: 1.0
+  FINETUNE:
+    FLAG: False
+    EPOCH_START: 1001
+    NET_G: 'output/experiment15/sixray_2381_ftt_1024D_cbow_nocrop_batch_double_stage_2_train_2023_06_12_18_16_28/Model/netG_epoch_1000.pth'
+    NET_D: 'output/experiment15/sixray_2381_ftt_1024D_cbow_nocrop_batch_double_stage_2_train_2023_06_12_18_16_28/Model/netD_epoch_last.pth'
+
+GAN:
+  CONDITION_DIM: 128
+  DF_DIM: 96
+  GF_DIM: 192
+  R_NUM: 2
+
+TEXT:
+  DIMENSION: 1536
diff --git a/code/cfg/sixray_s1.yml b/code/cfg/sixray_s1.yml
@@ -0,0 +1,34 @@
+CONFIG_NAME: 'stage1'
+
+DATASET_NAME: 'sixray_sample'
+EMBEDDING_TYPE: 'embeddings_cc.en.300.bin_300D.pickle'
+GPU_ID: '0,1'
+Z_DIM: 100
+DATA_DIR: '../data/sixray_sample'
+IMSIZE: 64
+WORKERS: 4
+STAGE: 1
+TRAIN:
+  FLAG: True
+  BATCH_SIZE: 6
+  BATCH_DROP_LAST: True
+  MAX_EPOCH: 300
+  LR_DECAY_EPOCH: 20
+  SNAPSHOT_INTERVAL: 10
+  DISCRIMINATOR_LR: 0.0002
+  GENERATOR_LR: 0.0002
+  COEFF:
+    KL: 2.0
+  FINETUNE:
+    FLAG: False
+    EPOCH_START: 0
+    NET_G: ''
+    NET_D: ''
+
+GAN:
+  CONDITION_DIM: 128
+  DF_DIM: 96
+  GF_DIM: 192
+
+TEXT:
+  DIMENSION: 300
diff --git a/code/cfg/sixray_s2.yml b/code/cfg/sixray_s2.yml
@@ -0,0 +1,36 @@
+CONFIG_NAME: 'stage2'
+
+DATASET_NAME: 'sixray_sample'
+EMBEDDING_TYPE: 'embeddings_cc.en.300.bin_300D.pickle'
+GPU_ID: '0,1'
+Z_DIM: 100
+STAGE1_G: 'output/sixray_sample_stage1_2023_05_12_19_17_04/Model/netG_epoch_300.pth'
+DATA_DIR: 'data/sixray_sample'
+WORKERS: 4
+IMSIZE: 256
+STAGE: 2
+TRAIN:
+  FLAG: True
+  BATCH_SIZE: 6
+  BATCH_DROP_LAST: True
+  MAX_EPOCH: 500
+  LR_DECAY_EPOCH: 20
+  SNAPSHOT_INTERVAL: 5
+  DISCRIMINATOR_LR: 0.0002
+  GENERATOR_LR: 0.0002
+  COEFF:
+    KL: 2.0
+  FINETUNE:
+    FLAG: False
+    EPOCH_START: 0
+    NET_G: ''
+    NET_D: ''
+
+GAN:
+  CONDITION_DIM: 128
+  DF_DIM: 96
+  GF_DIM: 192
+  R_NUM: 2
+
+TEXT:
+  DIMENSION: 300