Skip to content

Utsavd7/pix2pix-GANs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

pix2pix: Image-to-image translation

1) Overview

  1. This repository illustrates the process of constructing and training a pix2pix conditional generative adversarial network (cGAN).
  2. The objective of this network, as outlined in the paper "Image-to-image translation with conditional adversarial networks", is to learn how to convert input images into corresponding output images.
  3. The versatility of pix2pix allows it to be employed for various tasks such as producing photos from label maps, adding color to grayscale images, converting Google Maps images into aerial views, and changing sketches into realistic photographs.

2) Architecture

The architecture of your network will contain:

  1. A generator with a U-Net-based architecture.
  2. A discriminator represented by a convolutional PatchGAN classifier (proposed in the pix2pix paper).
  3. Note that each epoch can take around 15 seconds on a single V100 GPU.

Below are some examples of the output generated by the pix2pix cGAN after training for 200 epochs on the facades dataset (80k steps).

1

3) Libraries

  1. import tensorflow as tf
  2. import os
  3. import pathlib
  4. import time
  5. import datetime
  6. from matplotlib import pyplot as plt
  7. from IPython import display

4) Build the generator

The generator of your pix2pix cGAN is a modified U-Net. A U-Net consists of an encoder (downsampler) and a decoder (upsampler).

  1. Each block in the encoder is: Convolution -> Batch normalization -> Leaky ReLU
  2. Each block in the decoder is: Transposed convolution -> Batch normalization -> Dropout (applied to the first 3 blocks) -> ReLU
  3. There are skip connections between the encoder and decoder (as in the U-Net).

5) Define the generator loss

GANs learn a loss that adapts to the data, while cGANs learn a structured loss that penalizes a possible structure that differs from the network output and the target image, as described in the pix2pix paper.

  1. The generator loss is a sigmoid cross-entropy loss of the generated images and an array of ones.
  2. The pix2pix paper also mentions the L1 loss, which is a MAE (mean absolute error) between the generated image and the target image.
  3. This allows the generated image to become structurally similar to the target image.
  4. The formula to calculate the total generator loss is gan_loss + LAMBDA * l1_loss, where LAMBDA = 100. This value was decided by the authors of the paper.

6) Build the discriminator

The discriminator in the pix2pix cGAN is a convolutional PatchGAN classifier—it tries to classify if each image patch is real or not real, as described in the pix2pix paper.

  1. Each block in the discriminator is: Convolution -> Batch normalization -> Leaky ReLU.
  2. The shape of the output after the last layer is (batch_size, 30, 30, 1).
  3. Each 30 x 30 image patch of the output classifies a 70 x 70 portion of the input image.
  4. The discriminator receives 2 inputs:
  5. The input image and the target image, which it should classify as real.
  6. The input image and the generated image (the output of the generator), which it should classify as fake.
  7. Use tf.concat([inp, tar], axis=-1) to concatenate these 2 inputs together.

7) Define the discriminator loss

  1. The discriminator_loss function takes 2 inputs: real images and generated images.
  2. Real_loss is a sigmoid cross-entropy loss of the real images and an array of ones(since these are the real images).
  3. Generated_loss is a sigmoid cross-entropy loss of the generated images and an array of zeros (since these are fake images).
  4. The total_loss is the sum of real_loss and generated_loss.

8) Training

  1. For each example input generates an output.
  2. The discriminator receives the input_image and the generated image as the first input. The second input is the input_image and the target_image.
  3. Next, calculate the generator and the discriminator loss.
  4. Then, calculate the gradients of loss with respect to both the generator and the discriminator variables(inputs) and apply those to the optimizer.
  5. Finally, log the losses to TensorBoard.

The actual training loop. Since this tutorial can run of more than one dataset, and the datasets vary greatly in size the training loop is setup to work in steps instead of epochs.

  1. Iterates over the number of steps.
  2. Every 10 steps print a dot (.).
  3. Every 1k steps: clear the display and run generate_images to show the progress.
  4. Every 5k steps: save a checkpoint.

Interpreting the logs is more subtle when training a GAN (or a cGAN like pix2pix) compared to a simple classification or regression model. Things to look for:

  1. Check that neither the generator nor the discriminator model has "won". If either the gen_gan_loss or the disc_loss gets very low, it's an indicator that this model is dominating the other, and you are not successfully training the combined model.
  2. The value log(2) = 0.69 is a good reference point for these losses, as it indicates a perplexity of 2 - the discriminator is, on average, equally uncertain about the two options.
  3. For the disc_loss, a value below 0.69 means the discriminator is doing better than random on the combined set of real and generated images.
  4. For the gen_gan_loss, a value below 0.69 means the generator is doing better than random at fooling the discriminator.
  5. As training progresses, the gen_l1_loss should go down.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published