Skip to content

Training MoCoGAN

Sergey Tulyakov edited this page Oct 17, 2017 · 8 revisions

Installation

We are releasing a docker image with everything preinstalled. You can build a docker image your self or pull it from the docker-hub. To start install docker and nvidia-docker for your system.

To pull a docker image from the docker-hub:

docker pull stulyakov/mocogan

or clone the repository and build the image yourself:

git clone https://github.com/sergeytulyakov/mocogan.git
cd mocogan/docker
docker build . -t stulyakov/mocogan

Now start a docker container with the following command:

nvidia-docker run -ti --shm-size 12G stulyakov/mocogan /bin/bash

It is important to specify --shm-size 12G, since pytorch dataloader uses multiprocessing that requires shared memory. Otherwise you will be getting errors. If you get a permission denied error make sure you have added your username to the docker group.

Alternatively you can install all the dependencies locally and train/test mocogan without using a docker image. Install python and python-pip on your system. For ubuntu:

sudo apt-get install python python-dev python-pip

and the necessary dependencies:

pip install -U docopt pyyaml numpy matplotlib tqdm Pillow tensorflow scipy
pip install http://download.pytorch.org/whl/cu80/torch-0.2.0.post3-cp27-cp27mu-manylinux1_x86_64.whl && \
pip install torchvision

tensorflow is required to use tensorboard visualizations

Preparing the data

MoCoGAN training script uses jpg video representation, where all frames are concatenated either horizontally or vertically. The training script will automatically determine if it is a horizontal or vertical video. Several examples are given below:

Horizontal video example Horizontal video example

If your videos have categories (such as facial expressions or human actions) make sure you put your data into folders, where all videos that share the category are located under the same folder (see data/shapes or data/actions for example):

data/
  actions/
    0/ <- category 0
      00000002.jpg
      .........
    1/ <- category 1
      00000001.jpg
  ..........

Training

Either you're using a docker image (preferred) or are going to train without the docker image go to the mocogan/src folder:

cd mocogan/src

python train.py --help

Copyright (C) 2017 NVIDIA Corporation.  All rights reserved.
Licensed under the CC BY-NC-ND 4.0 license (https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode).

Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, Jan Kautz, MoCoGAN: Decomposing Motion and Content for Video Generation
https://arxiv.org/abs/1707.04993

Usage:
    train.py [options] <dataset> <log_folder>

Options:
    --image_dataset=<path>          specifies a separate dataset to train for images [default: ]
    --image_batch=<count>           number of images in image batch [default: 10]
    --video_batch=<count>           number of videos in video batch [default: 3]

    --image_size=<int>              resize all frames to this size [default: 64]

    --use_infogan                   when specified infogan loss is used

    --use_categories                when specified ground truth categories are used to
                                    train CategoricalVideoDiscriminator

    --use_noise                     when specified instance noise is used
    --noise_sigma=<float>           when use_noise is specified, noise_sigma controls
                                    the magnitude of the noise [default: 0]

    --image_discriminator=<type>    specifies image disciminator type (see models.py for a
                                    list of available models) [default: PatchImageDiscriminator]

    --video_discriminator=<type>    specifies video discriminator type (see models.py for a
                                    list of available models) [default: CategoricalVideoDiscriminator]

    --video_length=<len>            length of the video [default: 16]
    --print_every=<count>           print every iterations [default: 1]
    --n_channels=<count>            number of channels in the input data [default: 3]
    --every_nth=<count>             sample training videos using every nth frame [default: 4]
    --batches=<count>               specify number of batches to train [default: 100000]

    --dim_z_content=<count>         dimensionality of the content input, ie hidden space [default: 50]
    --dim_z_motion=<count>          dimensionality of the motion input [default: 10]
    --dim_z_category=<count>        dimensionality of categorical input [default: 6]

We are giving two databases as examples: the synthetic shape dataset and the human action dataset. We have preprocessed the latter. These datasets are located in the data/shapes and data/actions folders correspondingly.

Unsupervised training using motion categories

To do this you need to specify the number of categories --dim_z_category and --use_infogan.

To train on human actions run the following:

python train.py  \
    --image_batch 32 \
    --video_batch 32 \
    --use_infogan \
    --use_noise \
    --noise_sigma 0.1 \
    --image_discriminator PatchImageDiscriminator \
    --video_discriminator CategoricalVideoDiscriminator \
    --print_every 100 \
    --every_nth 2 \
    --dim_z_content 50 \
    --dim_z_motion 10 \
    --dim_z_category 4 \
    ../data/actions ../logs/actions

To train on synthetic shape motions execute the following:

python train.py  \
    --image_batch 32 \
    --video_batch 32 \
    --use_infogan \
    --use_noise \
    --noise_sigma 0.1 \
    --image_discriminator PatchImageDiscriminator \
    --video_discriminator CategoricalVideoDiscriminator \
    --print_every 100 \
    --every_nth 2 \
    --dim_z_content 50 \
    --dim_z_motion 10 \
    --dim_z_category 2 \
    ../data/shapes ../logs/shapes

Training without categories

To do this run the following:

python train.py  \
    --image_batch 32 \
    --video_batch 32 \
    --use_noise \
    --noise_sigma 0.1 \
    --image_discriminator PatchImageDiscriminator \
    --video_discriminator PatchVideoDiscriminator \
    --print_every 100 \
    --every_nth 2 \
    --dim_z_content 50 \
    --dim_z_motion 10 \
    ../data/actions ../logs/actions-wo-categories

Generating videos using a trained model

To generate a video using a previously trained model use the generate_videos.py script:

python generate_videos.py --help

Copyright (C) 2017 NVIDIA Corporation.  All rights reserved.
Licensed under the CC BY-NC-ND 4.0 license (https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode).

Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, Jan Kautz, MoCoGAN: Decomposing Motion and Content for Video Generation
https://arxiv.org/abs/1707.04993

Generates multiple videos given a model and saves them as video files using ffmpeg

Usage:
    generate_videos.py [options] <model> <output_folder>

Options:
    -n, --num_videos=<count>                number of videos to generate [default: 10]
    -o, --output_format=<ext>               save videos as [default: gif]
    -f, --number_of_frames=<count>          generate videos with that many frames [default: 16]

    --ffmpeg=<str>                          ffmpeg executable (on windows should be ffmpeg.exe). Make sure
                                            the executable is in your PATH [default: ffmpeg]

Note that you need to have ffmpeg installed in your system

Visualizing training using tensorboard

The training script samples the model every --print_every batches. You can visualize images and videos as well as loss values using tensorboard:

cd ../logs/
tensorboard --logdir=./ --port=8890