-
Notifications
You must be signed in to change notification settings - Fork 114
Training MoCoGAN
We are releasing a docker image with everything preinstalled. You can build a docker image your self or pull it from the docker-hub. To start install docker and nvidia-docker for your system.
To pull a docker image from the docker-hub:
docker pull stulyakov/mocogan
or clone the repository and build the image yourself:
git clone https://github.com/sergeytulyakov/mocogan.git
cd mocogan/docker
docker build . -t stulyakov/mocogan
Now start a docker container with the following command:
nvidia-docker run -ti --shm-size 12G stulyakov/mocogan /bin/bash
It is important to specify --shm-size 12G, since pytorch dataloader uses multiprocessing that requires shared memory. Otherwise you will be getting errors. If you get a permission denied error make sure you have added your username to the docker group.
Alternatively you can install all the dependencies locally and train/test mocogan without using a docker image. Install python
and python-pip
on your system. For ubuntu:
sudo apt-get install python python-dev python-pip
and the necessary dependencies:
pip install -U docopt pyyaml numpy matplotlib tqdm Pillow tensorflow scipy
pip install http://download.pytorch.org/whl/cu80/torch-0.2.0.post3-cp27-cp27mu-manylinux1_x86_64.whl && \
pip install torchvision
tensorflow
is required to use tensorboard visualizations
MoCoGAN training script uses jpg video representation, where all frames are concatenated either horizontally or vertically. The training script will automatically determine if it is a horizontal or vertical video. Several examples are given below:
If your videos have categories (such as facial expressions or human actions) make sure you put your data into folders, where all videos that share the category are located under the same folder (see data/shapes
or data/actions
for example):
data/
actions/
0/ <- category 0
00000002.jpg
.........
1/ <- category 1
00000001.jpg
..........
Either you're using a docker image (preferred) or are going to train without the docker image go to the mocogan/src
folder:
cd mocogan/src
python train.py --help
Copyright (C) 2017 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-ND 4.0 license (https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode).
Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, Jan Kautz, MoCoGAN: Decomposing Motion and Content for Video Generation
https://arxiv.org/abs/1707.04993
Usage:
train.py [options] <dataset> <log_folder>
Options:
--image_dataset=<path> specifies a separate dataset to train for images [default: ]
--image_batch=<count> number of images in image batch [default: 10]
--video_batch=<count> number of videos in video batch [default: 3]
--image_size=<int> resize all frames to this size [default: 64]
--use_infogan when specified infogan loss is used
--use_categories when specified ground truth categories are used to
train CategoricalVideoDiscriminator
--use_noise when specified instance noise is used
--noise_sigma=<float> when use_noise is specified, noise_sigma controls
the magnitude of the noise [default: 0]
--image_discriminator=<type> specifies image disciminator type (see models.py for a
list of available models) [default: PatchImageDiscriminator]
--video_discriminator=<type> specifies video discriminator type (see models.py for a
list of available models) [default: CategoricalVideoDiscriminator]
--video_length=<len> length of the video [default: 16]
--print_every=<count> print every iterations [default: 1]
--n_channels=<count> number of channels in the input data [default: 3]
--every_nth=<count> sample training videos using every nth frame [default: 4]
--batches=<count> specify number of batches to train [default: 100000]
--dim_z_content=<count> dimensionality of the content input, ie hidden space [default: 50]
--dim_z_motion=<count> dimensionality of the motion input [default: 10]
--dim_z_category=<count> dimensionality of categorical input [default: 6]
We are giving two databases as examples: the synthetic shape dataset and the human action dataset. We have preprocessed the latter. These datasets are located in the data/shapes
and data/actions
folders correspondingly.
To do this you need to specify the number of categories --dim_z_category and --use_infogan.
To train on human actions run the following:
python train.py \
--image_batch 32 \
--video_batch 32 \
--use_infogan \
--use_noise \
--noise_sigma 0.1 \
--image_discriminator PatchImageDiscriminator \
--video_discriminator CategoricalVideoDiscriminator \
--print_every 100 \
--every_nth 2 \
--dim_z_content 50 \
--dim_z_motion 10 \
--dim_z_category 4 \
../data/actions ../logs/actions
To train on synthetic shape motions execute the following:
python train.py \
--image_batch 32 \
--video_batch 32 \
--use_infogan \
--use_noise \
--noise_sigma 0.1 \
--image_discriminator PatchImageDiscriminator \
--video_discriminator CategoricalVideoDiscriminator \
--print_every 100 \
--every_nth 2 \
--dim_z_content 50 \
--dim_z_motion 10 \
--dim_z_category 2 \
../data/shapes ../logs/shapes
To do this run the following:
python train.py \
--image_batch 32 \
--video_batch 32 \
--use_noise \
--noise_sigma 0.1 \
--image_discriminator PatchImageDiscriminator \
--video_discriminator PatchVideoDiscriminator \
--print_every 100 \
--every_nth 2 \
--dim_z_content 50 \
--dim_z_motion 10 \
../data/actions ../logs/actions-wo-categories
To generate a video using a previously trained model use the generate_videos.py
script:
python generate_videos.py --help
Copyright (C) 2017 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-ND 4.0 license (https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode).
Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, Jan Kautz, MoCoGAN: Decomposing Motion and Content for Video Generation
https://arxiv.org/abs/1707.04993
Generates multiple videos given a model and saves them as video files using ffmpeg
Usage:
generate_videos.py [options] <model> <output_folder>
Options:
-n, --num_videos=<count> number of videos to generate [default: 10]
-o, --output_format=<ext> save videos as [default: gif]
-f, --number_of_frames=<count> generate videos with that many frames [default: 16]
--ffmpeg=<str> ffmpeg executable (on windows should be ffmpeg.exe). Make sure
the executable is in your PATH [default: ffmpeg]
Note that you need to have ffmpeg installed in your system
The training script samples the model every --print_every
batches. You can visualize images and videos as well as loss values using tensorboard:
cd ../logs/
tensorboard --logdir=./ --port=8890