VideoComposer based on MindSpore

This is an unofficial implementation of VideoComposer based on mindspore.

Progress

What's done

TODOs

Training
Speed Up
Graph Mode
AMP

Limits

Only runs in PyNative mode.

Setup Environment

Create virtual environment

conda create -n ms2.0 python=3.9
conda activate ms2.0

Install requirements
```
pip install -r requirements.txt
```

Prepare Model Weights

Download

The root path of downloading must be ${PROJECT_ROOT}\model_weights, where ${PROJECT_ROOT} means the root path of project. Download from official website and place them as:

|--model_weights
|    |--non_ema_228000.pth
|    |--midas_v3_dpt_large.pth
|    |--open_clip_pytorch_model.bin (todo)
|    |--sketch_simplification_gan.pth
|    |--table5_pidinet.pth
|    |--v2-1_512-ema-pruned.ckpt

Convert

In another virtual environment with torch, run the following script：

python vc/utils/pt2ms.py

You'll get the converted weights file in the same path, with the same name, ending with npy.

Demos

bash run_net.sh

The following content is the readme of the original repository.

VideoComposer

Official repo for VideoComposer: Compositional Video Synthesis with Motion Controllability

Please see Project Page for more examples.

We are searching for talented, motivated, and imaginative researchers to join our team. If you are interested, please don't hesitate to send us your resume via email yingya.zyy@alibaba-inc.com

VideoComposer is a controllable video diffusion model, which allows users to flexibly control the spatial and temporal patterns simultaneously within a synthesized video in various forms, such as text description, sketch sequence, reference video, or even simply handcrafted motions and handrawings.

TODO

Release our technical papers and webpage.
Release code and pretrained model.
Release Gradio UI on ModelScope and Hugging Face.
Release pretrained model that can generate 8s videos without watermark.

Method

Running by Yourself

1. Installation

Requirements:

Python==3.8
ffmpeg (for motion vector extraction)
torch==1.12.0+cu113
torchvision==0.13.0+cu113
open-clip-torch==2.0.2
transformers==4.18.0
flash-attn==0.2
xformers==0.0.13
motion-vector-extractor==1.0.6 (for motion vector extraction)

You also can create a same environment like ours with the following command:

conda env create -f environment.yaml

2. Download model weights

Download all the model weights via the following command:

!pip install modelscope
from modelscope.hub.snapshot_download import snapshot_download
model_dir = snapshot_download('damo/VideoComposer', cache_dir='model_weights/', revision='v1.0.0')

Next, place these models in the model_weights folder following the file structure shown below.

|--model_weights/
|    |--non_ema_228000.pth
|    |--midas_v3_dpt_large.pth
|    |--open_clip_pytorch_model.bin
|    |--sketch_simplification_gan.pth
|    |--table5_pidinet.pth
|    |--v2-1_512-ema-pruned.ckpt

You can also download the some of them from their original project:

"midas_v3_dpt_large.pth" in MiDaS
"open_clip_pytorch_model.bin" in Open Clip
"sketch_simplification_gan.pth" and "table5_pidinet.pth" in Pidinet
"v2-1_512-ema-pruned.ckpt" in Stable Diffusion.

For convenience, we provide a download link in this repo.

3. Running

In this project, we provide two implementations that can help you better understand our method.

3.1 Inference with Customized Inputs

You can run the code with following command:

python run_net.py\
    --cfg configs/exp02_motion_transfer.yaml\
    --seed 9999\
    --input_video "demo_video/motion_transfer.mp4"\
    --image_path "demo_video/moon_on_water.jpg"\
    --input_text_desc "A beautiful big moon on the water at night"

The results are saved in the outputs/exp02_motion_transfer-S09999 folder:

In some cases, if you notice a significant change in color difference, you can use the style condition to adjust the color distribution with the following command. This can be helpful in certain cases.

python run_net.py\
    --cfg configs/exp02_motion_transfer_vs_style.yaml\
    --seed 9999\
    --input_video "demo_video/motion_transfer.mp4"\
    --image_path "demo_video/moon_on_water.jpg"\
    --style_image "demo_video/moon_on_water.jpg"\
    --input_text_desc "A beautiful big moon on the water at night"

python run_net.py\
    --cfg configs/exp03_sketch2video_style.yaml\
    --seed 8888\
    --sketch_path "demo_video/src_single_sketch.png"\
    --style_image "demo_video/style/qibaishi_01.png"\
    --input_text_desc "Red-backed Shrike lanius collurio"

python run_net.py\
    --cfg configs/exp04_sketch2video_wo_style.yaml\
    --seed 144\
    --sketch_path "demo_video/src_single_sketch.png"\
    --input_text_desc "A Red-backed Shrike lanius collurio is on the branch"

python run_net.py\
    --cfg configs/exp05_text_depths_wo_style.yaml\
    --seed 9999\
    --input_video demo_video/video_8800.mp4\
    --input_text_desc "A glittering and translucent fish swimming in a small glass bowl with multicolored piece of stone, like a glass fish"

python run_net.py\
    --cfg configs/exp06_text_depths_vs_style.yaml\
    --seed 9999\
    --input_video demo_video/video_8800.mp4\
    --style_image "demo_video/style/qibaishi_01.png"\
    --input_text_desc "A glittering and translucent fish swimming in a small glass bowl with multicolored piece of stone, like a glass fish"

3.2 Inference on a Video

You can just runing the code with the following command:

python run_net.py \
    --cfg configs/exp01_vidcomposer_full.yaml \
    --input_video "demo_video/blackswan.mp4" \
    --input_text_desc "A black swan swam in the water" \
    --seed 9999

This command will extract the different conditions, e.g., depth, sketch, motion vectors, of the input video for the following video generation, which are saved in the outputs folder. The task list are predefined in inference_multi.py.

In addition to the above use cases, you can explore further possibilities with this code and model. Please note that due to the diversity of generated samples by the diffusion model, you can explore different seeds to generate better results.

We hope you enjoy using it! 😀

BibTeX

If this repo is useful to you, please cite our technical paper.

@article{2023videocomposer,
  title={VideoComposer: Compositional Video Synthesis with Motion Controllability},
  author={Wang, Xiang* and Yuan, Hangjie* and Zhang, Shiwei* and Chen, Dayou* and Wang, Jiuniu, and Zhang, Yingya, and Shen, Yujun, and Zhao, Deli and Zhou, Jingren},
  booktitle={arXiv preprint arXiv:2306.02018},
  year={2023}
}

Acknowledgement

We would like to express our gratitude for the contributions of several previous works to the development of VideoComposer. This includes, but is not limited to Composer, ModelScopeT2V, Stable Diffusion, OpenCLIP, WebVid-10M, LAION-400M, Pidinet and MiDaS. We are committed to building upon these foundations in a way that respects their original contributions.

Disclaimer

This open-source model is trained on the WebVid-10M and LAION-400M datasets and is intended for RESEARCH/NON-COMMERCIAL USE ONLY. We have also trained more powerful models using internal video data, which can be used in future.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
clip		clip
configs		configs
demo_video		demo_video
model_weights		model_weights
vc		vc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_net.py		run_net.py
run_net.sh		run_net.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VideoComposer based on MindSpore

Progress

What's done

TODOs

Limits

Setup Environment

Prepare Model Weights

Download

Convert

Demos

VideoComposer

TODO

Method

Running by Yourself

1. Installation

2. Download model weights

3. Running

3.1 Inference with Customized Inputs

3.2 Inference on a Video

BibTeX

Acknowledgement

Disclaimer

About

Releases

Packages

Languages

License

geniuspatrick/videocomposer-mindspore

Folders and files

Latest commit

History

Repository files navigation

VideoComposer based on MindSpore

Progress

What's done

TODOs

Limits

Setup Environment

Prepare Model Weights

Download

Convert

Demos

VideoComposer

TODO

Method

Running by Yourself

1. Installation

2. Download model weights

3. Running

3.1 Inference with Customized Inputs

3.2 Inference on a Video

BibTeX

Acknowledgement

Disclaimer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages