- Tested on Python 3.9, CUDA 11.7
- Requires torch==1.13.1
- Install Dependencies from requirements.txt
pip install -r requirements.txt
To run inference
for Instruct-Pix2Pix
python pix2pix.py pix2pix_config.json
for Edge Conditioned ControlNet based approach
python controlnet.py controlnet_config.json
-
start_t
(Default: 0): Specifies the starting time in seconds for the video processing. A value of 0 means the processing starts from the beginning of the video. -
end_t
(Default: -1): Sets the end time in seconds for the video processing. A value of -1 indicates that the processing will continue until the end of the video. -
out_fps
(Default: -1): Determines the frames per second (fps) for processing the video. A value of -1 fps means that the output video will have original video fps. -
chunk_size
(Default: 8): Defines the number of frames to be processed at once. A smaller chunk size can reduce memory usage, while a larger size might improve processing speed but requires more memory. -
low threshold
(Default: 100): Canny Edge detection param. -
high threshold
(Default: 180): Canny Edge detection param. Make sure thathigh_threshold
>low_threshold
.
demo_input_cropped.mp4
Prompt: "Turn the woman's clothes to superman costume"
demo_output_pix2pix.mp4
Prompt: "Beautiful girl in superman costume in front of white background, a high-quality, detailed, and professional photo"
demo_output_controlnet.mp4
full_facecontrol_canny_80_150_new_res.mp4
Adapted from Text2Video-Zero.