Skip to content

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

Notifications You must be signed in to change notification settings

vanHeemstraSystems/STAR

 
 

Repository files navigation

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

1Nanjing University, 2ByteDance,  3Southwest University

🔆 Updates

  • 2025.01.09 The online demo of STAR is now live! Please note that due to the duration limitation of ZeroGPU, the running time may exceed the allocated GPU duration. If you'd like to try it, you can duplicate the demo and assign a paid GPU.

  • 2025.01.07 The pretrained STAR model (I2VGen-XL and CogVideoX-5B versions) and inference code have been released.

📑 TODO

  • Inference codes
  • Online demo
  • Training codes

🔎 Method Overview

STAR

📷 Results Display

STAR STAR 👀 More visual results can be found in our Project Page and Video Demo.

⚙️ Dependencies and Installation

## git clone this repository
git clone https://github.com/NJU-PCALab/STAR.git
cd STAR

## create an environment
conda create -n star python=3.10
conda activate star
pip install -r requirements.txt
sudo apt-get update && apt-get install ffmpeg libsm6 libxext6  -y

🚀 Inference

Model Weight

Base Model Type URL
I2VGen-XL Light Degradation 🔗
I2VGen-XL Heavy Degradation 🔗
CogVideoX-5B Heavy Degradation 🔗

1. I2VGen-XL-based

Step 1: Download the pretrained model STAR from HuggingFace.

We provide two versions for I2VGen-XL-based model, heavy_deg.pt for heavy degraded videos and light_deg.pt for light degraded videos (e.g., the low-resolution video downloaded from video websites).

You can put the weight into pretrained_weight/.

Step 2: Prepare testing data

You can put the testing videos in the input/video/.

As for the prompt, there are three options: 1. No prompt. 2. Automatically generate a prompt (e.g., using Pllava). 3. Manually write the prompt. You can put the txt file in the input/text/.

Step 3: Change the path

You need to change the paths in video_super_resolution/scripts/inference_sr.sh to your local corresponding paths, including video_folder_path, txt_file_path, model_path, and save_dir.

Step 4: Running inference command

bash video_super_resolution/scripts/inference_sr.sh

If you encounter an OOM problem, you can set a smaller frame_length in inference_sr.sh.

2. CogVideoX-based

Refer to these instructions for inference with the CogVideX-5B-based model.

Please note that the CogVideX-5B-based model supports only 720x480 input.

❤️ Acknowledgments

This project is based on I2VGen-XL, VEnhancer, CogVideoX and OpenVid-1M. Thanks for their awesome works.

🎓Citations

If our project helps your research or work, please consider citing our paper:

@misc{xie2025starspatialtemporalaugmentationtexttovideo,
      title={STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution}, 
      author={Rui Xie and Yinhong Liu and Penghao Zhou and Chen Zhao and Jun Zhou and Kai Zhang and Zhenyu Zhang and Jian Yang and Zhenheng Yang and Ying Tai},
      year={2025},
      eprint={2501.02976},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2501.02976}, 
}

📧 Contact

If you have any inquiries, please don't hesitate to reach out via email at ruixie0097@gmail.com

📄 License

I2VGen-XL-based models are distributed under the terms of the MIT License.

CogVideoX-5B-based model is distributed under the terms of the CogVideoX License.

About

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.8%
  • Shell 0.2%