EditVerse

This repository contains the instruction-based video editing evaluation code for EditVerseBench in the paper "EditVerse: A Unified Framework for Editing and Generation via In-Context Learning".

Xuan Ju¹², Tianyu Wang¹, Yuqian Zhou¹, He Zhang¹, Qing Liu¹, Nanxuan Zhao¹, Zhifei Zhang¹, Yijun Li¹, Yuanhao Cai³, Shaoteng Liu¹, Daniil Pakhomov¹, Zhe Lin¹, Soo Ye Kim^1*, Qiang Xu^2*
¹Adobe Research ²The Chinese University of Hong Kong ³Johns Hopkins University ^*Corresponding Author

🌐 Project Page | 📜 Arxiv | 🤗 Benchmark | 📹 Slides | 👀 Comparison

Setup Environment

(Optional) Create a Conda environment

conda create -n EditVerse python=3.10
conda activate EditVerse

Install Pytorch

(You may adjust the version or CUDA support depending on your hardware)

pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124

Install required packages

pip install -r requirements.txt

Download Benchmark & Results

Download benchmark dataset

git lfs install
git clone https://huggingface.co/datasets/sooyek/EditVerseBench

Download the videos

The source videos cannot be directly distributed due to licensing restrictions. Instead, you can download them using the provided script with the Pixabay API. (The network connection may occasionally fail, so you might need to run the script multiple times.)

⚠️ Note: Please remember to revise the API key to your own key in download_source_video.py. You can find the API key here (marked in Parameters-key(required) on the website). The API is free, but you need to sign up for an account to get the API key.

cd EditVerseBench
python download_source_video.py

The benchmark file structure should be like:

EditVerseBench/
  ├── test.json
  ├── depths/
  │   ├── xx.mp4
  ├── edited_first_frame/
  │   ├── xx.mp4
  ├── images/
  │   ├── xx.mp4
  ├── inpaint_video_and_masks/
  │   ├── xx.mp4
  ├── poses/
  │   ├── xx.mp4
  ├── sketchs/
  │   ├── xx.mp4
  ├── videos/
  │   ├── xx.mp4

Unpack comparison results

cd EditVerseBench
tar -zxvf EditVerse_Comparison_Results.tar.gz
rm EditVerse_Comparison_Results.tar.gz

Evaluation

Command

python eval.py --metrics [metrics] \
--test_json_path EditVerseBench/EditVerseBench/test.json \
--generate_results_dir [results_dir] \
--output_csv [output_csv] \
--gpt_api_key [your_api_key]

Arguments

metrics: Use all to evaluate all metrics.

To select specific metrics, provide a comma-separated list (no spaces). Example: clip_temporal_consistency,dino_temporal_consistency

Supported metrics include:
- clip_temporal_consistency
- dino_temporal_consistency
- frame_text_alignment
- video_text_alignment
- pick_score_video_quality
- editing_vlm_evaluation
test_json_path: Path to the benchmark entrypoint JSON file.
generate_results_dir: Directory containing generated results (must follow the required structure).
output_csv: Path to save the evaluation CSV file.
gpt_api_key: penAI API key (required for editing_vlm_evaluation).

Example

Evaluate the provided EditVerse results and save output to EditVerse_eval.csv:

python eval.py --metrics all \
--test_json_path EditVerseBench/EditVerseBench/test.json \
--generate_results_dir EditVerseBench/EditVerse_Comparison_Results/EditVerse \
--output_csv EditVerse_eval.csv \
--gpt_api_key [Your API key]

👉 Pre-computed evaluation results for EditVerse and previous methods are available at: EditVerseBench/automatic_evaluation_results.

Evaluate Your Own Model

You can also evaluate your model outputs by following the same format.

Step 1: Refer to benchmark JSON format

See EditVerseBench/EditVerseBench/test.json for reference.

Each entry looks like this:

{
    "0": {
        "<text>": "<video1> Add a small golden crown ...",
        "<video1>": "videos/174008-850361316.mp4",
        "<video1> link": "https://pixabay.com/videos/woman-smile-communication-gesture-174008/",
        "direction": "horizontal",
        "target_prompt": "A young woman stands outside in front of ...",
        "type": "add object",
        "source_prompt": "A young woman stands outside in front of ..."
    },
    "1": {
        ...
    },
    ...
}

Key fields:

<text>: A natural language instruction describing the required edit in an interleaved format.
- The instruction may include special tags such as <video1>, <video2>, or <image1>.
- Each tag corresponds to a specific key field defined in the same JSON entry.
<video1>: The local file path of the source video.
<video1> link: The reference URL pointing to the source video’s original location.
direction: horizontal or vertical.
target_prompt: A detailed textual description of the desired edited video outcome.
type: The category of the edit
source_prompt: A description of the original, unedited video.

Step 2: Format your results

After generating results with your model, arrange files as follows:

Your_Folder/
  ├── 0/
  │   ├── generate.mp4   # model-generated video
  │   └── video1.mp4     # source video
  ├── 1/
  │   ├── generate.mp4
  │   └── video1.mp4
  ...

Step 3: Run evaluation

python eval.py --metrics all \
--test_json_path EditVerseBench/EditVerseBench/test.json \
--generate_results_dir [Your_Folder] \
--output_csv [Your_Results.csv] \
--gpt_api_key [your_api_key]

Benchmark Results

Method	VLM evaluation	Video Quality	Text Alignment		Temporal Consistency
Method	Editing Quality ↑	Pick Score ↑	Frame ↑	Video ↑	CLIP ↑	DINO ↑
Attention Manipulation (Training-free)
TokenFlow	5.26	19.73	25.57	22.70	98.36	98.09
STDF	4.41	19.45	25.24	22.26	96.04	95.22
First-Frame Propagation (w/ End-to-End Training)
Señorita-2M	6.97	19.71	26.34	23.24	98.05	97.99
Instruction-Guided (w/ End-to-End Training)
InsV2V	5.21	19.39	24.99	22.54	97.15	96.57
Lucy Edit	5.89	19.67	26.00	23.11	98.49	98.38
Ours (Ours)	7.65	20.07	26.73	23.93	98.56	98.42

License

Files under ./automatic_evaluation/viclip are from InternVideo and under Apache 2.0 License. Files under ./automatic_evaluation except for those under the folder viclip are modified from awesome-diffusion-v2v under MIT License and modifications by Adobe are under Adobe Research License. All other materials are licensed under Adobe Research License.

Cite Us

If you find our work useful for your research, please consider citing our paper:

@article{ju2025editverse,
  title   = {EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning},
  author  = {Xuan Ju and Tianyu Wang and Yuqian Zhou and He Zhang and Qing Liu and Nanxuan Zhao and Zhifei Zhang and Yijun Li and Yuanhao Cai and Shaoteng Liu and Daniil Pakhomov and Zhe Lin and Soo Ye Kim and Qiang Xu},
  journal = {arXiv preprint arXiv:2509.20360},
  year    = {2025},
  url     = {https://arxiv.org/abs/2509.20360}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
automatic_evaluation		automatic_evaluation
LICENSE.md		LICENSE.md
README.md		README.md
eval.py		eval.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EditVerse

Setup Environment

Download Benchmark & Results

Evaluation

Evaluate Your Own Model

Benchmark Results

License

Cite Us

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

adobe-research/EditVerse

Folders and files

Latest commit

History

Repository files navigation

EditVerse

Setup Environment

Download Benchmark & Results

Evaluation

Evaluate Your Own Model

Benchmark Results

License

Cite Us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages