Catalogue

Catalogue
Introduction
Statistics of UniEdit
Installation
Generation of UniEdit
- Download and Start Elasticsearch Backend
- Generation Pipeline
Evaluation with UniEdit
Citation

Introduction

UniEdit is a large-scale open-domain benchmark for large language model knowledge editing, containing 311K samples. It is designed to systematically and fine-grainedly evaluate editing algorithms across three key dimensions: Reliability, Generality, and Locality.

To overcome the limitations of existing benchmarks—such as narrow knowledge coverage, limited structural diversity, and incomplete evaluation criteria—UniEdit is constructed from 29.9M entities in Wikidata, covering 25 domains across five major sectors: natural sciences, humanities, social sciences, applied sciences, and interdisciplinary studies. This provides extensive knowledge coverage and diverse evaluation settings (e.g., multi-hop reasoning, same-entity reasoning, relation reversal, etc.).

The figure below presents a data composition example, which generates corresponding natural-language editing and evaluation samples from sampled structured fact chains.

UniEdit introduces the NMCS (Neighborhood Multi-hop Chain Sampling) algorithm, a unified sampling method that generates diverse structural samples for generality and locality based on a given factual triple. This expands the coverage of existing evaluation criteria. We use Deepseek-V3 to automatically convert the structured data into natural-language form. The figure below illustrates the full generation pipeline of UniEdit.

Statistics of UniEdit

The table below compares UniEdit with existing benchmarks in terms of coverage across various evaluation features, including Rephrase (Rep), Multi-Hop (MH), Relation Reversal (RR), Same-Entity Reasoning (SER), Subject Alias (SA), Object Alias (OA), Subject Specificity (SS), Relation Specificity (RS), Object Specificity (OS), 1-N Forgotten (1-NF), Combinations of the above evaluation Criteria (CC), and Open-Domain (OD).

The figure below shows the data distribution of UniEdit across: (a) domains, (b) multi-hop counts and query chain structures (G., L., S., and D. represent generality, locality, single, and double, respectively), and (d, e) the top 15 combinations of recognized evaluation criteria. (c) displays the frequency statistics of nouns in entity descriptions.

The figure below shows the word cloud distribution of the head-entity descriptions for editing samples across different domains in UniEdit.

Installation

Please use Python 3.11.9 to get started with our implementation; simply install Conda and run:

git clone https://github.com/qizhou000/UniEdit.git
cd UniEdit
conda create -n UniEdit python=3.11.9
conda activate UniEdit
pip install -r requirements.txt

Generation of UniEdit

Download and Start Elasticsearch Backend

We use Elasticsearch to retrieve head entities for each domain based on domain-specific keywords. To begin, please download Elasticsearch 8.17.2 from: https://www.elastic.co/downloads/past-releases/elasticsearch-8-17-2, place the package into the ./elasticsearch directory, and extract it. Then start the Elasticsearch backend:

./elasticsearch/elasticsearch-8.17.2/bin/elasticsearch

Generation Pipeline

The pipeline and toolkit code for UniEdit generation are located in the ./uniedit_construction/preprocess_pipeline directory. Among these steps, pipeline steps 1, 2, 3, 4, 5, 6, and 11 operate on the global data, while the remaining steps process the data independently for each domain. The scripts for these domain-specific steps can be found in the ./scripts/uniedit_gen directory.

Evaluation with UniEdit

Download UniEdit

You can download the UniEdit dataset from HuggingFace (here). The file structure is as follows:

📁UniEdit
├── 📁train
│   ├── 📄agronomy.json
│   ├── 📄art.json
│   └── ...
└── 📁test
    ├── 📄agronomy.json
    ├── 📄art.json
    └── ...

Each knowledge domain corresponds to a single JSON file in the training or test set. The data structure of each JSON file is as follows:

{
    "<sample_id>": {
        "edit": {
            "prompt": str, # prompt string
            "target": str, # target string
            "subject": str, # subject string
            "head_entity": { # properties of the head entity
                "id": str, "label": str,
                "description": str, "aliases": list[str]
            },
            "property": { # properties of the relation
                "id": str, "label": str,
                "description": str, "aliases": list[str],
                "single_value": bool # whether the head entity and the relation correspond to a single-valued tail
            },
            "tail_entity": { # properties of the tail entity
                "order_in_property": 0, # the index of this tail entity among all values associated with the head–relation pair
                "datatype": str, # such as "wikibase-item"(entity), "quantity"(numeric type), etc.
                "value": dict # structure depends on the datatype
            },
            "single_inverse": bool # whether the tail entity and the reversed relation correspond to a single-valued head
        },
        "generality": {
            "0": { # only one generality and locality sample is generated for each edit sample
                "path_type": str, # "single" or "double", indicating single-path multi-hop reasoning or same-entity reasoning
                "prompt": str, # prompt string
                "target": str, # target string
                "paths": list[list[str]], # the encoded multi-hop reasoning chains. For example,
                                          # [["t_e", "r_e", "h_e", "b", "m"]] represents a chain containing only one fact: the edit fact itself.
                                          # Here, "b" means the relation is reversed, i.e. r_e'.
                                          # "m" indicates that (t_e, r_e', ·) has multiple values.
                "one_hops": [ # all one-hop facts in the reasoning chain; exists only when `path_type` is "single"
                    {
                        "path": list[str], # encoded one-hop fact, e.g., ["e_0", "r_0", "t_e"] meaning a hop linked to the tail of the edit sample
                        "prompt": str,
                        "subject": str,
                        "target": str,
                        "reverse_in_multi_hop": bool, # whether this hop is reversed within the chain
                        "reversed": { # QA for the reversed one-hop fact (only appears if `reverse_in_multi_hop` is True)
                            "prompt": str,
                            "target": str
                        },
                        "head_entity": dict, # same structure as defined under "edit"
                        "property": dict,    # same structure as defined under "edit"
                        "tail_entity": dict, # same structure as defined under "edit"
                        "single_inverse": bool # same definition as under "edit"
                    }, ...
                ],
                "single_path1": { # exists only when `path_type` is "double"; facts and prompts of one direction of the reasoning chain
                    "path_type": "single",
                    "prompt": str,
                    "target": str,
                    "paths": list[list[str]],
                    "one_hops": list[dict]
                },
                "single_path2": dict # exists only when `path_type` = "double"; the other direction of the reasoning chain
            }
        },
        "locality": {
            "0": {
                "loc_type": str, # the starting point from which the locality sample is derived: "none", 
                                 # "tail of edit", "head of edit", or "property of edit"
                # All fields below follow the same definitions as in "generality"
                "path_type": str,
                "prompt": str,
                ...
            }
        }
    }
}

Place the UniEdit data in the ./data folder. To check which criteria each sample belongs to, you can run:

from dataset.llm import UniEdit

i = 0
data = UniEdit('data/UniEdit/test')
print(data.get_data_by_ids([i]))

Prepare Backbones

Please download the backbone weights and place them in the ./models folder. The model URLs are:

GPT2-XL-1.5B: https://huggingface.co/openai-community/gpt2-xl
GPT-J-6B: https://huggingface.co/EleutherAI/gpt-j-6b
LLaMa-3.1-8B: https://huggingface.co/meta-llama/Llama-3.1-8B

Training & Testing

For scripts related to editor training and testing on UniEdit, please refer to ./scripts/editor_train_test.

Testing results will be saved in the ./eval_results directory.

For visualization of the results reported in the paper, please refer to the scripts ./fig_heat_map_discipline_results.py ./fig_radar_eval_pattern_results.py, and ./fig_serac_domain_generalization.py. The Figures are saved in ./figs/uniedit.

Citation

If you use UniEdit in your research or projects, please remember to cite our paper :).

@article{Chen2025UniEdit,
  author       = {Qizhou Chen and
                  Dakan Wang and
                  Taolin Zhang and
                  Zaoming Yan and
                  Chengsong You and
                  Chengyu Wang and
                  Xiaofeng He},
  title        = {UniEdit: {A} Unified Knowledge Editing Benchmark for Large Language
                  Models},
  journal      = {CoRR},
  volume       = {abs/2505.12345},
  year         = {2025},
  url          = {https://doi.org/10.48550/arXiv.2505.12345},
  doi          = {10.48550/ARXIV.2505.12345},
  eprinttype    = {arXiv},
  eprint       = {2505.12345},
  timestamp    = {Tue, 24 Jun 2025 07:37:11 +0200},
  biburl       = {https://dblp.org/rec/journals/corr/abs-2505-12345.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Catalogue

Introduction

Statistics of UniEdit

Installation

Generation of UniEdit

Download and Start Elasticsearch Backend

Generation Pipeline

Evaluation with UniEdit

Download UniEdit

Prepare Backbones

Training & Testing

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
configs/llms		configs/llms
data		data
dataset		dataset
editor		editor
elasticsearch		elasticsearch
eval_results		eval_results
evaluation		evaluation
figs		figs
jobs		jobs
models		models
records		records
scripts		scripts
uniedit_construction		uniedit_construction
utils		utils
LICENSE		LICENSE
README.md		README.md
fig_heat_map_discipline_results.py		fig_heat_map_discipline_results.py
fig_radar_eval_pattern_results.py		fig_radar_eval_pattern_results.py
fig_serac_domain_generalization.py		fig_serac_domain_generalization.py
requirements.txt		requirements.txt
test_editor.py		test_editor.py
train_editor.py		train_editor.py

License

qizhou000/UniEdit

Folders and files

Latest commit

History

Repository files navigation

Catalogue

Introduction

Statistics of UniEdit

Installation

Generation of UniEdit

Download and Start Elasticsearch Backend

Generation Pipeline

Evaluation with UniEdit

Download UniEdit

Prepare Backbones

Training & Testing

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages