DCP-Bench-Open

DCP-Bench-Open is a collaborative benchmark of Discrete Combinatorial Problems, with only involving integer and Boolean decision variables. Many problem formulations come from the Constraint Programming (CP) community, as well as the Integer Programming community (no continuous variables), Pseudo-Boolean and Satisfiability communities.

This benchmark has two primary goals:

To provide a centralized repository of discrete combinatorial optimisation and satisfaction problems, including clear natural language descriptions, corresponding data instances and ground-truth constraint models. You are more than welcome to contribute new problems and/or problem instances for existing problems (please see the Contributing Guide for more details).
To serve as an evaluation framework for evaluating generative AI systems (e.g. LLMs) in their ability to generate executable constraint models from natural language descriptions.

In the repository, the ground-truth models are (currently) using the CPMpy library. The evaluation framework can (currently) evaluate generated models in CPMpy, MiniZinc or Or-Tools CP-SAT and more can be added with limited effort.

This benchmark is an open source project that welcomes additional problems, data instances and evaluation tooling from interested developers. For reproducability, always use a specific 'Release' in your research (see below). This project started as an extension of the original CP-Bench published at ECAI 2025.

Getting the Dataset

There are two main ways to get the benchmark dataset:

1. Recommended: Download from a Release

This is the best way to get a stable, versioned copy of the dataset and all corresponding evaluation scripts.

Visit the project's GitHub Releases page.
From the latest release, download the dcp-bench-open.jsonl file (and if you need more files, e.g. eval scripts etc., then download the Source code archive as well).

2. Generate from Source

If you want to use the very latest (unreleased) version of the problems, you can generate the dataset file yourself.

Clone the repository.
Run python jsonl_convert.py to create dcp-bench-open.jsonl.

Repository Structure

The dataset/ directory contains the source problems for the benchmark. Each problem consists of two files:

<problem_name>/<problem_name>.cpmpy.py: A Python script containing the natural language problem description, a sample instance, and a ground-truth CPMpy model.
<problem_name>/<problem_name>.json: A JSON file containing one or more instances for the problem, compatible with the python script.

Evaluation framework

To use the evaluation framework, you will need Python 3.12 and the libraries listed in requirements.txt.

Verifying Problem Consistency

The self_consistency.py script ensures that the example solution provided in each problem's .py file is valid and executes correctly. It works by adding the generated solution as a constraint and re-solving the model.

To run the check on all problems:

python self_consistency.py

Generating a JSONL file containing all problems

To generate a new JSONL file containing all problems in the dataset/ directory, you can use the jsonl_convert.py script:

python jsonl_convert.py

Automated evaluation of solution accuracy

This dataset is primarily designed to evaluate systems that generate constraint models from natural language. A generated model is considered correct if it produces a valid solution for a given instance. This can be verified by checking if the solution satisfies the constraints of the ground-truth model provided in the dataset. For optimization problems, the objective value must also match.

The eval.py script can be used to automate this evaluation process provided that there exists a file containing the generated models. For example, if you have the file sample_test.jsonl with generated models, you can run:

 python eval.py --dataset_file dcp-bench-open.jsonl --test_file sample_test.jsonl --modelling_framework CPMpy

Here, --dataset_file specifies the path to the jsonl version of the benchmark, --test_file specifies the path to the file with generated models, and --modelling_framework indicates the modelling framework used in the generated models.

Creating your test file with models to be evaluated

Regarding the test file, each line should be a JSON object with two keys: id and model.

id: The ID of the problem exactly as it appears in the dataset (e.g., csplib_001_car_sequencing).
model: The generated model for the problem (as a string representing runnable code). Make sure that it eventually outputs the solution as a json with key(s) as described in the decision_variables entry and values as would be expected in the problem. This is part of the evaluation as well: unexpected keys, or value types are considered incorrect. This is because our automatic evaluation is based on the solution printed by the submitted models.

An example test file with just 5 generated models can be found in this repo for your assistance (sample_test.jsonl).

Finally, for now, evaluation assumes that the first instance of each (multi-instance) problem is used.

Benchmark Design

Satisfiable Problems: All problems are designed to have at least one feasible solution. For multi-instance problems, at least the first instance is guaranteed to be satisfiable, the rest are not guaranteed to be solvable in a short amount of time, information about runtimes is currently not part of the dataset.
Human-Readable Descriptions: Problem descriptions are written to be clear and preferably non-technical.
Clear Output Format: The required output format for each problem is explicitly specified to facilitate automated evaluation of solution accuracy.

How to Contribute

We encourage contributions to expand the benchmark! If you have a new problem you'd like to add, please follow the guidelines outlined in our Contributing Guide.

Citation

Feel free to cite our work as follows:

@dataset{dcpbenchopen,
  author       = {Michailidis, K. and Tsouros, D. and Guns, T.},
  title        = {DCP-Bench-Open},
  year         = {2025},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.17800138},
  url          = {https://doi.org/10.5281/zenodo.17800138}
}

or (APA format):

Michailidis, K., Tsouros, D., & Guns, T. (2025). DCP-Bench-Open [Data set]. Zenodo. https://doi.org/10.5281/zenodo.17800138

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dataset		dataset
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SOURCES.md		SOURCES.md
VERSION		VERSION
eval.py		eval.py
jsonl_convert.py		jsonl_convert.py
requirements.txt		requirements.txt
sample_test.jsonl		sample_test.jsonl
self_consistency.py		self_consistency.py
stats.ipynb		stats.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DCP-Bench-Open

Getting the Dataset

1. Recommended: Download from a Release

2. Generate from Source

Repository Structure

Evaluation framework

Verifying Problem Consistency

Generating a JSONL file containing all problems

Automated evaluation of solution accuracy

Creating your test file with models to be evaluated

Benchmark Design

How to Contribute

Citation

About

Uh oh!

Releases 1

Packages

Languages

License

DCP-Bench/DCP-Bench-Open

Folders and files

Latest commit

History

Repository files navigation

DCP-Bench-Open

Getting the Dataset

1. Recommended: Download from a Release

2. Generate from Source

Repository Structure

Evaluation framework

Verifying Problem Consistency

Generating a JSONL file containing all problems

Automated evaluation of solution accuracy

Creating your test file with models to be evaluated

Benchmark Design

How to Contribute

Citation

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages