News

Curating the best open reasoning datasets
A collaboration led by Bespoke Labs and the DataComp community

Our first goal is to curate a reasoning dataset to train state-of-the-art small reasoning models that surpass DeepSeek-R1-Distill-Qwen-32B and DeepSeek-R1-Distill-Qwen-7B on math and code reasoning benchmarks.

News

[2025/02/02] 🎉 OpenThoughts-114k dataset is the #1 trending dataset on Hugging Face.
[2025/01/30] 🎉 Reasoning benchmarks are added to Evalchemy and compared to publicly reported scores.
[2025/01/28] 🎉 Open Thoughts launches with OpenThoughts-114k dataset and OpenThinker-7B model.
[2025/01/27] 🎉 Bespoke-Stratos-17k dataset is the #2 trending dataset on Hugging Face.
[2025/01/22] 🎉 Bespoke-Stratos-17k dataset and Bespoke-Stratos-32B model are announced.

Results

The numbers reported in the table below are evaluated with our open-source tool Evalchemy.

	AIME24	MATH500	GPQA-Diamond	LCBv2 Easy	LCBv2 Medium	LCBv2 Hard	LCBv2 All
OpenThinker-7B	31.3	83.0	42.4	75.3	28.6	6.5	39.9
Bespoke-Stratos-7B	22.7	79.6	38.9	71.4	25.2	0.8	35.8
DeepSeek-R1-Distill-Qwen-7B	60	88.2	46.9	79.7	45.1	14.6	50.1
gpt-4o-0513	8.6	75.8	46.5	87.4	42.7	8.9	50.5
o1-mini	64.0	85.6	60	92.8	74.7	39.8	72.8

Note: The AIME24 dataset has a small sample size, resulting in high variance in evaluation accuracy. To mitigate this, we updated the code to compute the average score over five evaluation runs with different seeds. No system prompt is used, the maximum token length is set to 32,768, and temperature is 0.7.

We are fully open-source. Our model weights, datasets, data generation code, evaluation code, and training code are all publicly available.

	Open Weights	Open Data	Open Code
OpenThinker-7B	✅	✅	✅
Bespoke-Stratos-7B	✅	✅	✅
DeepSeek-R1-Distill-Qwen-7B	✅	❌	❌
gpt-4o-0513	❌	❌	❌
o1-mini	❌	❌	❌

Installation

make install
poetry shell

Set the DeepSeek API key:

export DEEPSEEK_API_KEY=your_api_key

Set HF_ORG to your organization id. Set HF_PRIVATE=true if you want to push to a private repo.

export HF_ORG=your_org_id
export HF_PRIVATE=false

Data Generation

Currently, we are generating data for the following domains:

Code
Math
Science
Puzzle

The recipe is outlined below: Data Curation Recipe

More instructions are in open_thoughts/README.md.

Training and Evaluation

Training and evaluation code coming soon.

Links

Citation

@misc{Open Thoughts,
  author = {Open Thoughts Team},
  month = jan,
  title = {{Open Thoughts}},
  year = {2025}
}

About Us

We are a team of researchers and engineers from Bespoke Labs, Stanford, University of California Berkeley, University of Washington, Juelich Supercomputing Center (JSC), LAION, UCLA, UNC Chapel Hill, UT Austin, and Toyota Research Institute united around building the best datasets (and thus the best models). See our previous works at datacomp.ai and mlfoundations.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
open_thoughts		open_thoughts
train		train
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
diagram.png		diagram.png
diagram_dark.png		diagram_dark.png
open_thoughts.png		open_thoughts.png
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News

Results

Installation

Data Generation

Training and Evaluation

Links

Citation

About Us

Sponsors

About

Releases

Packages

Languages

License

xduan7/open-thoughts

Folders and files

Latest commit

History

Repository files navigation

News

Results

Installation

Data Generation

Training and Evaluation

Links

Citation

About Us

Sponsors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages