Logic-RL

📢 Our detailed technical report is released!

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

Main results

Benchmark

Model	2ppl	3ppl	4ppl	5ppl	6ppl	7ppl	8ppl
o3-mini-high	0.99	0.98	0.97	0.95	0.94	0.89	0.83
o1-2024-12-17	0.83	0.51	0.38	0.38	0.35	0.30	0.20
GPT-4o	0.68	0.57	0.49	0.32	0.23	0.21	0.11
Deepseek-Math-7b	0.35	0.21	0.08	0.06	0.02	0.00	0.00
Qwen2.5-7B-Instruct-1M	0.49	0.40	0.25	0.11	0.02	0.06	0.01
Qwen2.5-7B-Logic-RL (ours)	0.99	0.99	0.94	0.92	0.91	0.80	0.67

Installation

conda create -n logic python=3.9
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip3 install vllm==0.6.3 ray
pip3 install flash-attn --no-build-isolation
pip install -e .  # For verl integration
pip install wandb IPython matplotlib

数据准备部分

我们可以直接使用 data 文件夹下面的数据进行处理。

调试模式

将 ray 设置为本地模式，将ray的初始化代码中添加 local_mode=True，开启本地模式。

ray.init(runtime_env={'env_vars': {'TOKENIZERS_PARALLELISM': 'true', 'NCCL_DEBUG': 'WARN'}}, local_mode=True)

将 vs code中调试参数开启 justMyCode=False，使用vs code调试代码时可以进入到安装包中进行调试

For your own data generation, here's a demo:

Base Model

python ./examples/data_preprocess/kk.py \
    --local_dir {processed_data_path} \
    --data_path {raw_data_path}

Instruct Model

python ./examples/data_preprocess/kk.py \
    --template_type=qwen-instruct \
    --local_dir {processed_data_path} \
    --data_path {raw_data_path}

Training Execution

conda activate logic
bash main_grpo.sh  # 4×A100 80G

⚙️ Implementation Details

Component	Location
Reward Modeling	`verl/utils/reward_score/kk.py`
Data Preprocessing	`examples/data_preprocess/kk.py`

Citation

@misc{xie2025logicrlunleashingllmreasoning,
      title={Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning}, 
      author={Tian Xie and Zitian Gao and Qingnan Ren and Haoming Luo and Yuqian Hong and Bryan Dai and Joey Zhou and Kai Qiu and Zhirong Wu and Chong Luo},
      year={2025},
      eprint={2502.14768},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.14768}, 
}

Acknowledgements

Verl 🔗
TinyZero 🔗
Knights and Knaves (K&K) puzzles dataset 🔗

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
data/kk/instruct		data/kk/instruct
docker		docker
docs		docs
eval_kk		eval_kk
examples		examples
math_eval		math_eval
patches		patches
pics		pics
scripts		scripts
tests		tests
verl		verl
.gitignore		.gitignore
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
main_grpo.sh		main_grpo.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Logic-RL

📢 Our detailed technical report is released!

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

Benchmark

Installation

数据准备部分

调试模式

Base Model

Instruct Model

Training Execution

⚙️ Implementation Details

Citation

Acknowledgements

Star History

About

Releases

Packages

Languages

License

LeoMax-Xiong/Logic-RL

Folders and files

Latest commit

History

Repository files navigation

Logic-RL

📢 Our detailed technical report is released!

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

Benchmark

Installation

数据准备部分

调试模式

Base Model

Instruct Model

Training Execution

⚙️ Implementation Details

Citation

Acknowledgements

Star History

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages