LLMSR@XLLM25:π§ Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation
π Third-place solution to the XLLM@ACL2025 Shared Task-III: LLM for Structural Reasoning π
π Contact: jamse_yuan@163.com
β If you find this project helpful, please consider giving us a star to support the latest updates.
-
2025.06.15πππ We're thrilled to announce that our technical report Less is More, which earned 3rd place, has been officially accepted to the LLMSR@XLLM ACL 2025 Workshop! -
2025.05.16πππ Excited to share that our earlier work Reversal of Thought has been accepted to ACL2025 Main! -
2025.05.01πππ Honored to announce that our ECNU-Passion team won π 3rd place in the XLLM@ACL 2025 Shared Task III: LLM-SR! -
2025.04.23πππ Released all source code π to the public to support transparency and reproducibility. -
2025.04.23πππ Published our ECNU-Passion Team technical report π Less is More based on our submission to the XLLM@ACL 2025 Shared Task III.
If you find our work useful for your research, please kindly cite our paper as follows:
@inproceedings{yuan2025reversal,
title = "Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up",
author={Yuan, Jiahao and Du, Dehui and Zhang, Hao and Di, Zixiang and Naseem, Usman},
booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
pages = "19442--19459",
year = "2025"
}
@inproceedings{yuan2025llmsr,
title = "LLMSR@XLLM25: Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation",
author={Yuan, Jiahao and Sun, Xingzhe and Yu, Xing and Wang, Jingwen and Du, Dehui and Cui, Zhiqing and Di, Zixiang},
booktitle = "Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)",
pages = "274--282",
year = "2025"
}
This repository provides the official full implementation of our "Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation" framework, which distills high-quality structured reasoning data into multi-agent LLaMA-3 modules. It addresses low-resource structured reasoning by combining:
-
π§ Reverse-prompted task induction
-
π Retrieval-augmented CoT generation
-
π Reward-guided filtering for faithful and interpretable supervision
-
π§© Modular Agents: Specialized models for question parsing, CoT decomposition, and verification
-
π Semantic ICL Retrieval: Top-k demos fetched via BGE-M3 embeddings
-
π― Reward Filtering: LLaMA3.2 Reward model filters reasoning quality
-
β‘ LoRA+ Fine-tuning: Efficient SFT on each role using ms-swift
-
π Structured Output: JSON-compatible format for downstream use
git clone https://github.com/JhCircle/Less-is-More.git
cd Less-is-More
pip install -r requirements.txt.
βββ data/ # Raw and processed data
β βββ train.txt # Raw LogiQA-style questions
β βββ All_Train_With_Scores.jsonl # CoT scoring results
β βββ train/{strategy}_filtered.jsonl # Filtered by reward
β βββ test/test_question_parsing_role.jsonl
β βββ test/test_cot_parsing_role.jsonl
β βββ test/test_cot_verify_role_role.jsonl
β
βββ utils/
β βββ prompt.py # Prompt templates
β βββ llm_utils.py # Inference / pipeline tools
β
βββ data_synthesize.py # Generate CoT + parsing
βββ reward_filter.py # Score CoT quality using reward model
βββ extract_train_role.py # Extract instruction-role data for training
βββ extract_test_role.py # Extract data for evaluation
βββ train_qp.sh # Shell script for LoRA+ training on Question Parsing
βββ train_cp.sh # Shell script for LoRA+ training on CoT Parsing
βββ train_cv.sh # Shell script for LoRA+ training on CoT Verify (Statement+Verification)
βββ infer.sh # Full structured inference pipeline
β
βββ README.mdGenerate high-quality Question Parsing (QP), Chain-of-Thought Parsing (CP), and CoT Verification (CV: including both statement extraction and logical validation) from raw LogiQA questions using GPT-4o via Retrieval-Augmented In-Context Learninig.
python data_synthesize.py \
--demo_pool demo_pool.json \
--logiqa_file data/train.txt \
--output_file data/Train_LogicQA.jsonl \
--embedding_model BAAI/bge-m3 \
--tokenizer_name BAAI/bge-m3 \
--model_id gpt-4o-2024-08-06 \
--api_key YOUR_API_KEY \
--base_url YOUR_OPENAI_APIUse a reward model to evaluate CoT quality and retain only samples with reward > 0.
python reward_filter.py| Strategy | Description |
|---|---|
with_few_shot |
Select samples with high reward under few-shot prompting (reward > 0) |
without_few_shot |
Select samples with high reward under zero-shot prompting (reward > 0) |
average (default) |
Select samples with highest average reward across both settings (reward > 0) |
Generates:
data/All_Train_With_Scores.jsonldata/with_few_shot_filtered.jsonldata/without_few_shot_filtered.jsonldata/average_filtered.jsonl
Convert filtered CoT data into structured instruction formats for each role. Each file is used to train a different role agent (QP / CP / CV).
python scripts/extract_train_role.py
python scripts/extract_test_role.pyOutputs:
data/train/{strategy}/training_question_parsing_role.jsonl
data/train/{strategy}/training_cot_parsing_role.jsonl
data/train/{strategy}/training_cot_verify_role.jsonlTrain each role agent (Question Parsing / CoT Parsing / CoT Verify) using reward-filtered data.
bash train_qp.sh
bash train_cv.sh
bash train_cs.shTo switch filtering strategy (with_few_shot, without_few_shot, average, all), change this line in the .sh file:
strategy="average"| Role Agent | Input File | Task |
|---|---|---|
| QP (Parser) | training_question_parsing_role.jsonl |
Extract constraints/facts |
| CP (Parser) | training_cot_parsing_role.jsonl |
Break CoT into statements |
| CV (Verifier) | training_cot_verify_role.jsonl |
Find evidence + verify logic |
Use the trained role agents to perform structured reasoning on new questions.
bash infer.sh
#!/bin/bash
TEST_FILE="test.jsonl"
QP_MODEL_PATH="./Question_Parsing"
CP_MODEL_PATH="./CoT_Parsing"
CV_MODEL_PATH="./CoT_Verify"
EMBEDDING_MODEL="BAAI/bge-m3"
python inference_pipeline.py \
--test_file "$TEST_FILE" \
--qp_model_id_or_path "$QP_MODEL_PATH" \
--cp_model_id_or_path "$CP_MODEL_PATH" \
--cv_model_id_or_path "$CV_MODEL_PATH" \
--icl_embedding "$EMBEDDING_MODEL"
Produces results.json in the following structure:
[
{
"question": "Fair use refers to the non-commercial use of works published by others without the permission of the copyright owner, and without having to pay remuneration under the circumstances specified in the law.The \"cases specified in the law\" mainly include: (1) Personal study, research or appreciation, using published works of others; (2) performing published works for free; (3) copying, painting, photography, video recording of artistic works installed or displayed in outdoor public places; (4) Translate published works created in Chinese and written into minority languages and publish works for publication.\nAccording to the above provisions, Which of the following are fair use:\nA.A sang an unpublished song at the class party\nB.B translates an English work into Mongolian work and publishes it\nC.Company C took the sculptures in the public square and made them into pictures.\nD.Ding Wei wrote a paper and copied a paper published by Geng in a journal for reference",
"question_parsing": [
"Fair use refers to the non-commercial use of works published by others without the permission of the copyright owner, and without having to pay remuneration under the circumstances specified in the law",
"The cases specified in the law mainly include: (1) Personal study, research or appreciation, using published works of others",
"Performing published works for free",
"Copying, painting, photography, video recording of artistic works installed or displayed in outdoor public places",
"Translate published works created in Chinese and written into minority languages and publish works for publication"
],
"answer": "d",
"id": 2021,
"cot": "e definition of fair use mentions that it is the non-commercial use of works published by others without permission, and the main cases specified in the law include personal study, research, or appreciation, performing published works for free, copying artistic works displayed in outdoor public places, and translating published works into minority languages. Options B and D seem to fit into the category of fair use, as they involve translating and using published works for non-commercial purposes. Option C involves copying artistic works displayed in public places, which is also a specified case of fair use. Option A, however, involves singing an unpublished song, which is not a specified case of fair use.",
"cot_parsing": [
{
"statement": "Options B and D fit into the category of fair use.",
"evidence": "Options B and D seem to fit into the category of fair use, as they involve translating and using published works for non-commercial purposes.",
"Verification": "true"
},
{
"statement": "Option C involves fair use.",
"evidence": "Option C involves copying artistic works displayed in public places, which is a specified case of fair use.",
"Verification": "true"
},
{
"statement": "Option A does not involve fair use.",
"evidence": "Singing an unpublished song is not a specified case of fair use.",
"Verification": "false"
}
]
}
]| Setting | Question_F1 | Statement_F1 | Evidence_F1 | Reasoning_F1 |
|---|---|---|---|---|
| Structure Filtered | 56.87 | 36.72 | 10.80 | 5.20 |
| 0-shot Reward | 62.76 | 38.05 | 12.79 | 7.15 |
| 5-shot Reward | 65.89 | 38.26 | 14.45 | 7.70 |
| π₯ Avg. Reward (Ours) | 66.71 | 39.21 | 14.92 | 8.98 |
For any questions, suggestions, or collaborations, feel free to open an issue or start a discussion in the community.
I'd π to hear from you and are always open to feedback or collaboration ideas!
π¬ Contact me: Jiahao Yuan
We sincerely thank the organizers of the XLLM@ACL2025 Shared Task for providing an open and challenging platform on LLM for Structural Reasoning.
This work has greatly benefited from the generous contributions of the open-source community. In particular, we acknowledge the following resources:
π LogiQA β A dataset for evaluating logical reasoning in QA tasks
π§ BAAI/bge-m3 β A powerful multilingual embedding model
π Ray2333/GRM-Llama3.2-3B-rewardmodel-ft β A high-performing LLaMA3-based reward model
π§° microsoft/MS-Swift β A Scalable lightWeight Infrastructure for Fine-Tuning
We are truly grateful to the community for making such impactful resources openly available.

