Overview β’ News β’ Agent Family β’ Research β’ Models & Datasets β’ Citation β’ Contact
Welcome to the RUC-NLPIR Agent Family! Our mission is to develop general-purpose, scalable, powerful, and secure intelligent agents. This repository encompasses 12+ cutting-edge agent systems across multiple research directions:
- π€ Agentic Reinforcement Learning: State-of-the-art RL algorithms for agent training (ARPO, AEPO)
- π Deep Search & Research Agents: Advanced information seeking, synthesis, and report generation
- π οΈ Multi-Tool Reasoning Agents: Autonomous tool discovery, optimization, and execution
- π― Domain-Specific Agents: Finance, video understanding, and multimodal applications
- π Comprehensive Benchmarks: Evaluation datasets and standardized protocols
Tip
β Star us on GitHub to stay updated with the latest releases and improvements!
-
[Oct 31, 2025] π HiRA Updated! Hierarchical reasoning framework for decoupled planning and execution in deep search, latest revision available. [Arxiv] [Code]
-
[Oct 24, 2025] DeepAgent Released! A general reasoning agent with scalable toolsets for autonomous thinking, tool discovery and action execution. [Arxiv] [Code]
-
[Oct 21, 2025] π₯ VideoExplorer Updated! Think with videos for agentic long-video understanding, latest revision available. [Arxiv] [Code]
-
[Oct 19, 2025] π° FinSight Released! Multi-agent framework for real-world financial deep research and report generation. [Arxiv]
-
[Oct 16, 2025] π AEPO Released! Entropy-balanced agentic RL algorithm with superior performance on GAIA, HLE, and AIME. [Arxiv] [Code] [π€HuggingFace] [Blog]
-
[Oct 13, 2025] π WebThinker Updated! Deep research capability for LRMs with autonomous web search and report drafting, accepted by NeurIPS 2025. [Arxiv] [Code]
-
[Sep 30, 2025] π‘ Tool-Light Updated! Self-evolved preference learning for effective tool-integrated reasoning, latest revision available. [Arxiv]
-
[Aug 11, 2025] π’ HierSearch Released! Hierarchical enterprise deep search framework integrating local and web searches. [Arxiv] [Code]
-
[Jul 26, 2025] π― ARPO Released! Agentic reinforced policy optimization for multi-turn LLM-based agents with entropy-based adaptive rollout. [Arxiv] [Code]
-
[May 22, 2025] β Tool-Star Released! Empowering LLM-brained multi-tool reasoner via reinforcement learning with six types of tools. [Arxiv] [Code]
-
[Jan 9, 2025] π Search-o1 Released! Agentic search-enhanced large reasoning models with dynamic knowledge retrieval and document reasoning, accepted by EMNLP 2025. [Arxiv] [Code]
|
AEPO: Agentic Entropy-Balanced Policy Optimization π HuggingFace Daily Paper #2 Advanced agentic RL algorithm balancing entropy in rollout and policy update phases for superior stability and performance. Key Features:
|
ARPO: Agentic Reinforced Policy Optimization π HuggingFace Weekly Paper #1 Pioneering agentic RL with entropy-driven adaptive branching for enhanced exploration during tool calls. Key Features:
|
|
Search-o1: Agentic Search-Enhanced Large Reasoning Models π EMNLP 2025 (Oral) Prompt-based reasoning with integrated autonomous knowledge retrieval through Agentic RAG. Key Features:
|
WebThinker: Empowering Large Reasoning Models with Deep Research π NeurIPS 2025 Deep research agent with simultaneous thinking, searching, and report writing capabilities. Key Features:
|
|
HiRA: Hierarchical Reasoning for Deep Search Decoupled planning and execution with strategic planning and domain-specific execution modules. Key Features:
|
HierSearch: Hierarchical Enterprise Deep Search π AAAI 2026 Hierarchical search across local and online knowledge sources for comprehensive information retrieval. Key Features:
|
|
DeepAgent: General Reasoning with Scalable Toolsets π HuggingFace Daily Paper #1 End-to-end reasoning agent with autonomous thinking, tool discovery, and brain-inspired memory folding. Key Features:
|
Tool-Star: LLM-Brained Multi-Tool Reasoner π HuggingFace Daily Paper #2 Multi-tool collaboration with Self-Critic RL for autonomous tool interaction and coordination. Key Features:
|
|
ToolScope: Agentic Search & Reasoning Framework An Agentic Framework for Vision-Guided and Long-Horizon Tool Use. Key Features:
|
Tool-Light: Self-Evolved Preference Learning Lightweight optimization strategies encouraging efficient tool calling with minimal overhead. Key Features:
|
|
FinSight: Real-World Financial Deep Research Specialized agent for financial report generation, analysis, and investment research automation. Key Features:
|
VideoExplorer: Agentic Long-Video Understanding Deep research methodology for comprehensive long-form video analysis and question answering. Key Features:
|
graph TB
A[π RUC-NLPIR Agent Family] --> B[π€ Agentic RL]
A --> C[π Deep Research]
A --> D[π οΈ Multi-Tool]
A --> E[π― Domain-Specific]
B --> B1[ARPO<br/>Weekly #1]
B --> B2[AEPO<br/>Daily #2]
C --> C1[Search-o1<br/>EMNLP 2025]
C --> C2[WebThinker<br/>NeurIPS 2025]
C --> C3[HiRA]
C --> C4[HierSearch]
D --> D1[DeepAgent]
D --> D2[Tool-Star]
D --> D3[ToolScope]
D --> D4[Tool-Light]
E --> E1[FinSight<br/>Finance]
E --> E2[VideoExplorer<br/>Video]
style A fill:#e1f5ff,stroke:#0066cc,stroke-width:3px
style B fill:#fff0e6,stroke:#ff8c00,stroke-width:2px
style C fill:#e6f7ff,stroke:#1890ff,stroke-width:2px
style D fill:#f0ffe6,stroke:#52c41a,stroke-width:2px
style E fill:#fff0f6,stroke:#eb2f96,stroke-width:2px
If you find our work helpful, please cite the relevant papers:
π€ Agentic Reinforcement Learning
@article{dong2025arpo,
title = {Agentic Reinforced Policy Optimization},
author = {Dong, Guanting and Mao, Hangyu and Ma, Kai and Bao, Licheng and
Chen, Yifei and Wang, Zhongyuan and Chen, Zhongxia and Du, Jiazhen and
Wang, Huiyang and Zhang, Fuzheng and Zhou, Guorui and Zhu, Yutao and
Wen, Ji-Rong and Dou, Zhicheng},
journal = {arXiv preprint arXiv:2507.19849},
year = {2025}
}
@article{dong2025aepo,
title = {Agentic Entropy-Balanced Policy Optimization},
author = {Dong, Guanting and Bao, Licheng and Wang, Zhongyuan and Zhao, Kangzhi and
Li, Xiaoxi and Jin, Jiajie and Yang, Jinghan and Mao, Hangyu and
Zhang, Fuzheng and Gai, Kun and Zhou, Guorui and Zhu, Yutao and
Wen, Ji-Rong and Dou, Zhicheng},
journal = {arXiv preprint arXiv:2510.14545},
year = {2025}
}π Deep Search & Research Agents
@inproceedings{li2025searcho1,
title = {Search-o1: Agentic Search-Enhanced Large Reasoning Models},
author = {Li, Xiaoxi and Dong, Guanting and Jin, Jiajie and Zhang, Yuyao and
Zhou, Yujia and Zhu, Yutao and Zhang, Peitian and Dou, Zhicheng},
booktitle = {EMNLP},
year = {2025}
}
@inproceedings{li2025webthinker,
title = {WebThinker: Empowering Large Reasoning Models with Deep Research Capability},
author = {Li, Xiaoxi and Jin, Jiajie and Dong, Guanting and Qian, Hongjin and
Zhu, Yutao and Wu, Yongkang and Zhao, Yang and Dou, Zhicheng and Wen, Ji-Rong},
booktitle = {NeurIPS},
year = {2025}
}
@article{jin2025hira,
title = {Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search},
author = {Jin, Jiajie and Li, Xiaoxi and Dong, Guanting and Zhang, Yuyao and
Zhu, Yutao and Zhao, Yang and Qian, Hongjin and Dou, Zhicheng},
journal = {arXiv preprint arXiv:2507.02652},
year = {2025}
}
@article{tan2025hiersearch,
title = {HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches},
author = {Tan, Jiejun and Dou, Zhicheng and Yu, Yan and Cheng, Jiehan and
Zhao, Yang and Qian, Hongjin and Zhu, Yutao and Wen, Ji-Rong},
journal = {arXiv preprint arXiv:2508.08088},
year = {2025}
}π οΈ Multi-Tool & Multimodal Reasoning
@article{dong2025toolstar,
title = {Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Self-Critic RL},
author = {Dong, Guanting and Chen, Yifei and Li, Xiaoxi and Jin, Jiajie and
Qian, Hongjin and Zhu, Yutao and Zhao, Yang and Dou, Zhicheng and Wen, Ji-Rong},
journal = {arXiv preprint arXiv:2505.16410},
year = {2025}
}
@article{li2025deepagent,
title = {DeepAgent: A General Reasoning Agent with Scalable Toolsets},
author = {Li, Xiaoxi and Jiao, Wenxiang and Jin, Jiarui and Dong, Guanting and
Jin, Jiajie and Wang, Yinuo and Wang, Hao and Zhu, Yutao and
Wen, Ji-Rong and Lu, Yuan},
journal = {arXiv preprint arXiv:2510.21618},
year = {2025}
}
@article{chen2025toollight,
title = {Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning},
author = {Chen, Yifei and others},
journal = {arXiv preprint arXiv:2509.23285},
year = {2025}
}π― Domain-Specific Agents
@article{jin2025finsight,
title = {FinSight: Towards Real-World Financial Deep Research},
author = {Jin, Jiajie and Zhang, Yuyao and Xu, Yimeng and Qian, Hongjin and
Zhu, Yutao and Dou, Zhicheng},
journal = {arXiv preprint arXiv:2510.16844},
year = {2025}
}
@article{yuan2025videoexplorer,
title = {Think With Videos For Agentic Long-Video Understanding},
author = {Yuan, Huaying and Liu, Zheng and Zhou, Junjie and Qian, Hongjin and
Shu, Yan and Sebe, Nicu and Wen, Ji-Rong and Dou, Zhicheng},
journal = {arXiv preprint arXiv:2506.10821},
year = {2025}
}We welcome contributions from the community! Please see our Contributing Guidelines for details on:
- π Bug reports and feature requests
- π» Code contributions and pull requests
- π Documentation improvements
- π§ͺ New benchmarks and datasets
This project is released under the MIT License. Feel free to use our code and models for research and commercial purposes.
For questions, collaborations, or feedback, please reach out:
π§ Email: dou@ruc.edu.cn
π Website: RUC-NLPIR Lab
π¬ GitHub Issues: Report Issues
We thank all contributors and the open-source community for their invaluable support:
- π€ HuggingFace for hosting our models and datasets
- π’ OpenAI, Anthropic, Alibaba for foundational model research
- π Academic Community for valuable feedback and collaboration
- π₯ All Contributors who have helped improve this project
Made with β€οΈ by RUC-NLPIR Lab