GitHub - Gnonymous/Web-CogReasoner: Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents

📑 arXiv | 🤗 Models(Coming soon) | 🤗 Dataset(Coming soon) | 🤗 Bench(Coming soon)

Web-CogReasoner introduces a paradigm shift from simply enhancing web agents to systematically building their cognitive abilities from the ground up. Inspired by Bloom’s Taxonomy, we decomposes agent capabilities into knowledge content learning (Factual, Conceptual) and cognitive processes (Procedural), enabling interpretable and goal-directed behavior. Built upon large multimodal models, it performs knowledge-driven Chain-of-Thought (CoT) reasoning across complex web tasks, where each reasoning step is transparently grounded in a specific knowledge type, ensuring both interpretability and robust generalization.

To support this, we introduce:

Web-CogKnowledge Framework: A Bloom's Taxonomy-inspired two-stage training paradigm (Knowledge Content Learning → Cognitive Reasoning) for enhancing web agents' cognitive abilities.
Web-CogReasoner: A knowledge-driven multimodal agent trained via imitation learning in our Web-CogDataset.
Web-CogDataset: A curriculum-style dataset with 12 fine-grained tasks across 3 knowledge levels (Factual, Conceptual, Procedural), enabling stepwise skill acquisition.
Web-CogBench: A dedicated benchmark for evaluating whether a web agent possesses the requisite prior knowledge and cognitive capabilities for effective web navigation.

To-Do List

Last Updated: 2025-08-05 13:08 UTC+8

Paper: Release the full research paper on arXiv.
Code: Open-source the complete code for training and inference.
Model: Publish the official Web-CogReasoner model weights.
Dataset: Make the Web-CogDataset publicly available for community research.
Benchmark: Launch a public online evaluation server for Web-CogBench to ensure fair comparisons.

News

[2025-08-05] Release the full research paper on arXiv.

Performance

Cognitive & Visual Benchmarks

This comparison highlights our model's strength in reasoning, a crucial capability that visual-centric models may lack.

Model	Web-CogBench (Cognition)	VisualWebBench (Vision)
*Proprietary Models*
Claude Sonnet 4	76.8%	85.9%
Gemini 2.5 Pro	80.2%	86.6%
*Open-Source Models*
Qwen2.5-VL-7B	69.8%	76.0%
UI-TARS-7B-SFT	46.4%	86.0%
Web-CogReasoner (Ours)	82.9%	86.3%

Key Insight: While some models like UI-TARS excel at visual tasks (VisualWebBench: 86.0%), they struggle with reasoning-intensive tasks (Web-CogBench: 48.2%). This highlights that strong visual perception does not guarantee advanced cognitive capabilities—a gap our work aims to fill.

Online Web Task

This section evaluates the models' ability to perform complex, multi-step tasks in live web environments.

Model	WebVoyager (Generalization)	Mind2Web (Cross-task)	Mind2Web (Cross-web)
*Proprietary Models*
Claude Sonnet 4	47.7%	40.2%	21.7%
Gemini 2.5 Pro	54.9%	37.5%	25.5%
*Open-Source Models*
Qwen2.5-VL-7B	2.2%	1.0%	1.0%
OpenWebVoyager_IL	18.1%	6.3%	6.6%
Web-CogReasoner (Ours)	30.2%	17.0%	10.1%

Quickstart

Coming soon

Citation

@article{guo2025web,
title={Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents},
author={Guo, Yuhan and Guo, Cong and Sun, Aiwen and He, Hongliang and Yang, Xinyu and Lu, Yue and Zhang, Yingji and Guo, Xuntao and Zhang, Dong and Liu, Jianzhuang and others},
journal={arXiv preprint arXiv:2508.01858},
year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

To-Do List

News

Performance

Quickstart

Citation

About

Uh oh!

License

Gnonymous/Web-CogReasoner

Folders and files

Latest commit

History

Repository files navigation

To-Do List

News

Performance

Quickstart

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks