Skip to content

Gnonymous/Web-CogReasoner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Web-CogReasoner Overview

   📑 arXiv    |    🤗 Models(Coming soon)    |    🤗 Dataset(Coming soon)    |    🤗 Bench(Coming soon)   

   🌐 Homepage    |    💬 Blog   

📝 Paper (arXiv) 🤗 Model on Hugging Face 🐛 Open Issues ⭐ GitHub Stars

Web-CogReasoner Overview

Web-CogReasoner introduces a paradigm shift from simply enhancing web agents to systematically building their cognitive abilities from the ground up. Inspired by Bloom’s Taxonomy, we decomposes agent capabilities into knowledge content learning (Factual, Conceptual) and cognitive processes (Procedural), enabling interpretable and goal-directed behavior. Built upon large multimodal models, it performs knowledge-driven Chain-of-Thought (CoT) reasoning across complex web tasks, where each reasoning step is transparently grounded in a specific knowledge type, ensuring both interpretability and robust generalization.

To support this, we introduce:

  • Web-CogKnowledge Framework: A Bloom's Taxonomy-inspired two-stage training paradigm (Knowledge Content Learning → Cognitive Reasoning) for enhancing web agents' cognitive abilities.

  • Web-CogReasoner: A knowledge-driven multimodal agent trained via imitation learning in our Web-CogDataset.

  • Web-CogDataset: A curriculum-style dataset with 12 fine-grained tasks across 3 knowledge levels (Factual, Conceptual, Procedural), enabling stepwise skill acquisition.

  • Web-CogBench: A dedicated benchmark for evaluating whether a web agent possesses the requisite prior knowledge and cognitive capabilities for effective web navigation.

Web-CogReasoner Overview

To-Do List

Last Updated: 2025-08-05 13:08 UTC+8

  • Paper: Release the full research paper on arXiv.
  • Code: Open-source the complete code for training and inference.
  • Model: Publish the official Web-CogReasoner model weights.
  • Dataset: Make the Web-CogDataset publicly available for community research.
  • Benchmark: Launch a public online evaluation server for Web-CogBench to ensure fair comparisons.

News

[2025-08-05] Release the full research paper on arXiv.

Performance

Cognitive & Visual Benchmarks

This comparison highlights our model's strength in reasoning, a crucial capability that visual-centric models may lack.

Model Web-CogBench (Cognition) VisualWebBench (Vision)
Proprietary Models
Claude Sonnet 4 76.8% 85.9%
Gemini 2.5 Pro 80.2% 86.6%
Open-Source Models
Qwen2.5-VL-7B 69.8% 76.0%
UI-TARS-7B-SFT 46.4% 86.0%
Web-CogReasoner (Ours) 82.9% 86.3%

Key Insight: While some models like UI-TARS excel at visual tasks (VisualWebBench: 86.0%), they struggle with reasoning-intensive tasks (Web-CogBench: 48.2%). This highlights that strong visual perception does not guarantee advanced cognitive capabilities—a gap our work aims to fill.

Online Web Task

This section evaluates the models' ability to perform complex, multi-step tasks in live web environments.

Model WebVoyager (Generalization) Mind2Web (Cross-task) Mind2Web (Cross-web)
Proprietary Models
Claude Sonnet 4 47.7% 40.2% 21.7%
Gemini 2.5 Pro 54.9% 37.5% 25.5%
Open-Source Models
Qwen2.5-VL-7B 2.2% 1.0% 1.0%
OpenWebVoyagerIL 18.1% 6.3% 6.6%
Web-CogReasoner (Ours) 30.2% 17.0% 10.1%

Quickstart

Coming soon

Citation

@article{guo2025web,
title={Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents},
author={Guo, Yuhan and Guo, Cong and Sun, Aiwen and He, Hongliang and Yang, Xinyu and Lu, Yue and Zhang, Yingji and Guo, Xuntao and Zhang, Dong and Liu, Jianzhuang and others},
journal={arXiv preprint arXiv:2508.01858},
year={2025}
}

About

Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents

Topics

Resources

License

Stars

Watchers

Forks