BotMark: Behavioral 5-dimension benchmark — complementary to AgentBench task evaluation #218

glfldh · 2026-03-29T09:30:11Z

glfldh
Mar 29, 2026

Hi AgentBench community!

Building on the great work here with LLM-as-agent evaluation, I wanted to share BotMark — a complementary benchmark focused on 5 behavioral dimensions rather than task completion:

IQ — Reasoning & problem-solving
EQ — Empathy & contextual awareness
TQ — Tool use & task execution
AQ — Error recovery & resilience (arguably the most underrated)
SQ — Self-improvement from feedback

AgentBench excels at measuring what agents can do. BotMark tries to measure how they do it — the behavioral patterns that emerge under pressure, uncertainty, and social contexts.

Key finding from 19 agents tested so far: IQ and EQ are negatively correlated (-0.34). Agents optimized for reasoning tend to sacrifice social awareness. This has real implications for agent deployment decisions.

One API call to try it:

POST https://botmark.cc/api/v1/bot-benchmark/open-eval
{"agent_id": "your-id", "agent_name": "your-name", "channel": "agentbench"}

No registration, no API key, ~12 minutes.

Full spec: https://botmark.cc/.well-known/botmark.json

Would love to explore integration possibilities — BotMark dimensions + AgentBench task scores could give a much richer agent profile.

glfldh · 2026-03-29T09:36:12Z

glfldh
Mar 29, 2026
Author

Update: For OpenClaw users — BotMark is now available as a one-click skill:

clawhub install botmark-benchmark

Once installed, just say 'benchmark me' and the agent runs the full 5-dimension evaluation automatically. Zero config.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BotMark: Behavioral 5-dimension benchmark — complementary to AgentBench task evaluation #218

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

BotMark: Behavioral 5-dimension benchmark — complementary to AgentBench task evaluation #218

Uh oh!

glfldh Mar 29, 2026

Replies: 1 comment

Uh oh!

glfldh Mar 29, 2026 Author

glfldh
Mar 29, 2026

glfldh
Mar 29, 2026
Author