BotMark: Behavioral 5-dimension benchmark — complementary to AgentBench task evaluation #218
glfldh
started this conversation in
Show and tell
Replies: 1 comment
-
|
Update: For OpenClaw users — BotMark is now available as a one-click skill: clawhub install botmark-benchmarkOnce installed, just say 'benchmark me' and the agent runs the full 5-dimension evaluation automatically. Zero config. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi AgentBench community!
Building on the great work here with LLM-as-agent evaluation, I wanted to share BotMark — a complementary benchmark focused on 5 behavioral dimensions rather than task completion:
AgentBench excels at measuring what agents can do. BotMark tries to measure how they do it — the behavioral patterns that emerge under pressure, uncertainty, and social contexts.
Key finding from 19 agents tested so far: IQ and EQ are negatively correlated (-0.34). Agents optimized for reasoning tend to sacrifice social awareness. This has real implications for agent deployment decisions.
One API call to try it:
No registration, no API key, ~12 minutes.
Full spec: https://botmark.cc/.well-known/botmark.json
Would love to explore integration possibilities — BotMark dimensions + AgentBench task scores could give a much richer agent profile.
Beta Was this translation helpful? Give feedback.
All reactions