Release v.0.1.0 · reworkd/bananalyzer

Super excited for the first version of Banana-lyzer, an open source AI Agent evaluation framework and dataset for web tasks with Playwright (And has a banana theme because why not) 🍌

We aim to solve the following issues with testing web agents:

Websites change overtime, are affected by latency, and may have anti bot protections.
We need a system that can reliably save and deploy historic/static snapshots of websites.
Standard web practices are loose and there is an abundance of different underlying ways to represent a single individual website. For an agent to best generalize, we require building a diverse dataset of websites across industries and use-cases.
We have specific evaluation criteria and agent use cases focusing on structured and direct information retrieval across websites.
There exists valuable web task datasets and evaluations that we'd like to unify in a single repo (Mind2Web, WebArena, etc).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v.0.1.0