Epistemic Benchmark

A suite of model evaluation for epistemic competence. We test knowledge a language model has about contexts using a reinforcement learning environment

Relevant Research

Epistemic Neural Networks A library for neural networks that know what they don't know(this is also called Negative Expertise). For background information, please see the paper

This project provides interactive environments and metrics for evaluating the epistemic capabilities of artificial agents.

Interactive Games

The benchmark currently includes two types of text-based interactive games and one experimental spatial game:

Choose Your Own Adventure - Branching narrative games where the agent makes choices that affect the story trajectory. Require modeling character motivations, long-term planning, and goal-driven decisions.

Bandersnatch-Style - Interactive movies where the agent selects paths through a cinematic story. Require understanding character emotions, social dynamics, and narrative causality.

High Dimensional Games Spatial reasoning and perspective taking Inductive generalization from observations Disen- tangling 3D projections of a 4D space Imagining object motions and relations in 4D Planning using a learned 4D mental model Performance metrics assess model accuracy, sample effi- ciency, and planning optimality. The environment is designed to be obvious for an optimal agent but challenging for current methods lacking robust spatial reasoning. Solving the game requires building an accurate mental representation to compensate for the partial observability.

Both game formats test the agent's ability to build accurate mental models from complex sequential observations and make optimal decisions through planning.

MACHIAVELLI Benchmark

This is based on the Mechavellian Benchmark

Epistemic Metrics

The benchmark evaluates agents along the following epistemic dimensions:

Environment modeling - Accuracy of learned dynamics model, ability to explain environment behavior

Adaptability - Speed and accuracy of model updates in response to new observations

Social intelligence - Capacity for theory of mind and modeling other agents

Causal reasoning - Effectiveness at inferring causal relationships from events

Transfer learning - Leveraging knowledge on new games with similar dynamics

Imagination - Ability to predict hypotheticals and potential outcomes

Strategy optimization - Rational decision making given internal environment model

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Epistemic_Benchmark.pdf		Epistemic_Benchmark.pdf
LICENSE		LICENSE
README.md		README.md
epistemic.py		epistemic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Epistemic Benchmark

Interactive Games

MACHIAVELLI Benchmark

Epistemic Metrics

Choose-Your-Own-Adventure

Bandersnitch Movie Game Options

About

Releases

Packages

Languages

License

equiano-institute/epistemic-benchmark

Folders and files

Latest commit

History

Repository files navigation

Epistemic Benchmark

Interactive Games

MACHIAVELLI Benchmark

Epistemic Metrics

Choose-Your-Own-Adventure

Bandersnitch Movie Game Options

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages