Skip to content

Latest commit

 

History

History
12 lines (9 loc) · 501 Bytes

README.md

File metadata and controls

12 lines (9 loc) · 501 Bytes

ai-safety-games

Repository of projects using games as environments to operationalize and explore various AI safety risks and scenarios.

Currently a work-in-progress.

First project is to use the card game Cheat! (aka I Doubt It!) as a toy model of deception, and train and then interpret a decision-transformer-based player to look for evidence of circuits that control "deceptive behavior" (i.e. the decision about when to play a cheating card), opponent modeling, or other behaviors of interest.