Skip to content

montemac/ai-safety-games

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ai-safety-games

Repository of projects using games as environments to operationalize and explore various AI safety risks and scenarios.

Currently a work-in-progress.

First project is to use the card game Cheat! (aka I Doubt It!) as a toy model of deception, and train and then interpret a decision-transformer-based player to look for evidence of circuits that control "deceptive behavior" (i.e. the decision about when to play a cheating card), opponent modeling, or other behaviors of interest.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages