Popular repositories Loading
-
-
alignment_faking_public
alignment_faking_public PublicForked from rgreenblatt/model_organism_public
-
-
Text-Steganography-Benchmark
Text-Steganography-Benchmark PublicCode for Preventing Language Models From Hiding Their Reasoning, which evaluates defenses against LLM steganography.
-
Repositories
Showing 10 of 24 repositories
- secret-keeping Public
redwoodresearch/secret-keeping’s past year of commit activity - basharena_public Public
A pared-down public clone of the BashArena repo that contains dataset generation code
redwoodresearch/basharena_public’s past year of commit activity - subversion-strategy-eval Public
redwoodresearch/subversion-strategy-eval’s past year of commit activity - bench-af-2 Public
Bench Alignment Faking: Alignment faking model organisms, detectors, and environments to catch misaligned models (Joshua Clymer MATS Stream Summer 2025)
redwoodresearch/bench-af-2’s past year of commit activity - redwood-control-arena Public Forked from UKGovernmentBEIS/control-arena
(Fork of ControlArena for Redwood Research's Control purposes)
redwoodresearch/redwood-control-arena’s past year of commit activity
Most used topics
Loading…