kaifronsdal

Kai Fronsdal kaifronsdal

Achievements

safety-research/petri safety-research/petri Public

An alignment auditing agent capable of quickly exploring alignment hypothesis

Python 889 127
UKGovernmentBEIS/inspect_ai UKGovernmentBEIS/inspect_ai Public

Inspect: A framework for large language model evaluations

Python 1.7k 395
Self-Reasoning-Evals Self-Reasoning-Evals Public

MISR: Measuring Instrumental Self-Reasoning in Frontier Models

Jupyter Notebook 10
VLM-Reward-Hacking VLM-Reward-Hacking Public

Python 1 1