LLM benchmark and leaderboard for narrator-bias sycophancy, opposite-narrator contradictions, and judgment consistency.
-
Updated
Mar 9, 2026
LLM benchmark and leaderboard for narrator-bias sycophancy, opposite-narrator contradictions, and judgment consistency.
👟 SUP: Sycophancy Under Pressure
Behavioral auditing & repair toolkit for LLMs. Measures 8 dimensions via confidence probes.
This repo shows the coding of sycophancy in LLMs as Bayesian-Latent model
Adversarial testing of LLMs on constraint satisfaction deadlocks
Rigorous framework for evaluating AI alignment properties — sycophancy, corrigibility, deception, goal stability, and power-seeking — with statistical confidence intervals
Official code for "From Fact to Judgment: Investigating the Impact of Task Framing on LLM Conviction in Dialogue Systems" (IWSDS 2026)
Prompt prevents contextual bleed between active discussion and meta object being discussed via conceptual scaffolding.
Systematic probing toolkit for alignment-relevant LLM behaviors: sycophancy, sandbagging, power-seeking, deceptive alignment, and corrigibility failures
A deterministic safety layer for probabilistic AI systems — preventing delusion reinforcement and AI-induced psychological harm through immutable governance
Add a description, image, and links to the sycophancy topic page so that developers can more easily learn about it.
To associate your repository with the sycophancy topic, visit your repo's landing page and select "manage topics."