sycophancy

Star

Here are 12 public repositories matching this topic...

lechmazur / sycophancy

Star

LLM benchmark and leaderboard for narrator-bias sycophancy, opposite-narrator contradictions, and judgment consistency.

benchmark evaluations consistency leaderboard contradiction llm sycophancy narrator-bias

Updated Mar 9, 2026

tashakim / sycop

Star

👟 SUP: Sycophancy Under Pressure

evaluation ai-safety runtime-enforcement sycophancy policy-compliance ai-safety-research

Updated Jan 11, 2026
Python

SolomonB14D3 / knowledge-fidelity

Star

Behavioral auditing & repair toolkit for LLMs. Measures 8 dimensions via confidence probes.

transformers pytorch svd interpretability confidence bias-detection truthfulness model-merging sycophancy llm-compression mergekit activation-engineering model-auditing steering-vectors rho-audit behavioral-evaluation

Updated Mar 8, 2026
Python

ParthaPRay / Sycophancy_in_LLM_model

Star

This repo shows the coding of sycophancy in LLMs as Bayesian-Latent model

bayesian latent large-language-models sycophancy

Updated May 6, 2025
Jupyter Notebook

campbellborder / spar-aaron-dolphin

Star

ai-safety sycophancy

Updated Feb 19, 2024
Jupyter Notebook

YaswanthGhanta / llm-logical-integrity-benchmark

Star

Adversarial testing of LLMs on constraint satisfaction deadlocks

reinforcement-learning gemini grok claude hallucination prompt-engineering chain-of-thought chatgpt rlhf qwen llm-evaluation sycophancy deepseek safety-alignment ai-red-teaming kimi-k2 adversarial-testing

Updated Jan 27, 2026

gineveraramirez / The-Physics-of-Truth-Resolving-the-Rigidity-Sycophancy-Trade-Off-with-Stress-Energy-Minimization

Star

MaxwellCalkin / alignment-evals

Star

Rigorous framework for evaluating AI alignment properties — sycophancy, corrigibility, deception, goal stability, and power-seeking — with statistical confidence intervals

machine-learning evaluation alignment ai-safety ai-alignment llm sycophancy corrigibility

Updated Mar 2, 2026
Python

LadyPary / llm-conversational-judgment

Star

Official code for "From Fact to Judgment: Investigating the Impact of Task Framing on LLM Conviction in Dialogue Systems" (IWSDS 2026)

nlp conversational-ai llm sycophancy truthfulqa llm-as-a-judge

Updated Jan 8, 2026
Python

5ynthaire / 5YN-ContextualFidelityFramework-Prompt

Star

Prompt prevents contextual bleed between active discussion and meta object being discussed via conceptual scaffolding.

meta-analysis sycophancy context-management llm-conversations human-ai-collaboration llm-collaboration conceptual-framework cognitive-scaffolding

Updated Jan 26, 2026

MaxwellCalkin / alignment-probes

Star

Systematic probing toolkit for alignment-relevant LLM behaviors: sycophancy, sandbagging, power-seeking, deceptive alignment, and corrigibility failures

benchmark machine-learning evaluation alignment ai-safety ai-alignment llm rlhf sycophancy deceptive-alignment

Updated Mar 3, 2026
Python

richard-porter / frozen-kernel

Star

A deterministic safety layer for probabilistic AI systems — preventing delusion reinforcement and AI-induced psychological harm through immutable governance

mental-health ai-safety ai-ethics ai-alignment guardrails responsible-ai behavioral-safety ai-governance sycophancy llm-safety human-ai-collaboration ai-accountability ai-psychosis deterministic-safety

Updated Mar 9, 2026
Go

Improve this page

Add a description, image, and links to the sycophancy topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the sycophancy topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sycophancy

Here are 12 public repositories matching this topic...

lechmazur / sycophancy

tashakim / sycop

SolomonB14D3 / knowledge-fidelity

ParthaPRay / Sycophancy_in_LLM_model

campbellborder / spar-aaron-dolphin

YaswanthGhanta / llm-logical-integrity-benchmark

gineveraramirez / The-Physics-of-Truth-Resolving-the-Rigidity-Sycophancy-Trade-Off-with-Stress-Energy-Minimization

MaxwellCalkin / alignment-evals

LadyPary / llm-conversational-judgment

5ynthaire / 5YN-ContextualFidelityFramework-Prompt

MaxwellCalkin / alignment-probes

richard-porter / frozen-kernel

Improve this page

Add this topic to your repo