neuronpedia

Here are 4 public repositories matching this topic...

peppinob-ol / attribution-graph-probing

Automates attribution-graph analysis via probe prompting: circuit-trace a prompt, auto-generate concept probes, profile feature activations, cluster supernodes.

graph-analysis sparse-autoencoders mechanistic-interpretability llm-interpretability research-tooling circuit-tracing attribution-graphs probe-prompting prompt-probing neuronpedia feature-activation supernodes cross-layer-transcoder

Updated Mar 18, 2026
Python

nulone / sae-consciousness-steering-pitfalls

Star

Reproducible case study of pitfalls in contrastive SAE discovery and steering for "consciousness" features (GemmaScope SAEs, Gemma 3 4B/12B): reconstruction confound, delta-steering fix, matched controls, and false-positive scaling law vs dataset size.

gemma sae sparse-autoencoder contrastive-learning mechanistic-interpretability feature-steering neuronpedia null-result gemmascope delta-steering

Updated Feb 26, 2026
Python

DrejcPesjak / auto-gemmascope

Star

Finding SAE features in Gemma 3 vision-language model — autonomous AI vs human-guided AI comparison using GemmaScope 2

gemma sae sparse-autoencoder ai-alignment mechanistic-interpretability vision-language-model neuronpedia gemmascope

Updated Mar 9, 2026
Jupyter Notebook

myregistercd / sae-consciousness-steering-pitfalls

Star

Explore limitations of contrastive SAE steering in identifying causal consciousness features and introduce delta-steering to improve experiment validity.

gemma sae sparse-autoencoder contrastive-learning mechanistic-interpretability feature-steering neuronpedia null-result gemmascope delta-steering

Updated Mar 19, 2026
PHP

Improve this page

Add a description, image, and links to the neuronpedia topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the neuronpedia topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

neuronpedia

Here are 4 public repositories matching this topic...

peppinob-ol / attribution-graph-probing

nulone / sae-consciousness-steering-pitfalls

DrejcPesjak / auto-gemmascope

myregistercd / sae-consciousness-steering-pitfalls

Improve this page

Add this topic to your repo