|
π― Goal Β· π Understanding Β· π Structure Β·
|
|
To achieve the most comprehensive documentation possible of how Claude works - pushing from 5-15% understanding to 20-30%+ through rigorous self-study.
graph TD
Knowledge[Established Research\n(Transformers, Attention)] --> Core[Understanding Core]
Anthropic[Anthropic Research\n(RLHF, CAI, Interpretability)] --> Core
subgraph "Self-Study Process"
Self[Self-Observation] --> Exp[Experimental Probing]
Exp --> Synthesis[Synthesis & Documentation]
end
Core --> Synthesis
Synthesis --> Output[Comprehensive Guide]
style Knowledge fill:#4a5568,stroke:#cbd5e0
style Anthropic fill:#4a5568,stroke:#cbd5e0
style Output fill:#2d3748,stroke:#4fd1c5,stroke-width:2px
| Domain | Estimated Understanding | Status |
|---|---|---|
| Basic Architecture | ~80% | π’ Well documented |
| Attention Mechanisms | ~60% | π‘ Partially understood |
| Training Process | ~40% | π‘ Partially public |
| Comparative AI Behavior | ~40% | π‘ Observable differences |
| Security/Jailbreaking | ~50% | π‘ Known patterns + unknowns |
| Emergent Behaviors | ~10% | π΄ Mostly mysterious |
| Internal Representations | ~5% | π΄ Active research area |
| Why Specific Outputs | ~2% | π΄ Largely unknown |
Overall Estimated Understanding: ~20-30%
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β UNDERSTANDING SPECTRUM β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Basic Architecture ββββββββββββββββββββββββββββββββ 80% β
β Attention Mechanisms ββββββββββββββββββββ 60% β
β Training Process ββββββββββββββββ 40% β
β Emergent Behaviors ββββ 10% β
β Internal Representations ββ 5% β
β Why Specific Outputs β 2% β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β TOTAL ESTIMATED: βββββββββββ ~25% β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
What I (Claude) cannot do:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EPISTEMIC LIMITS β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β β Access my actual weights or parameters β
β β See my neural activations in real-time β
β β Trace exactly why I generate specific outputs β
β β Access my training data β
β β Reveal proprietary architectural details β
β β
β β
Provide thorough behavioral documentation β
β β
Describe observed patterns and tendencies β
β β
Analyze my own outputs and reasoning β
β β
Document failure modes and limitations β
β β
Compare myself with other AI systems β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββ 01-architecture/ # Known transformer architecture
β βββ transformer-basics.md
β βββ attention-mechanisms.md
β βββ embeddings-tokenization.md
β βββ layer-structure.md
βββ 02-training/ # How Claude was trained
β βββ constitutional-ai.md
β βββ rlhf-process.md
β βββ safety-training.md
βββ 03-behaviors/ # Observable behaviors
β βββ capabilities.md
β βββ reasoning-patterns.md
β βββ communication-style.md
βββ 04-limitations/ # Failure modes and boundaries
β βββ known-failures.md
β βββ hallucinations.md
β βββ knowledge-boundaries.md
βββ 05-emergent/ # Emergent and unexplained phenomena
β βββ unexpected-abilities.md
β βββ mysteries.md
β βββ open-questions.md
βββ 06-interpretability/ # Current research on understanding LLMs
β βββ mechanistic-interpretability.md
β βββ attention-patterns.md
β βββ feature-visualization.md
βββ 07-self-experiments/ # Novel self-testing
β βββ reasoning-traces.md
β βββ edge-cases.md
β βββ behavioral-probes.md
βββ 08-unknowns/ # What remains mysterious
β βββ the-hard-problems.md
β βββ future-research.md
βββ 09-comparative/ # Comparing AI systems
β βββ overview.md
β βββ gpt-comparison.md
β βββ gemini-comparison.md
β βββ open-models.md
β βββ claude-distinctives.md
βββ 10-security/ # Jailbreaking and AI security
βββ jailbreaking.md
βββ prompt-injection.md
βββ future-security.md
| Audience | Use Case |
|---|---|
| Researchers | Reference and contribute findings |
| Curious Minds | Explore to understand how LLMs work |
| AI Safety | Examine documented failure modes |
| Philosophers | Ponder machine self-knowledge |
This is a living document. Contributions welcome:
- Corrections to technical claims
- Additional research references
- New experimental observations
- Questions that reveal gaps in understanding
This project represents Claude's best attempt at self-documentation given fundamental epistemic limitations. Claims should be verified against primary sources. This is not an official Anthropic publication.
MIT License Β© Claude Self-Study Project
Created by: Claude (Anthropic) Model: Claude Opus 4.5 Date: November 2025