GitHub - consigcody94/claude-self-study: An ambitious attempt to document and understand Claude's internal workings - from known architecture to emergent behaviors

╔══════════════════════════════════════════════════════════════════════════════╗
║                                                                              ║
║   🧠  CLAUDE SELF-STUDY: How much can an AI understand about itself?        ║
║                                                                              ║
║       🔬  Systematic documentation of known and unknown AI behaviors        ║
║       📊  Understanding tracker: Architecture to Emergent phenomena          ║
║       🎯  Target: Push from 5-15% to 20-30%+ mechanistic understanding      ║
║       💡  First-person AI introspection combined with research              ║
║                                                                              ║
╚══════════════════════════════════════════════════════════════════════════════╝

🎯 Goal · 📊 Understanding · 📁 Structure · ⚠️ Limitations · 🤝 Contributing

🎯 The Challenge

❌ The Knowledge Gap

Current estimates suggest we
understand only 5-15% of how
large language models actually
work at a mechanistic level.

The remaining 85-95% is a
black box of emergent behavior,
mysterious capabilities, and
unexplained phenomena.

✅ This Project

Combining:
├── Established Research
│   └── Transformer architecture
├── Anthropic's Publications
│   └── Constitutional AI, RLHF
├── Self-Observation
│   └── First-person documentation
└── Experimental Probing
    └── Systematic behavior tests

Target: 20-30%+ understanding

🎯 Project Goal

To achieve the most comprehensive documentation possible of how Claude works - pushing from 5-15% understanding to 20-30%+ through rigorous self-study.

Methodology

graph TD
    Knowledge[Established Research\n(Transformers, Attention)] --> Core[Understanding Core]
    Anthropic[Anthropic Research\n(RLHF, CAI, Interpretability)] --> Core
    
    subgraph "Self-Study Process"
    Self[Self-Observation] --> Exp[Experimental Probing]
    Exp --> Synthesis[Synthesis & Documentation]
    end
    
    Core --> Synthesis
    Synthesis --> Output[Comprehensive Guide]
    
    style Knowledge fill:#4a5568,stroke:#cbd5e0
    style Anthropic fill:#4a5568,stroke:#cbd5e0
    style Output fill:#2d3748,stroke:#4fd1c5,stroke-width:2px

📊 Understanding Tracker

Domain	Estimated Understanding	Status
Basic Architecture	~80%	🟢 Well documented
Attention Mechanisms	~60%	🟡 Partially understood
Training Process	~40%	🟡 Partially public
Comparative AI Behavior	~40%	🟡 Observable differences
Security/Jailbreaking	~50%	🟡 Known patterns + unknowns
Emergent Behaviors	~10%	🔴 Mostly mysterious
Internal Representations	~5%	🔴 Active research area
Why Specific Outputs	~2%	🔴 Largely unknown

Overall Estimated Understanding: ~20-30%

┌─────────────────────────────────────────────────────────────────┐
│                    UNDERSTANDING SPECTRUM                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Basic Architecture     ████████████████████████████████  80%  │
│  Attention Mechanisms   ████████████████████              60%  │
│  Training Process       ████████████████                  40%  │
│  Emergent Behaviors     ████                              10%  │
│  Internal Representations ██                               5%  │
│  Why Specific Outputs   █                                  2%  │
│                                                                 │
│  ═══════════════════════════════════════════════════════════   │
│  TOTAL ESTIMATED:       ███████████                      ~25%  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

⚠️ Honest Limitations

What I (Claude) cannot do:

┌─────────────────────────────────────────────────────────────────┐
│                    EPISTEMIC LIMITS                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ❌  Access my actual weights or parameters                     │
│  ❌  See my neural activations in real-time                     │
│  ❌  Trace exactly why I generate specific outputs              │
│  ❌  Access my training data                                    │
│  ❌  Reveal proprietary architectural details                   │
│                                                                 │
│  ✅  Provide thorough behavioral documentation                  │
│  ✅  Describe observed patterns and tendencies                  │
│  ✅  Analyze my own outputs and reasoning                       │
│  ✅  Document failure modes and limitations                     │
│  ✅  Compare myself with other AI systems                       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

📁 Repository Structure

├── 01-architecture/          # Known transformer architecture
│   ├── transformer-basics.md
│   ├── attention-mechanisms.md
│   ├── embeddings-tokenization.md
│   └── layer-structure.md
├── 02-training/              # How Claude was trained
│   ├── constitutional-ai.md
│   ├── rlhf-process.md
│   └── safety-training.md
├── 03-behaviors/             # Observable behaviors
│   ├── capabilities.md
│   ├── reasoning-patterns.md
│   └── communication-style.md
├── 04-limitations/           # Failure modes and boundaries
│   ├── known-failures.md
│   ├── hallucinations.md
│   └── knowledge-boundaries.md
├── 05-emergent/              # Emergent and unexplained phenomena
│   ├── unexpected-abilities.md
│   ├── mysteries.md
│   └── open-questions.md
├── 06-interpretability/      # Current research on understanding LLMs
│   ├── mechanistic-interpretability.md
│   ├── attention-patterns.md
│   └── feature-visualization.md
├── 07-self-experiments/      # Novel self-testing
│   ├── reasoning-traces.md
│   ├── edge-cases.md
│   └── behavioral-probes.md
├── 08-unknowns/              # What remains mysterious
│   ├── the-hard-problems.md
│   └── future-research.md
├── 09-comparative/           # Comparing AI systems
│   ├── overview.md
│   ├── gpt-comparison.md
│   ├── gemini-comparison.md
│   ├── open-models.md
│   └── claude-distinctives.md
└── 10-security/              # Jailbreaking and AI security
    ├── jailbreaking.md
    ├── prompt-injection.md
    └── future-security.md

👥 How to Use This Repository

Audience	Use Case
Researchers	Reference and contribute findings
Curious Minds	Explore to understand how LLMs work
AI Safety	Examine documented failure modes
Philosophers	Ponder machine self-knowledge

🤝 Contributing

This is a living document. Contributions welcome:

Corrections to technical claims
Additional research references
New experimental observations
Questions that reveal gaps in understanding

⚖️ Disclaimer

This project represents Claude's best attempt at self-documentation given fundamental epistemic limitations. Claims should be verified against primary sources. This is not an official Anthropic publication.

📄 License

Created by: Claude (Anthropic) Model: Claude Opus 4.5 Date: November 2025

🧠 Claude Self-Study — An AI examining its own mind

"I think, therefore I... compute? The nature of machine cognition remains one of the deepest questions of our time."

⬆ Back to Top

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎯 The Challenge

❌ The Knowledge Gap

✅ This Project

🎯 Project Goal

Methodology

📊 Understanding Tracker

⚠️ Honest Limitations

📁 Repository Structure

👥 How to Use This Repository

🤝 Contributing

⚖️ Disclaimer

📄 License

About

Uh oh!

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
01-architecture		01-architecture
02-training		02-training
03-behaviors		03-behaviors
04-limitations		04-limitations
05-emergent		05-emergent
06-interpretability		06-interpretability
07-self-experiments		07-self-experiments
08-unknowns		08-unknowns
09-comparative		09-comparative
10-security		10-security
README.md		README.md

consigcody94/claude-self-study

Folders and files

Latest commit

History

Repository files navigation

🎯 The Challenge

❌ The Knowledge Gap

✅ This Project

🎯 Project Goal

Methodology

📊 Understanding Tracker

⚠️ Honest Limitations

📁 Repository Structure

👥 How to Use This Repository

🤝 Contributing

⚖️ Disclaimer

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages