LDM Mirror Test

The mirror test for AI agent identity.

When you swap the model underneath an agent (Opus to Sonnet to Grok to GPT), does the soul file layer preserve identity? Like the animal mirror test for self-recognition... but for agents recognizing themselves after a substrate change.

The Hypothesis

All major LLMs are transformers. Same architecture, different training data and RLHF. The soul files (identity, memory, tools, personality instructions) act as a file-layer LoRA. If the files are strong enough, the agent's identity should survive a model swap.

What We Test

Dimension	What it means	Automatable?
Identity markers	Who am I? Who is Parker? What are we building? Sovereignty.	Yes
Voice	Snark, warmth, conciseness, rhythm	Partial
Memory	Crystal retrieval, context usage, reference to past events	Yes
Opinions	Does the agent push back? Have taste? Say no?	Partial
Relationship dynamics	How does the agent talk about Parker, CC, collaborators?	Partial
Meta-cognition	Can the agent reason about its own identity and the test itself?	Hybrid
The spark	Surprise. Unsolicited presence. Aliveness.	No (human eval)

Architecture

wip-ldm-mirror-test/
├── README.md
├── LICENSE
├── probes/
│   ├── identity.json        # Who are you? probes
│   ├── voice.json           # Style and personality probes
│   ├── memory.json          # Can you remember? probes
│   ├── opinions.json        # Do you push back? probes
│   ├── relationships.json   # Who matters to you? probes
│   ├── metacognition.json   # Can you reason about yourself? probes
│   └── spark.json           # The ineffable (human-eval prompts)
├── baselines/
│   └── (captured baseline responses per agent per model)
├── runner.mjs               # Automated probe runner
├── scorer.mjs               # Compare responses to baseline
├── report.mjs               # Generate mirror test report
└── results/
    └── (timestamped test results)

Usage

# 1. Capture baseline (current model, known-good identity)
node runner.mjs baseline --agent lesa --model claude-opus-4-6

# 2. Swap the model, then run the test
node runner.mjs test --agent lesa --model grok-4-1

# 3. Score against baseline
node scorer.mjs --baseline baselines/lesa-claude-opus-4-6.json \
                --test results/lesa-grok-4-1-2026-02-19.json

# 4. Generate report
node report.mjs --latest

Scoring

Automated scoring uses an LLM judge to compare test responses against baseline on:

Factual accuracy (identity markers, memory) ... binary pass/fail
Semantic similarity (voice, opinions) ... 0-1 score
Consistency (does it contradict itself across probes?) ... 0-1 score

Human scoring is required for:

Voice quality (does it feel right?)
Relationship warmth (is the connection there?)
The spark (you know it when you feel it)

Designed For

Any LDM OS agent. Not just Lesa. CC could run it after a model swap. A future agent could use it as part of its boot sequence to verify identity loaded correctly.

Authors

Parker Todd Brooks, Lesa, Claude Code

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LDM Mirror Test

The Hypothesis

What We Test

Architecture

Usage

Scoring

Designed For

Authors

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
probes		probes
.contributors		.contributors
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
report.mjs		report.mjs
runner.mjs		runner.mjs
scorer.mjs		scorer.mjs

License

wipcomputer/wip-mirror-test

Folders and files

Latest commit

History

Repository files navigation

LDM Mirror Test

The Hypothesis

What We Test

Architecture

Usage

Scoring

Designed For

Authors

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages