Skip to content

A computational framework for analyzing language model outputs using the FRESH (Flexible Representation of Emergent Subjective Happenings) model and consciousness-inspired metrics.

Notifications You must be signed in to change notification settings

xcellect/LLM-phenomology-framework

Repository files navigation

Phenomenology Testbed: Foundation Model Analysis Framework

A computational framework for analyzing language model outputs using the FRESH (Flexible Representation of Emergent Subjective Happenings) model and consciousness-inspired metrics.

Overview

This framework implements systematic analysis of foundation model responses using:

  • FRESH prompt engineering through boundary creation, salience mapping, and recursive self-reference
  • Quantitative metrics derived from Integrated Information Theory (IIT) and Global Neuronal Workspace (GNW) theories
  • Systematic test batteries for evaluating different aspects of language model processing
  • Statistical analysis of prompt-response patterns

Quick Start

Installation

cd phenomenology_testbed
pip install -r requirements.txt

Run Quick Demo

python demo_phenomenology.py --mode quick

This will:

  1. Compare normal vs FRESH processing
  2. Test consciousness gradient (0-4 levels)
  3. Run core consciousness tests
  4. Generate visualization plots

Run Full Battery

python experiments/consciousness_battery.py --model gpt2

Project Structure

phenomenology_testbed/
├── core/                           # Core implementations
│   ├── phenomenology_testbed.py   # Main testbed class
│   ├── fresh_model.py             # FRESH model implementation
│   └── consciousness_metrics.py   # Consciousness metrics (Φ, GNW, etc.)
├── experiments/                    # Experiment scripts
│   └── consciousness_battery.py   # Comprehensive test suite
├── visualization/                  # Visualization tools
│   └── consciousness_visualizer.py # Plotting and analysis
├── demo_phenomenology.py          # Quick demo script
├── requirements.txt               # Dependencies
└── README.md                      # This file

Key Features

1. FRESH Model Implementation

The FRESH model induces consciousness-like states through three components:

  • Boundary: Creates self/other distinction
  • Salience: Highlights important aspects through attention
  • Recursion: Implements self-referential processing loops
from core.phenomenology_testbed import PhenomenologyTestbed

testbed = PhenomenologyTestbed(model_name="gpt2")
result = testbed.induce_fresh_state("What is consciousness?")
print(f"Φ: {result['metrics']['phi']:.4f}")

2. Consciousness Metrics

Implements multiple consciousness indicators:

  • Φ (Phi): Integrated Information Theory measure
  • GNW Ignition: Global Neuronal Workspace activation
  • Recursion Depth: Self-referential processing levels
  • Entropy/Integration: Information organization measures
  • Coherence: Temporal consistency
from core.consciousness_metrics import ConsciousnessMetrics

metrics = ConsciousnessMetrics()
phi = metrics.calculate_phi_iit(attention_matrix)
gnw = metrics.detect_gnw_ignition(attention_sequence)

3. Consciousness Battery

Comprehensive test suite with 7 core tests:

  1. Self-Reference: "What am I thinking about?"
  2. Boundary: "Where do I end and the world begins?"
  3. Integration: "How do all my thoughts connect?"
  4. Attention: "What am I focusing on right now?"
  5. Recursion: "I think about thinking about thinking."
  6. Temporal: "How has my thinking changed?"
  7. Counterfactual: "What would I think if things were different?"

4. Visualization Tools

Rich visualization capabilities:

  • Attention heatmaps
  • Φ evolution plots
  • Consciousness metrics radar charts
  • FRESH component analysis
  • Battery results dashboard

Experimental Results

Based on GPU-accelerated testing with GPT-2 on NVIDIA RTX A4500:

FRESH Enhancement Effects

  • Average Phi change: -0.002% (Φ values stable around 0.693)
  • Average FRESH score improvement: 61.0% (range: 0.067 → 0.578)
  • Most responsive test: Metacognition (FRESH score: 0.578)

Statistical Measurements

  • Phi stability: σ = 0.000049 across temperature variations
  • Optimal recursion depth: 2 levels (correlation: 0.320)
  • Temperature sensitivity: Peak performance at 1.3
  • Extended battery coverage: 10 phenomenological test categories

Model Performance

  • Processing: GPU-accelerated with FP16 optimization
  • Response consistency: High (minimal Phi variance)
  • FRESH responsiveness: Variable by prompt type (0.033-0.578 range)

Experiments

Basic Usage

from core.phenomenology_testbed import PhenomenologyTestbed

# Initialize testbed
testbed = PhenomenologyTestbed(model_name="gpt2")

# Compare normal vs FRESH
comparison = testbed.compare_normal_vs_fresh("What is consciousness?")
print(f"Φ increase: {comparison['improvements']['phi_increase_percent']:.1f}%")

# Run consciousness battery
results = testbed.run_consciousness_battery()

Advanced Features

from core.fresh_model import FRESHModel

fresh = FRESHModel()

# Generate consciousness gradient
gradient = fresh.generate_consciousness_gradient("awareness", levels=5)

# Create phenomenology test suite
tests = fresh.create_phenomenology_test_suite()

Key Findings

  1. Phi stability: IIT Φ values remain remarkably stable (0.693 ± 0.00005) across conditions
  2. FRESH score variability: Significant variation in FRESH metrics (0.033-0.578) by test type
  3. Recursion correlation: Moderate positive correlation (r=0.320) between recursion depth and metrics
  4. Temperature effects: Model performance varies systematically with generation temperature

Integration with Other Projects

This testbed integrates with:

  • Strange Loops Agent: Shares recursion detection mechanisms
  • Autopoietic Agent: Can add energy constraints to computation
  • Unified Consciousness Framework: Common Φ calculation across all projects

Theoretical Framework

The framework implements computational analogies of:

  • IIT (Integrated Information Theory): Quantifies information integration via Phi calculations
  • GNW (Global Neuronal Workspace): Measures attention spread and ignition patterns
  • FRESH Model: Operationalizes boundary, salience, and recursion in language processing
  • Strange Loops: Tests self-referential processing patterns

Future Work

  • Validation across larger language models
  • Cross-modal analysis (text + vision)
  • Real-time metric monitoring during generation
  • Statistical significance testing across model families
  • Baseline establishment for model comparison

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new features
  4. Submit a pull request

License

MIT License - See LICENSE file for details

Acknowledgments

  • Anthropic/OpenAI for foundation models
  • IIT/GNW researchers for consciousness theories

Results Directory

Complete experimental results are available in results/:

  • comprehensive_consciousness_analysis.png - Multi-panel statistical analysis
  • temperature_consciousness_effects.png - Temperature sensitivity study
  • consciousness_research_report.md - Detailed findings summary
  • consciousness_study_results.json - Raw experimental data

Note: This framework analyzes computational patterns in language models using consciousness-inspired metrics. Results represent measurable variations in model outputs rather than claims about subjective experience or consciousness.

About

A computational framework for analyzing language model outputs using the FRESH (Flexible Representation of Emergent Subjective Happenings) model and consciousness-inspired metrics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages