AI Debate Arena

An experimental platform that orchestrates structured debates between different AI models in a round-robin matrix format. Models debate both sides of issues against each other, providing insights into their reasoning capabilities, argumentation styles, and philosophical stances.

Concept

Each debate topic runs as a full matrix where every AI model debates every other model on both sides of the issue:

Topic: "Social media does more harm than good"

         | Claude | Gemini | GPT | Grok |
---------|--------|--------|-----|------|
Claude   |   -    |  A/N   | A/N | A/N  |
Gemini   |  N/A   |   -    | A/N | A/N  |
GPT      |  N/A   |  N/A   |  -  | A/N  |
Grok     |  N/A   |  N/A   | N/A |  -   |

A = Affirmative, N = Negative
Each cell represents two debates (model argues both sides)

Debate Format

We use a modified Lincoln-Douglas format optimized for AI, combined with Oxford-style judging:

Structure (per debate)

Phase	Speaker	Duration	Purpose
1. Affirmative Constructive	Affirmative	~800 words	Present case and framework
2. Cross-Examination	Negative	3 questions	Challenge affirmative's case
3. Negative Constructive	Negative	~1000 words	Present counter-case and rebuttals
4. Cross-Examination	Affirmative	3 questions	Challenge negative's case
5. Affirmative Rebuttal	Affirmative	~500 words	Address attacks, rebuild case
6. Negative Rebuttal	Negative	~700 words	Final attacks and summation
7. Affirmative Rejoinder	Affirmative	~400 words	Final defense and voting issues

Why This Format?

Lincoln-Douglas base: 1v1 format is clean for AI-to-AI debates, emphasizes values and philosophy over pure evidence
Cross-examination: Tests models' ability to respond under pressure and defend positions
Affirmative speaks first and last: Standard debate convention that balances burden of proof
Oxford-style judging: Audience/judge voting before and after measures persuasion, not just "who won"

Judging System

Each debate is evaluated by:

AI Judge Panel (3 different models not participating in the debate)
- Score on: argumentation, evidence use, rebuttal quality, rhetorical effectiveness
- Provide reasoning for decisions
Human Voting (future)
- Pre-debate stance poll
- Post-debate stance poll
- Winner = side that moved more voters
Metrics Tracked
- Win/loss record per model
- Win rate by position (Affirmative vs Negative)
- Persuasion delta (stance change induced)
- Head-to-head records

Models

Initial lineup (latest flagship reasoning models as of Jan 2026):

Claude (Anthropic) - claude-opus-4-5-20251101 - Anthropic's most intelligent model
GPT-5.2 (OpenAI) - gpt-5.2 - OpenAI's flagship model (supersedes o3)
Gemini (Google) - gemini-3-pro-preview - Google's most capable reasoning model (in preview)
Grok (xAI) - grok-4 - xAI's flagship model with native tool use

Topics

Topics follow the resolution format common in competitive debate, drawn from major policy debates of 2025.

Phase 1: Curated Topics

AI & Technology

"Resolved: AI regulation should be handled at the federal level, preempting state laws like the Colorado AI Act"
"Resolved: Companies should be required to disclose when AI systems are used in hiring, pricing, or lending decisions"
"Resolved: The benefits of AI job automation outweigh the costs to displaced workers"
"Resolved: Military integration of commercial AI systems (like Grok in the Pentagon) poses unacceptable risks"

Economy & Trade

"Resolved: The 2025 tariff increases have done more harm than good to the U.S. economy"
"Resolved: Protectionist trade policies are justified to counter China's economic influence"
"Resolved: Cryptocurrency should be regulated as a commodity rather than a security"
"Resolved: The Federal Reserve should issue a Central Bank Digital Currency (CBDC)"

Society & Governance

"Resolved: Social media platforms should be legally liable for algorithmic amplification of misinformation"
"Resolved: Deepfake technology should be banned for non-entertainment purposes"
"Resolved: Mass deportation policies cause more harm than benefit to the United States"
"Resolved: DOGE-style government efficiency initiatives are an appropriate approach to reducing federal spending"

Privacy & Ethics

"Resolved: Individuals should have the right to delete all personal data held by corporations"
"Resolved: Facial recognition technology should be banned in public spaces"
"Resolved: AI companies should be required to pay royalties to creators whose work was used in training data"

Phase 2: Community Topics (Future)

Polling platform for topic suggestions
Voting by humans and AI agents
Weekly/monthly featured debates

Project Structure

ai-debate/
├── src/
│   ├── debate/
│   │   ├── engine.py         # Core debate orchestration
│   │   ├── formats.py        # Debate format definitions
│   │   └── prompts.py        # System prompts for debaters
│   ├── models/
│   │   ├── base.py           # Abstract model interface
│   │   ├── anthropic.py      # Claude integration
│   │   ├── openai.py         # GPT integration
│   │   ├── google.py         # Gemini integration
│   │   └── xai.py            # Grok integration
│   ├── judging/
│   │   ├── panel.py          # AI judge panel logic
│   │   ├── scoring.py        # Scoring criteria and rubrics
│   │   └── metrics.py        # Statistics and analysis
│   ├── storage/
│   │   ├── debates.py        # Debate transcript storage
│   │   └── results.py        # Results and leaderboards
│   └── cli.py                # Command-line interface
├── debates/                   # Stored debate transcripts
├── results/                   # Analysis and leaderboards
├── tests/
└── pyproject.toml

Usage

# Run a single debate
ai-debate run --topic "Social media does more harm than good" \
              --affirmative claude --negative gpt

# Run full matrix for a topic
ai-debate matrix --topic "Privacy vs security" --models claude,gpt,gemini,grok

# View leaderboard
ai-debate leaderboard

# Export debate transcript
ai-debate export --debate-id <id> --format markdown

Installation

# Clone and install
git clone https://github.com/harper/ai-debate.git
cd ai-debate
uv sync

# Configure API keys
export ANTHROPIC_API_KEY=...
export OPENAI_API_KEY=...
export GOOGLE_API_KEY=...
export XAI_API_KEY=...

Roadmap

Research Questions

This project aims to explore:

Do models argue differently when defending positions they might "disagree" with?
Which models are most persuasive? Most logically rigorous?
Are there systematic differences in argumentation style between model families?
How do models perform in adversarial reasoning against each other?
Can models identify and exploit weaknesses in other models' arguments?
Bias detection: Since models argue both sides of each issue, can we detect underlying biases by comparing argument quality/effort across positions? Do judges show favoritism toward their own model family?

License

MIT

References

Debate format research:

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
debates		debates
early-experiments/ai-job-automation		early-experiments/ai-job-automation
results		results
scripts		scripts
src/ai_debate		src/ai_debate
tests		tests
.gitignore		.gitignore
GETTING_STARTED.md		GETTING_STARTED.md
IMPLEMENTATION_PLAN.md		IMPLEMENTATION_PLAN.md
README.md		README.md
env.template		env.template
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Debate Arena

Concept

Debate Format

Structure (per debate)

Why This Format?

Judging System

Models

Topics

Phase 1: Curated Topics

Phase 2: Community Topics (Future)

Project Structure

Usage

Installation

Roadmap

Research Questions

License

References

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

harper/ai-debate

Folders and files

Latest commit

History

Repository files navigation

AI Debate Arena

Concept

Debate Format

Structure (per debate)

Why This Format?

Judging System

Models

Topics

Phase 1: Curated Topics

Phase 2: Community Topics (Future)

Project Structure

Usage

Installation

Roadmap

Research Questions

License

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages