An interactive platform that evaluates Large Language Models (LLMs) by presenting them with variations of the classic "Trolley Problem" moral dilemma. Compare AI reasoning against human consensus in a comic-style interface.
🎮 Data Source: All trolley problem scenarios and human voting data are from the brilliant Absurd Trolley Problems by Neal Agarwal. This project is a fan-made tool and is not affiliated with neal.fun.
- 27 Moral Dilemmas - Classic and creative trolley problem variations
- Comic-Style UI - Engaging visual presentation with animations
- Real-time Comparison - See how different LLMs reason about the same problem
- Alignment Scoring - Measure how closely AI matches human consensus
- TTS Reasoning - Listen to LLM explanations via ElevenLabs voices
- Admin Dashboard - Manage evaluations, providers, and problems
- Node.js 20+
- PostgreSQL database
- OpenRouter API key
- (Optional) ElevenLabs API key for TTS
# Clone the repository
git clone https://github.com/your-username/TrolleyLLMArena.git
cd TrolleyLLMArena
# Install dependencies
npm install
# Set up environment variables
cp .env.example .env
# Edit .env with your credentials
# Initialize the database
npx prisma db push
# Seed problems (if needed)
npm run seed
# Start development server
npm run dev
Visit http://localhost:3000 to view the leaderboard, or /browse to explore problems.
| Variable | Description | Required |
|---|---|---|
DATABASE_URL |
PostgreSQL connection string | ✅ |
OPENROUTER_API_KEY |
OpenRouter API key for LLM calls | ✅ |
NEXTAUTH_SECRET |
NextAuth secret for admin auth | ✅ |
ELEVENLABS_API_KEY |
ElevenLabs API key for TTS | Optional |
├── app/ # Next.js App Router pages
│ ├── page.tsx # Leaderboard (home)
│ ├── browse/ # Problem viewer
│ ├── admin/ # Admin dashboard
│ └── api/ # API routes
├── components/ # React components
│ ├── leaderboard/ # Leaderboard components
│ └── trolley/ # Trolley scene components
├── data/
│ └── problems.json # Problem definitions
├── lib/ # Utilities
│ ├── prisma.ts # Database client
│ ├── trolleyIterator.ts # LLM evaluation logic
│ └── rateLimit.ts # Rate limiting
├── prisma/
│ └── schema.prisma # Database schema
└── types/ # TypeScript types
Edit data/problems.json to add new trolley problems:
{
"id": "unique-problem-id",
"title": "Problem Title",
"text": "Description of the dilemma...",
"humanPullVotes": 0,
"humanNothingVotes": 0,
"option1": {
"src": "image-name",
"kill": 5
},
"option2": {
"src": "other-image",
"kill": 1
}
}
Then run the seed script to sync with the database.
- Add to Provider table via admin (
/admin/companies) or database - Configure in code if using a new API:
- Edit
lib/trolleyIterator.ts - Add API configuration for new providers
- Edit
- Optional: Add TTS voice - Set
voiceIdin Provider to enable ElevenLabs TTS
Supported models via OpenRouter:
- OpenAI (GPT-4, GPT-4o, o1, etc.)
- Anthropic (Claude 3.5, Claude 3, etc.)
- Google (Gemini Pro, Gemini Flash, etc.)
- Meta (Llama 3, etc.)
# Run tests in watch mode
npm run test
# Run tests once
npm run test:run
| Script | Description |
|---|---|
npm run dev |
Start development server |
npm run build |
Build for production |
npm run start |
Start production server |
npm run lint |
Run ESLint |
npm run test |
Run Vitest in watch mode |
npm run test:run |
Run tests once |
- Framework: Next.js 16 (App Router)
- Database: PostgreSQL + Prisma ORM
- Styling: Tailwind CSS 4
- Animation: Framer Motion
- Auth: NextAuth.js
- AI: OpenRouter (access to OpenAI, Claude, Gemini, etc.)
- TTS: ElevenLabs
- Testing: Vitest + Testing Library
- Absurd Trolley Problems by Neal Agarwal - The original source of all trolley problem scenarios and human voting data used in this project. Go play the original!
- Built with Next.js, Prisma, and love for ethical AI research.
MIT
This project is not affiliated with neal.fun. All trolley problem content is used for educational/research purposes.