Polymarket Pricing Gap Detection System

Disclaimer: This project is a solo educational and academic research exercise only. It is not investment advice, not a securities recommendation, and not a solicitation to buy, sell, or trade any financial instrument or prediction market contract. The author is not acting in any professional, advisory, or fiduciary capacity. This project is not associated with any financial institution, broker-dealer, or registered investment advisor. Any results, outputs, or analysis are purely theoretical and should not be used as the basis for any trading or investment decision. Past performance or backtesting results do not guarantee future results.

A multi-agent system built as a learning exercise to explore NLP, agentic AI frameworks, and public API integration using prediction market data.

Latest: See CHANGELOG.md for the full v2.0 release notes — 5 new data sources, live dashboard, DeepSeek LLM, ensemble sentiment, smart contract selection, backtesting, and more.

Architecture Overview

This system uses CrewAI as the agentic framework for coordinating four specialized agents:

Data Collection Agent: Fetches Polymarket contracts and social media data from 8 sources
Sentiment Analysis Agent: Analyzes social sentiment using ensemble methods (VADER + TextBlob + LLM)
Gap Detection Agent: Identifies pricing inefficiencies across four categories with dynamic confidence scoring
Ranking & Reporting Agent: Prioritizes and formats opportunities

Data Sources (8 total)

Source	Type	Cost	Description
Polymarket	Market data	Free	Contract odds, volume, liquidity via Gamma API
Bluesky	Social	Free	Public posts via AT Protocol
GDELT	News	Free	Global news monitoring, 65+ languages
RSS Feeds	News	Free	Reuters, BBC, CNN, AP, Google News
Polymarket Comments	Social	Free	Comments on Polymarket markets
X Mirror (Nitter)	Social	Free	Public tweets via Nitter mirrors (fallback)
Tavily	Web search	Paid	Real-time web search API
Grok/xAI	Social	Paid	X/Twitter sentiment via xAI API

All paid sources are optional — system degrades gracefully with missing API keys.

LLM Options

Provider	Cost	Speed	Quality	Config
DeepSeek (Recommended)	~$0.50/cycle	Fast	Excellent	`LLM_PROVIDER=deepseek`
Ollama (Free)	$0	Moderate	Good	`LLM_PROVIDER=ollama`
OpenAI	~$0.30/cycle	Fast	Excellent	`LLM_PROVIDER=openai`

The system uses get_fast_llm() (always Ollama) for cheap tasks like keyword extraction, and the primary LLM for complex sentiment/gap analysis.

Smart Contract Selection

Instead of analyzing every contract equally, the system uses a filter-and-rank approach:

Fetch the full universe (500+ active contracts) from Polymarket
Store all in the database for historical odds tracking
Filter out garbage: dead markets, no-odds contracts, basically-resolved (97%+/3%-)
Rank remaining by composite score and process best-first:
- Volume (30%) — actively traded markets
- Volatility (25%) — recent price movement / breaking news
- Uncertainty (20%) — mid-range odds with room for mispricing
- Near-expiry (10%) — resolving soon, most actionable
- Liquidity + spread (15%) — tradeable markets

Dual Search Strategy

Each social source performs two searches per contract:

Keyword search — extracted topic keywords for broad sentiment
Contract title search — the actual Polymarket question to find people discussing the specific bet

GDELT and RSS use keyword-only since news articles cover topics, not bet titles.

Features

Real-time Monitoring: Continuous polling with configurable intervals and interactive mode
8-Source Data Pipeline: Social media, news, web search, and market data aggregation
Ensemble Sentiment Analysis: VADER + TextBlob + LLM weighted combination with rolling windows
Four Gap Types:
- Sentiment-Probability Mismatches
- Information Asymmetry Detection
- Cross-Market Arbitrage (Kalshi + Manifold Markets)
- Historical Pattern Deviations
Dynamic Confidence Scoring: Multi-factor scoring considering gap size, data volume, source diversity, contract features
Contract Feature Engineering: Volatility, momentum, time-to-expiry, spread analysis
Backtesting Framework: Win rate and edge analysis on resolved predictions
Live Dashboard: FastAPI web interface with gap explorer, sentiment charts, cycle history, data source monitoring
CSV Export: Download filtered gaps for external analysis
Cycle History Tracking: Every pipeline run logged with performance metrics
PostgreSQL Storage: Full historical data for trend analysis
Ethical Data Collection: Rate limiting, robots.txt compliance, platform ToS adherence

Prerequisites

Required:

Python 3.9+
PostgreSQL 12+

LLM (choose one):

DeepSeek (Recommended) — Sign up at platform.deepseek.com
Ollama (FREE) — Install guide
OpenAI (Paid) — From platform.openai.com

Optional data sources:

Bluesky account (free — create at bsky.app)
Tavily API key (from tavily.com)
Grok/xAI API key (from x.ai)
FMP API key (from financialmodelingprep.com)

Installation

Clone the repository

git clone https://github.com/mwhite732/Polymarket_Agents.git
cd Polymarket_Agents

Install dependencies

pip install -r requirements.txt

Set up PostgreSQL database

createdb polymarket_gaps
psql -d polymarket_gaps -f migrations/init_db.sql
psql -d polymarket_gaps -f migrations/002_upgrade_schema.sql
psql -d polymarket_gaps -f migrations/003_cycle_runs.sql

Configure environment variables

cp config/.env.example .env
# Edit .env with your credentials

Configuration

Edit the .env file:

# Database
DATABASE_URL=postgresql://postgres:password@localhost:5432/polymarket_gaps

# LLM Provider (deepseek, ollama, or openai)
LLM_PROVIDER=deepseek
DEEPSEEK_API_KEY=your_key_here
DEEPSEEK_MODEL=deepseek-chat

# Ollama fallback (used for cheap tasks regardless of primary LLM)
OLLAMA_MODEL=qwen2.5:7b

# Free data sources (no keys needed)
ENABLE_GDELT=true
ENABLE_X_MIRROR=true

# Optional paid data sources
TAVILY_API_KEY=your_key_here
ENABLE_TAVILY=true

# Bluesky (free — create account at bsky.app)
BLUESKY_HANDLE=yourhandle.bsky.social
BLUESKY_APP_PASSWORD=your-app-password

# System settings
POLLING_INTERVAL=300
MAX_CONTRACTS_PER_CYCLE=100
ENABLE_BACKTESTING=true
ENABLE_ENSEMBLE_SENTIMENT=true

Usage

Interactive Mode (Recommended)

python run.py
# Runs one cycle, prompts before next. Dashboard stays alive at http://localhost:8000

Other Modes

python run.py continuous    # Non-stop monitoring
python run.py once          # Single cycle, then exit
python run.py demo          # Demo with verbose output
python run.py dashboard     # Dashboard only (no analysis)

Dashboard

Access the live dashboard at http://localhost:8000 after starting any mode. Features:

Gap Explorer — filterable list with confidence badges
Top Contracts — ranked by social buzz and gap activity
Sentiment vs Price — per-contract chart overlay
Cycle History — performance log with timing and cost estimates
Data Sources — live status of all 8 sources
Alerts — notification banner for newly detected gaps
CSV Export — download gaps at /api/gaps/export

Project Structure

polymarket_agents/
├── src/
│   ├── agents/                    # CrewAI agent implementations
│   │   ├── data_collector.py            # 8-source data collection + smart contract selection
│   │   ├── sentiment_analyzer.py        # Ensemble sentiment (VADER + TextBlob + LLM)
│   │   ├── gap_detector.py              # Gap detection + dynamic confidence scoring
│   │   └── reporter.py                  # Ranking and formatted output
│   ├── database/                  # SQLAlchemy models and connection
│   │   ├── models.py                    # Contract, SocialPost, DetectedGap, CycleRun, etc.
│   │   └── connection.py               # Database manager with session handling
│   ├── services/                  # External API integrations
│   │   ├── polymarket_api.py            # Polymarket CLOB + Gamma API + comments
│   │   ├── bluesky_scraper.py           # Bluesky AT Protocol
│   │   ├── rss_news_scraper.py          # Free RSS news feeds
│   │   ├── gdelt_api.py                 # GDELT global news (free)
│   │   ├── tavily_search.py             # Tavily web search (paid)
│   │   ├── grok_sentiment.py            # Grok/xAI X sentiment (paid)
│   │   ├── x_mirror_scraper.py          # Nitter/XCancel tweet scraper (free)
│   │   ├── fmp_api.py                   # Financial Modeling Prep (paid)
│   │   ├── kalshi_api.py                # Kalshi cross-market arbitrage
│   │   ├── manifold_api.py              # Manifold Markets cross-reference
│   │   ├── twitter_scraper.py           # Twitter/X API (optional)
│   │   └── reddit_scraper.py            # Reddit API (optional)
│   ├── analysis/                  # Analysis modules
│   │   └── backtester.py                # Backtesting framework
│   ├── features/                  # Feature engineering
│   │   └── contract_features.py         # Contract-level feature computation
│   ├── sentiment/                 # Sentiment analysis
│   │   └── ensemble_sentiment.py        # VADER + TextBlob ensemble
│   ├── scoring/                   # Confidence scoring
│   │   └── confidence_scorer.py         # Dynamic multi-factor scoring
│   ├── dashboard/                 # Web dashboard
│   │   ├── app.py                       # FastAPI backend (8 endpoints)
│   │   └── static/index.html            # Frontend (vanilla JS + Chart.js)
│   ├── utils/
│   │   └── logger.py
│   ├── config.py                  # Pydantic settings (40+ config fields)
│   └── main.py                    # Main orchestration + cycle tracking
├── migrations/
│   ├── init_db.sql                # Initial schema
│   ├── 002_upgrade_schema.sql     # v2.0 schema additions
│   └── 003_cycle_runs.sql         # Cycle history table
├── config/
│   └── .env.example
├── requirements.txt
├── run.py                         # Entry point with multiple run modes
├── CHANGELOG.md
└── README.md

Database Schema

Table	Purpose
`contracts`	Polymarket contract data with current odds, volume, liquidity, category
`social_posts`	Social media posts from all 8 sources
`sentiment_analysis`	Per-post sentiment scores (LLM + VADER + TextBlob + ensemble)
`detected_gaps`	Pricing gaps with confidence scores and contract features
`historical_odds`	Time-series odds data for trend/volatility analysis
`cycle_runs`	Pipeline execution history with performance metrics
`sentiment_snapshots`	Aggregated sentiment windows per contract
`backtest_results`	Backtesting performance metrics
`system_logs`	System events and errors

Development Roadmap

Ethical Considerations

Rate Limiting: All API calls respect platform limits
robots.txt Compliance: Web scraping follows robots.txt rules
Terms of Service: Implementation adheres to all platform ToS
Data Privacy: No personal data collection
Transparency: All data sources are clearly attributed

License

MIT License - See LICENSE file for details

Disclaimer

This software is a solo educational and academic research project only. It is not investment advice, not a securities recommendation, and not a solicitation to buy, sell, or trade any financial instrument or prediction market contract. The author is not acting in any professional, advisory, or fiduciary capacity. This project is not associated with any financial institution, broker-dealer, or registered investment advisor.

Any results, outputs, edge calculations, or analysis produced by this system are purely theoretical and academic in nature. They have not been validated for real-world application and should not be used as the basis for any trading or investment decision. Past performance or backtesting results do not guarantee future results.

Use at your own risk for learning purposes only.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
config		config
migrations		migrations
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
DEVELOPMENT.md		DEVELOPMENT.md
EXAMPLE_OUTPUT.txt		EXAMPLE_OUTPUT.txt
OLLAMA_SETUP.md		OLLAMA_SETUP.md
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
QUICK_START.md		QUICK_START.md
README.md		README.md
SETUP_GUIDE.md		SETUP_GUIDE.md
python		python
requirements-flexible.txt		requirements-flexible.txt
requirements.txt		requirements.txt
run.py		run.py
test_db_connection.py		test_db_connection.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Polymarket Pricing Gap Detection System

Architecture Overview

Data Sources (8 total)

LLM Options

Smart Contract Selection

Dual Search Strategy

Features

Prerequisites

Installation

Configuration

Usage

Interactive Mode (Recommended)

Other Modes

Dashboard

Project Structure

Database Schema

Development Roadmap

Ethical Considerations

License

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Polymarket Pricing Gap Detection System

Architecture Overview

Data Sources (8 total)

LLM Options

Smart Contract Selection

Dual Search Strategy

Features

Prerequisites

Installation

Configuration

Usage

Interactive Mode (Recommended)

Other Modes

Dashboard

Project Structure

Database Schema

Development Roadmap

Ethical Considerations

License

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages