A Defensive Engagement & Threat Intelligence Research Laboratory (Email-first)
Last updated: 2026-02-02 | Data period: December 2025 - February 2026
ScamBuster turns inbound scam emails into actionable threat intelligence through controlled, policy-driven engagement.
The project serves defensive security, fraud prevention, and applied research purposes (not offensive use). It extracts IOCs, maps campaigns, measures engagement effectiveness, and exports intelligence in STIX/MISP formats. All workflows are safety-gated, cost-aware, and fully auditable.
This repository is a public preview (documentation only). Operational assets remain private to prevent misuse.
Email scams operate at massive scale. Most security programs are forced into a block-and-forget posture: the message is removed, but the attacker infrastructure, financial rails, and campaign signals remain largely unobserved. Industry estimates and sourced figures are documented in Problem Statement.
This creates a structural gap. There is little to no attribution across messages and campaigns, limited visibility into evolving TTPs and infrastructure reuse, and slow feedback loops on what actually works. Most organizations miss opportunities to generate intelligence from real-world interaction with threat actors.
ScamBuster explores this gap by converting scam emails into measurable threat intelligence, safely and at scale.
ScamBuster is a research laboratory that transforms email scams into actionable intelligence through controlled AI engagement.
Instead of discarding scam emails, ScamBuster creates an observatory that answers critical questions:
| Question | ScamBuster Insight |
|---|---|
| What scam types are trending? | Real-time classification across 13 categories |
| Which personas maximize engagement? | Adaptive learning identifies optimal strategies per scam type |
| What IOCs do scammers reveal? | Automatic extraction of 34 indicator types |
| How do campaigns evolve? | Clustering and attribution over time |
| What works against different scammers? | Data-driven optimization, not intuition |
┌─────────────────────────────────────────────────────────────────────────┐
│ SCAMBUSTER RESEARCH LABORATORY │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ CONVERSATIONAL │ │ INTELLIGENCE │ │ ADAPTIVE │ │
│ │ LABORATORY │ │ EXTRACTION │ │ LEARNING │ │
│ ├──────────────────┤ ├──────────────────┤ ├──────────────────┤ │
│ │ │ │ │ │ │ │
│ │ Test which │ │ Analyze how & │ │ Automatically │ │
│ │ personas work │ │ when IOCs are │ │ optimize │ │
│ │ best for each │ │ revealed during │ │ strategies via │ │
│ │ scam type │ │ conversations │ │ reinforcement │ │
│ │ │ │ │ │ learning │ │
│ └──────────────────┘ └──────────────────┘ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
| Metric | Value | Notes |
|---|---|---|
| Conversations | +1K | Real scammers engaged |
| IOCs Extracted | +20K | Emails, phones, IBANs, crypto wallets |
| IOC Precision | 100% on audited sample (N=107) | vs 44% with regex-only baseline |
| System Uptime | 60 days | Zero incidents, fully automated |
| Operational Cost | €5.2 | Total LLM API cost |
| Cost per IOC | €0.0002 | Negligible operational expense |
Metrics scope & definitions
Figures come from controlled live deployment (December 2025 - February 2026):
- 60-day run: Used for stability, scale, and ROI indicators (+1K conversations, +20K IOCs, 60 days uptime)
- Controlled validation run: Used for precision analysis and campaign-level attribution
IOC precision (100%) = no false positives in audited sample (precision = TP / (TP + FP), N=107 messages). Sample-based validation details are documented in Evaluation Methodology.
Adaptive strategy selection was validated on 2,221 synthetic conversations with statistically significant results (p < 0.001). Full methodology and statistical details are available in Evaluation Methodology.
Strategy Performance Varies Significantly by Scam Type
The adaptive system discovered that:
- Optimal strategy differs significantly across scam categories
- Human intuition about "best" approaches is often wrong
- Data-driven selection outperforms random assignment
Campaign Attribution
From +1K conversations, identified coordinated operations:
- Shared infrastructure (same IBANs across conversations)
- Common TTPs (message templates, escalation patterns)
- Geographic clustering (phone number prefixes)
Five specialized AI agents work in concert:
| Agent | Role | Achievement |
|---|---|---|
| ScamClassifier | Categorize incoming scams | 82% auto-classification, 13 types |
| IocExtractor | Extract threat indicators | 100% precision on audited sample, 34 IOC types |
| Generator | Create contextual responses | +35% IOCs post-IBAN detection |
| Validator | Ensure safety & quality | 95% approval rate |
| Orchestrator | Coordinate & optimize costs | <€0.0002/message |
ScamBuster does not rely on a single fixed "best" conversational approach. Instead, it uses adaptive strategy selection to learn, per scam category, which safe persona/response patterns maximize intelligence yield under strict constraints.
Strategies are selected based on scam type (BEC, lottery, romance, refund, etc.). The system optimizes for defensive signals such as indicators revealed, validated artifacts, and sustained interaction, while controlling cost and safety. Every response is gated by validation rules and policy checks before being sent. Performance is monitored over time, enabling data-driven iteration rather than intuition.
| Aspect | Summary |
|---|---|
| Approach | Contextual bandit / adaptive experimentation |
| Context | One policy per scam category (extensible) |
| Strategy space | Persona & response patterns (kept private to prevent misuse) |
| Objectives | Intelligence yield, safety compliance, and cost efficiency |
| Capability | Benefit |
|---|---|
| Automated IOC feeds | STIX 2.1 / MISP-compatible exports |
| Campaign attribution | Link individual scams to organized operations |
| Early warning | Identify emerging threats before they scale |
| Reduced analyst workload | Automated extraction vs manual review |
| Capability | Benefit |
|---|---|
| Differentiation | Proactive TI service vs reactive blocking |
| Scalability | One deployment serves multiple clients |
| ROI demonstration | Quantifiable intelligence value |
| Capability | Benefit |
|---|---|
| BEC detection | Early identification of business email compromise |
| Account protection | Report fraudulent accounts to consortium |
| Fraud prevention | Intelligence on active money mule networks |
| Capability | Benefit |
|---|---|
| Reproducible methodology | Published protocol for evaluation |
| Dataset | Anonymized corpus (Feb 2026) |
| Collaboration | Open platform for strategy experimentation |
| Document | Description |
|---|---|
| Problem Statement | The €12.5B scam problem in depth |
| Value Proposition | Technical differentiators and ROI |
| Architecture | High-level system design |
| Security & Ethics | Defensive principles, GDPR, safety |
| Evaluation | Metrics, validation, statistical methods |
| Roadmap | Timeline and milestones |
| FAQ | Common questions |
To prevent misuse by adversaries, this repository contains documentation only:
- No engagement prompts or persona definitions
- No automation workflows or scripts
- No operational playbooks or tactics
- No real conversation data or scammer identifiers
- No API keys, secrets, or operational configurations
- No information enabling offensive use or replication without governance
| Phase | Status | Timeline |
|---|---|---|
| Phase 1: Multi-agent LLM architecture | ✅ Complete | Oct-Nov 2025 |
| Phase 2: Adaptive engagement (ε-greedy) | ✅ Complete | Nov-Dec 2025 |
| Phase 3: Thompson Sampling V2 | ✅ Feature-complete (rollout in progress) | Dec 2025 |
| Phase 4: Scale & Dashboards | 🔄 In Progress | Dec 2025 |
| Phase 5: A/B Testing | 📅 Planned | Jan 2026 |
| Phase 6: Publication & Dataset Release | 📅 Planned | Feb 2026 |
Status note: "Feature-complete" means core functionality is implemented and tested. "Rollout in progress" means gradual activation in production is ongoing. See Roadmap for week-by-week detail.
What you'll see:
- End-to-end flow (ingestion → engagement → extraction → export)
- Live dashboard with convergence visualization
- Sanitized sample outputs and IOC examples
What we need from you:
- Your role and organization context
- Specific use case or evaluation criteria
- Any compliance constraints (optional)
Eligibility: Access is granted for defensive security, research, or fraud prevention purposes only. No access for offensive use, scam operations, or purposes that conflict with the project's ethical guidelines.
Operational boundaries: The system only responds to scam emails already received and never initiates contact. There is no impersonation of real organizations, brands, or individuals (personas are synthetic role patterns, non-identifying). There is no unauthorized access, no hack-back, and no exploitation of scammer infrastructure.
Evaluate in your environment:
- Time-boxed deployment (4-8 weeks typical)
- Defined scope and success criteria
- Security and compliance review available
- Integration assessment with existing tools
- SOC/MSSP: SIEM/SOAR integration pilots
- Research: Dataset sharing, methodology validation
- Commercial: Enterprise licensing discussions
| Project lead | Laurent Giovannoni |
| linkedin.com/in/giovannonilaurent | |
| Context | E-MSc Cybersecurity, Master's Thesis |
| Demo request | Open a GitHub Issue (private requests welcome) |
| Security | See SECURITY.md for responsible disclosure |
| Layer | Technology |
|---|---|
| Backend | PHP 8.3, Symfony 7, DDD architecture |
| Database | PostgreSQL, Redis |
| LLM | OpenAI API (GPT-4o-mini) |
| Orchestration | n8n workflow automation |
| Infrastructure | Docker, GitLab CI |
| Security | Industry-standard encryption, secrets management |
- Methodological: Reproducible protocol for adaptive honeypot evaluation
- Technical: Multi-agent LLM with double validation (95% approval vs 60-70% baseline)
- Scientific: Empirically validated adaptive engagement (p < 0.001, N=2,221)
- Practical: Demonstrated efficiency at pilot scale (€5.2 for +20K IOCs)
@master{giovannoni2025scambuster,
author = {Giovannoni, Laurent},
title = {ScamBuster: Adaptive Controlled Engagement via Multi-Armed Bandits
for Automated Threat Intelligence Extraction},
school = {E-MSc Cybersecurity},
year = {2025}
}- Documentation: CC BY-NC-SA 4.0
- Code: Private (commercial/research license available)
- Dataset: CC BY-NC-SA 4.0 (anonymized, February 2026)