ScamBuster

A Defensive Engagement & Threat Intelligence Research Laboratory (Email-first)

Last updated: 2026-02-02 | Data period: December 2025 - February 2026

ScamBuster turns inbound scam emails into actionable threat intelligence through controlled, policy-driven engagement.

The project serves defensive security, fraud prevention, and applied research purposes (not offensive use). It extracts IOCs, maps campaigns, measures engagement effectiveness, and exports intelligence in STIX/MISP formats. All workflows are safety-gated, cost-aware, and fully auditable.

This repository is a public preview (documentation only). Operational assets remain private to prevent misuse.

The Problem: Email Scams Are High-Volume, and Mostly "Invisible" to Defenders

Email scams operate at massive scale. Most security programs are forced into a block-and-forget posture: the message is removed, but the attacker infrastructure, financial rails, and campaign signals remain largely unobserved. Industry estimates and sourced figures are documented in Problem Statement.

This creates a structural gap. There is little to no attribution across messages and campaigns, limited visibility into evolving TTPs and infrastructure reuse, and slow feedback loops on what actually works. Most organizations miss opportunities to generate intelligence from real-world interaction with threat actors.

ScamBuster explores this gap by converting scam emails into measurable threat intelligence, safely and at scale.

ScamBuster: From Blocking to Understanding

ScamBuster is a research laboratory that transforms email scams into actionable intelligence through controlled AI engagement.

The Vision: A Scam Observatory

Instead of discarding scam emails, ScamBuster creates an observatory that answers critical questions:

Question	ScamBuster Insight
What scam types are trending?	Real-time classification across 13 categories
Which personas maximize engagement?	Adaptive learning identifies optimal strategies per scam type
What IOCs do scammers reveal?	Automatic extraction of 34 indicator types
How do campaigns evolve?	Clustering and attribution over time
What works against different scammers?	Data-driven optimization, not intuition

Three Research Dimensions

┌─────────────────────────────────────────────────────────────────────────┐
│                    SCAMBUSTER RESEARCH LABORATORY                        │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐       │
│  │  CONVERSATIONAL  │  │   INTELLIGENCE   │  │    ADAPTIVE      │       │
│  │    LABORATORY    │  │    EXTRACTION    │  │    LEARNING      │       │
│  ├──────────────────┤  ├──────────────────┤  ├──────────────────┤       │
│  │                  │  │                  │  │                  │       │
│  │ Test which       │  │ Analyze how &    │  │ Automatically    │       │
│  │ personas work    │  │ when IOCs are    │  │ optimize         │       │
│  │ best for each    │  │ revealed during  │  │ strategies via   │       │
│  │ scam type        │  │ conversations    │  │ reinforcement    │       │
│  │                  │  │                  │  │ learning         │       │
│  └──────────────────┘  └──────────────────┘  └──────────────────┘       │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Pilot Results (February 2026)

Controlled Live Deployment (60 Days)

Metric	Value	Notes
Conversations	+1K	Real scammers engaged
IOCs Extracted	+20K	Emails, phones, IBANs, crypto wallets
IOC Precision	100% on audited sample (N=107)	vs 44% with regex-only baseline
System Uptime	60 days	Zero incidents, fully automated
Operational Cost	€5.2	Total LLM API cost
Cost per IOC	€0.0002	Negligible operational expense

Metrics scope & definitions

Figures come from controlled live deployment (December 2025 - February 2026):

60-day run: Used for stability, scale, and ROI indicators (+1K conversations, +20K IOCs, 60 days uptime)

Controlled validation run: Used for precision analysis and campaign-level attribution

IOC precision (100%) = no false positives in audited sample (precision = TP / (TP + FP), N=107 messages). Sample-based validation details are documented in Evaluation Methodology.

Validation Summary

Adaptive strategy selection was validated on 2,221 synthetic conversations with statistically significant results (p < 0.001). Full methodology and statistical details are available in Evaluation Methodology.

Key Discoveries

Strategy Performance Varies Significantly by Scam Type

The adaptive system discovered that:

Optimal strategy differs significantly across scam categories
Human intuition about "best" approaches is often wrong
Data-driven selection outperforms random assignment

Campaign Attribution

From +1K conversations, identified coordinated operations:

Shared infrastructure (same IBANs across conversations)
Common TTPs (message templates, escalation patterns)
Geographic clustering (phone number prefixes)

How It Works

Multi-Agent LLM Architecture

Five specialized AI agents work in concert:

Agent	Role	Achievement
ScamClassifier	Categorize incoming scams	82% auto-classification, 13 types
IocExtractor	Extract threat indicators	100% precision on audited sample, 34 IOC types
Generator	Create contextual responses	+35% IOCs post-IBAN detection
Validator	Ensure safety & quality	95% approval rate
Orchestrator	Coordinate & optimize costs	<€0.0002/message

Adaptive Strategy Selection (Applied Research)

ScamBuster does not rely on a single fixed "best" conversational approach. Instead, it uses adaptive strategy selection to learn, per scam category, which safe persona/response patterns maximize intelligence yield under strict constraints.

Strategies are selected based on scam type (BEC, lottery, romance, refund, etc.). The system optimizes for defensive signals such as indicators revealed, validated artifacts, and sustained interaction, while controlling cost and safety. Every response is gated by validation rules and policy checks before being sent. Performance is monitored over time, enabling data-driven iteration rather than intuition.

Aspect	Summary
Approach	Contextual bandit / adaptive experimentation
Context	One policy per scam category (extensible)
Strategy space	Persona & response patterns (kept private to prevent misuse)
Objectives	Intelligence yield, safety compliance, and cost efficiency

Value for Stakeholders

For SOC/CERT Teams

Capability	Benefit
Automated IOC feeds	STIX 2.1 / MISP-compatible exports
Campaign attribution	Link individual scams to organized operations
Early warning	Identify emerging threats before they scale
Reduced analyst workload	Automated extraction vs manual review

For MSSPs

Capability	Benefit
Differentiation	Proactive TI service vs reactive blocking
Scalability	One deployment serves multiple clients
ROI demonstration	Quantifiable intelligence value

For Financial Institutions

Capability	Benefit
BEC detection	Early identification of business email compromise
Account protection	Report fraudulent accounts to consortium
Fraud prevention	Intelligence on active money mule networks

For Research

Capability	Benefit
Reproducible methodology	Published protocol for evaluation
Dataset	Anonymized corpus (Feb 2026)
Collaboration	Open platform for strategy experimentation

Documentation

Document	Description
Problem Statement	The €12.5B scam problem in depth
Value Proposition	Technical differentiators and ROI
Architecture	High-level system design
Security & Ethics	Defensive principles, GDPR, safety
Evaluation	Metrics, validation, statistical methods
Roadmap	Timeline and milestones
FAQ	Common questions

What's NOT Included (Operational Security)

To prevent misuse by adversaries, this repository contains documentation only:

No engagement prompts or persona definitions
No automation workflows or scripts
No operational playbooks or tactics
No real conversation data or scammer identifiers
No API keys, secrets, or operational configurations
No information enabling offensive use or replication without governance

Project Status

Phase	Status	Timeline
Phase 1: Multi-agent LLM architecture	✅ Complete	Oct-Nov 2025
Phase 2: Adaptive engagement (ε-greedy)	✅ Complete	Nov-Dec 2025
Phase 3: Thompson Sampling V2	✅ Feature-complete (rollout in progress)	Dec 2025
Phase 4: Scale & Dashboards	🔄 In Progress	Dec 2025
Phase 5: A/B Testing	📅 Planned	Jan 2026
Phase 6: Publication & Dataset Release	📅 Planned	Feb 2026

Status note: "Feature-complete" means core functionality is implemented and tested. "Rollout in progress" means gradual activation in production is ongoing. See Roadmap for week-by-week detail.

Request Access

Private Demo (45 min)

What you'll see:

End-to-end flow (ingestion → engagement → extraction → export)
Live dashboard with convergence visualization
Sanitized sample outputs and IOC examples

What we need from you:

Your role and organization context
Specific use case or evaluation criteria
Any compliance constraints (optional)

Eligibility: Access is granted for defensive security, research, or fraud prevention purposes only. No access for offensive use, scam operations, or purposes that conflict with the project's ethical guidelines.

Operational boundaries: The system only responds to scam emails already received and never initiates contact. There is no impersonation of real organizations, brands, or individuals (personas are synthetic role patterns, non-identifying). There is no unauthorized access, no hack-back, and no exploitation of scammer infrastructure.

Pilot Program

Evaluate in your environment:

Time-boxed deployment (4-8 weeks typical)
Defined scope and success criteria
Security and compliance review available
Integration assessment with existing tools

Partnership Opportunities

SOC/MSSP: SIEM/SOAR integration pilots
Research: Dataset sharing, methodology validation
Commercial: Enterprise licensing discussions

Contact


Project lead	Laurent Giovannoni
LinkedIn	linkedin.com/in/giovannonilaurent
Context	E-MSc Cybersecurity, Master's Thesis
Demo request	Open a GitHub Issue (private requests welcome)
Security	See SECURITY.md for responsible disclosure

Technology Stack

Layer	Technology
Backend	PHP 8.3, Symfony 7, DDD architecture
Database	PostgreSQL, Redis
LLM	OpenAI API (GPT-4o-mini)
Orchestration	n8n workflow automation
Infrastructure	Docker, GitLab CI
Security	Industry-standard encryption, secrets management

Academic Context

Research Contributions

Methodological: Reproducible protocol for adaptive honeypot evaluation
Technical: Multi-agent LLM with double validation (95% approval vs 60-70% baseline)
Scientific: Empirically validated adaptive engagement (p < 0.001, N=2,221)
Practical: Demonstrated efficiency at pilot scale (€5.2 for +20K IOCs)

Citation

@master{giovannoni2025scambuster,
  author = {Giovannoni, Laurent},
  title = {ScamBuster: Adaptive Controlled Engagement via Multi-Armed Bandits
           for Automated Threat Intelligence Extraction},
  school = {E-MSc Cybersecurity},
  year = {2025}
}

License

Documentation: CC BY-NC-SA 4.0
Code: Private (commercial/research license available)
Dataset: CC BY-NC-SA 4.0 (anonymized, February 2026)

Learn More • Request Demo • View Roadmap

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
docs		docs
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScamBuster

The Problem: Email Scams Are High-Volume, and Mostly "Invisible" to Defenders

ScamBuster: From Blocking to Understanding

The Vision: A Scam Observatory

Three Research Dimensions

Pilot Results (February 2026)

Controlled Live Deployment (60 Days)

Validation Summary

Key Discoveries

How It Works

Multi-Agent LLM Architecture

Adaptive Strategy Selection (Applied Research)

Value for Stakeholders

For SOC/CERT Teams

For MSSPs

For Financial Institutions

For Research

Documentation

What's NOT Included (Operational Security)

Project Status

Request Access

Private Demo (45 min)

Pilot Program

Partnership Opportunities

Contact

Technology Stack

Academic Context

Research Contributions

Citation

License

About

Uh oh!

Releases

Packages

License

laugiov/scambuster-preview

Folders and files

Latest commit

History

Repository files navigation

ScamBuster

The Problem: Email Scams Are High-Volume, and Mostly "Invisible" to Defenders

ScamBuster: From Blocking to Understanding

The Vision: A Scam Observatory

Three Research Dimensions

Pilot Results (February 2026)

Controlled Live Deployment (60 Days)

Validation Summary

Key Discoveries

How It Works

Multi-Agent LLM Architecture

Adaptive Strategy Selection (Applied Research)

Value for Stakeholders

For SOC/CERT Teams

For MSSPs

For Financial Institutions

For Research

Documentation

What's NOT Included (Operational Security)

Project Status

Request Access

Private Demo (45 min)

Pilot Program

Partnership Opportunities

Contact

Technology Stack

Academic Context

Research Contributions

Citation

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages