Vector Log Reduction Benchmark

Reproducible benchmark of in-flight telemetry reduction techniques using Vector on public network security datasets (CIC-IDS2017, UNSW-NB15).

This artifact accompanies the paper:

"Context-Aware Security Telemetry Reduction: A Streaming Architecture for Scalable SOC Pipelines"
IEEE Access, 2026
DOI: TBD (assigned upon acceptance)

Techniques

Profile	Config	Description
Baseline	`baseline.toml`	No reduction — reference forwarder
Filter-only	`filter.toml`	Pre-ingest rules dropping explicitly benign events
Field Pruning	`prune.toml`	Keeps minimal schema (timestamp, src_ip, dst_ip, event_id, proto)
Sampling (10%)	`sample.toml`	10% stratified Bernoulli sampling preserving class balance
Template Hashing	`hash.toml`	xxHash64 fingerprint of structural template + `.count` aggregation (T_flush = 60s)
Hybrid	`combined.toml`	Field Pruning → 10% Sampling (sequential)

Note on hashing algorithm: The paper uses xxHash64 — deterministic, O(1)/event, 13.8 GB/s throughput, 8-byte key, collision resistance ~2³². SHA-1 was an earlier placeholder and does not reproduce paper results.

Hardware & Software

Parameter	Value
CPU	Xeon Silver 4310
RAM	64 GB
Storage	NVMe SSD
OS	Ubuntu 22.04
Vector	v0.41.0
ClickHouse	v23.12
Python	3.10+
Replay rate	5,000 EPS (fixed)
Warm-up discard	30 s
Repetitions	n = 3

Quick Start

1) Prerequisites

Docker & Docker Compose
Python 3.10+

2) Start services

docker compose up -d

3) Initialize ClickHouse (once)

docker exec -i clickhouse bash -lc "clickhouse-client --multiquery < /docker-entrypoint-initdb.d/init.sql"

4) Prepare dataset

Place CSV/JSONL file in data/ with columns:

timestamp, src_ip, dst_ip, event_id, proto, message, is_attack, label

5) Replay (choose a profile)

python scripts/replay.py --input data/cicids_sample.csv --format csv --profile baseline
python scripts/replay.py --input data/cicids_sample.csv --format csv --profile filter
python scripts/replay.py --input data/cicids_sample.csv --format csv --profile prune
python scripts/replay.py --input data/cicids_sample.csv --format csv --profile sample
python scripts/replay.py --input data/cicids_sample.csv --format csv --profile template-hash
python scripts/replay.py --input data/cicids_sample.csv --format csv --profile combined

6) Query metrics (4 paper metrics)

-- Throughput: measured externally via replay.py timer (kEPS)

-- Byte Reduction (%)
SELECT profile,
       (1 - sum(length(message)) / any(input_bytes)) * 100 AS byte_reduction_pct
FROM logs.reduced GROUP BY profile ORDER BY byte_reduction_pct DESC;

-- Attack Coverage (%)
SELECT profile,
       sum(if(is_attack = 1, 1, 0)) AS retained_attacks,
       retained_attacks / (SELECT count() FROM logs.raw WHERE is_attack=1) * 100 AS coverage_pct
FROM logs.reduced GROUP BY profile;

-- Latency p95 (ms): reported by replay.py

See docs/metrics.md for exact metric definitions used in the paper.

Repository Layout

vector-log-reduction-benchmark/
├─ README.md
├─ docker-compose.yml
├─ LICENSE
├─ CITATION.cff
├─ requirements.txt
├─ docs/
│  └─ metrics.md          ← Metric definitions (paper Section IV-C)
├─ .github/workflows/ci.yml
├─ clickhouse/init.sql
├─ vector/configs/
│  ├─ baseline.toml
│  ├─ filter.toml
│  ├─ prune.toml
│  ├─ sample.toml
│  ├─ hash.toml            ← Template Hashing (xxHash64, T_flush=60s)
│  └─ combined.toml        ← Hybrid: Pruning + Sampling
└─ scripts/replay.py

Reproducibility Statement

All Vector TOML configs, replay scripts, and ClickHouse DDL are provided.
A DOI will be minted upon acceptance (Zenodo/GitHub release).
See CITATION.cff for a citable reference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vector Log Reduction Benchmark

Techniques

Hardware & Software

Quick Start

1) Prerequisites

2) Start services

3) Initialize ClickHouse (once)

4) Prepare dataset

5) Replay (choose a profile)

6) Query metrics (4 paper metrics)

Repository Layout

Reproducibility Statement

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
clickhouse		clickhouse
docs		docs
results		results
scripts		scripts
vector/configs		vector/configs
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Vector Log Reduction Benchmark

Techniques

Hardware & Software

Quick Start

1) Prerequisites

2) Start services

3) Initialize ClickHouse (once)

4) Prepare dataset

5) Replay (choose a profile)

6) Query metrics (4 paper metrics)

Repository Layout

Reproducibility Statement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages