HEPop is a high-performance HEP Capture Server built with DuckDB, Bun and Apache Arrow/Parquet
- High-Performance HEP Server
- HEPv3/EEP Support (UDP/TCP)
- Apache Parquet Writer
- Parquet Columnar WAL + Storage
- Automatic Rotation + Compaction
- Automatic Metadata Management
- DuckDB Integration
- Parquet Data Compaction
- Query Execution
- Search API
- GET/POST Query API
- Metadata Table/Range Selection
%%{
init: {
'theme': 'base',
'themeVariables': {
'primaryColor': '#BB2528',
'primaryTextColor': '#fff',
'primaryBorderColor': '#7C0000',
'lineColor': '#F8B229',
'secondaryColor': '#006100',
'tertiaryColor': '#fff'
}
}
}%%
graph TD;
HEP-Client-- UDP/TCP -->HEPop;
HEPop-->ParquetWriter;
ParquetWriter-->Storage;
ParquetWriter-->Metadata;
Storage-->Compactor;
Compactor-->Storage;
Compactor-->Metadata;
Storage-.->LocalFS;
Storage-.->S3;
HTTP-API-- GET/POST --> HEPop;
DuckDB-->Storage;
DuckDB-->Metadata;
subgraph HEPop[HEPop Server]
ParquetWriter
Compactor
Metadata;
DuckDB;
end
Use Bun to install, build and run hepop
bun install
bun start
Configure HEPop using Environment variables:
PORT
: HEP server port (default: 9069)HTTP_PORT
: Query API port (default: PORT + 1)HOST
: Bind address (default: "0.0.0.0")PARQUET_DIR
: Data directory (default: "./data")WRITER_ID
: Instance identifier (default: hostname)
HEPop organizes data in a time-based directory structure:
data/
└── writer1/
└── dbs/
└── hep-0/
├── hep_1-0/
│ └── 2025-02-08/
│ ├── 19-00/
│ │ └── c_0000000001.parquet
│ ├── 19-10/
│ │ └── 0000000002.parquet
│ └── metadata.json
└── hep_100-0/
└── ...
- Each HEP type gets its own directory structure
- Generated Parquet files are organized by date and hour
- Compacted sets (c_) consolidate files for fast access
- Metadata tracks all files, compaction and statistics
{
"type": 1,
"parquet_size_bytes": 379739,
"row_count": 359,
"min_time": 1739043338978000000,
"max_time": 1739043934193000000,
"wal_sequence": 32,
"files": [
{
"id": 0,
"path": "data/writer1/dbs/hep-0/hep_1-0/2025-02-08/19-00/c_0000000032.parquet",
"size_bytes": 379739,
"row_count": 359,
"chunk_time": 1739043000000000000,
"min_time": 1739043338978000000,
"max_time": 1739043934193000000,
"range": "1h",
"type": "compacted"
}
]
}
Query the HEP data using the HTTP API. The server provides both GET and POST endpoints for querying data.
- Time Range: If not specified, defaults to last 10 minutes
- Dynamic Columns: Select specific columns or use * for all
- Filtering: WHERE clause supports standard SQL conditions
- Sorting: ORDER BY supports all columns
- Pagination: Use LIMIT and OFFSET for paging
HEP virtual fields are automatically exploded at query time
timestamp/time
: Event timestamprcinfo
: Raw HEP protocol header (JSON)payload
: HEP Protocol payloadsrc_ip
: Source IP (rcinfo)dst_ip
: Destination IP (rcinfo)src_port
: Source port (rcinfo)dst_port
: Destination port (rcinfo)
# Query last 10 minutes of SIP messages
curl "http://localhost:9070/query?q=SELECT time,src_ip,dst_ip,payload FROM hep_1 LIMIT 10"
# Complex query with time range and conditions
curl -X POST http://localhost:9070/query \
-H "Content-Type: application/json" \
-d '{
"query": "SELECT time, src_ip, dst_ip, payload FROM hep_1 WHERE time >= '\''2025-02-08T19:00:00'\'' AND payload LIKE '\''%INVITE%'\'' ORDER BY time DESC"
}'
Query HEP data using DuckDB, ClickHouse, Databend or any Parquet-compatible tool:
SELECT count() FROM 'data/writer1/dbs/hep-0/hep_1-*/*/*/c_0000000001.parquet' LIMIT 10;
HEPop.js also supports InfluxDB Line Protocol ingestion for metrics and events.
Send metrics using the InfluxDB Line Protocol format. Each line represents a single data point with measurement, tags, fields and optional timestamp.
# Single metric
curl -i -XPOST "http://localhost:9070/write" --data-raw 'cpu,host=server01,region=us-west usage_idle=92.6,usage_user=7.4'
# Multiple metrics
curl -i -XPOST "http://localhost:9070/write" --data-raw '
memory,host=server01,region=us-west used_percent=23.43,free=7.82
disk,host=server01,region=us-west used_percent=86.45,free=21.45
network,host=server01,region=us-west rx_bytes=7834,tx_bytes=9843
'
<measurement>[,<tag_key>=<tag_value>] <field_key>=<field_value>[,<field_key>=<field_value>] [timestamp]
- measurement: Name of the metric (required)
- tags: Optional key-value pairs for categorizing data
- fields: One or more key-value pairs of the actual metric values
- timestamp: Optional timestamp in nanoseconds since Unix epoch
Query metrics using the same SQL interface:
# Query last 10 minutes of CPU metrics
curl -X POST http://localhost:9070/query \
-H "Content-Type: application/json" \
-d '{
"query": "SELECT time, host, region, usage_idle, usage_user FROM cpu WHERE time >= '\''2025-02-09T16:00:00'\''"
}'
# Aggregate metrics by host
curl -X POST http://localhost:9070/query \
-H "Content-Type: application/json" \
-d '{
"query": "SELECT host, avg(used_percent) as avg_used FROM memory GROUP BY host ORDER BY avg_used DESC"
}'
The Line Protocol data is stored in Parquet files using the same directory structure and compaction strategy as HEP data, allowing for efficient querying and storage.
©️ QXIP BV - Released under the AGPLv3 Open Source License.