Skip to content

sipcapture/HEPop

Repository files navigation

HEPop is a high-performance HEP Capture Server built with DuckDB, Bun and Apache Arrow/Parquet

Features
  • High-Performance HEP Server
    • HEPv3/EEP Support (UDP/TCP)
  • Apache Parquet Writer
    • Parquet Columnar WAL + Storage
    • Automatic Rotation + Compaction
    • Automatic Metadata Management
  • DuckDB Integration
    • Parquet Data Compaction
    • Query Execution
  • Search API
    • GET/POST Query API
    • Metadata Table/Range Selection
%%{
  init: {
    'theme': 'base',
    'themeVariables': {
      'primaryColor': '#BB2528',
      'primaryTextColor': '#fff',
      'primaryBorderColor': '#7C0000',
      'lineColor': '#F8B229',
      'secondaryColor': '#006100',
      'tertiaryColor': '#fff'
    }
  }
}%%

  graph TD;
      HEP-Client-- UDP/TCP -->HEPop;
      HEPop-->ParquetWriter;
      ParquetWriter-->Storage;
      ParquetWriter-->Metadata;
      Storage-->Compactor;
      Compactor-->Storage;
      Compactor-->Metadata;
      Storage-.->LocalFS;
      Storage-.->S3;
      HTTP-API-- GET/POST --> HEPop;
      DuckDB-->Storage;
      DuckDB-->Metadata;

      subgraph HEPop[HEPop Server]
        ParquetWriter
        Compactor
        Metadata;
        DuckDB;
      end

Loading

Install & Start

Use Bun to install, build and run hepop

bun install
bun start

Configuration

Configure HEPop using Environment variables:

  • PORT: HEP server port (default: 9069)
  • HTTP_PORT: Query API port (default: PORT + 1)
  • HOST: Bind address (default: "0.0.0.0")
  • PARQUET_DIR: Data directory (default: "./data")
  • WRITER_ID: Instance identifier (default: hostname)

Storage Structure

HEPop organizes data in a time-based directory structure:

data/
└── writer1/
    └── dbs/
        └── hep-0/
            ├── hep_1-0/
            │   └── 2025-02-08/
            │       ├── 19-00/
            │       │   └── c_0000000001.parquet
            │       ├── 19-10/
            │       │   └── 0000000002.parquet
            │       └── metadata.json
            └── hep_100-0/
                └── ...
  • Each HEP type gets its own directory structure
  • Generated Parquet files are organized by date and hour
  • Compacted sets (c_) consolidate files for fast access
  • Metadata tracks all files, compaction and statistics
{
  "type": 1,
  "parquet_size_bytes": 379739,
  "row_count": 359,
  "min_time": 1739043338978000000,
  "max_time": 1739043934193000000,
  "wal_sequence": 32,
  "files": [
    {
      "id": 0,
      "path": "data/writer1/dbs/hep-0/hep_1-0/2025-02-08/19-00/c_0000000032.parquet",
      "size_bytes": 379739,
      "row_count": 359,
      "chunk_time": 1739043000000000000,
      "min_time": 1739043338978000000,
      "max_time": 1739043934193000000,
      "range": "1h",
      "type": "compacted"
    }
  ]
}

Query API

Query the HEP data using the HTTP API. The server provides both GET and POST endpoints for querying data.

Query Features

  • Time Range: If not specified, defaults to last 10 minutes
  • Dynamic Columns: Select specific columns or use * for all
  • Filtering: WHERE clause supports standard SQL conditions
  • Sorting: ORDER BY supports all columns
  • Pagination: Use LIMIT and OFFSET for paging

Available HEP Fields:

HEP virtual fields are automatically exploded at query time

  • timestamp/time: Event timestamp
  • rcinfo: Raw HEP protocol header (JSON)
  • payload: HEP Protocol payload
  • src_ip: Source IP (rcinfo)
  • dst_ip: Destination IP (rcinfo)
  • src_port: Source port (rcinfo)
  • dst_port: Destination port (rcinfo)

GET /query

# Query last 10 minutes of SIP messages
curl "http://localhost:9070/query?q=SELECT time,src_ip,dst_ip,payload FROM hep_1 LIMIT 10"

# Complex query with time range and conditions
curl -X POST http://localhost:9070/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "SELECT time, src_ip, dst_ip, payload FROM hep_1 WHERE time >= '\''2025-02-08T19:00:00'\'' AND payload LIKE '\''%INVITE%'\'' ORDER BY time DESC"
  }'

OLAP Query

Query HEP data using DuckDB, ClickHouse, Databend or any Parquet-compatible tool:

SELECT count() FROM 'data/writer1/dbs/hep-0/hep_1-*/*/*/c_0000000001.parquet' LIMIT 10;


Line Protocol API

HEPop.js also supports InfluxDB Line Protocol ingestion for metrics and events.

POST /write

Send metrics using the InfluxDB Line Protocol format. Each line represents a single data point with measurement, tags, fields and optional timestamp.

# Single metric
curl -i -XPOST "http://localhost:9070/write" --data-raw 'cpu,host=server01,region=us-west usage_idle=92.6,usage_user=7.4'

# Multiple metrics
curl -i -XPOST "http://localhost:9070/write" --data-raw '
memory,host=server01,region=us-west used_percent=23.43,free=7.82
disk,host=server01,region=us-west used_percent=86.45,free=21.45
network,host=server01,region=us-west rx_bytes=7834,tx_bytes=9843
'

Line Protocol Format

<measurement>[,<tag_key>=<tag_value>] <field_key>=<field_value>[,<field_key>=<field_value>] [timestamp]
  • measurement: Name of the metric (required)
  • tags: Optional key-value pairs for categorizing data
  • fields: One or more key-value pairs of the actual metric values
  • timestamp: Optional timestamp in nanoseconds since Unix epoch

Query Line Protocol Data

Query metrics using the same SQL interface:

# Query last 10 minutes of CPU metrics
curl -X POST http://localhost:9070/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "SELECT time, host, region, usage_idle, usage_user FROM cpu WHERE time >= '\''2025-02-09T16:00:00'\''"
  }'

# Aggregate metrics by host
curl -X POST http://localhost:9070/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "SELECT host, avg(used_percent) as avg_used FROM memory GROUP BY host ORDER BY avg_used DESC"
  }'

The Line Protocol data is stored in Parquet files using the same directory structure and compaction strategy as HEP data, allowing for efficient querying and storage.

License

©️ QXIP BV - Released under the AGPLv3 Open Source License.