Skip to content

dnstapir/edm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

edm: Edge DNSTAP Minimiser

About

edm reads DNSTAP and depending on configuration can output some different data based on the observed messages:

  • DNS queries for names considered well-known will be summarised into histograms which are saved as parquet files. These files will then be submitted to Core.
  • DNS queries for names not considered well-known are collected into other parquet files for further local analysis and here the complete message content is saved but the client and server IP-addresses are pseudonymised via Crypto-PAn.
  • DNS queries that are not considered well-known and have never been seen before by a given instance of edm will result in notifications beingsent to Core via MQTT messages.

Usage

Running edm requires the creation of a TOML config file for holding the crypto-PAn secret used for pseudonymisation as well as a well-known-domains.dawg file which can be created using tapir-cli from https://github.com/dnstapir/cli

Steps for a basic local-only setup

A basic setup where edm will listen on a unix socket for DNSTAP data and output files to a directory structure under /tmp/edm but not send anything to Core can be created like this:

echo 'cryptopan-key = "mysecret"' > edm.toml
curl -O https://www.domcop.com/files/top/top10milliondomains.csv.zip
unzip top10milliondomains.csv.zip
tapir-cli dawg --standalone compile --format csv --src top10milliondomains.csv --dawg well-known-domains.dawg
edm run --input-unix /tmp/edm/input.sock --data-dir /tmp/edm/data --config-file edm.toml --well-known-domains-file well-known-domains.dawg --disable-mqtt --disable-histogram-sender

Since all communication with Core is disabled this is helpful for creating some local parquet files to look around in.

Inspecting the resulting files

For inspecting the content you can use e.g. DuckDB like so:

  • For summarised histogram data
duckdb -c 'select * from "/tmp/edm/data/parquet/histograms/outbox/dns_histogram-2024-09-26T18-14-00Z_2024-09-26T18-15-00Z.parquet"'
  • For pseudonymised session (full message) data
duckdb -c 'select * from "/tmp/edm/data/parquet/sessions/dns_session_block-2024-09-26T18-18-00Z_2024-09-26T18-19-00Z.parquet"'

Next to the parquet directory you will also see a directory called "pebble". This is where edm keeps its key-value store which is used to tell if a query name has been seen before or not. The key-value store being used is pebble.

Observability

edm exposes prometheus metrics and go pprof profiling data via HTTP listener at 127.0.0.1:2112. To look at prometheus current metrics:

curl 127.0.0.1:2112/metrics

There are multiple types of profiling data available, here is a CPU-centric example:

go tool pprof http://127.0.0.1:2112/debug/pprof/profile?seconds=30

Development

Formatting and linting

When working with this code at least the following tools are expected to be run at the top level directory prior to commiting:

  • go fmt ./...
  • go vet ./...
  • staticcheck ./... (see staticcheck)
  • gosec ./... (see gosec)
  • golangci-lint run (see golangci-lint)
  • go test -race ./...

Building

Binary

The most simple way of getting the binary while including a version string based on the current git commit is this:

go build -ldflags="-X main.version=$(git log -1 --pretty=%H)"

Container

For creating a container image you will need to install ko and once this is done you can build a container that is pushed to a local Docker daemon like so:

GITHUB_SHA=$(git log -1 --pretty=%H) ko build -L -B

You now have a ko.local/edm:latest available locally.