Local analyzers are identified by the presence of an _invocation
key in the
analyzer metadata. Local analyzers are either normalizers (consuming raw data,
types identified by the _file_types
analyzer metadata key) or derived
analyzers (consuming observations, when no _file_types
key is present).
Local analyzers are UNIX executables written in any language, though the PTO
provides special support for analyzers written in Go and in Python.
Normalizers (raw data analyzers) take a single raw data file on standard input and produce an observation file on standard output. Metadata for the raw data file, including any metadata inherited from the campaign, is passed in as a JSON object on file descriptor 3.
Derived analyzers take one or more observation sets in observation file format on standard input, and produce a single observation file containing a single observation set on standard output. The observations on standard input are ordered by observation set, with each observation set preceded by its ; i.e., as multiple obset files concatenated together.
A local analyzer may optionally specify a platform, which determines how a
local analyzer runtime should set up the analyzer environment before the first
time it runs the _invocation
command.
golang-1.x
: repository is a Go repo. The analyzer runtime will create a newGOPATH
for thego
tool,go get
the repository, check out the appropriate tag, scan subdirectories for executables (*.go
files contiaingpackage main
), andgo install
all such executables, before the first time it runs the_invocation
command. Subsequent runs will occur in the sameGOPATH
.python-3.x
: repository contains a Python module. The analyzer runtime will create an appropriate Pythonvirtualenv
, runsetup.py install
, before the first time it runs the_invocation
command. Subsequent runs will occur in the samevirtualenv
.bash
: repository contains a module in some other language. The analyzer runtime will source thesetup.sh
script usingbash
in the repository root before running the_invocation
command.
A local analyzer runtime is not yet available for the PTO; use the command-line tools described below to invoke local analyzers manually.
The PTO comes with a set of command-line tools for running normalizers and
analyzers locally (i.e., on the same machine running ptosrv
, or on a machine
with equivalent access to the raw filesystem and the PostgreSQL database).
Three tools are provided:
ptonorm
: read data and metadata from raw data store, hadling campaign metadata inheritance, run a normalizer, and pipe to stdin / fd 3.ptocat
: dump observation sets from the database with metadata (in Observation File Format) to stdoutptoload
: read files with observation set data and metadata (in Observation File Format) and insert resulting observation sets into database
These tools can be used for normalization and analysis workflows as descibed below.
Local normalizers are run by ptonorm
, which takes the following command-line
arguments:
ptonorm -config <path/to/config.json> <normalizer> <campaign> <file>
If -config
is not given, the file ptoconfig.json
in the current working
directory is used.
ptonorm
launches the normalizer as a subprocess, allowing access to the raw
data file over stdin, and streaming metadata over a pipe on file descriptor 3.
It then takes the standard output, coalescing all metadata into a single
object, and writes it to standard output. When coalescing metadata, the last
write on a given metadata key wins.
The resulting observation file can be passed as input to ptoload
, which
takes the following command-line arguments:
ptoload -config <path/to/config.json> <obsfile>...
If -config
is not given, the file ptoconfig.json
in the current working
directory is used. More than one observation file can be given on a single
command line, but each file given will create a new observation set.
For example, to normalize the file quux.ndjson
with the bar
normalizer in
the foo
campaign into an observation set, using a local configuration file,
and load it directly into the database, deleting the cached observation file:
ptonorm bar foo quux.json > cached.obs && ptoload cached.obs && rm cached.obs
Analyzers are simpler to run, as they take observation files on standard input
and generate observation files on standard output. To get observation files,
use ptocat
, which takes the following command line arguments:
ptocat -config <path/to/config.json> <set-id>...
If -config
is not given, the file ptoconfig.json
in the current working
directory is used. Set IDs are given in hexadecimal, as in the rest of the
PTO. More than one set ID may appear; in this case, the metadata for the first
set will be followed by the data for the first set followed by the metadata
for the second set followed by the data for the second set and so on.
For example, to analyze sets 3a70 through 3a75 using the analyer fizz
and
load it directly into the database, deleting the cached observation file:
ptocat 3a70 3a71 3a72 3a73 3a74 3a75 > cached.obs && ptoload cached.obs && rm cached.obs
Client analyzers are simply clients of the PTO. A normalizer interacts with
raw data through /raw
resources and creates new observation sets by posting
to /obs/create
. An analyzer retrieves observation sets from /obs/
and
likewise creates new observation sets by posting to /obs/create
Tools for normalizing and analyzing PATHspider output (originally focused on ECN, with future support for other plugins) are in the pto3-ecn repository.
Tools for dealing with Tracebox output (with future support for other traceroute-like tools and data sources) are in the pto3-trace repository.