Available filters

SPADE includes a set of filters to manipulate provenance metadata before it is committed to storage. They are described below.

AddAnnotation

This filter can be used to add annotation(s) to all vertices and edges that pass through it. If the annotation already existed then it is replaced.

The filter can be added using SPADE's controller:

-> add filter AddAnnotation position=1 host=spade-host
Adding filter AddAnnotation... done

The command above will add the filter which will add the annotation key host with annotation value spade-host to vertices and edges.

Blacklist

This filter is used to exclude files (based on their name) from being committed to persistent storage. The regular expression for matching filenames should be specified in cfg/spade.filter.Blacklist.config.

ConvertTime

This filter converts the date-time present in a vertex or an edge annotation value to a specified format. The converted date-time is added as a new annotation which can be specified in the configuration file cfg/spade.filter.ConvertTime.config. Please see cfg/spade.filter.ConvertTime.config for more details.

CrossNamespaces

This filter tracks the ipc, mount, net, pid, and user namespaces of every thread that performs a write to the filesystem. When any thread performs a read, its tuple of namespaces is compared to that of each thread that has performed a write to the same path in the past. If the tuples differ, this is logged. Log entries indicate the presence of cross-container or host-container flows.

The default configuration is specified in cfg/spade.filter.CrossNamespaces.config. It detects flows between processes in different Linux namespaces through all artifact types. The log file containing the cross-namespace events is created at tmp/cross-namespaces.json. Each line in the log file is a cross-namespace event as a JSON object. Each event contains the following information:

cross-namespace-event-id: The event id generated by the filter.
artifact: The annotations of the artifact through which the cross-namespace flow occurred. This includes only the matched annotations. Only those annotations that are configured in cfg/spade.filter.CrossNamespaces.config are reported.
artifacts: A list of artifacts with the matched and extra reportable annotations as specified by key artifactAnnotationsToReport in cfg/spade.filter.CrossNamespaces.config. Note: This may include false positives.
reader: The annotations of the reader.
read-edge: The annotations of the edge between the reader and the artifact.
writers: A list of writers. For each writer, only those annotations that are configured in cfg/spade.filter.CrossNamespaces.config using processAnnotationsToMatch, and processAnnotationsToReport are reported. Note: This may include false positives.

An example cross-namespace event (truncated and formatted for simplicity):

{
  "cross-namespace-event-id": "0",
  "artifact": {
    "path": "/etc/passwd"
  },
  "artifacts":[
    {
      "path": "/etc/passwd",
      "inode": "880"
    },
    {
      "path": "/etc/passwd",
      "inode": "881"
    }
  ],
  "reader": {
    "pid": "7575",
    "ppid": "7574",
    "name": "docker_container_process",
    "mount namespace": "100000001"
  },
  "read-edge": {
    "operation": "read",
    "size": "5"
  },
  "writers": [
    {
      "mount namespace": "100000002",
      "pid": "25"
    }
  ]
}

Above, file /etc/passwd was written by process in mount namespace 100000002 and read by process docker_container_process. The flow was logged due to the difference in the value of the mount namespace.

CycleAvoidance

This filter tracks the ancestors of a file and creates a new version each time a new ancestor is encountered. Please see cfg/spade.filter.CycleAvoidance.config for configuration options.

Fusion

The Fusion filter can be used to merge vertices from related provenance streams. The configuration for this filter is stored in cfg/fusion.config and has the following format:

-- BEGIN FILE --
<1st reporter>
<2nd reporter>
<1st reporter>.<annotation>=<2nd reporter>.<annotation>
...
-- END FILE --

To merge the two streams, the names of both reporters must be specified on the first two lines of the config file.

Next, rules can be specified on which to merge annotations. These rules are specified as <1st reporter>.<annotation>=<2nd reporter>.<annotation>.

The Fusion filter will check to see if the incoming vertices satisfy the merging rules. If vertices are found that match the criteria, they are fused into a single vertex.

GraphFinesse

This filter tracks the entire lineage graph of a file and creates a new version if a new edge would have created a cycle. By default, an annotation named GFVersion is added to all vertices. The value of the annotation GFVersion is the version assigned by the GraphFinesse filter. Please see cfg/spade.filter.GraphFinesse.config for configuration options.

IORuns

Reads and writes in an operating system often occurs as runs of one or the other type. For example, a single function that reads in a file may result in multiple read system calls. This can result in a high volume of provenance metadata, especially when reading or writing large files. The IORuns filter can be used to fuse consecutive edges of the same type of I/O operation (i.e., either read or write) into a single edge.

By default only the reads, and writes for artifacts with the annotation path are merged. To merge reads, and writes for some other artifact specify the annotation(s) for the artifact as key in arguments to the filter or update the value specified in the default config file for the filter at cfg/spade.filter.IORuns.config.

The filter can be added using SPADE's controller:

-> add filter IORuns position=1 key="path,permissions"
Adding filter IORuns... done

The above-mentioned command tells the filter to merge reads, and writes for artifacts which have both the annotations path, and permissions.

OPM2Prov

This filter translates OPM vertex and edge elements into corresponding W3C PROV ones.

VersionOnWrite

This filter versions a vertex each time it is encountered as a child vertex in an edge. Please see cfg/spade.filter.VersionOnWrite.config for configuration options.

WindowsFeatures

This filter computes features on provenance collected using the ProcMon reporter. The features are computed according to the approach described in the paper: Mining Data Provenance to Detect Advanced Persistent Threats. This filter can be used as follows:

-> add filter WindowsFeatures position=1 malicious=cmd.exe,Explorer.EXE inceptionTime=10000000 taintedParentWeight=5.0
Adding filter WindowsFeatures... done

The command, above, specifies the three following arguments:

malicious: A comma-separated list of process names to mark as malicious
inceptionTime: Time window in a process's lifetime to consider as it's inception window
taintedParentWeight: The weight to use for parent processes to compute the value of taint on child processes

Additionally, the filter (on removal) writes the computed features for processes, and artifacts to tmp/windows.process.features.csv, and tmp/windows.filepath.features.csv, respectively.

All of the above arguments (except position) can also be specified in the configuration file cfg/spade.filter.WindowsFeatures.config.

This material is based upon work supported by the National Science Foundation under Grants OCI-0722068, IIS-1116414, and ACI-1547467. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Setting up SPADE
Storing provenance
Collecting provenance
- Across the operating system
- Limiting collection to a part of the filesystem
  - On Linux
  - On macOS
- From an external application
- With compile-time instrumentation
- Using the reporting API
- Of transactions in the Bitcoin blockchain
- Filtering provenance
  - Using filters
  - Available filters
Viewing provenance
- In a graph database
- In a relational database
Querying SPADE
- Illustrative example
- Transforming query responses
  - Using transformers
  - Available transformers
- Protecting query responses
Miscellaneous

Provide feedback

Saved searches

Use saved searches to filter your results more quickly