Skip to content

RefineDependencies Query

Hassaan edited this page Oct 1, 2021 · 2 revisions

Overview

SPADE's query client can be used to retrieve stored provenance for subsequent analysis. The complete list of queries that the QuickGrail query surface supports are documented here.

This page documents usage of the query refineDependencies. It illustrates this by using an event dependency map provided by clam-prov to refine a provenance graph.

Causality

When querying provenance graphs, an analyst might be interested in finding out how one event caused another event. The refineDependencies query provides a way of specifying events as edges, and dependencies between them to trace how an event lead to another event. The query template is:

$dependency_graph = $base.refineDependencies('dependency map file path', 'edge annotation', maxDepth)

In the query above, a dependency map file has to specified in DOT. A dependency map is a way of specifying which value of edge annotation is dependent on which other value of edge annotation in the provenance graph. The argument maxDepth is used to limit length of the paths to find from the cause event to the effect event.

The dependency map file template is:

digraph dependency_map_graph{
  "edge annotation value 1" -> "edge annotation value 2" [label="WasDependentOn"];
  "edge annotation value 1" -> "edge annotation value 3" [label="WasDependentOn"];
}

The template specifies the values of edge annotation (in the query template above): edge annotation value 1, edge annotation value 2, and edge annotation value 3. Also, it specifies that edge annotation value 1 was dependent on edge annotation value 2, and edge annotation value 3.

Clam-Prov Call Site Dependencies

Clam-prov provides a way of finding dependencies in a program between function call sites using static analysis. It also provides a way of reporting function call sites in a log file. This information can be combined in two steps with Linux Audit provenance stream to trace system call dependencies. The two steps are outlined below:

  1. Use the ClamProv filter in SPADE to incorporate clam-prov call site information into the Linux Audit provenance stream being generated by the Audit reporter
  2. Use the refineDependencies query, and the dependency map generated by clam-prov to get a subgraph of only dependent events

Example

A small example C program to find dependencies in is shown below (error checking excluded):

#include <unistd.h>
#include <fcntl.h>
int main(int argc, char *argv[]){
  int src_fd, dst_fd;
  int data;
  src_fd = open("source_file", O_RDONLY);
  dst_fd = open("destination_file", O_WRONLY);
  read(src_fd, &data, sizeof(int));
  write(dst_fd, &data, sizeof(int));
  return 0;
}

The call site log generated by clam-prov for this program can be ingested by the ClamProv filter to incorporate that information into Audit reporter's provenance stream. The resulting provenance graph annotated with clam-prov information can be queried now. To use refineDependencies we need the dependency map for the program generated by clam-prov:

digraph dependency_map{
  "0" [label="function name:read\ncall site:0"];
  "1" [label="function name:read\ncall site:1"];
  "1" -> "0" [label="WasDependentOn"];
}

The dependency map says that the read function call had the call site 0 assigned, and the write function call had the 1 call site. Also, that the call site 1 was dependent on call site 0.

Lastly, the following refineDependencies query can be used:

$result = $base.refineDependencies('/tmp/dependency_map.dot', 'call site id', 2)

The graph $result is shown below:refineDependencies

Clone this wiki locally