Collecting system wide provenance on Linux with Audit

The Audit reporter collects provenance from across the operating system using the Linux kernel's audit event stream of system calls. (Note: Activity of the user that SPADE runs as is excluded.)

This reporter is built automatically when SPADE's top-level make command is issued. (Note: The included kernel modules are optional. To build them, use the command make KERNEL_MODULES=true.)

Requirements

Before this reporter can be used, the below commands must be run. These commands only need to be executed once after SPADE is compiled. (Note: This will allow a normal user to configure and access the audit stream.)

The first two commands allow users to configure the audit rules and packet filtering needed to generate the provenance graph. The next two commands grant users access to the audit stream:

sudo chmod ug+s `which auditctl`
sudo chmod ug+s `which iptables`
sudo chmod ug+s `which kmod`
sudo chown root bin/spadeAuditBridge
sudo chmod ug+s bin/spadeAuditBridge

To let the above utility access the audit stream, edit the file /etc/audisp/plugins.d/af_unix.conf on Ubuntu or /etc/audit/plugins.d/af_unix.conf on Fedora and activate the plugin by changing the line that says

active = no

to

active = yes

Restart auditd to activate the dispatcher (audispd):

sudo service auditd restart

Real-time collection

The Audit reporter can be started using SPADE's controller:

-> add reporter Audit
Adding reporter Audit... done

The reporter will transform records from the Linux audit dispatcher into an Open Provenance Model representation. The details of the key-value annotations are available here.

Configuring I/O reporting

Filesystem reads and writes, as well as network connection sends and receives, can generate significant log overhead. In many contexts, knowledge that a process opened a file or made a network connection, suffices for understanding the provenance of data.

By default, this reporter only tracks when files are opened for reading or writing, and when network connection are made or accepted. To report all filesystem reads and writes, the argument fileIO=true should be provided when starting the reporter with the SPADE controller. Similarly, to report all network sends and receives, the argument netIO=true should be used:

-> add reporter Audit fileIO=true netIO=true
Adding reporter Audit... done

Configuring namespace reporting

Linux containers are a user-space construct. They are created by virtualizing selected kernel resources using namespaces. The included kernel module can report the specific namespaces of each process. To enable this functionality, SPADE must be built with make KERNEL_MODULES=true. Providing the localEndpoints=true argument will activate the kernel module (and reporting of local IP address / port information that is not present in Audit records).

By default, namespaces are not tracked. Providing the namespaces=true argument activates reporting of values for the user and group identifier (User), process identifier (PID), filesystem mount point (Mount), and network information (Network) namespaces. Providing the IPC=true argument activates reporting of inter-process message queue (IPC) namespace. By providing the networkAddressTranslation=true argument, both the host-level and intra-network-namespace address and port values will be reported.

The above arguments should be provided when starting the Audit reporter in the SPADE controller:

-> add reporter Audit localEndpoints=true namespaces=true IPC=true networkAddressTranslation=true
Adding reporter Audit... done

Saving the audit records

For debugging purposes, the Linux Audit records that have been processed can be stored in a file using the outputLog argument. For example, the records can be stored in the file /tmp/audit.log by using this command to start the reporter in the SPADE controller:

-> add reporter Audit outputLog=/tmp/audit.log
Adding reporter Audit... done

Using a saved log

Instead of collecting Linux Audit records from the running system, a previously saved log can be used by specifying it with the inputLog argument. The hardware architecture of the machine on which the log was collected must be x86-64.

For example, to read records from the file /tmp/audit.log collected on an x86-64 machine, this command can be used to start the reporter in the SPADE controller:

-> add reporter Audit inputLog=/tmp/audit.log
Adding reporter Audit... done

Logs must be sorted by event identifier. This is done automatically during preprocessing.

The end of Audit log processing is reported in SPADE's log (that is stored in log/SPADE_<date>-<time>.log).

Overview of generated provenance

This material is based upon work supported by the National Science Foundation under Grants OCI-0722068, IIS-1116414, and ACI-1547467. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Setting up SPADE
Storing provenance
Collecting provenance
- Across the operating system
- Limiting collection to a part of the filesystem
  - On Linux
  - On macOS
- From an external application
- With compile-time instrumentation
- Using the reporting API
- Of transactions in the Bitcoin blockchain
- Filtering provenance
  - Using filters
  - Available filters
Viewing provenance
- In a graph database
- In a relational database
Querying SPADE
- Illustrative example
- Transforming query responses
  - Using transformers
  - Available transformers
- Protecting query responses
Miscellaneous

Provide feedback

Saved searches

Use saved searches to filter your results more quickly