Skip to content

Commit

Permalink
docs(datasources): make datasource docs more complete (#3568)
Browse files Browse the repository at this point in the history
- add containers data source documentation
- add process tree data source documentation

commit: 2a51125 (main), cherry-pick
  • Loading branch information
rafaeldtinoco committed Oct 18, 2023
1 parent 73e4041 commit 48beb05
Show file tree
Hide file tree
Showing 4 changed files with 444 additions and 27 deletions.
99 changes: 99 additions & 0 deletions docs/docs/data-sources/containers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Containers Data Source

The [container enrichment](../integrating/container-engines.md) feature gives Tracee the ability to extract details about active containers and link this information to the events it captures.

The [data source](./overview.md) feature makes the information gathered from active containers accessible to signatures. When an event is captured and triggers a signature, that signature can retrieve information about the container using its container ID, which is bundled with the event being analyzed by the signature.

## Enabling the Feature

The data source does not need to be enabled, but requires that the `container enrichment` feature is. To enable the enrichment feature, execute trace with `--containers`. For more information you can read [container enrichment](../integrating/container-engines.md) page.

## Internal Data Organization

From the [data-sources documentation](../data-sources/overview.md), you'll see that searches use keys. It's a bit like looking up information with a specific tag (or a key=value storage).

The `containers data source` operates straightforwardly. Using `string` keys, which represent the container IDs, you can fetch `map[string]string` values as shown below:

```go
schemaMap := map[string]string{
"container_id": "string",
"container_name": "string",
"container_image": "string",
"k8s_pod_id": "string",
"k8s_pod_name": "string",
"k8s_pod_namespace": "string",
"k8s_pod_sandbox": "bool",
}
```

From the structure above, using the container ID lets you access details like the originating Kubernetes pod name or the image utilized by the container.

## Using the Containers Data Source

> Make sure to read [Golang Signatures](../events/custom/golang.md) first.
### Signature Initialization

During the signature initialization, get the containers data source instance:

```go
type e2eContainersDataSource struct {
cb detect.SignatureHandler
containersData detect.DataSource
}

func (sig *e2eContainersDataSource) Init(ctx detect.SignatureContext) error {
sig.cb = ctx.Callback
containersData, ok := ctx.GetDataSource("tracee", "containers")
if !ok {
return fmt.Errorf("containers data source not registered")
}
sig.containersData = containersData
return nil
}
```

Then, to each event being handled, you will `Get()`, from the data source, the information needed.

### On Events

Given the following example:

```go
func (sig *e2eContainersDataSource) OnEvent(event protocol.Event) error {
eventObj, ok := event.Payload.(trace.Event)
if !ok {
return fmt.Errorf("failed to cast event's payload")
}

switch eventObj.EventName {
case "sched_process_exec":
containerId := eventObj.Container.ID
if containerId == "" {
return fmt.Errorf("received non container event")
}

container, err := sig.containersData.Get(containerId)
if !ok {
return fmt.Errorf("failed to find container in data source: %v", err)
}

containerImage, ok := container["container_image"].(string)
if !ok {
return fmt.Errorf("failed to obtain the container image name")
}

m, _ := sig.GetMetadata()

sig.cb(detect.Finding{
SigMetadata: m,
Event: event,
Data: map[string]interface{}{},
})
}

return nil
}
```

You may see that, through the `event object container ID` information, you may query the data source and obtain the `container name` or any other information listed before.
68 changes: 50 additions & 18 deletions docs/docs/data-sources/overview.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,76 @@
# Data Sources (Experimental)

Data sources are a new feature, which will be the base of allowing access to dynamic data stores in signature writing (currently only available in golang).
Data sources are currently an experimental feature and in active development, and usage is opt-in.
Data sources are a new feature, which will be the base of allowing access to
dynamic data stores in signature writing (currently only available in golang).

> Data sources are currently an experimental feature and in active development,
> and usage is opt-in.
## Why use data sources?

Data sources should be used when a signature requires access to data not available to it from the events it receives.
For example, a signature may need access to additional data about a container where an event was generated. Using tracee's builtin container data source it can do so without additionally tracking container lifecycle events.
Signatures should opt for data sources when they need access to data beyond what
is provided by the events they process.

For instance, a signature may need access to data about the container where the
event being processed was generated. With Tracee's integrated container data
source, this can be achieved without the signature having to separately monitor
container lifecycle events.

## What data sources can I use

Currently, only builtin data sources from tracee are available.
Initially only a data source for containers will be available, but the list will be expanded as this and other features are further developed.
For now, only the built-in data sources from Tracee are at your disposal.
Looking ahead, there are plans to enable integration of data sources into Tracee
either as plugins or extensions.

Currently, two primary data source exist:

1. Containers: Provides metadata about containers given a container id.
1. Process Tree: Provides access to a tree of ever existing processes and threads.

This list will be expanded as other features are developed.

## How to use data sources
In order to use a data source in a signature you must request access to it in the `Init` stage. This can be done through the `SignatureContext` passed at that stage as such:

In order to use a data source in a signature you must request access to it in
the `Init` stage. This can be done through the `SignatureContext` passed at that
stage as such:

```golang
func (sig *mySig) Init(ctx detect.SignatureContext) error {
...
containersData, ok := ctx.GetDataSource("tracee", "containers")
if !ok {
return fmt.Errorf("containers data source not registered")
}
if !ok {
return fmt.Errorf("containers data source not registered")
}
if containersData.Version() > 1 {
return fmt.Errorf("containers data source version not supported, please update this signature")
}
sig.containersData = containersData
return fmt.Errorf("containers data source version not supported, please update this signature")
}
sig.containersData = containersData
}
```

As you can see we have requested access to the data source through two keys, a namespace, and a data source ID. Namespaces are used to avoid name conflicts in the future when custom data sources can be integrated. All of tracee's builtin data sources will be available under the "tracee" namespace.
After checking the data source is available, we suggest to add a version check against the data source. Doing so will let you avoid running a signature which was not updated to run with a new data source schema.
As you can see, access to the data source has been requested using two keys: a
namespace and a data source ID. Namespaces are employed to prevent name
conflicts in the future when integrating custom data sources. All built-in data
sources from Tracee will be available under the "tracee" namespace.

After verifying the data source's availability, it's suggested to include a
version check against the data source. This approach ensures that outdated
signatures aren't run with a newer data source schema.

Now, in the `OnEvent` function, you may use the data source like so:

Now, in the `OnEvent` function, you may use the data source like so:
```golang
container, err := sig.containersData.Get(containerId)
if !ok {
return fmt.Errorf("failed to find container in data source: %v", err)
}

containerName := container["container_name"].(string)
```
Each Data source comes with one querying method `Get(key any) map[string]any`. In the above example, omitting the type validation when checking the key, which was safe to do by following the schema (given through the `Schema()` method), a json representation of the returned map, and initially checking the data source version.
```

Each Data source provides a querying method `Get(key any) map[string]any`. In
the provided example, type validation is omitted during key verification. This
omission is safe when adhering to the schema (provided by the `Schema()`
method), considering the JSON representation of the returned map, and after an
initial check of the data source version.
Loading

0 comments on commit 48beb05

Please sign in to comment.