Handle leak of process info in `hostfs` provider for `add_session_metadata` #42398

fearful-symmetry · 2025-01-22T15:43:19Z

Proposed commit message

So, it turns out that the processsDB used by the procfs provider in add_session_metadata expects events to come in order, which won't always be the case under load. If we get a an exit event before the exec event, we'll drop the exit event, and then the process event will remain in the db.processes map indefinitely. In addition to this, auditbeat is configured to tell netlink to drop events, meaning that under load, we can lose either the exec or the exit event, potentially leading to a leak if we can never pair up the two for a given process.

This alters the DB so we don't drop orphaned exit events, and instead the DB reaper will wait a few iterations of reapProcs() to try to match the orphaned exit. We also optionally reap process exec events. I've tested this under load, and it does prevent the process DB from growing indefinitely.

There's a few caveats to this as-is:

We're now putting every single exit event into our db.removalMap, which means we'll be using more memory until those exit events are reaped. I can't really think of a good way around this.
This processor still uses a lot of resources, and under high-load situations, we may still end up using an unacceptable amount of memory.
If we need to reap processes, it can result in data loss if the processes don't exist in /proc.

There's also a few smaller changes to the process DB:

The removal list has been changed from a heap type to a map. This is less performant, but needed, as we're looking up exit events with every exec.
We expose a number of new config vars.
This adds metrics to the DB, to further help out with any issues in the future.

I'm still running performance tests on this, as the behavior is a bit bursty and hard to measure without some proper scripts. Will update when I have results.

How to test

Run auditbeat with the following:

- module: auditd
  # Load audit rules from separate files. Same format as audit.rules(7).
  audit_rule_files: [ '${path.config}/audit.rules.d/*.conf' ]
  audit_rules: |
    -a exit,always -F arch=b64 -S fork
    -a exit,always -F arch=b64 -S vfork
    ## set_sid
    -a exit,always -F arch=b64 -F euid=0 -S execve -k rootact
    -a exit,always -F arch=b32 -F euid=0 -S execve -k rootact
    -a always,exit -F arch=b64 -S connect -F a2=16 -F success=1 -F key=network_connect_4
    -a always,exit -F arch=b64 -F exe=/bin/bash -F success=1 -S connect -k "remote_shell"
    -a always,exit -F arch=b64 -F exe=/usr/bin/bash -F success=1 -S connect -k "remote_shell" 
    -a always,exit -F arch=b64 -S exit_group
    -a exit,always -F arch=b64 -S close
    -a always,exit -F arch=b64 -S exit
    -a exit,always -F arch=b64 -S kill
    -a always,exit -F arch=b64 -S setsid 
    -a always,exit -F arch=b64 -S execve,execveat -k exec

processors:
  - add_session_metadata:
      backend: "procfs"

logging.level: debug

Grep for the REAPER: log line to examine the following the state of the various DB maps.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

elasticmachine · 2025-01-22T15:43:22Z

Pinging @elastic/sec-linux-platform (Team:Security-Linux Platform)

mergify · 2025-01-22T15:43:53Z

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @fearful-symmetry? 🙏.
For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

mergify · 2025-01-22T15:43:54Z

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

x-pack/auditbeat/processors/sessionmd/config.go

x-pack/auditbeat/processors/sessionmd/processdb/db.go

nicholasberlin · 2025-01-22T20:56:03Z

x-pack/auditbeat/processors/sessionmd/processdb/reaper.go

+			// in this case, give us a few iterations for us to get the exec, since things can arrive out of order.
+			if cand.removeAttempt < exitRemoveAttempts {
+				cand.removeAttempt += 1
+				db.removalMap[pid] = cand


Is this needed? I don't see it being removed prior to this point.

Not sure what you mean?

We are iterating over db.removalMap ...

for pid, cand := range db.removalMap {

Seems like db.removalMap[pid] = cand is adding something that is already in the map ...

Oh, we are updating cand.removeAttempt, is that we it needs to be re-added? Why doesn't that update the thing in the map directly?

Yes, that updates the existing entry in the map. The compiler won't let you do map[key].struct_val = new, if that's what you're thinking.

I was thinking that line 75 changes what's in the map.

haesbaert · 2025-01-23T13:10:43Z

So, it turns out that the processsDB used by the procfs provider in add_session_metadata expects events to come in order, which won't always be the case under load. If we get a an exit event before the exec event, we'll drop the exit event, and then the process event will remain in the db.processes map indefinitely.

You mean the auditd events come out of order?

fearful-symmetry · 2025-01-23T15:14:26Z

@haesbaert so, I'm not sure how the ordering happens; my current theory is that because there's so many channels, threads and mutexes between the netlink sockets and this processor, that things will invariably end up out of order, even if we get them in-order from netlink.

fearful-symmetry · 2025-01-23T15:44:28Z

Alright, We're gonna have to hold off on this for a bit, I just discovered that auditbeat configures netlink by default to aggressively drop events:

		if ms.backpressureStrategy&(bsKernel|bsAuto) != 0 {
			// "kernel" backpressure mitigation strategy
			//
			// configure the kernel to drop audit events immediately if the
			// backlog queue is full.
			if status.FeatureBitmap&libaudit.AuditFeatureBitmapBacklogWaitTime != 0 {
				ms.log.Info("Setting kernel backlog wait time to prevent backpressure propagating to the kernel.")
				if err = ms.client.SetBacklogWaitTime(0, libaudit.NoWait); err != nil {
					return fmt.Errorf("failed to set audit backlog wait time in kernel: %w", err)
				}
			} else {
				if ms.backpressureStrategy == bsAuto {
					ms.log.Warn("setting backlog wait time is not supported in this kernel. Enabling workaround.")
					ms.backpressureStrategy |= bsUserSpace
				} else {
					return errors.New("kernel backlog wait time not supported by kernel, but required by backpressure_strategy")
				}
			}
		}

which kind of throws the whole strategy of this out the window, since the processor has no way of knowing how complete our dataset is. Going back to the drawing board...

handle leak in hostfs provider for sessionmd

7949035

fearful-symmetry added the Team:Security-Linux Platform Linux Platform Team in Security Solution label Jan 22, 2025

fearful-symmetry self-assigned this Jan 22, 2025

fearful-symmetry requested a review from a team as a code owner January 22, 2025 15:43

botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jan 22, 2025

mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Jan 22, 2025

fearful-symmetry added the bug label Jan 22, 2025

nicholasberlin reviewed Jan 22, 2025

View reviewed changes

add metrics, clean up

018ad7e

fearful-symmetry and others added 5 commits January 23, 2025 15:47

fix tests

180200d

add process reaper for dropped exit events

6cba7d8

remove test code

2c80942

linter

a37b8ce

Merge branch 'main' into reaper-exit-orph-fix

de94730

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle leak of process info in `hostfs` provider for `add_session_metadata` #42398

Handle leak of process info in `hostfs` provider for `add_session_metadata` #42398

fearful-symmetry commented Jan 22, 2025 •

edited

Loading

elasticmachine commented Jan 22, 2025

mergify bot commented Jan 22, 2025

mergify bot commented Jan 22, 2025

nicholasberlin Jan 22, 2025

fearful-symmetry Jan 22, 2025

nicholasberlin Jan 23, 2025

fearful-symmetry Jan 23, 2025

nicholasberlin Jan 23, 2025

haesbaert commented Jan 23, 2025

fearful-symmetry commented Jan 23, 2025 •

edited

Loading

fearful-symmetry commented Jan 23, 2025 •

edited

Loading

Handle leak of process info in hostfs provider for add_session_metadata #42398

Are you sure you want to change the base?

Handle leak of process info in hostfs provider for add_session_metadata #42398

Conversation

fearful-symmetry commented Jan 22, 2025 • edited Loading

Proposed commit message

How to test

Checklist

elasticmachine commented Jan 22, 2025

mergify bot commented Jan 22, 2025

mergify bot commented Jan 22, 2025

nicholasberlin Jan 22, 2025

Choose a reason for hiding this comment

fearful-symmetry Jan 22, 2025

Choose a reason for hiding this comment

nicholasberlin Jan 23, 2025

Choose a reason for hiding this comment

fearful-symmetry Jan 23, 2025

Choose a reason for hiding this comment

nicholasberlin Jan 23, 2025

Choose a reason for hiding this comment

haesbaert commented Jan 23, 2025

fearful-symmetry commented Jan 23, 2025 • edited Loading

fearful-symmetry commented Jan 23, 2025 • edited Loading

Handle leak of process info in `hostfs` provider for `add_session_metadata` #42398

Handle leak of process info in `hostfs` provider for `add_session_metadata` #42398

fearful-symmetry commented Jan 22, 2025 •

edited

Loading

fearful-symmetry commented Jan 23, 2025 •

edited

Loading

fearful-symmetry commented Jan 23, 2025 •

edited

Loading