Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: add process tree data source #3488

Closed

Conversation

AlonZivony
Copy link
Contributor

1. Explain what the PR does

ce696ac chore(proctree): add process tree e2e tests
7098db0 feat(proctree): create process tree data source

ForkTime time.Time
ExitTime time.Time
Name string
ProcessHash uint32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this hash from the "thread group leader" ? Maybe we should use the same names both sides (process tree and here).

Copy link
Contributor Author

@AlonZivony AlonZivony Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this connects to my previous comment.
There should be a clear distinction between processes and threads.
I think we have too much of a mish-mash between kernel terms and user terms.
This should be the process hash, because the user is interested in the process information. Not necessarily the thread group leader information (the TaskInfo of the group leader is irrelevant for most use cases of the process tree).

** I took a closer look on the current design, and I see that some crucial process information is accessible only through the TaskInfo struct of the process. We need to discuss the design here to think what is the best API here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, just need to decide on the name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe in the end that having more user-friendly names here is more appropriate.
Tell me what you think.

@rafaeldtinoco
Copy link
Contributor

Hey Alon, I think we're close. Overall I like the datasource, the way you're accessing the process tree, etc. I have some doubts on why you're not distinguishing between threads and processes in the signature (you can answer in those comments).

I'll get back to this tomorrow as soon as you answer/re-push things. Please check the tests results as well.

Cheers!

pkg/ebpf/signature_engine.go Outdated Show resolved Hide resolved
pkg/proctree/datasource.go Outdated Show resolved Hide resolved
pkg/proctree/datasource.go Outdated Show resolved Hide resolved
types/datasource/proctree.go Outdated Show resolved Hide resolved
@AlonZivony
Copy link
Contributor Author

Answering #3488 (comment)
I agree, but currently the version is uint, so it is not supported to implement it like this.
If you prefer, we can open an issue and do a refactor after this PR changing the datasources version to be a string in the major.minor.patch fashion.

@AlonZivony
Copy link
Contributor Author

AlonZivony commented Sep 24, 2023

Has errors with the e2e tests when trying to find the thread hash in the tree.
Should figure how to solve them for the milestone.

@AlonZivony AlonZivony force-pushed the feature/tree-datasource branch 2 times, most recently from 74bc00f to f55a435 Compare September 27, 2023 14:31
Copy link
Collaborator

@yanivagman yanivagman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,
needs types PR to be merged first

@rafaeldtinoco
Copy link
Contributor

#3509 merged, feel free to rebase.

@AlonZivony AlonZivony force-pushed the feature/tree-datasource branch 2 times, most recently from 5358546 to 49d8fd7 Compare September 27, 2023 17:42
@AlonZivony
Copy link
Contributor Author

@yanivagman @rafaeldtinoco needs someone to restart the tests

@AlonZivony
Copy link
Contributor Author

AlonZivony commented Sep 27, 2023

My local e2e test is not working.
I guess at this point that the reason for it is the race condition between the feed of the proctree and the actual events.
Still trying to figure it out.

EDIT:
From my check, we have 3 issues with the e2e test:

  1. Race condition between signal processing and events processing.
  2. The fact that if the sched_process_exec event hook is triggered before the exec signal, the execution time registered in the process tree will be later then the exec event, resulting incorrect information when querying the process tree.
  3. The task name during runtime is the cmdPath instead of the task name, which is not matching the expected behavior (for example, tasks info from the procfs is not in the same format - its the short name of the command that ran).

@AlonZivony AlonZivony force-pushed the feature/tree-datasource branch 2 times, most recently from 88ad33d to f63ea9b Compare September 27, 2023 19:26
@rafaeldtinoco
Copy link
Contributor

My local e2e test is not working.
I guess at this point that the reason for it is the race condition between the feed of the proctree and the actual events.
Still trying to figure it out.

Like we've just discussed, the only way to get rid of this would be to get "both" sources of events enabled (with #3498).

Then the proctree entry would always exist since the fork/exec/exit events would come in the same pipeline (at least for this test that doesn't have a filled pipeline).

I'm pretty sure in regular situations the signal events will happen first, but, still ... its an event oriented architecture, can't guarantee a transaction with unordered events without holding the pipeline.

@AlonZivony
Copy link
Contributor Author

I decided to make this PR only on the data source, and putting the e2e tests on another PR.
The e2e tests doesn't fail because of the datasource, and the tests are mergeable after code freeze.
Will push the datasource commit alone soon.

Expose the process tree to the signatures using a datasource.
This way a signatures can use the process tree to have state, instead
of using multiple events and saving the state internally.
@rafaeldtinoco
Copy link
Contributor

Rebased at #3522 (solving conflicts mostly)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants