-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enrich: fixes post control plane #3285
enrich: fixes post control plane #3285
Conversation
1. Initialize enrichEnabled field 2. Run started a go routine internally, which was redundant.
Locks in the container package were as short as possible. This caused transaction issues once a few actors interacted with it. Changed the locking scheme to be more "transactional" with large locks, instead of combining read locks and write locks in the same transaction.
e3c880b
to
4f11496
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things I could spot:
- Can we call the mutex "cgroupMapMutex" instead of "mtx" only ? We're only protecting the map and I think its worth making it clear.
Update: It protects "deleted" as well... still, its cgroupMapMutex related. Either way, unimportant, up to you.
- RemoveFromBpfMap (which should be named RemoveFromBPFMap) can be called in parallel: There is no logical error, as the map deletion would fail, but there will be a concurrency error (unable to delete the cgroupID) as return from the 2nd (slightly 2nd) concurrent thread calling it.
Questions:
- Is there any plan to "de-duplicate" the repeated code in between
*Tracee.processCgroupMkdir
and*Controller.processCgroupMkdir
?
All the rest LGTM, but do check the possibility of a concurrent error from RemoveFromBpfMap, and, if you change it, please rename that method to have capital BPF.
Thank you!
I will remove the processing code so there's no confusion and do the renames. Then squash and open a backport PR (I will wait for your review so you can confirm the formatting is right). |
1. Rename mutex in cgroups code 2. Capitalize BPF in RemoveFromBpfMap 3. Delete unused processing code for cgroup events.
Locks in the container package were as short as possible. This caused transaction issues once a few actors interacted with it. Change the locking scheme to be more "transactional" with large locks, instead of combining read locks and write locks in the same transaction. Additional chores in the scope of the commit: 1. Initialize enrichEnabled field in control plane. 2. Control Plane's Run started a go routine internally, which was redundant. 3. Rename mutex in cgroups code to cgroupsMutex. 4. Capitalize BPF in RemoveFromBpfMap. 5. Delete the unused processing code for cgroup events. commit: f638b4e (main), cherry-pick
Locks in the container package were as short as possible. This caused transaction issues once a few actors interacted with it. Change the locking scheme to be more "transactional" with large locks, instead of combining read locks and write locks in the same transaction. Additional chores in the scope of the commit: 1. Initialize enrichEnabled field in control plane. 2. Control Plane's Run started a go routine internally, which was redundant. 3. Rename mutex in cgroups code to cgroupsMutex. 4. Capitalize BPF in RemoveFromBpfMap. 5. Delete the unused processing code for cgroup events. commit: f638b4e (main), cherry-pick
So what is the control plane left with ? And about backporting multiple commits into a single commit and merging into v0.16.0, I think its a bad idea for some sustaining reasons:
With those changes you should have waited another review of mine instead I believe. And you also did not wait my take on the cherry-picking into v0.16.0. |
All the logic is in the control plane, there was just leftover (and disconnected) logic in the event processing file. It was already doing all the work, just had to clean up the left overs from the event processing.
I squashed the 3 commits when merging to main, then cherry-picked the squashed commit to v0.16.0. So there is no difference in what you described, it's the same commit in both places (with the addition of the reference in the cherry-picked one).
I assumed Yaniv's +1 was enough, as we had examples of backports already which he would've also read and known about, and reviewed it on his own volition. |
I saw the code removal and I would have said for us not to do in the cherry-pick/backport patch in the release tree. We will eventually agree that the release tree has to have only fixes (and sometimes an ugly fix, not the fix done in MAIN) so we don't introduce bugs (even being as simple as code removal of some code that is not being used). About Yaniv +1 being enough, you are right. It is enough and I spoke to him about this just now. We will coordinate better and discuss with you (offline) about how we want this tree to be maintained. I'm writing to you in private with the rest of this message =). |
The
container_create
argument enrichment was broken due to a race in context switching between the control plane and the ebpf event pipeline.As such the control plane would trigger but a context switch could cause the pipeline continue instead. Then when enrichment for the
cgroup_mkdir
event would trigger, the cgroup info would be missing in theContainers
struct, since now the control plane populates it.The root cause is that the locking scheme in
Containers
is non transactional. Now that we have multiple pipelines writing to theContainers
struct, locking should align with the situation. This PR does that.Fix #3282