-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix (cgroups): already dead edge case #3325
fix (cgroups): already dead edge case #3325
Conversation
Certain kubernetes version make use of short lived cgroups for various tasks (for example log rotation). These cgroups will generate events and a very quick cgroup_rmdir event. As such, many related events to the cgroup will attempt to query its directory through the recursive path and not find it. Commit adds a "Dead" field to the cgroup info to indicate a cgroup which has already been removed. Various logical sections can refer to it if its relevant to them, and more importantly, additional queries will not be attempted. Bonus: optimize away additional Stat syscall in containers by returning the directory ctime in cgroup.GetCgroupPath.
9435e5d
to
b3d043c
Compare
@@ -137,7 +137,7 @@ func (p *Controller) processCgroupMkdir(args []trace.Argument) error { | |||
if err != nil { | |||
return errfmt.WrapError(err) | |||
} | |||
if info.Container.ContainerId == "" { | |||
if info.Container.ContainerId == "" && !info.Dead { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this ever happen?
It means that we get mkdir event after rmdir event, doesn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, and it was the original indicator for the error (although it was on a tracee v0.11.1 installation).
This can happen because it's a race between userspace and ebpf. The entry can be deleted either here OR in the cgroup_rmdir program.
I guess this would also require a misorder in events received, so this doesn't fully resolve the case, but the result here is only an annoying (otherwise harmless) error log.
Fix #3324
@rafaeldtinoco I want to backport this to v0.16.0 branch if that's ok with you.