-
Notifications
You must be signed in to change notification settings - Fork 2.2k
libct: fix resetting CPU affinity #5025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
how about @askervin |
NumCPU() returns the number of CPUs usable by the current process. The purpose of the Besides, I would avoid adding any magic values (like 1024) to this logic. If, for instance, unix.CPUSet would be changed to [64]uint64 instead of current [16]uint64, the current workaround calling SchedGetaffinity() would start working 4096-CPU systems, or if unix.SchedGetaffinity/SchedSetaffinity would be updated to work with dynamic (large enough) cpuset sizes, then this fix would work as is. With magic numbers we would have introduced only a new place that needs to be fixed at some point. |
|
I'm not sure this completely fixes #5023 -- yes, it stops the reset issue but still leaves you with the same problem that #4858 was trying to solve. If you are explicitly requesting CPUs >= 1024 then you will still not be able to get them AFAICS because we still use I think we should just be calling For what it's worth, I think even the current behaviour of resetting to use the first 1024 CPUs by default is better than regressing #4858. |
c92d043 to
be8dbc6
Compare
|
@cyphar, I think you're right: why making a quick fix when the proper fix is not really that much harder. Updated. Playing as safe as the latest go runtime in the size of the CPU mask. |
be8dbc6 to
8618a16
Compare
kolyshkin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a single nit, otherwise LGTM
unix.CPUSet is limited to 1024 CPUs. Calling unix.SchedSetaffinity(pid, cpuset) removes all CPUs starting from 1024 from allowed CPUs of pid, even if cpuset is all ones. The consequence of runc trying to reset CPU affinity by default is that it prevents all containers from using those CPUs. This change is uses huge CPU mask to play safe and get all possible CPUs enabled with single sched_setaffinity call. Fixes: opencontainers#5023 Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
8618a16 to
016fac8
Compare
|
Hmm, should we try to fix unix.CPUSet instead? |
unix.CPUSet is limited to 1024 CPUs. Calling
unix.SchedSetaffinity(pid, cpuset) removes all CPUs starting from 1024 from allowed CPUs of pid, even if cpuset is all ones. The consequence of runc trying to reset CPU affinity by default is that it prevents all containers from using those CPUs.
This change is a quick fix that brings runc behavior back to what it was in v1.3.0 in 1024+ CPU systems. Real fix requires calling sched_setaffinity with cpusetsize fitting all CPUs in the system, which cannot be done with current unix.SchedSetaffinity.
Fixes: #5023