Skip to content

Conversation

@askervin
Copy link
Contributor

unix.CPUSet is limited to 1024 CPUs. Calling
unix.SchedSetaffinity(pid, cpuset) removes all CPUs starting from 1024 from allowed CPUs of pid, even if cpuset is all ones. The consequence of runc trying to reset CPU affinity by default is that it prevents all containers from using those CPUs.

This change is a quick fix that brings runc behavior back to what it was in v1.3.0 in 1024+ CPU systems. Real fix requires calling sched_setaffinity with cpusetsize fitting all CPUs in the system, which cannot be done with current unix.SchedSetaffinity.

Fixes: #5023

@ningmingxiao
Copy link
Contributor

ningmingxiao commented Nov 18, 2025

how about @askervin

	if runtime.NumCPU() > 1024 {
		return
	}

@askervin
Copy link
Contributor Author

	if runtime.NumCPU() > 1024 {
		return
	}

NumCPU() returns the number of CPUs usable by the current process. The purpose of the tryResetCPUAffinity() is to make that number bigger, just in case an external entity has made it smaller than enabled CPUs in the whole system. Now assume that the external entity has set affinity to cpuset 1023-1122, giving NumCPUs()==100, the logic would continue to SchedSetaffinity(pid, cpuset(0-1023)), and allow using only 1 CPU, namely 1023.

Besides, I would avoid adding any magic values (like 1024) to this logic. If, for instance, unix.CPUSet would be changed to [64]uint64 instead of current [16]uint64, the current workaround calling SchedGetaffinity() would start working 4096-CPU systems, or if unix.SchedGetaffinity/SchedSetaffinity would be updated to work with dynamic (large enough) cpuset sizes, then this fix would work as is. With magic numbers we would have introduced only a new place that needs to be fixed at some point.

@cyphar
Copy link
Member

cyphar commented Nov 19, 2025

I'm not sure this completely fixes #5023 -- yes, it stops the reset issue but still leaves you with the same problem that #4858 was trying to solve. If you are explicitly requesting CPUs >= 1024 then you will still not be able to get them AFAICS because we still use unix.SchedSetaffinity which doesn't support CPUs >= 1024 (this appears to be what #5023 is talking about, but the reset case could also cause problems if you are forcing the affinity using cgroups with cpuset).

I think we should just be calling sched_setaffinity directly. We can either just call it directly for this one case (i.e., pass an array of 0xFF that is "long enough" -- the current upstream kernel maximum is 8192 CPUs which would be an 128-long uint64 array) or we can copy the code from golang.org/x/sys and make it use slices instead of a fixed-size array so that we can support larger CPU values for the explicit CPU affinity configuration.

For what it's worth, I think even the current behaviour of resetting to use the first 1024 CPUs by default is better than regressing #4858.

@askervin askervin force-pushed the 5eA-workaround-max-1kcpus branch 2 times, most recently from c92d043 to be8dbc6 Compare November 19, 2025 09:04
@askervin
Copy link
Contributor Author

askervin commented Nov 19, 2025

@cyphar, I think you're right: why making a quick fix when the proper fix is not really that much harder.

Updated. Playing as safe as the latest go runtime in the size of the CPU mask.

@askervin askervin force-pushed the 5eA-workaround-max-1kcpus branch from be8dbc6 to 8618a16 Compare November 19, 2025 11:19
@askervin askervin changed the title libct: do not reset CPU affinity if it prevents using cpu >= 1024 libct: fix resetting CPU affinity Nov 19, 2025
@cyphar cyphar added this to the 1.4.1 milestone Nov 20, 2025
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a single nit, otherwise LGTM

unix.CPUSet is limited to 1024 CPUs. Calling
unix.SchedSetaffinity(pid, cpuset) removes all CPUs starting from 1024
from allowed CPUs of pid, even if cpuset is all ones. The consequence
of runc trying to reset CPU affinity by default is that it prevents
all containers from using those CPUs.

This change is uses huge CPU mask to play safe and get all possible
CPUs enabled with single sched_setaffinity call.

Fixes: opencontainers#5023

Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
@askervin askervin force-pushed the 5eA-workaround-max-1kcpus branch from 8618a16 to 016fac8 Compare December 5, 2025 15:56
@kolyshkin
Copy link
Contributor

Hmm, should we try to fix unix.CPUSet instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Resetting CPU affinity does the opposite on 1024+ CPU systems

4 participants