-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uncore perf #519
base: master
Are you sure you want to change the base?
Uncore perf #519
Conversation
Add a property "uncore_max_delta_mhz" to cpu_plugin, so that uncore maximum frequency can be set to a lower value from the default maximum. Lowering uncore maximum frequency can have significant impact on the total power consummations of SoC. The uncore can consume significant amount of power in Intel's Xeon servers based on the workload characteristics. To optimize the total power and improve overall performance, SoCs have internal algorithms for scaling uncore frequency. Refer to the following link to get details: https://docs.kernel.org/admin-guide/pm/intel_uncore_frequency_scaling.html Based on experiments done on Intel Sapphire Rapids server by running a wide variety of server workloads, power saving is significant with some performance loss. For example a 500 MHz reduction causes 5% performance loss and reduces total active power by 13%. This property knob helps users to optimize power based on their tolerance for reduced performance. The unit for this property knob is in MHz. Signed-off-by: Todd Brandt <todd.e.brandt@intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Based on experiments done on Intel Sapphire Rapids server by running a wide variety of server workloads from phoronix, power saving is 8% with performance loss of 1.5%. Introduce a profile which inherits "throughput-performance" and reduce the uncore maximum frequency by 200MHz. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
I just tested this patch on a single socket Sapphire Rapids system, running a 6.3.0-0.rc4.35.eln126.x86_64 kernel. I did see power savings at idle, (12% when C1 was not pinned, and 8% when C1 was pinned). While I agree this would be good to get into TuneD, I'm not sure doing it with a new TuneD profile is the way to go. Question back to Jaroslav and the TuneD maintainers: |
To add a clarification to my earlier comment, there was no power savings when the cpus were busy. Those 8% and 12% power savings were only for idle systems. |
data = self._cmd.read_file(fpath, err_ret=None, no_error=True) | ||
try: | ||
value = int(data.strip()) | ||
except: |
Check notice
Code scanning / CodeQL
Except block handles 'BaseException'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix in the next version, once issue with the general direction regarding profile is clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I assume from this that the "int" call has its own baseexception that I'm overriding. Not sure what the problem is with that but I can replace this with a simple "if re.match('[0-9]+', data.strip()):". That try/catch is just there in case the sysfs file returns garbage (e.g. anything other than a decimal integer). If it happens I ignore its package and move on like it doesn't exist.
When CPUs are 100% busy and accessing lots of memory all the time, then saving will not be seen. But workloads has idle times and also don't access uncore constantly to keep uncore high. So overall this is good knob. |
Absolutely. I'd personally like to avoid adding yet another profile. The "knobs" that could make use of the functionality in any TuneD profile would be nice though. |
Do you mean that add a knob to: |
Personally, that's not what I had in mind. First, we probably need to agree what "knobs"/variables/tunables we need alongside with their names (just |
Hi Jiri, Jaroslav, and Srinivas: For example, we could:
I would defer to Jaroslav on his preferred approach for doing something like this. |
Hi Jiri, Jaroslav, Joe We can resubmit as suggested by Joe. Please let us know. |
Hi Joe, I created /etc/tuned/power-saving-variables.conf with contents. # governor=performance ///////////// [main] [variables] [modules] [cpu] # energy_perf_bias=normal "This is what balanced defines" energy_perf_bias=${energy_perf_bias} There is no way to say that if ${energy_perf_bias} is defined use the value So what you are suggesting, may not be possible with current tuned? |
Does this rely on the intel_uncore kmod? Will this allow similar functionality in tuned like we've been able to use in the past via the msr-tools package, specifically the binaries |
For the configurations we expect users usually to customize, the variables are way to go. If users are not expected to usually customize the configuration then new profile is probably a better solution. In this specific case I would also prefer variables, i.e. the
IMHO this should be possible to implement with the Regarding the current implementation in this PR:
|
Hi, I would like to move this PR forward. IIUC we have two separate issues here, first is adding uncore frequency knob to cpu plugin, and second incorporate the knob into existing profiles (via variable.conf). For now I would like to concentrate on the first issue - a, once that done move to the second one.
It's of cource reasonable to configure this per cpu. The one problem I see is that the all cpu's in the config might not necessary located in the same die. Basically will be required that devices='cpu list' is configured to die. That's ok I think , we can log error and make verify fail , if the configuration is not correct.
Yes, it would be better to have the same units obviously. Regarding name it's delta, because we don't know apriori what is the maximum frequency. We read that value from sysfs intial_max_freq_khz and subtract the delta and write to max_freq_khz. |
Yes.
Yes, I think using sysfs intel_uncore_frequency would be preferred to configure uncore frequency over rdmsr/wrmsr. |
According to the kernel documentation (https://docs.kernel.org/admin-guide/pm/intel_uncore_frequency_scaling.html) this is configured per package and die combination - not per CPU. So I don't see how or why we would try to configure this for specific CPUs in tuned.
I believe that in at least some cases, either Intel or the HW vendor is going to require a specific uncore frequency to be set. I think tuned needs to allow the uncore max_freq_khz and min_freq_khz to be set specifically - not using deltas. Or perhaps we should allow either method to be used? |
We know about topology, the die_id can be read from sysfs i.e for cpu6 is: However there are uncore's that might not contain cpu's (there are uncore* entries additionally to package_die entries in . /sys/devices/system/cpu/intel_uncore_frequency). So this is not that simple.
I'm open to suggestions here. It can be direct value or percentage of max for example. |
I've opened another PR . Please check it out to see if it goes to the right direction. It allows to define uncore freq per cpu (the config is checked against topology) . Hope it's something reasonable. Otherwise what I think could be done is one global option (as in this PR) or separate intel_uncore plugin where device is defined as entry in |
Uncore support has been merged, if you want the setting in the |
Hello maintainers,
This pull request for uncore power optimization on Intel servers. Please provide feedback.