Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA-383852: Fix output from sar -A, move sar to cap(system-load) #13

Conversation

bernhardkaindl
Copy link
Collaborator

@bernhardkaindl bernhardkaindl commented Oct 10, 2023

CA-383852: Fix a recently added feature:

Fix 'sar -A' from timing out:

The problem it solves was/is that the attempt to add the output of sar -A
(submitted from outside our team, just reviewed and merged by us) didn't work.

It appeared to work at back then, but when we test it now, we see that it timeouts.

This issue was that when the human-readable output of sar -A
was added in April to --entries=xenserver-logs, no timeout
for this capability was added, and the default timeout is -1:

This results in a nearly immediate timeout of any added command,
when the capability does is not defined with a specific max_time= parameter

  • should we fix this?

To fix this, the reporter of the issue suggested that sar (a system load report) could be moved to a new capability.

Also, it should be noted that sar records are binary system load records, not XenServer specific log files:

Therefore, with this PR, I propose to create a new capability (system_load) to collect the sar data files (and the human-readable of output of sar -A) with a timeout of 30 seconds and a max_size of 70 MB.

I've set the personally identifiable information setting of system-load PII_MAYBE because of a private review comment (due to time pattern, maybe someone could make a guess based on activity data. That is definitely safe as the list of network interfaces and similar information is also currently set to MAYBE for other caps.).

# PII -- Personally identifiable information.  Of particular concern are
# things that would identify customers, or their network topology.
# Passwords are never to be included in any bug report, regardless of any PII
# declaration.
#
# NO            -- No PII will be in these entries.
# YES           -- PII will likely or certainly be in these entries.
# MAYBE         -- The user may wish to audit these entries for PII.
# IF_CUSTOMIZED -- If the files are unmodified, then they will contain no PII,
# but since we encourage customers to edit these files, PII may have been
# introduced by the customer.  This is used in particular for the networking
# scripts in dom0.

@bernhardkaindl bernhardkaindl requested review from liulinC, psafont and rosslagerwall and removed request for liulinC October 10, 2023 15:47
@bernhardkaindl bernhardkaindl changed the title Fix getting output from sar -A, moving sar to "system-load" CA-383852: Fix getting output from sar -A, moving sar to "system-load" Oct 12, 2023
@bernhardkaindl bernhardkaindl changed the title CA-383852: Fix getting output from sar -A, moving sar to "system-load" CA-383852: Fix getting output from sar -A, move "sar" to "system-load" Oct 12, 2023
@bernhardkaindl bernhardkaindl changed the title CA-383852: Fix getting output from sar -A, move "sar" to "system-load" CA-383852: Fix getting output from sar -A, move sar to cap(system-load) Oct 12, 2023
@bernhardkaindl bernhardkaindl changed the title CA-383852: Fix getting output from sar -A, move sar to cap(system-load) CA-383852: Fix output from sar -A, move sar to cap(system-load) Oct 12, 2023
rosslagerwall
rosslagerwall previously approved these changes Oct 12, 2023
Fix `sar -A` from timing out:

This issue was that when the human-readable output of `sar -A`
was added in April to `--entries=xenserver-logs`, the timeout
for the capability was not defined.

The capability `xenserver-logs` uses the default timeout of -1
which is racey for commands as they are immediately stopped
due to timeout of -1 running out unless they are very brief.

To fix this, it was suggested that `sar` (a system load report)
could be moved to a new capability. Also, it should be noted that
`sar` records are binary system load records, not XenServer specific
log files.

Therefore, create a new capability (`system_load`), responsible for
collecting the sar data files (and the human-readable of output
of `sar -A`) with a timeout of 30 seconds and a `max_size` of 70 MB.

Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
@bernhardkaindl bernhardkaindl force-pushed the private/berhardk/CA-383852-fix-sar-and-move-to-system-load branch from b93ef68 to 78823aa Compare October 12, 2023 13:27
@@ -426,6 +427,7 @@ cap(CAP_PERSISTENT_STATS, PII_NO, max_size=50*MB,
max_time=60, checked=False, hidden=True)
cap(CAP_PROCESS_LIST, PII_YES, max_size=30*KB,
max_time=60)
cap(CAP_SYSTEM_LOAD, PII_MAYBE, max_size=70*MB, max_time=30)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove this max_size from the CAP_XENSERVER_LOGS to avoid growing the size of the status report?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For CAP_SYSTEM_LOAD: The files in /var/log/sa which are collected in it now is ~33 MB (it keeps 60 binary in fixed binary format, so the size should not grow beyond that):

du -sch /var/log/sa
33M     /var/log/sa
33M     total

For CAP_XENSERVER_LOGS: Yes, I think not setting a max_size for CAP_XENSERVER_LOGS would simplify the code as it currently recalculates and updates its size, which may not be needed then. But we can keep that as it is now. If I understand this correctly, this can be done later when we clean up the code.

@bernhardkaindl bernhardkaindl merged commit c433647 into master Oct 13, 2023
4 checks passed
@bernhardkaindl bernhardkaindl deleted the private/berhardk/CA-383852-fix-sar-and-move-to-system-load branch February 14, 2024 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants