-
Notifications
You must be signed in to change notification settings - Fork 22
Add some helpers to report potential problem on scheduler side #182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
brenns10
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this module, it's going to be really useful. The code looks great as well, but I have a suggestion. The docstrings for check_runq_wakelist and check_idle_runq have really useful context, and it's a bit of a shame for that context to be hidden away in the code.
Do you think it would be possible to make each explanation a global variable, and then have the run_queue_check() function print each explanation at the end, only if necessary? Something like this:
EXPLAIN_WAKELIST = """
[TASKS ON WAKE LIST]
A runq's wakelist temporarily holds tasks that are about to be woken
up on that CPU. If this list has multiple tasks, it usually means that
this CPU has missed multiple scheduler IPIs. This can imply issues
like IRQs being disabled for too long, IRQ delivery issues between hypervisor
and VM, or some other issue.
"""
EXPLAIN_IDLE_RUNQ = """
[IDLE RUN QUEUE]
...
"""Then, each function that performs checks could take a parameter, explanations: Set[str], and you could do:
explanations.add(EXPLAIN_WAKELIST)
print(f"cpu: {cpu} has following tasks in its runq wake_list:")
...
print("See TASKS ON WAKE LIST below")And then run_queue_check() could handle printing out explanations:
explanations = set()
for cpu in for_each_online_cpu(prog):
check_runq_wakelist(prog, cpu, explanations)
...
if not explanations:
return
print("Note: found some possible run queue issues. Explanations below:")
for explanation in explanations:
print(explanation)I think this would be really helpful even for support & sustaining because we don't all have the context for why a detected issue may be a problem, and what next steps we should look into.
Yes its definitely possible and I agree that this help string can help in getting a better context of this data. I have made the suggested change |
2415226 to
733f566
Compare
|
@brenns10 thanks a lot for taking a look and sharing your feedback. I have addressed the review comments but at the moment I see some jobs failing. This looks like a setup issue. |
733f566 to
f00c3e0
Compare
Issues like missing IPIs or irq disablement for long, can create
obervable anomalies on scheduler side. For example a scheduler
IPI, missed/absent due to hypervisor issue, can cause migration/X
threads on VMs to get stuck for ever, causing softlockup.
Having a knowledge of these anomalies can help in locating the
actual problem.
For example following snippet shows that migration/23 thread
was stuck in runq's wake_list.
cpu: 23 has following tasks in its runq wake_list:
task pid: 273301, comm: ora_ipc0_csadsd
task pid: 281335, comm: ora_lms0_ccawsd
task pid: 391691, comm: ora_ipc0_webprd
task pid: 390722, comm: ora_ipc0_dppd1
task pid: 394144, comm: ora_lmd1_audprd
task pid: 394450, comm: ora_lms1_rbsspr
task pid: 393235, comm: ora_lmd0_etudpr
task pid: 24469, comm: cvfwd
task pid: 357613, comm: tnslsnr
task pid: 351700, comm: ocssd.bin
task pid: 394519, comm: ora_dia0_wspd1
task pid: 394307, comm: ora_lms1_wmsprd
task pid: 394773, comm: ora_lms0_ccadmp
task pid: 351141, comm: ocssd.bin
task pid: 394690, comm: ora_lms0_wspd1
task pid: 351774, comm: ocssd.bin
task pid: 351678, comm: ocssd.bin
task pid: 351692, comm: ocssd.bin
task pid: 351683, comm: ocssd.bin
task pid: 351680, comm: ocssd.bin
task pid: 351686, comm: ocssd.bin
task pid: 351681, comm: ocssd.bin
task pid: 351688, comm: ocssd.bin
task pid: 150, comm: migration/23
This will block migration/X threads running on other CPUs.
The helpers and corelens module added here can detect such issues.
Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the explanations! I think that'll be a great model as we start moving into more proactive checks & anomaly detection.
Since these helpers only execute on non-live systems, I triggered the vmcore tests on Gitlab (simply pushing a branch to the internal gitlab will do it, and you can view results under the "Pipelines" tab). There are some errors there -- one of which is from me which I'm fixing (#194), but the bigger issue is that the wake_list field is gone since UEK7, so we'll need to make sure this works on UEK7&8.
| explanations.add(EXPLAIN_IDLE_RUNQ) | ||
| if runq.nr_running.value_() != 0: | ||
| print(f"Idle cpu: {cpu} has non-zero nr_running") | ||
|
|
||
| nr_cfs_task = num_tasks_on_cfs_rq(prog, cpu) | ||
| nr_rt_task = num_tasks_on_rt_rq(prog, cpu) | ||
| nr_dl_task = num_tasks_on_dl_rq(prog, cpu) | ||
|
|
||
| if nr_cfs_task != 0: | ||
| print(f"Idle cpu: {cpu} has {nr_cfs_task} tasks on CFS runq") | ||
|
|
||
| if nr_rt_task != 0: | ||
| print(f"Idle cpu: {cpu} has {nr_rt_task} tasks on RT runq") | ||
|
|
||
| if nr_dl_task != 0: | ||
| print(f"Idle cpu: {cpu} has {nr_dl_task} tasks on DL runq") | ||
| print("See IDLE RUN QUEUE below") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that the explanation should only appear if one of the checks failed. So something more like this:
| explanations.add(EXPLAIN_IDLE_RUNQ) | |
| if runq.nr_running.value_() != 0: | |
| print(f"Idle cpu: {cpu} has non-zero nr_running") | |
| nr_cfs_task = num_tasks_on_cfs_rq(prog, cpu) | |
| nr_rt_task = num_tasks_on_rt_rq(prog, cpu) | |
| nr_dl_task = num_tasks_on_dl_rq(prog, cpu) | |
| if nr_cfs_task != 0: | |
| print(f"Idle cpu: {cpu} has {nr_cfs_task} tasks on CFS runq") | |
| if nr_rt_task != 0: | |
| print(f"Idle cpu: {cpu} has {nr_rt_task} tasks on RT runq") | |
| if nr_dl_task != 0: | |
| print(f"Idle cpu: {cpu} has {nr_dl_task} tasks on DL runq") | |
| print("See IDLE RUN QUEUE below") | |
| if runq.nr_running.value_() != 0: | |
| print(f"Idle cpu: {cpu} has non-zero nr_running [IDLE RUN QUEUE]") | |
| explanations.add(EXPLAIN_IDLE_RUNQ) | |
| nr_cfs_task = num_tasks_on_cfs_rq(prog, cpu) | |
| nr_rt_task = num_tasks_on_rt_rq(prog, cpu) | |
| nr_dl_task = num_tasks_on_dl_rq(prog, cpu) | |
| if nr_cfs_task != 0: | |
| print(f"Idle cpu: {cpu} has {nr_cfs_task} tasks on CFS runq [IDLE RUN QUEUE]") | |
| explanations.add(EXPLAIN_IDLE_RUNQ) | |
| if nr_rt_task != 0: | |
| print(f"Idle cpu: {cpu} has {nr_rt_task} tasks on RT runq [IDLE RUN QUEUE]") | |
| explanations.add(EXPLAIN_IDLE_RUNQ) | |
| if nr_dl_task != 0: | |
| print(f"Idle cpu: {cpu} has {nr_dl_task} tasks on DL runq [IDLE RUN QUEUE]") | |
| explanations.add(EXPLAIN_IDLE_RUNQ) |
| print("\n") | ||
| explanations.add(EXPLAIN_WAKELIST) | ||
| print(f"cpu: {cpu} has following tasks in its runq wake_list:") | ||
| for task in for_each_task_on_rq_wake_list(prog, cpu): | ||
| print( | ||
| f"task pid: {task.pid.value_()}, comm: {task.comm.string_().decode()}" | ||
| ) | ||
| print("See TASKS ON WAKE LIST below") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The
print("\n")prints two blank lines, since theprint()function always appends a newline to the string you give it. Is that intentional? If you only want one newline,print()with no argument would suffice. - I'm not sure whether I love the idea of printing all these newlines. Whenever helper functions start printing their own newlines, the output can have lots of extra spaces and it can be a bit difficult to read or track down which function printed which one. Instead, could we just indent each task for this CPU? That way, there is still visual grouping for each CPU, but we don't have the newlines contributing to odd spacing.
| if llist_empty(runq.wake_list): | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since torvalds/linux@a148866489fbe ("sched: Replace rq::wake_list"), wake_list doesn't exist, so this function doesn't work on UEK7 and later.
| def dump_rt_runq_wait_summary( | ||
| prog: Program, cpu: int, qduration_thresh_ms: int = 1000 | ||
| ): | ||
| """ | ||
| Iterate through all tasks, on a CPU's runq and list tasks that have been queued | ||
| for greater than specified threshold in ms (default 1000 ms). | ||
| :param prog: drgn Program | ||
| :param cpu: cpu number | ||
| :param qduration_thresh_ms: threshold for wait duration on runq | ||
| """ | ||
| runq = per_cpu(prog["runqueues"], cpu) | ||
| for task in for_each_task_on_rt_rq(prog, cpu): | ||
| try: | ||
| if ( | ||
| task.sched_info.last_queued.value_() > 0 | ||
| and task.sched_info.last_queued.value_() | ||
| > task.sched_info.last_arrival.value_() | ||
| ): | ||
| qduration = ( | ||
| runq.clock.value_() - task.sched_info.last_queued.value_() | ||
| ) | ||
| print( | ||
| f"cpu: {cpu} pid: {task.pid.value_()} prio: {task.prio.value_()} queued for {qduration} nsecs" | ||
| ) | ||
| except AttributeError: | ||
| pass | ||
|
|
||
|
|
||
| def dump_cfs_runq_wait_summary( | ||
| prog: Program, cpu: int, qduration_thresh_ms: int = 1000 | ||
| ): | ||
| """ | ||
| Iterate through all tasks, on a CPU's runq and list tasks that have been queued | ||
| for greater than specified threshold in ms (default 1000 ms). | ||
| :param prog: drgn Program | ||
| :param cpu: cpu number | ||
| :param qduration_thresh_ms: threshold for wait duration on runq | ||
| """ | ||
| runq = per_cpu(prog["runqueues"], cpu) | ||
| for task in for_each_task_on_cfs_rq(prog, cpu): | ||
| try: | ||
| if ( | ||
| task.sched_info.last_queued.value_() > 0 | ||
| and task.sched_info.last_queued.value_() | ||
| > task.sched_info.last_arrival.value_() | ||
| ): | ||
| qduration = ( | ||
| runq.clock.value_() - task.sched_info.last_queued.value_() | ||
| ) | ||
| print( | ||
| f"cpu: {cpu} pid: {task.pid.value_()} prio: {task.prio.value_()} queued for {qduration} nsecs" | ||
| ) | ||
| except AttributeError: | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions are the same, except for which kind of scheduler runqueue they operate on. Further, there's not an equivalent for the deadline scheduler. I think that makes sense for the deadline scheduler, because the deadline period may be greater than the threshold value -- though correct me if that was not the motivation for omitting it.
Regardless, it seems we should not have two copies of this function hanging around. Could we instead have a function that takes an iterable of tasks, plus a runqueue object (for consulting the runq clock), and prints out these warnings? Something like this:
| def dump_rt_runq_wait_summary( | |
| prog: Program, cpu: int, qduration_thresh_ms: int = 1000 | |
| ): | |
| """ | |
| Iterate through all tasks, on a CPU's runq and list tasks that have been queued | |
| for greater than specified threshold in ms (default 1000 ms). | |
| :param prog: drgn Program | |
| :param cpu: cpu number | |
| :param qduration_thresh_ms: threshold for wait duration on runq | |
| """ | |
| runq = per_cpu(prog["runqueues"], cpu) | |
| for task in for_each_task_on_rt_rq(prog, cpu): | |
| try: | |
| if ( | |
| task.sched_info.last_queued.value_() > 0 | |
| and task.sched_info.last_queued.value_() | |
| > task.sched_info.last_arrival.value_() | |
| ): | |
| qduration = ( | |
| runq.clock.value_() - task.sched_info.last_queued.value_() | |
| ) | |
| print( | |
| f"cpu: {cpu} pid: {task.pid.value_()} prio: {task.prio.value_()} queued for {qduration} nsecs" | |
| ) | |
| except AttributeError: | |
| pass | |
| def dump_cfs_runq_wait_summary( | |
| prog: Program, cpu: int, qduration_thresh_ms: int = 1000 | |
| ): | |
| """ | |
| Iterate through all tasks, on a CPU's runq and list tasks that have been queued | |
| for greater than specified threshold in ms (default 1000 ms). | |
| :param prog: drgn Program | |
| :param cpu: cpu number | |
| :param qduration_thresh_ms: threshold for wait duration on runq | |
| """ | |
| runq = per_cpu(prog["runqueues"], cpu) | |
| for task in for_each_task_on_cfs_rq(prog, cpu): | |
| try: | |
| if ( | |
| task.sched_info.last_queued.value_() > 0 | |
| and task.sched_info.last_queued.value_() | |
| > task.sched_info.last_arrival.value_() | |
| ): | |
| qduration = ( | |
| runq.clock.value_() - task.sched_info.last_queued.value_() | |
| ) | |
| print( | |
| f"cpu: {cpu} pid: {task.pid.value_()} prio: {task.prio.value_()} queued for {qduration} nsecs" | |
| ) | |
| except AttributeError: | |
| pass | |
| def dump_waiting_task_summary( | |
| cpu: int, | |
| runq: Object, | |
| tasks: Iterable[Object], | |
| qduration_thresh_ms: int = 1000, | |
| ): | |
| """ | |
| Print tasks on a CPU's runq that have been queued | |
| for greater than specified threshold in ms (default 1000 ms). | |
| :param cpu: cpu number | |
| :param runq: CPU's ``struct rq *`` | |
| :param tasks: iterable of ``struct task_struct *`` to check | |
| :param qduration_thresh_ms: threshold for wait duration on runq | |
| """ | |
| for task in tasks: | |
| try: | |
| if ( | |
| task.sched_info.last_queued.value_() > 0 | |
| and task.sched_info.last_queued.value_() | |
| > task.sched_info.last_arrival.value_() | |
| ): | |
| qduration = ( | |
| runq.clock.value_() - task.sched_info.last_queued.value_() | |
| ) | |
| print( | |
| f"cpu: {cpu} pid: {task.pid.value_()} prio: {task.prio.value_()} queued for {qduration} nsecs" | |
| ) | |
| except AttributeError: | |
| pass |
And we could call this function by:
from itertools import chain
tasks = chain(
for_each_task_on_rt_rq(prog, cpu),
for_each_task_on_cfs_rq(prog, cpu),
# if dl should be included:
# for_each_task_on_dl_rq(prog, cpu),
)
dump_waiting_task_summary(cpu, per_pcu(prog["runqueues"], cpu), tasks)Finally:
- Is
except AttributeErrornecessary? We need this to be compatible with UEK5+, so if we need a helper to correctly compute this (or if we already have one), let's do it and make sure we handle the cases, and then drop the exception handler here. Swallowing errors doesn't prevent them, it just hides them. - Can we have an explanation for this one too? I know it's a bit obvious, but it's good to follow the pattern.
| def for_each_task_on_rq_wake_list(prog: Program, cpu: int) -> Iterator[Object]: | ||
| """ | ||
| Iterate through all tasks, on a CPU's wake_list. | ||
| A CPU's wake_list contains tasks, that are in the process of being woken | ||
| up and have not yet landed on a CPU runq after wakeup. | ||
| Tasks should not reside here for long. | ||
| :param prog: drgn program | ||
| :param cpu: cpu number | ||
| :return: Iterator of ``struct task_struct *`` objects | ||
| """ | ||
|
|
||
| runq = per_cpu(prog["runqueues"], cpu) | ||
| for task in llist_for_each_entry( | ||
| "struct task_struct", runq.wake_list.first, "wake_entry" | ||
| ): | ||
| yield task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted elsewhere, wake_list is gone from UEK7 and later.
No description provided.