Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve postpone checks outside of check period #494

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sni
Copy link
Contributor

@sni sni commented Feb 20, 2025

with #490 we rescheduled checks found outside their check period to the next time slot in their check period with a random delay of 60 seconds (or what ever retained_scheduling_randomize_window was set to)

There are 2 scenarios when this leads to scheduling issues:

  1. Lots of checks with a medium check interval (ex. 2h) and office hours
    timeperiod. Those checks should evenly be scheduled over the 2h interval
    but currently, they would start with the office hours and from then they
    all at once every 2 hours which results in load peaks.
    The solution here is to take the check period into account when postponing
    the next check.
  2. Consider checks with a long check interval (ex.: 24h) and small timeperiods,
    ex.: only 08:00 till 08:05. In this case we need to take the actual time slot
    into account to find a valid next check time slot.

While on it, i merged the code into a generic function which is then used for hosts and services.

@sni sni force-pushed the improve_postpone_checks branch 3 times, most recently from dac3cec to 0c0c871 Compare February 21, 2025 09:50
with naemon#490 we rescheduled checks found outside their check period to the
next time slot in their check period with a random delay of
60 seconds (or what ever retained_scheduling_randomize_window was set to)

There are 2 scenarios when this leads to scheduling issues:

1) Lots of checks with a medium check interval (ex. 2h) and office hours
   timeperiod. Those checks should evenly be scheduled over the 2h interval
   but currently, they would start with the office hours and from then they
   all at once every 2 hours which results in load peaks.
   The solution here is to take the check period into account when postponing
   the next check.
2) Consider checks with a long check interval (ex.: 24h) and small timeperiods,
   ex.: only 08:00 till 08:05. In this case we need to take the actual time slot
   into account to find a valid next check time slot.

While on it, i merged the code into a generic function which is then used for hosts and services.
@sni sni force-pushed the improve_postpone_checks branch from 0c0c871 to f9b2699 Compare February 21, 2025 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant