Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken downtime comment sync #10000

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

yhabteab
Copy link
Member

@yhabteab yhabteab commented Feb 12, 2024

All objects must be synced sorted by their load dependency. Otherwise, downtimes and/or comments might get synced before their respective Checkables, which will result in comments and downtimes being ignored by the other endpoint since it does not yet know about their checkables. Given that the runtime config updates event does not trigger a reload on the remote endpoint, these objects won't be synced again until the next reload.

~/master2/icinga2 (bundled-cluster-fixes ✗) ls prefix/var/lib/icinga2/api/packages/_api/*/conf.d/downtimes | wc -l
    3501
~/master2/icinga2 (bundled-cluster-fixes ✗) curl -sSku root:icinga 'https://localhost:5666/v1/objects/downtimes?pretty=1' | grep ' "attrs": {' | wc -l
    1501
~/master1/icinga2 (bundled-cluster-fixes ✗) curl -sSku root:icinga 'https://localhost:5665/v1/objects/downtimes?pretty=1' | grep ' "attrs": {' | wc -l
    3501

After master2 reload:

~/master2/icinga2 (bundled-cluster-fixes ✗) curl -sSku root:icinga 'https://localhost:5666/v1/objects/downtimes?pretty=1' | grep ' "attrs": {' | wc -l
    3501

closes #7786
closes #9873

TODO

@yhabteab yhabteab added bug Something isn't working area/distributed Distributed monitoring (master, satellites, clients) ref/IP area/runtime Downtimes, comments, dependencies, events labels Feb 12, 2024
@cla-bot cla-bot bot added the cla/signed label Feb 12, 2024
lib/remote/apilistener-configsync.cpp Outdated Show resolved Hide resolved
@log1-c
Copy link
Contributor

log1-c commented Jul 22, 2024

Quick question: would this also fix the following issue:
Downtimes scheduled via API (sometimes) get synced/re-created with delay and are doubled #10078

@Mordecaine
Copy link

Mordecaine commented Aug 13, 2024

Quick question: would this also fix the following issue: Downtimes scheduled via API (sometimes) get synced/re-created with delay and are doubled #10078

It would be very helpful to get an answer to this.

@yhabteab
Copy link
Member Author

Quick question: would this also fix the following issue: Downtimes scheduled via API (sometimes) get synced/re-created with delay and are doubled #10078

It would be very helpful to get an answer to this.

Hi, we don't know for sure whether this will fix #10078 as we still haven't identified exactly what is going wrong there, other than something is not working as expected. It's unlikely that this PR will fix #10078, but we can't tell you for sure until the cause for #10078 is identified.

@Al2Klimov Al2Klimov removed their assignment Aug 20, 2024
@Al2Klimov Al2Klimov self-requested a review August 20, 2024 11:16
@dgiesselbach
Copy link

@yhabteab When will this request be completed? Is there a timeline?

@yhabteab yhabteab changed the base branch from master to enhanced-sort-types-by-load-dependencies September 16, 2024 07:23
@yhabteab yhabteab force-pushed the enhanced-sort-types-by-load-dependencies branch 4 times, most recently from cb4fe57 to eb97676 Compare September 20, 2024 14:18
@Al2Klimov Al2Klimov changed the base branch from enhanced-sort-types-by-load-dependencies to master September 20, 2024 15:30
@Al2Klimov Al2Klimov marked this pull request as draft September 20, 2024 15:31
@julianbrost
Copy link
Contributor

I believe similar problems will still exist for other types where there's no load_after dependency at the moment: many objects can refer to a time period, however, there's not a single load_after TimePeriod. There are more such examples: Host/Service -> *Command, Notification -> NotificationCommand (not necessarily a complete list).

@yhabteab
Copy link
Member Author

A complete list of the navigable aka navigation types (attributes):

~/Workspace/icinga2 (broken-downtime-comment-sync ✗) grep -rE '\[.*navigation.*\]' lib
lib/icinga/dependency.ti:       [config, no_user_modify, required, navigation(child_host)] name(Host) child_host_name {
lib/icinga/dependency.ti:       [config, no_user_modify, navigation(child_service)] String child_service_name {
lib/icinga/dependency.ti:       [config, no_user_modify, required, navigation(parent_host)] name(Host) parent_host_name {
lib/icinga/dependency.ti:       [config, no_user_modify, navigation(parent_service)] String parent_service_name {
lib/icinga/dependency.ti:       [config, navigation] name(TimePeriod) period (PeriodRaw) {
lib/icinga/checkable.ti:        [config, required, navigation] name(CheckCommand) check_command (CheckCommandRaw) {
lib/icinga/checkable.ti:        [config, navigation] name(TimePeriod) check_period (CheckPeriodRaw) {
lib/icinga/checkable.ti:        [config, navigation] name(EventCommand) event_command (EventCommandRaw) {
lib/icinga/checkable.ti:        [config, navigation] name(Endpoint) command_endpoint (CommandEndpointRaw) {
lib/icinga/downtime.ti: [config, no_user_modify, required, navigation(host)] name(Host) host_name {
lib/icinga/downtime.ti: [config, no_user_modify, navigation(service)] String service_name {
lib/icinga/notification.ti:     [config, protected, required, navigation] name(NotificationCommand) command (CommandRaw) {
lib/icinga/notification.ti:     [config, navigation] name(TimePeriod) period (PeriodRaw) {
lib/icinga/notification.ti:     [config, no_user_modify, protected, required, navigation(host)] name(Host) host_name {
lib/icinga/notification.ti:     [config, protected, no_user_modify, navigation(service)] String service_name {
lib/icinga/notification.ti:     [config, navigation] name(Endpoint) command_endpoint (CommandEndpointRaw) {
lib/icinga/comment.ti:  [config, no_user_modify, protected, required, navigation(host)] name(Host) host_name {
lib/icinga/comment.ti:  [config, no_user_modify, protected, navigation(service)] String service_name {
lib/icinga/scheduleddowntime.ti:        [config, protected, no_user_modify, required, navigation(host)] name(Host) host_name {
lib/icinga/scheduleddowntime.ti:        [config, protected, no_user_modify, navigation(service)] String service_name {
lib/icinga/service.ti:  [no_storage, navigation] Host::Ptr host {
lib/icinga/user.ti:     [config, navigation] name(TimePeriod) period (PeriodRaw) {
lib/remote/zone.ti:     [config, no_user_modify, navigation] name(Zone) parent (ParentRaw) {

@yhabteab yhabteab marked this pull request as ready for review September 26, 2024 14:03
@yhabteab yhabteab added the consider backporting Should be considered for inclusion in a bugfix release label Sep 26, 2024
@yhabteab
Copy link
Member Author

A complete list of the navigable aka navigation types (attributes):

Another list of non-navigable dependencies :):

~/Workspace/icinga2 (broken-downtime-comment-sync ✗) grep -rE 'array\(name.*' lib     
lib/icinga/host.ti:     [config, no_user_modify, required, signal_with_old_value] array(name(HostGroup)) groups {
lib/icinga/notification.ti:     [config, signal_with_old_value] array(name(User)) users (UsersRaw);
lib/icinga/notification.ti:     [config, signal_with_old_value] array(name(UserGroup)) user_groups (UserGroupsRaw);
lib/icinga/servicegroup.ti:     [config, no_user_modify] array(name(ServiceGroup)) groups;
lib/icinga/hostgroup.ti:        [config, no_user_modify] array(name(HostGroup)) groups;
lib/icinga/usergroup.ti:        [config, no_user_modify] array(name(UserGroup)) groups;
lib/icinga/service.ti:  [config, no_user_modify, required, signal_with_old_value] array(name(ServiceGroup)) groups {
lib/icinga/user.ti:     [config, no_user_modify, required, signal_with_old_value] array(name(UserGroup)) groups {
lib/icinga/timeperiod.ti:       [config, required, signal_with_old_value] array(name(TimePeriod)) excludes {
lib/icinga/timeperiod.ti:       [config, required, signal_with_old_value] array(name(TimePeriod)) includes {
lib/remote/zone.ti:     [config] array(name(Endpoint)) endpoints (EndpointsRaw);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/distributed Distributed monitoring (master, satellites, clients) area/runtime Downtimes, comments, dependencies, events bug Something isn't working cla/signed consider backporting Should be considered for inclusion in a bugfix release ref/IP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants