Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change or disable global status errors right after a director deployment #5259

Open
slalomsk8er opened this issue Sep 17, 2024 · 9 comments
Open

Comments

@slalomsk8er
Copy link

Is your feature request related to a problem? Please describe.

Every director deployment results in messages like this:
image
Which make Icinga look bad in the eyes of the users.

Describe the solution you'd like

Make the code check if the director just deployed and based on this increase the timeout or change the message text and color.

Describe alternatives you've considered

Globally increase the timeout.

Additional context

Add any other context or screenshots about the feature request here.

@nilmerg
Copy link
Member

nilmerg commented Sep 17, 2024

Hey, how exactly is the deployment performed?

@nilmerg nilmerg added the needs-feedback We'll only proceed once we hear from you again label Sep 17, 2024
@slalomsk8er
Copy link
Author

We deploy manually ATM.

@nilmerg
Copy link
Member

nilmerg commented Sep 17, 2024

And what does manually mean? Exactly? 😉

@slalomsk8er
Copy link
Author

slalomsk8er commented Sep 17, 2024

Clicking on one of the "Ausrollen" links that are distributed all over the director. 😉

@slalomsk8er slalomsk8er changed the title Change or disable global status errors right after a director deploymen Change or disable global status errors right after a director deployment Sep 17, 2024
@nilmerg nilmerg removed the needs-feedback We'll only proceed once we hear from you again label Sep 17, 2024
@nilmerg
Copy link
Member

nilmerg commented Sep 23, 2024

I cannot reproduce this (with a sleep(120) in my director config).

Any idea what could cause this? @yhabteab

@yhabteab
Copy link
Member

I cannot reproduce this (with a sleep(120) in my director config).

As we discussed last time, a director deployment should never prevent Icinga DB from updating the icingadb_instance table, but looking at the Icinga DB web code, I see two reasons why this might happen:

  • It's unlikely in this specific situation, if there's no database entries at all in the icingadb_instance table.
  • If the instance heartbeat from the database is less than now() - 60. Though, when a Icinga Director deployment would interfere with this in any way, Icinga DB Web would actually render Redis is outdated. Make sure Icinga 2 is running and connected to Redis. instead, but this problem could be caused by the same reasons as in Competing HA takeover results in both instances becoming active icingadb#787 since @slalomsk8er is affected by that issue.

@nilmerg
Copy link
Member

nilmerg commented Sep 23, 2024

If there are multiple rows in icingadb_instance, Icinga DB Web makes sure that the newest (heartbeat desc) is evaluated. So it shouldn't be affected by this 🤔

@yhabteab
Copy link
Member

If there are multiple rows in icingadb_instance

That is not the problem in the referenced issue! The problem is that the active Icinga DB instance inserts a outdated heartbeat instance into the icingadb_instance table while remaining HA responsible, and the passive instance reads this outdated heartbeat like Icinga DB Web does and thinks that the other instance is gone and has to take over HA responsibility, resulting in both instances becoming responsible.

@nilmerg
Copy link
Member

nilmerg commented Sep 25, 2024

thinks that the other instance is gone

and doesn't insert a row in icingadb_instance because of this?

If so, it may be the same reason. But why is this related to a director deployment? (@slalomsk8er wrote above the message (competition) is caused by this every time)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants