Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW] sentinel shouldn't pick slaves that are not synced with master #13533 #1012

Open
ichernev opened this issue Sep 10, 2024 · 0 comments
Open

Comments

@ichernev
Copy link

The problem/use-case that the feature addresses

Currently if the sentinel picks slaves that are connected to the master, but they might not be synced up properly, either because syncing takes time, or because of some misconfiguration (like client-output-buffer-limit slave) prevents sync.

Description of the feature

The sentinel should keep track of when was the last time a slave was fully in-sync with the master (not just connected, as is the case now), and when switching should ignore and/or deprioritize slaves that are not up-to-date.

Ideally it should have ignore threshold (say, slaves that were last in sync more than 10 mins ago are not considered at all), and for all slaves that were in-sync in the last 10mins, choose the one that was in-sync the most recent. In the ideal case there would be multiple slaves that are currently in sync, so it can pick from those according to existing criteria.

Alternatives you've considered

We've implemented an external system that monitors the last-time-in-sync and adjusts slave priorities, so the sentinel does take that into account via the priority parameter.

However, this just feels like it should be handled from the sentinel itself. It does have logic about master-slave connection, but just being connected is not nearly enough, that's why my suggestion is to enhance this particular feature "is master-slave connected" into "is master-slave in sync (and if not, when was it last in sync, sooner is better)".

Additional information

To reproduce what I'd consider a failure, just start a master + slave and sentinel, put a small client-output-buffer-limit, make the master bigger, trigger a switch, make sure the old master got a few writes after the switch (it can happen in certain scenarios), then it won't be able to do partial-resync with the new-master, and now the new slave (old master) wouldn't be able to fully sync due to the buffer limit. Now the sentinel will be happy to switch back (if it detects odown) to the stuck slave.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant