You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently if the sentinel picks slaves that are connected to the master, but they might not be synced up properly, either because syncing takes time, or because of some misconfiguration (like client-output-buffer-limit slave) prevents sync.
Description of the feature
The sentinel should keep track of when was the last time a slave was fully in-sync with the master (not just connected, as is the case now), and when switching should ignore and/or deprioritize slaves that are not up-to-date.
Ideally it should have ignore threshold (say, slaves that were last in sync more than 10 mins ago are not considered at all), and for all slaves that were in-sync in the last 10mins, choose the one that was in-sync the most recent. In the ideal case there would be multiple slaves that are currently in sync, so it can pick from those according to existing criteria.
Alternatives you've considered
We've implemented an external system that monitors the last-time-in-sync and adjusts slave priorities, so the sentinel does take that into account via the priority parameter.
However, this just feels like it should be handled from the sentinel itself. It does have logic about master-slave connection, but just being connected is not nearly enough, that's why my suggestion is to enhance this particular feature "is master-slave connected" into "is master-slave in sync (and if not, when was it last in sync, sooner is better)".
Additional information
To reproduce what I'd consider a failure, just start a master + slave and sentinel, put a small client-output-buffer-limit, make the master bigger, trigger a switch, make sure the old master got a few writes after the switch (it can happen in certain scenarios), then it won't be able to do partial-resync with the new-master, and now the new slave (old master) wouldn't be able to fully sync due to the buffer limit. Now the sentinel will be happy to switch back (if it detects odown) to the stuck slave.
The text was updated successfully, but these errors were encountered:
The problem/use-case that the feature addresses
Currently if the sentinel picks slaves that are connected to the master, but they might not be synced up properly, either because syncing takes time, or because of some misconfiguration (like
client-output-buffer-limit slave
) prevents sync.Description of the feature
The sentinel should keep track of when was the last time a slave was fully in-sync with the master (not just connected, as is the case now), and when switching should ignore and/or deprioritize slaves that are not up-to-date.
Ideally it should have ignore threshold (say, slaves that were last in sync more than 10 mins ago are not considered at all), and for all slaves that were in-sync in the last 10mins, choose the one that was in-sync the most recent. In the ideal case there would be multiple slaves that are currently in sync, so it can pick from those according to existing criteria.
Alternatives you've considered
We've implemented an external system that monitors the last-time-in-sync and adjusts slave priorities, so the sentinel does take that into account via the priority parameter.
However, this just feels like it should be handled from the sentinel itself. It does have logic about master-slave connection, but just being connected is not nearly enough, that's why my suggestion is to enhance this particular feature "is master-slave connected" into "is master-slave in sync (and if not, when was it last in sync, sooner is better)".
Additional information
To reproduce what I'd consider a failure, just start a master + slave and sentinel, put a small client-output-buffer-limit, make the master bigger, trigger a switch, make sure the old master got a few writes after the switch (it can happen in certain scenarios), then it won't be able to do partial-resync with the new-master, and now the new slave (old master) wouldn't be able to fully sync due to the buffer limit. Now the sentinel will be happy to switch back (if it detects odown) to the stuck slave.
The text was updated successfully, but these errors were encountered: