[NEW] sentinel shouldn't pick slaves that are not synced with master #13533 #1012

ichernev · 2024-09-10T11:54:29Z

The problem/use-case that the feature addresses

Currently if the sentinel picks slaves that are connected to the master, but they might not be synced up properly, either because syncing takes time, or because of some misconfiguration (like client-output-buffer-limit slave) prevents sync.

Description of the feature

The sentinel should keep track of when was the last time a slave was fully in-sync with the master (not just connected, as is the case now), and when switching should ignore and/or deprioritize slaves that are not up-to-date.

Ideally it should have ignore threshold (say, slaves that were last in sync more than 10 mins ago are not considered at all), and for all slaves that were in-sync in the last 10mins, choose the one that was in-sync the most recent. In the ideal case there would be multiple slaves that are currently in sync, so it can pick from those according to existing criteria.

Alternatives you've considered

We've implemented an external system that monitors the last-time-in-sync and adjusts slave priorities, so the sentinel does take that into account via the priority parameter.

However, this just feels like it should be handled from the sentinel itself. It does have logic about master-slave connection, but just being connected is not nearly enough, that's why my suggestion is to enhance this particular feature "is master-slave connected" into "is master-slave in sync (and if not, when was it last in sync, sooner is better)".

Additional information

To reproduce what I'd consider a failure, just start a master + slave and sentinel, put a small client-output-buffer-limit, make the master bigger, trigger a switch, make sure the old master got a few writes after the switch (it can happen in certain scenarios), then it won't be able to do partial-resync with the new-master, and now the new slave (old master) wouldn't be able to fully sync due to the buffer limit. Now the sentinel will be happy to switch back (if it detects odown) to the stuck slave.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NEW] sentinel shouldn't pick slaves that are not synced with master #13533 #1012

[NEW] sentinel shouldn't pick slaves that are not synced with master #13533 #1012

ichernev commented Sep 10, 2024

[NEW] sentinel shouldn't pick slaves that are not synced with master #13533 #1012

[NEW] sentinel shouldn't pick slaves that are not synced with master #13533 #1012

Comments

ichernev commented Sep 10, 2024