-
Notifications
You must be signed in to change notification settings - Fork 955
Cluster: Fix sub-replica and prevent empty primary when replica is blocked at replication and has inbound/outbound link race #2811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
zhijun42
wants to merge
13
commits into
valkey-io:unstable
Choose a base branch
from
zhijun42:race-empty-primary-with-test
base: unstable
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+361
−259
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Zhijun <dszhijun@gmail.com>
Signed-off-by: Zhijun <dszhijun@gmail.com>
Signed-off-by: Zhijun <dszhijun@gmail.com>
Signed-off-by: Zhijun <dszhijun@gmail.com>
Signed-off-by: Zhijun <dszhijun@gmail.com>
Signed-off-by: Zhijun <dszhijun@gmail.com>
Signed-off-by: Zhijun <dszhijun@gmail.com>
Signed-off-by: Zhijun <dszhijun@gmail.com>
Signed-off-by: Zhijun <dszhijun@gmail.com>
Signed-off-by: Zhijun <dszhijun@gmail.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #2811 +/- ##
============================================
- Coverage 72.45% 72.40% -0.05%
============================================
Files 128 128
Lines 70485 70497 +12
============================================
- Hits 51068 51042 -26
- Misses 19417 19455 +38
🚀 New features to boost your workflow:
|
61bbeab to
6133920
Compare
Signed-off-by: Zhijun <dszhijun@gmail.com>
6133920 to
e3a98ce
Compare
Contributor
Author
|
@PingXie @enjoy-binbin @hpatro Friendly ping 👀 Could you take a look at this when you got some time? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR handles a network race condition that would cause a replica to read stale cluster state and then incorrectly promote itself to an empty primary within an existing shard.
Issues
When multiple replicas attempt to synchronize from the same primary concurrently, the first replica to issue PSYNC triggers an RDB child fork (
RDB_CHILD_TYPE_SOCKET) for replication.Any replicas connecting shortly after the fork miss the join window — their PING commands are accepted but not replied to until the primary server's fork completes. As a result, those replicas remain blocked in
REPL_STATE_RECEIVE_PING_REPLYon their main thread, unable to process any cluster/client traffic, effectively dead to the outside world.If a failover occurs during this period (e.g., the primary goes down and another replica is elected leader), the blocked replica will:
server.repl_syncio_timeoutis 5s by default) and resume its main thread, reconnecting with other nodesThis results in two primaries in the same shard, with one being empty (0 slots) but still considered authoritative by all cluster servers.
For a concrete example, read the test case
test_blocked_replica_stale_state_raceand its comment in filetests/unit/cluster/replica-migration.tcl.Overall there're two issues:
This PR handles the second issue. Notice that this issue is flaky - If event [3] happens before [2] (inbound and outbound links are independent, so we can't guarantee the ordering), the replica won't become sub-replica and we won't have the empty primary issue.
I added guardrail logic in case such race condition does happen, and I also wrote a new test case to test my code, but since this issue can't be reliably reproduced (runs fine on my machine, but not on CI), this test case is disabled for now.
Fix
There could be different approaches to solving the problem. The way I do it is:
File
tests/unit/cluster/replica-migration.tclis mostly meant to test the logic in functionclusterUpdateSlotsConfigWith, so I add my new test case here. And since the test file has some duplicates, I extract them into helper functions to simplify the code.