Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce quorum-based chunk deduplication strategy #68

Merged
merged 2 commits into from
Aug 7, 2024

Conversation

hczhu-db
Copy link
Collaborator

@hczhu-db hczhu-db commented Aug 7, 2024

This chunk dedup strategy is specific to Databricks' setup where each time series is written to at least 2 out of 3 replicas. Each chunk should have 3 replicas in most cases, and 2 replicas in the worst acceptable cases. Quorum-based deduplication is used to pick the majority value among 3 replicas. If a chunck has only 2 identical replicas, there might be another chunk with corrupt data. We want to send those two identical replicas to the later quorum-based deduplication process to dominate any corrupt third replica.

Unit test

$ go test -timeout 30s -run ^TestDedupRespHeap_QuorumChunkDedup$ github.com/thanos-io/thanos/pkg/store

ok  	github.com/thanos-io/thanos/pkg/store	2.364s

Integration test with the new feature

@hczhu-db hczhu-db requested a review from jnyi August 7, 2024 05:04
@hczhu-db hczhu-db changed the title Keep duplicate chunks if there is 3 identical ones Introduce quorum-based chunk deduplication strategy Aug 7, 2024
@hczhu-db hczhu-db requested review from a team, christopherzli and yuchen-db and removed request for a team August 7, 2024 15:20
Copy link
Collaborator

@jnyi jnyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great job HC, awesome debugging!

Copy link
Collaborator Author

@hczhu-db hczhu-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed the comments

@hczhu-db hczhu-db merged commit d3b24cb into db_main Aug 7, 2024
11 of 12 checks passed
@jnyi jnyi deleted the quorum-chunk-dedup branch November 1, 2024 23:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants