Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Make segment replication the default for 3.0 #17162

Open
andrross opened this issue Jan 28, 2025 · 3 comments
Open

[Feature Request] Make segment replication the default for 3.0 #17162

andrross opened this issue Jan 28, 2025 · 3 comments
Labels
enhancement Enhancement or improvement to existing feature or request Indexing:Replication Issues and PRs related to core replication framework eg segrep untriaged

Comments

@andrross
Copy link
Member

Is your feature request related to a problem? Please describe

Segment replication was introduced in the 2.x line as an alternative to document replication. It can offer substantial performance benefits by reducing the amount of CPU-intensive indexing work that has to happen on the cluster.

Describe the solution you'd like

The default indexing method for new indexes in 3.0+ versions of OpenSearch should be SEGMENT instead of DOCUMENT.

Related component

Indexing:Replication

Describe alternatives you've considered

Do nothing; keep document replication as the default.

Additional context

Segment replication is the only option when using remote store and is being used by remote store-based offerings of Amazon OpenSearch Service. However, the node-to-node variant of segment replication likely has less real world production usage. Some things to consider before making this change are:

  1. Segment copying from the primary can become a bottleneck with large numbers of replicas configured
  2. We'd need a thorough audit to make sure all integration tests are running with segment replication enabled
  3. The semantics of wait_for=refresh are weird today. With document replication that means your write will be searchable after the index call completes. With segment replication it means the write will be searchable on the primary shard only, not replicas.
@andrross andrross added enhancement Enhancement or improvement to existing feature or request untriaged labels Jan 28, 2025
@github-actions github-actions bot added the Indexing:Replication Issues and PRs related to core replication framework eg segrep label Jan 28, 2025
@Bukhtawar
Copy link
Collaborator

Thanks @andrross

Segment copying from the primary can become a bottleneck with large numbers of replicas configured

The default number of replica is one, so the default option should work seamlessly

We'd need a thorough audit to make sure all integration tests are running with segment replication enabled

+1

The semantics of wait_for=refresh are weird today. With document replication that means your write will be searchable after the index call completes. With segment replication it means the write will be searchable on the primary shard only, not replicas.

The wait_for=refresh is something that is an explicit option used today, would it be fine to document that this option has a semantic change, that causes refreshes with wait(read after write consistency) to apply exclusively to primary in 3.0?

@andrross
Copy link
Member Author

The wait_for=refresh is something that is an explicit option used today, would it be fine to document that this option has a semantic change, that causes refreshes with wait(read after write consistency) to apply exclusively to primary in 3.0?

@Bukhtawar We definitely can change the semantics of this option. The question is really whether this is good enough. There likely are use cases where its necessary to know all primaries and replicas are up-to-date with respect to a specific write, and today we don't really have any mechanism to do that with segment replication.

@sachinpkale
Copy link
Member

@andrross @mch2 @Bukhtawar A bit tangential to this discussion but are we also thinking of removing the mixed mode (SegRep + DocRep) cluster from 3.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing:Replication Issues and PRs related to core replication framework eg segrep untriaged
Projects
None yet
Development

No branches or pull requests

3 participants