Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to override the user-agent #4015

Open
gregschohn opened this issue Jan 24, 2024 · 2 comments
Open

Allow users to override the user-agent #4015

gregschohn opened this issue Jan 24, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@gregschohn
Copy link

Is your feature request related to a problem? Please describe.

We need to differentiate requests from an instance of Data Prepper that our solution is using and from the rest of a cluster's clients.

To migrate data, our solution uses does a bulk move of data from a source cluster to a target cluster. Independently, individual requests are recorded from the source cluster and replayed to the target to both keep the target cluster in sync and also to compare the behavior of the two clusters.

When we capture traffic, depending on the order that a customer chooses to perform each step, there may be overlap with the Data Prepper requests to the source. We'd like to be able to mask out those requests from our replay. Those would at the very least, create more noisy data for users and could cause confusion as they would see updates replayed on already existing data that was migrated with Data Prepper. Allowing the customer/us to set a unique value that we can easily filter on the capture side would eliminate this problem and be more more efficient (much lower costs).

Describe the solution you'd like
I'd like to have a command line flag to set the user-agent HTTP header for all requests that Data Prepper sends. A default value of something different than the ES/OS user-agent may be beneficial too.

Describe alternatives you've considered (Optional)
Other HTTP header values could work too, but user-agent seems like it could be the most natural and easy to explain one. For our greater solution, dealing with the duplicate data better is possible, but it is 1) considerable effort to mitigate, 2) still will be expensive as we aren't able to remove the data passively.

Additional context
N/A

@dlvenable
Copy link
Member

@gregschohn , Thanks for this request. Do you want this configurable for both the opensearch sink and the opensearch source?

Do you have a proposal on how the user would configure this?

@dlvenable dlvenable added enhancement New feature or request and removed untriaged labels Jan 30, 2024
@gregschohn
Copy link
Author

A command line argument or a setting in the pipeline file would work (so would an environment variable, but that seems like it wouldn't be the best experience for users in general). We'll want the same user-agent for all requests, so a static value loaded once is fine.

Our needs at this time are just for the source - so you can use one user-agent configuration for both or separate ones. We don't have an opinion on that detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

2 participants