-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenSearch Bulk API Source #248
Comments
I would like to work on this issue. Could you please assign this to me? |
For the first milestone, we are going to support the OpenSearch Bulk API Index action. All other actions like create, update and delete will be available in later milestones. |
Thanks @sb2k16 for picking this up. This is of interest to me working on OpenSearch UBI. I don't want to restrict where UBI events and queries are indexed because there can be valid reasons for wanting to store those items on a different OpenSearch instance (different meaning different from where the query was done). Allowing the user to specify an OpenSearch API-compatible endpoint to receive that data would allow UBI to store data in any instance of OpenSearch with minimal overhead. The Bulk API will be helpful because the UBI OpenSearch module can use that endpoint directly to send data to another instance of OpenSearch via Data Prepper. Additionally, using Data Prepper is valuable because of the flexibility it gives the user. I hope that gives some insight into one use-case for this feature request. If it would be helpful to chat more about it please let me know. |
Completed in #5024. |
This is awesome! Thanks everyone! |
Summary
This creates a new Data Prepper source which accepts data in the form of the OpenSearch Bulk API.
Configuration
Operations
The
_bulk
API supports:index
create
update
delete
This source can do something similar to what the
dynamodb
source does. Specifically it should include theopensearch_action
metadata.Sample
The above request is the simplest case since it is an
index
request.It creates an Event with data such as:
Additionally, the event will need metadata that we can use in the
opensearch
sink.Query parameters
The
_bulk
API supports a few query parameters. The source should also support most of these and provide some of them as metadata.pipeline
-> Sets metadata:opensearch_pipeline
routing
-> Sets metadata:opensearch_routing
timeout
-> Configures an alternate timeout for the request in the source. This probably doesn't need to be provided downstream.Some other parameters that we may wish to support:
refresh
require_alias
wait_for_active_shards
Finally, we should not support these parameters as they are being deprecated.
type
Response
Being able to provide the
_bulk
API response may be more challenging. There are a few reasons:An initial version could provide responses that either have empty values (where appropriate) or use synthetic values.
The text was updated successfully, but these errors were encountered: