You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
My Data Prepper pipeline seems to be sending much larger requests that the bulk_size configured in the pipeline. Here's the relevant snippet of my pipeline configuration:
Running Data Prepper with this pipeline results in an exception that suggests that the size of the bulk request is much larger that what is configured:
...
WARN org.opensearch.dataprepper.plugins.sink.opensearch.BulkRetryStrategy - Bulk Operation Failed. Number of retries 5. Retrying...
org.opensearch.client.ResponseException: method [POST], host [<redacted>], URI [/_bulk], status line [HTTP/1.1 413 Request Entity Too Large]
{"Message":"Request size exceeded 10485760 bytes"}
at org.opensearch.client.RestClient.convertResponse(RestClient.java:375) ~[opensearch-rest-client-2.7.0.jar:?]
at org.opensearch.client.RestClient.performRequest(RestClient.java:345) ~[opensearch-rest-client-2.7.0.jar:?]
at org.opensearch.client.RestClient.performRequest(RestClient.java:320) ~[opensearch-rest-client-2.7.0.jar:?]
at org.opensearch.client.transport.rest_client.RestClientTransport.performRequest(RestClientTransport.java:143) ~[opensearch-java-2.5.0.jar:?]
at org.opensearch.client.opensearch.OpenSearchClient.bulk(OpenSearchClient.java:217) ~[opensearch-java-2.5.0.jar:?]
at org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink.lambda$doInitializeInternal$1(OpenSearchSink.java:202) ~[opensearch-2.4.0-SNAPSHOT.jar:?]
at org.opensearch.dataprepper.plugins.sink.opensearch.BulkRetryStrategy.handleRetry(BulkRetryStrategy.java:267) ~[opensearch-2.4.0-SNAPSHOT.jar:?]
at org.opensearch.dataprepper.plugins.sink.opensearch.BulkRetryStrategy.execute(BulkRetryStrategy.java:191) ~[opensearch-2.4.0-SNAPSHOT.jar:?]
at org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink.lambda$flushBatch$6(OpenSearchSink.java:319) ~[opensearch-2.4.0-SNAPSHOT.jar:?]
at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:141) ~[micrometer-core-1.10.5.jar:1.10.5]
at org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink.flushBatch(OpenSearchSink.java:316) ~[opensearch-2.4.0-SNAPSHOT.jar:?]
at org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink.doOutput(OpenSearchSink.java:288) ~[opensearch-2.4.0-SNAPSHOT.jar:?]
at org.opensearch.dataprepper.model.sink.AbstractSink.lambda$output$0(AbstractSink.java:64) ~[data-prepper-api-2.4.0-SNAPSHOT.jar:?]
at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:141) ~[micrometer-core-1.10.5.jar:1.10.5]
at org.opensearch.dataprepper.model.sink.AbstractSink.output(AbstractSink.java:64) ~[data-prepper-api-2.4.0-SNAPSHOT.jar:?]
at org.opensearch.dataprepper.pipeline.Pipeline.lambda$publishToSinks$5(Pipeline.java:336) ~[data-prepper-core-2.4.0-SNAPSHOT.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
We recently changed the bulk_size estimation logic to factor in compression to solve #2852. The estimation logic is crude in the current state and needs some iteration to be more accurate. I am planning to pick up this work soon.
I am surprised to see the estimation off by more than a factor of 2.5. I have seen the estimation within 1-1.5x the bulk_size in my testing. Could you provide more details about the sink you were using and the workload?
Describe the bug
My Data Prepper pipeline seems to be sending much larger requests that the
bulk_size
configured in the pipeline. Here's the relevant snippet of my pipeline configuration:Running Data Prepper with this pipeline results in an exception that suggests that the size of the
bulk
request is much larger that what is configured:This seems to be the opposite of #2852
To Reproduce
See description above
Expected behavior
Size of the bulk request should be equal or under to the configured
bulk_size
value.Screenshots
N/A
Environment (please complete the following information):
Additional context
N/A
The text was updated successfully, but these errors were encountered: