Http chunking fixes #4823

kkondaka · 2024-08-10T23:14:03Z

Description

HTTP chunking has couple of issues

The multi-byte fix added in #PR4656 needs also use the same unicode length in another place
The code puts at least one message(or chunk) in each of the lists. But it is possible for the chunk itself can be larger than 1MB. To address this issue, a new "optimal size" is added to the buffer. In case of kafka, optimal size is set to 1MB and max size is set to 4MB.

Issues Resolved

Resolves #[Issue number to be closed when this PR is merged]

Check List

New functionality includes testing.
New functionality has a documentation issue. Please link to it in this PR.
- New functionality has javadoc added
[ X] Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Krishna Kondaka <krishkdk@dev-dsk-krishkdk-2c-bd29c437.us-west-2.amazon.com>

dlvenable

Overall, I think this is the right solution. I have a few comments on some of the details.

dlvenable · 2024-08-12T17:03:55Z

...ain/java/org/opensearch/dataprepper/plugins/kafka/configuration/KafkaProducerProperties.java

    @JsonProperty("max_request_size")
-    private int maxRequestSize = DEFAULT_MAX_REQUEST_SIZE;
+    private int maxRequestSize = 4*DEFAULT_MAX_REQUEST_SIZE;


I think we should keep the max_request_size the same. There are a few reasons: 1) This change is going beyond Kafka recommendations (of 1MB). 2) This does change the existing behavior because it affects the topics on Kafka. 3) This may affect users without them clearly understanding the implications.

dlvenable · 2024-08-12T17:04:39Z

...kafka-plugins/src/main/java/org/opensearch/dataprepper/plugins/kafka/buffer/KafkaBuffer.java

@@ -96,6 +96,11 @@ public Optional<Integer> getMaxRequestSize() {
        return Optional.of(producer.getMaxRequestSize());
    }

+    @Override
+    public Optional<Integer> getOptimalRequestSize() {
+        return Optional.of(producer.getMaxRequestSize() / 4);


Suggested change

return Optional.of(producer.getMaxRequestSize() / 4);

return Optional.of(ONE_MEGABYTE);

Can we just make this value equal to 1MB? I think this is really what we are aiming for.

I changed it to DEFAULT MAX REQUEST SIZE. not dividing with 4 anymore.

dlvenable · 2024-08-12T17:06:24Z

...p-source/src/main/java/org/opensearch/dataprepper/plugins/source/loghttp/LogHTTPService.java

@@ -101,7 +101,7 @@ private HttpResponse processRequest(final AggregatedHttpRequest aggregatedHttpRe
        List<List<String>> jsonList;

        try {
-            jsonList = (maxRequestLength == null) ? jsonCodec.parse(content) : jsonCodec.parse(content, maxRequestLength - SERIALIZATION_OVERHEAD);
+            jsonList = (maxRequestLength == null) ? jsonCodec.parse(content) : jsonCodec.parse(content, buffer.getOptimalRequestSize().get() - SERIALIZATION_OVERHEAD);


There are two small problems here that you can fix.

buffer.getOptimalRequestSize().get()

buffer.getOptimalRequestSize() may return null, leading to NPE on .get()

buffer.getOptimalRequestSize() may return empty, leading to a NoSuchElementException on .get().

I don't think you need an NPE check actually. Just check that the optional is present.

dlvenable · 2024-08-12T17:09:46Z

...lugins/http-source-common/src/main/java/org/opensearch/dataprepper/http/codec/JsonCodec.java

+            if (size + nextRecordLength > maxSize) {
+                // It is possible that the first record is larger than maxSize, then
+                // innerJsonList size would be zero.
+                if (innerJsonList.size() > 0) {


Please add a unit test case for this condition.

dlvenable · 2024-08-12T17:13:59Z

data-prepper-core/src/main/java/org/opensearch/dataprepper/parser/MultiBufferDecorator.java

@@ -50,6 +50,12 @@ public Optional<Integer> getMaxRequestSize() {
        return  maxRequestSize.isPresent()  ? Optional.of(maxRequestSize.getAsInt()) : Optional.empty();
    }

+    @Override
+    public Optional<Integer> getOptimalRequestSize() {
+        OptionalInt optimalRequestSize = allBuffers.stream().filter(b -> b.getOptimalRequestSize().isPresent()).mapToInt(b -> (Integer)b.getOptimalRequestSize().get()).min();


Please add some unit test cases. We should be sure to test both when this value is present and not present.

Signed-off-by: Krishna Kondaka <krishkdk@dev-dsk-krishkdk-2c-bd29c437.us-west-2.amazon.com>

dlvenable · 2024-08-13T18:29:17Z

...ns/http-source-common/src/test/java/org/opensearch/dataprepper/http/codec/JsonCodecTest.java

        final int knownSingleBodySize = knownFirstPart.getBytes(Charset.defaultCharset()).length;
-        final int maxSize = (knownSingleBodySize * 2) + 3;
+        //final int maxSize = (knownSingleBodySize * 2) + 3;


Please remove commented code.

dlvenable · 2024-08-13T18:32:20Z

...p-source/src/main/java/org/opensearch/dataprepper/plugins/source/loghttp/LogHTTPService.java

        requestsReceivedCounter = pluginMetrics.counter(REQUESTS_RECEIVED);
        successRequestsCounter = pluginMetrics.counter(SUCCESS_REQUESTS);
+        requestsOverOptimalSizeCounter = pluginMetrics.counter(REQUESTS_OVER_OPTIMAL_SIZE);


Rather than have a metric to count these scenarios (and there may be others as time goes on), a distribution summary of the size each payload would tell us all we need.

I do not think so. Definitely the exception case we want a different counter. If you want distribution summary for all the sizes < 4MB, I am OK.

BTW, there is already DistributionSummary for payload. These are counters for special cases.

dlvenable · 2024-08-13T18:39:35Z

...ns/http-source-common/src/test/java/org/opensearch/dataprepper/http/codec/JsonCodecTest.java

+
+        for (int i = 0; i < expectedChunks.size(); i++) {
+            final String reconstructed = chunkedBodies.get(i).stream().collect(Collectors.joining(",", "[", "]"));
+            if (exceedsMaxSize.get(i)) {


Might it be simpler to provide the expected size and then do a single assertThat(..., equalTo(expectedSize))?

I didn't want to be so precise.

dlvenable · 2024-08-13T18:43:08Z

...lugins/http-source-common/src/main/java/org/opensearch/dataprepper/http/codec/JsonCodec.java

-        for (final Map<String, Object> log: logList) {
-            final String recordString = mapper.writeValueAsString(log);
-            int nextRecordLength = recordString.getBytes(Charset.defaultCharset()).length;
+        String recordString = mapper.writeValueAsString(logList.get(0));


I'm not sure I follow this change from your previous commit. Why would index 0 be a special case?

I can go back to previous commit, I thought this way, I can avoid the check inside the loop. I am OK, either way

I found the previous commit to be clearer.

dlvenable · 2024-08-13T18:45:07Z

...p-source/src/main/java/org/opensearch/dataprepper/plugins/source/loghttp/LogHTTPService.java

        }
        buffer.writeBytes(sb.toString().getBytes(), key, bufferWriteTimeoutInMillis);
    }

    private HttpResponse processRequest(final AggregatedHttpRequest aggregatedHttpRequest) throws Exception {
        final HttpData content = aggregatedHttpRequest.content();
        List<List<String>> jsonList;
+        boolean jsonListSplitSuccess = false;


Is "success" the right word here? I had some difficulty understanding this code at first. I think you want to ask is "is the JSON list actually split?".

Perhaps: isJsonListSplit?

dlvenable · 2024-08-13T18:48:17Z

...ns/http-source-common/src/test/java/org/opensearch/dataprepper/http/codec/JsonCodecTest.java

@@ -103,13 +113,49 @@ public void testParseNonJsonFailure() {
    static class JsonArrayWithKnownFirstArgumentsProvider implements ArgumentsProvider {
        @Override
        public Stream<? extends Arguments> provideArguments(ExtensionContext extensionContext) throws Exception {
+            // First test, all chunks smaller than maxSize, but output has 3 lists


This test is getting quite hard to understand now.

I'm not sure I have a great suggestion. The first suggestion I have is that perhaps you can keep the existing tests, but add a new test case as well?

Only two existing tests and that test is already kept as the first test here. The existing test doesn't do proper validations. I can add more comments if that helps.

Signed-off-by: Krishna Kondaka <krishkdk@dev-dsk-krishkdk-2c-bd29c437.us-west-2.amazon.com>

dlvenable

Thanks for working this!

kkondaka requested review from chenqi0805, engechas, graytaylor0, dinujoh, KarstenSchnitter, dlvenable and oeyh as code owners August 10, 2024 23:14

kkondaka and others added 4 commits August 11, 2024 01:56

dplive1.yaml

4936755

Signed-off-by: Krishna Kondaka <krishkdk@dev-dsk-krishkdk-2c-bd29c437.us-west-2.amazon.com>

Delete .github/workflows/static.yml

3937575

Signed-off-by: Krishna Kondaka <krishkdk@dev-dsk-krishkdk-2c-bd29c437.us-west-2.amazon.com>

Fix http message chunking bug

41637c2

Signed-off-by: Krishna Kondaka <krishkdk@dev-dsk-krishkdk-2c-bd29c437.us-west-2.amazon.com>

Modified tests to test for chunks correctly

ca54d5e

Signed-off-by: Krishna Kondaka <krishkdk@dev-dsk-krishkdk-2c-bd29c437.us-west-2.amazon.com>

kkondaka force-pushed the http-chunking-fix branch from 8972ed8 to ca54d5e Compare August 12, 2024 05:48

Added comments

78cf1a9

Signed-off-by: Krishna Kondaka <krishkdk@dev-dsk-krishkdk-2c-bd29c437.us-west-2.amazon.com>

dlvenable requested changes Aug 12, 2024

View reviewed changes

Krishna Kondaka added 4 commits August 12, 2024 18:38

Addressed offline review comments

4cdfb43

Signed-off-by: Krishna Kondaka <krishkdk@dev-dsk-krishkdk-2c-bd29c437.us-west-2.amazon.com>

Addressed review comments

3a1df54

Signed-off-by: Krishna Kondaka <krishkdk@dev-dsk-krishkdk-2c-bd29c437.us-west-2.amazon.com>

Added tests

359c2dd

Signed-off-by: Krishna Kondaka <krishkdk@dev-dsk-krishkdk-2c-bd29c437.us-west-2.amazon.com>

Added tests

97c1354

Signed-off-by: Krishna Kondaka <krishkdk@dev-dsk-krishkdk-2c-bd29c437.us-west-2.amazon.com>

dlvenable requested changes Aug 13, 2024

View reviewed changes

Addressed review comments

06369c6

Signed-off-by: Krishna Kondaka <krishkdk@dev-dsk-krishkdk-2c-bd29c437.us-west-2.amazon.com>

dlvenable approved these changes Aug 14, 2024

View reviewed changes

graytaylor0 approved these changes Aug 14, 2024

View reviewed changes

kkondaka merged commit 1bfed0d into opensearch-project:main Aug 14, 2024
48 of 50 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Http chunking fixes #4823

Http chunking fixes #4823

kkondaka commented Aug 10, 2024

dlvenable left a comment

dlvenable Aug 12, 2024

dlvenable Aug 12, 2024

kkondaka Aug 12, 2024

dlvenable Aug 12, 2024

dlvenable Aug 12, 2024

dlvenable Aug 12, 2024

dlvenable Aug 12, 2024

dlvenable Aug 13, 2024

dlvenable Aug 13, 2024

kkondaka Aug 13, 2024

kkondaka Aug 13, 2024

dlvenable Aug 13, 2024

kkondaka Aug 13, 2024

dlvenable Aug 13, 2024

kkondaka Aug 13, 2024

dlvenable Aug 14, 2024

dlvenable Aug 13, 2024

dlvenable Aug 13, 2024

kkondaka Aug 13, 2024

dlvenable left a comment

	return Optional.of(producer.getMaxRequestSize() / 4);
	return Optional.of(ONE_MEGABYTE);

Http chunking fixes #4823

Http chunking fixes #4823

Conversation

kkondaka commented Aug 10, 2024

Description

Issues Resolved

Check List

dlvenable left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dlvenable left a comment

Choose a reason for hiding this comment