Branch 2.8.bkbase.3 #1

golden-yang · 2023-05-16T07:41:52Z

(If this PR fixes a github issue, please add Fixes #<xyz>.)

Fixes #

(or if this PR is one task of a github issue, please add Master Issue: #<xyz> to link to the master issue.)

Master Issue: #

Motivation

Explain here the context, and why you're making that change. What is the problem you're trying to solve.

Modifications

Describe the modifications you've done.

Verifying this change

Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end deployment with large payloads (10MB)
Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API: (yes / no)
The schema: (yes / no / don't know)
The default values of configurations: (yes / no)
The wire protocol: (yes / no)
The rest endpoints: (yes / no)
The admin cli options: (yes / no)
Anything that affects deployment: (yes / no / don't know)

Documentation

Check the box below and label this PR (if you have committer privilege).

Need to update docs?

doc-required

(If you need help on updating docs, create a doc issue)
no-need-doc

(Please explain why)
doc

(If this PR contains doc changes)

(cherry picked from commit 07ef923)

(cherry picked from commit 02b8de0)

apache#13237) ### Motivation It should return `null` instead of `RestException` in method `NamespacesBase#internalGetPublishRate`, because `null` means that the `publish-rate` is not configured. It is the same as `internalGetSubscriptionDispatchRate` as below: https://github.com/apache/pulsar/blob/6d9d24d50db5418ddbb845d2c7a2be2b9ac72893/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/NamespacesBase.java#L1303-L1308 (cherry picked from commit 3e55b4f)

…ache#13324) ### Motivation When I build C++ tests on my local env, the following error happened. ``` tests/VersionTest.cc:19:10: fatal error: 'pulsar/Version.h' file not found #include <pulsar/Version.h> ``` It's because I specified another directory as CMake directory. ```bash mkdir _builds && cd _builds && cmake .. ``` After apache#12769, the `Version.h` is generated under `${CMAKE_BINARY_DIR}/include/pulsar` directory but it's not included in `CMakeLists.txt`. CI works well because it's built in the default CMake directory so that `CMAKE_BINARY_DIR` is the same with `CMAKE_SOURCE_DIR`, which is included. ### Modifications Add the `${CMAKE_BINARY_DIR}/include` to `included_directories`. (cherry picked from commit ca37e67)

…apache#13332) ### Motivation When we use a multi-topic reader, the `hasMessageAvailable` method might have the wrong behavior, since the multi-topics consumer receives all messages from the single-topic consumer, the single-topic consumer `hasMessageAvailable` might always be `false` (The lastDequeuedMessageId reach to the end of the queue, all message enqueue to multi-topic consumer's `incomingMessages` queue). We should check the multi-topics consumer `incomingMessages` size > 0 when calling `hasMessageAvailable `. ### Modifications 1. Add a check of `incomingMessages` size > 0 2. Add units test `testHasMessageAvailableAsync` to verify the behavior. (cherry picked from commit 6c7dcc0)

…he#13428) - closing ServerCnx while producers or consumers are created can lead to a producer or consumer never getting removed from the topic's list of producers (cherry picked from commit 3316db5)

(cherry picked from commit 76f3566)

--- *Motivation* The `namespaceEventsSystemTopicFactory` is created when you will use it. But the `createSystemTopicFactoryIfNeeded()` may failed which will cause the `namespaceEventsSystemTopicFactory` is null and throw a NPE error from the method. *Modifications* - throw the error and failed the method when there has exceptions in `createSystemTopicFactoryIfNeeded()` (cherry picked from commit 4022b28)

* add python3.9 on manylinux2014 build support * add python3.9 client build CI * fix python3.9 client build CI Co-authored-by: livio <livio.bencik@mindsmiths.com> (cherry picked from commit 825187f)

Co-authored-by: Ali Ahmed <alia@splunk.com> (cherry picked from commit 7c219b1)

(cherry picked from commit 636d5b4)

### Motivation The time unit in this exception message is ns, which is not very readable. We can change it from ns to ms. ``` org.apache.pulsar.client.api.PulsarClientException$TimeoutException: The producer xxx can not send message to the topic xxx within given timeout : createdAt 461913074 ns ago, firstSentAt 29545553038276935 ns ago, lastSentAt 29545553038276935 ns ago, retryCount 0 at org.apache.pulsar.client.api.PulsarClientException.unwrap(PulsarClientException.java:916) at org.apache.pulsar.client.impl.TypedMessageBuilderImpl.send(TypedMessageBuilderImpl.java:93) at org.apache.pulsar.client.impl.ProducerBase.send(ProducerBase.java:63) at com.yum.boh.oh.service.impl.StoreOrderPostServiceImpl.generalProcessing(StoreOrderPostServiceImpl.java:272) at com.yum.boh.oh.service.impl.StoreOrderPostServiceImpl.saveThirdOrder(StoreOrderPostServiceImpl.java:72) at com.yum.boh.oh.controller.StoreOrderController.postOrderInfo$original$T8425mfx(StoreOrderController.java:39) at com.yum.boh.oh.controller.StoreOrderController.postOrderInfo$original$T8425mfx$accessor$vJljNzML(StoreOrderController.java) at com.yum.boh.oh.controller.StoreOrderController$auxiliary$nysalhgy.call(Unknown Source) at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:86) ``` ### Modifications Change the time units from ns to ms for ProducerImpl#OpSendMsg. (cherry picked from commit 891660e)

…13194) (apache#13249) Fixes apache#13194 ### Motivation https://github.com/apache/pulsar/blob/38fb839154462fc5c6b0b4293f02762ed4021cd9/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BacklogQuotaManager.java#L200-L219 BacklogQuotaManager.dropBacklogForTimeLimit may fall into dead loop in some conditions, e.g. `backlogQuotaDefaultLimitSecond` is enabled 1. producer stop produce after produced some messages, current ledger is A 2. times up, triggered ledger rollover, a new ledger B created which is empty (no entries) 3. now lastConfirmedEntry is `A:last-entry-id` 4. after `backlogQuotaDefaultLimitSecond` times up, it'll reset cursor to position `A:last-entry-id+1` which is only valid, so loop begin until the producer resume produce ### Modifications Record the previous slowestReaderPosition, if it is same with newer slowestReaderPosition after `resetCursor`, then exit loop. (cherry picked from commit 021409b)

…3454) ### Motivation When a large message is sent by chunks, each chunk needs to reserve a spot of the semaphore. However, when it failed, the already reserved memory from limiter and spots from semaphore are not released. ### Modifications - Release the semaphore and memory when `canEnqueueRequest` returns false for chunks. - Add `testChunksEnqueueFailed` to cover this case. It sends a large message whose number of chunks is greater than the `maxPendingMessages` so that the first time `canEnqueueRequest` returns true while the following `canEnqueueRequest` calls will return false. (cherry picked from commit 2e2cd57)

### Motivation Currently, if the last confirmed entry is an empty ledger, getting the last message-id operation will get data from the compact ledger, if the compact ledger is also an empty ledger, it will encounter `IncorrectParameterException`. **Broker error message** ``` [pulsar-io-29-9] ERROR org.apache.bookkeeper.client.LedgerHandle - IncorrectParameterException on ledgerId:617 firstEntry:-1 lastEntry:-1 ``` **Client error log** ``` Exception in thread "main" org.apache.pulsar.client.api.PulsarClientException$BrokerMetadataException: The subscription reader-bf9246cfcb of the topic persistent://public/ns-test/t1 gets the last message id was failed {"errorMsg":"Failed to read last entry of the compacted Ledger Incorrect parameter input","reqId":79405902881798690, "remote":"localhost/127.0.0.1:6650", "local":"/127.0.0.1:55207"} at org.apache.pulsar.client.api.PulsarClientException.unwrap(PulsarClientException.java:1034) at org.apache.pulsar.client.impl.ConsumerImpl.hasMessageAvailable(ConsumerImpl.java:2001) at org.apache.pulsar.client.impl.ReaderImpl.hasMessageAvailable(ReaderImpl.java:181) at org.apache.pulsar.compaction.CompactedTopicTest.main(CompactedTopicTest.java:730) ``` ### Modifications Check the compact ledger entry id before reading an entry from the compact ledger, if there is no entry, return a null value. (cherry picked from commit 8136762)

(cherry picked from commit 1235162)

Fixes apache#13214 When client create producer failed caused by connection failed, topic terminated, or produce fenced. There are some resources that are not released in the client. When creating producer failed. 1. stop the sendTimout task 2. cancel the batchTimerTask 3. cancel the keyGeneratorTask 4. cancel the statTimeout task (cherry picked from commit 57eccf4)

…side (apache#13536) ### Motivation Currently Pulsar consumer allocates memory from direct memory via `Unpooled.directBuffer` directly, which doesn't make use of the widely used allocator in Pulsar. ### Modifications Use `PulsarByteBufAllocator` as the memory allocator for chunks buffer. (cherry picked from commit a5d3473)

(cherry picked from commit 1ea4ad8)

…each the end of the topic (apache#13533) ### Motivation The problem happens when the compaction cursor reaches the end of the topic but the tail messages of the topic have been removed by producer writes null value messages during the topic compaction. For example: - 5 messages in the original topic with key: 0,1,2,3,4 - the corresponding message IDs are: 1:0, 1:1, 1:2, 1:3, 1:4 - producer send null value messages for key 3 and 4 - trigger the topic compaction task After the compaction task complete, - 5 messages in the original topic: 1:0, 1:1, 1:2, 1:3, 1:4 - 3 messages in the compacted ledger: 1:0, 1:1, 1:2 At this moment, if the reader tries to get the last message ID of the topic, we should return `1:2` not `1:4`, because the reader is not able to read the message with keys `3` and `4` from the compacted topic, otherwise, the `reader.readNext()` method will be blocked until a new message written to the topic. ### Modifications The fix is straightforward, when the broker receives a get last message ID request, the broker will check if the compaction cursor reaches the end of the original topic. If yes, respond last message ID from the compacted ledger. ### Verifying this change New test added `testHasMessageAvailableWithNullValueMessage` which ensure the `hasMessageAvailable()` return false no more messages from the compacted topic if the compaction cursor reaches the end of the topic. (cherry picked from commit d49415e)

- see https://logging.apache.org/log4j/2.x/security.html - mitigates CVE-2021-44832 (cherry picked from commit 978bb7c)

…he#13477) Motivation Currently, if the roles in the token are empty, then he `MultiRolesTokenAuthorizationProvider` will have problems processing it. It will keep waiting for an empty list of futures. Eventually causing the operation to time out. Modification * In `MultiRolesTokenAuthorizationProvider.authorize`, return false immediately when the roles are empty. Signed-off-by: Zike Yang <zkyang@streamnative.io> (cherry picked from commit 4f942d7)

…che#13501) ### Motivation It's ``AddtionalServlet`` side change, like apache#11270 ### Modifications Change context class loader through Thread.currentThread().setContextClassLoader(classLoader) before every protocol handler callback, and change it back to original class loader afterwards. (cherry picked from commit 40ac793)

Fixes apache#13570 (cherry picked from commit de99c2c)

* Fix NPE when unloading namespace bundle (cherry picked from commit ec0a440)

…13622) (cherry picked from commit 1f10281)

* Upgrade gson version 2.8.6 to 2.8.9 * fix license (cherry picked from commit 937131a)

* update log content * update log content * check style (cherry picked from commit 825d2fe)

* Fix time field use error * rebuid Co-authored-by: liu.changqing <changqing_l@kingdee.com> (cherry picked from commit 793dd91)

(cherry picked from commit fd3ba55)

…eation failure (apache#15570) * Fix deadlock in broker after race condition in topic creation failure * Fixed checkstyle

…che#15581) (cherry picked from commit a92a801)

…che#15646) - release notes https://netty.io/news/2022/05/06/2-1-77-Final.html - improves Alpine / musl compatibility - could help issues such as apache#14534 apache#11415 apache#11224 apache#10798 - improves shading compatibility - fixes a bug related to the native epoll transport and epoll_pwait2 (cherry picked from commit a8045fc)

…o 5.0.1) (apache#13810) (cherry picked from commit b0c7259)

…io/batch-data-generator and Jcloud offloader (apache#14150) (cherry picked from commit 25ccf45)

…O, Offloaders and Pulsar SQL - Bump Guice to 5.1.0 (apache#14300) * Remove --illegal-access errors resulting from Google Guice - Batch Data Generator connector * and jcloud-shaded * use dependencyManagement * fix pulsar-sql (cherry picked from commit 332eca8)

(cherry picked from commit d1f4e19)

…ete. (apache#14818) If configuration ``patternAutoDiscoveryPeriod`` is small, there may be some unnecessary subscribe requests. *Describe the modifications you've done.* (cherry picked from commit 0fe921f)

…pache#15161) Fixes apache#15078 When a partitioned producer is created and some of the partitioned failed to create, `closeAsync` will be called immediately, even if other partitions were still in progress of creating the associated single producers. Since `closeAsync` is called before calling `setFailed` on the `partitionedProducerCreatedPromise_` field, there is a race condition that all single producers are closed before the promise is set. Then the promise will be set with `ResultUnknownError`, see https://github.com/apache/pulsar/blob/4aeeed5dab9dfe9493526f36d539b3ef29cf6fe5/pulsar-client-cpp/lib/PartitionedProducerImpl.cc#L317. Only after all single producers failed or succeeded then call `closeAsync` if one of them failed. And ensure `partitionedProducerCreatedPromise_` was completed before calling `closeAsync`. This PR also makes the state of a partitioned producer atomic because using a mutex to protect it makes code hard to write. Create a separate namespace `public/test-backlog-quotas` to test the case when the backlog quota exceeds. Then add `testBacklogQuotasExceeded` test to make some backlog via creating a consumer and sending some messages to a partition of the topic. In this test, only 1 partition has backlog and will fail with the related error. So the test verifies that `createProducer` could return a correct error instead of `ResultUnknownError`. (cherry picked from commit 0f85596)

…e#14742) Fixes apache#14459 See the issue 1. Add WebExecutorStats to record web thread pool metrics (cherry picked from commit 32d7a51)

(cherry picked from commit 6b004ed)

…ersistence correctly (apache#15206) In the current implementation, When the first time execute `purgeInactiveProducers`, Although the produces does not expire, it removed directly from the collection(464 line). The will result in these producers never being remove. https://github.com/apache/pulsar/blob/9861dfb1208c4b6b8a1f17ef026e9af71c3e784c/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/MessageDeduplication.java#L454-L472 1. It is removed from the collection only when the producer is inactive. 2. Take a snapshot after each removal of an inactive producer. When `managedLedger.getLastConfirmedEntry` equals `managedCursor.getMarkDeletedPosition()`, The`deduplication-snapshot-monitor` thread does not trigger a snapshot. The persistence these producers only the next time a message is produced, The can be confusing for users. ``` PositionImpl position = (PositionImpl) managedLedger.getLastConfirmedEntry(); if (position == null) { return; } PositionImpl markDeletedPosition = (PositionImpl) managedCursor.getMarkDeletedPosition(); if (markDeletedPosition != null && position.compareTo(markDeletedPosition) <= 0) { return; } ``` (cherry picked from commit 8e1ca48)

…erly since writeAndFlush is asynchronous (apache#15384) (cherry picked from commit cd3816a)

…expires (apache#15403) (cherry picked from commit 40d7169)

…pache#15415) * [Proxy] Remove unnecessary blocking DNS lookup in LookupProxyHandler * Use existing code pattern for creating address (cherry picked from commit 7373a51)

…#15340) * Add config of IO and acceptor threads in proxy * Update doc * Update site2/docs/reference-configuration.md Co-authored-by: Anonymitaet <50226895+Anonymitaet@users.noreply.github.com> * Update site2/docs/reference-configuration.md Co-authored-by: Anonymitaet <50226895+Anonymitaet@users.noreply.github.com> (cherry picked from commit da3f017)

…icated message id (apache#15465) (cherry picked from commit f6faeec)

…orizationProvider (apache#14857) ### Motivation Currently, `MultiRolesTokenAuthorizationProvider` doesn't support handling the single string type role. It will return the empty role in that case. This PR adds support for handling the string-type role. This PR also adds support for handling the non-jwt-token. ### Modifications * Add support for handling the string-type role * Add support for handling the non-jwt-token ### Verifying this change This change is already covered by existing tests, such as *testMultiRolesAuthzWithSingleRole*. (cherry picked from commit 8bf6785)

…VE-2022-26612 (apache#15660) (cherry picked from commit d926582)

Technoboy- and others added 30 commits December 23, 2021 22:37

Close old compacted ledger when open new. (apache#13210)

3c253ea

(cherry picked from commit 07ef923)

[Broker] Optimize ManagedLedger Ledger Ownership Check (apache#13222)

c578829

(cherry picked from commit 02b8de0)

[Broker] Fix race conditions in closing producers and consumers (apac…

c941b2f

…he#13428) - closing ServerCnx while producers or consumers are created can lead to a producer or consumer never getting removed from the topic's list of producers (cherry picked from commit 3316db5)

Fix NPE in cmdTopics (apache#13450)

756ade0

(cherry picked from commit 76f3566)

[Issue 9888] add python3.9 on manylinux2014 build support (apache#10954)

7aac8b8

* add python3.9 on manylinux2014 build support * add python3.9 client build CI * fix python3.9 client build CI Co-authored-by: livio <livio.bencik@mindsmiths.com> (cherry picked from commit 825187f)

Add log error tracking for semaphore count leak (apache#12410)

4f87667

Co-authored-by: Ali Ahmed <alia@splunk.com> (cherry picked from commit 7c219b1)

Apply clang-format check for python wrapper (apache#13418)

1a1c886

(cherry picked from commit 636d5b4)

fix: bug when allAll bucket (apache#13467)

cac6942

(cherry picked from commit 1235162)

[Pulsar SQL] support protobuf/timestamp (apache#13287)

078c8db

(cherry picked from commit 1ea4ad8)

[Security] Upgrade Log4j to 2.17.1 (apache#13552)

3644311

- see https://logging.apache.org/log4j/2.x/security.html - mitigates CVE-2021-44832 (cherry picked from commit 978bb7c)

[Tests] Fix flakiness issue when spying ServerCnx (apache#13608)

5495037

Fixes apache#13570 (cherry picked from commit de99c2c)

Fix NPE when unloading namespace bundle (apache#13571)

bf88118

* Fix NPE when unloading namespace bundle (cherry picked from commit ec0a440)

[Tests] Upgrade Mockito to latest stable 3.x version, 3.12.4 (apache#…

b319ae8

…13622) (cherry picked from commit 1f10281)

Upgrade Gson version 2.8.6 to 2.8.9 (apache#13610)

fd956a1

* Upgrade gson version 2.8.6 to 2.8.9 * fix license (cherry picked from commit 937131a)

update log content (apache#13540)

e0a9cf1

* update log content * update log content * check style (cherry picked from commit 825d2fe)

Fix time field use error (apache#12249)

c75afb4

* Fix time field use error * rebuid Co-authored-by: liu.changqing <changqing_l@kingdee.com> (cherry picked from commit 793dd91)

Fix issues 11964, deadlock bug when use key_shared mode (apache#11965)

fc39643

(cherry picked from commit fd3ba55)

merlimat and others added 30 commits May 12, 2022 21:12

[fix][broker] Fix deadlock in broker after race condition in topic cr…

2356dd5

…eation failure (apache#15570) * Fix deadlock in broker after race condition in topic creation failure * Fixed checkstyle

[improve] Upgrade BookKeeper to 4.14.5 (2.8, 2.9, 2.10 branches) (apa…

525b0df

…che#15581) (cherry picked from commit a92a801)

Remove --illegal-access errors resulting from Google Guice (upgrade t…

529ff6d

…o 5.0.1) (apache#13810) (cherry picked from commit b0c7259)

Fix NoClassDefFoundError: com/google/inject/AbstractModule in pulsar-…

209e48f

…io/batch-data-generator and Jcloud offloader (apache#14150) (cherry picked from commit 25ccf45)

[improve][offloaders] Upgrade JClouds to 2.5.0 (apache#15649)

83f7a3f

(cherry picked from commit d1f4e19)

[improve][client] Avoid timertask run before previous subscribe compl…

fa227a0

…ete. (apache#14818) If configuration ``patternAutoDiscoveryPeriod`` is small, there may be some unnecessary subscribe requests. *Describe the modifications you've done.* (cherry picked from commit 0fe921f)

[enh][monitor]: add metrics for pulsar web service thread pool (apach…

f682e10

…e#14742) Fixes apache#14459 See the issue 1. Add WebExecutorStats to record web thread pool metrics (cherry picked from commit 32d7a51)

Pulsar SQL support for Decimal data type (apache#15153)

28970b7

(cherry picked from commit 6b004ed)

[Broker] Fix typo in enum name and handle closing of the channel prop…

a93a67b

…erly since writeAndFlush is asynchronous (apache#15384) (cherry picked from commit cd3816a)

[Proxy/Client] Fix DNS server denial-of-service issue when DNS entry …

3673973

…expires (apache#15403) (cherry picked from commit 40d7169)

[Proxy] Remove unnecessary blocking DNS lookup in LookupProxyHandler (a…

fd44570

…pache#15415) * [Proxy] Remove unnecessary blocking DNS lookup in LookupProxyHandler * Use existing code pattern for creating address (cherry picked from commit 7373a51)

[improve][client] improve logic when ACK grouping tracker checks dupl…

d91c6d1

…icated message id (apache#15465) (cherry picked from commit f6faeec)

Fix cherry-pick issue

c312bca

[fix][security] Tiered storage: Upgrade Hadoop to 3.3.3 to get rid of C…

730bc43

…VE-2022-26612 (apache#15660) (cherry picked from commit d926582)

fix pulsar worker jetty bug

0e74066

change openjdk to konaJDK

fae1f2f

change openjdk to konaJDK

15fb557

change version to 2.8.3.bkbase.2

364ee1e

allow no-auth functionWorker start auth task

016d6b2

change k8sRuntime jobName

a56f3e7

add udc deploy.sh

5c0a835

support pod chinese log

96e4ce5

support worker url

4e7cc05

add area_code

6cd48a2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Branch 2.8.bkbase.3 #1

Branch 2.8.bkbase.3 #1

golden-yang commented May 16, 2023

Branch 2.8.bkbase.3 #1

Are you sure you want to change the base?

Branch 2.8.bkbase.3 #1

Conversation

golden-yang commented May 16, 2023

Motivation

Modifications

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation