Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Branch 2.8.bkbase.3 #1

Open
wants to merge 519 commits into
base: v2.8.1-bkdata.2
Choose a base branch
from

Conversation

golden-yang
Copy link
Collaborator

(If this PR fixes a github issue, please add Fixes #<xyz>.)

Fixes #

(or if this PR is one task of a github issue, please add Master Issue: #<xyz> to link to the master issue.)

Master Issue: #

Motivation

Explain here the context, and why you're making that change. What is the problem you're trying to solve.

Modifications

Describe the modifications you've done.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API: (yes / no)
  • The schema: (yes / no / don't know)
  • The default values of configurations: (yes / no)
  • The wire protocol: (yes / no)
  • The rest endpoints: (yes / no)
  • The admin cli options: (yes / no)
  • Anything that affects deployment: (yes / no / don't know)

Documentation

Check the box below and label this PR (if you have committer privilege).

Need to update docs?

  • doc-required

    (If you need help on updating docs, create a doc issue)

  • no-need-doc

    (Please explain why)

  • doc

    (If this PR contains doc changes)

Technoboy- and others added 30 commits December 23, 2021 22:37
apache#13237)

### Motivation
It should return `null` instead of `RestException` in method `NamespacesBase#internalGetPublishRate`, because `null` means that the `publish-rate` is not configured.
It is the same as `internalGetSubscriptionDispatchRate` as below:
https://github.com/apache/pulsar/blob/6d9d24d50db5418ddbb845d2c7a2be2b9ac72893/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/NamespacesBase.java#L1303-L1308

(cherry picked from commit 3e55b4f)
…ache#13324)

### Motivation

When I build C++ tests on my local env, the following error happened.

```
tests/VersionTest.cc:19:10: fatal error: 'pulsar/Version.h' file not found
#include <pulsar/Version.h>
```

It's because I specified another directory as CMake directory.

```bash
mkdir _builds && cd _builds && cmake ..
```

After apache#12769, the `Version.h` is generated under `${CMAKE_BINARY_DIR}/include/pulsar` directory but it's not included in `CMakeLists.txt`. CI works well because it's built in the default CMake directory so that `CMAKE_BINARY_DIR` is the same with `CMAKE_SOURCE_DIR`, which is included.

### Modifications

Add the `${CMAKE_BINARY_DIR}/include` to `included_directories`.

(cherry picked from commit ca37e67)
…apache#13332)

### Motivation

When we use a multi-topic reader, the `hasMessageAvailable` method might have the wrong behavior, since the multi-topics consumer receives all messages from the single-topic consumer, the single-topic consumer `hasMessageAvailable` might always be `false` (The lastDequeuedMessageId reach to the end of the queue, all message enqueue to multi-topic consumer's `incomingMessages` queue).

We should check the multi-topics consumer  `incomingMessages` size > 0 when calling `hasMessageAvailable `.

### Modifications

1. Add a check of `incomingMessages` size > 0
2. Add units test `testHasMessageAvailableAsync` to verify the behavior.

(cherry picked from commit 6c7dcc0)
…he#13428)

- closing ServerCnx while producers or consumers are created can lead
  to a producer or consumer never getting removed from the topic's
  list of producers

(cherry picked from commit 3316db5)
(cherry picked from commit 76f3566)
---

*Motivation*

The `namespaceEventsSystemTopicFactory` is created when you will
use it. But the `createSystemTopicFactoryIfNeeded()` may failed
which will cause the `namespaceEventsSystemTopicFactory` is null
and throw a NPE error from the method.

*Modifications*

- throw the error and failed the method when there has exceptions in
`createSystemTopicFactoryIfNeeded()`

(cherry picked from commit 4022b28)
* add python3.9 on manylinux2014 build support

* add python3.9 client build CI

* fix python3.9 client build CI

Co-authored-by: livio <livio.bencik@mindsmiths.com>
(cherry picked from commit 825187f)
Co-authored-by: Ali Ahmed <alia@splunk.com>
(cherry picked from commit 7c219b1)
### Motivation
The time unit in this exception message is ns, which is not very readable. We can change it from ns to ms.
```
org.apache.pulsar.client.api.PulsarClientException$TimeoutException:
The producer xxx can not send message to the topic xxx within given timeout : createdAt 461913074 ns ago, firstSentAt 29545553038276935 ns ago, lastSentAt 29545553038276935 ns ago, retryCount 0 at org.apache.pulsar.client.api.PulsarClientException.unwrap(PulsarClientException.java:916)
at org.apache.pulsar.client.impl.TypedMessageBuilderImpl.send(TypedMessageBuilderImpl.java:93)
at org.apache.pulsar.client.impl.ProducerBase.send(ProducerBase.java:63)
at com.yum.boh.oh.service.impl.StoreOrderPostServiceImpl.generalProcessing(StoreOrderPostServiceImpl.java:272)
at com.yum.boh.oh.service.impl.StoreOrderPostServiceImpl.saveThirdOrder(StoreOrderPostServiceImpl.java:72)
at com.yum.boh.oh.controller.StoreOrderController.postOrderInfo$original$T8425mfx(StoreOrderController.java:39)
at com.yum.boh.oh.controller.StoreOrderController.postOrderInfo$original$T8425mfx$accessor$vJljNzML(StoreOrderController.java)
at com.yum.boh.oh.controller.StoreOrderController$auxiliary$nysalhgy.call(Unknown Source)
at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:86)
```

### Modifications
Change the time units from ns to ms for ProducerImpl#OpSendMsg.

(cherry picked from commit 891660e)
…13194) (apache#13249)

Fixes apache#13194

### Motivation
https://github.com/apache/pulsar/blob/38fb839154462fc5c6b0b4293f02762ed4021cd9/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BacklogQuotaManager.java#L200-L219
BacklogQuotaManager.dropBacklogForTimeLimit may fall into dead loop in some conditions, e.g.
`backlogQuotaDefaultLimitSecond` is enabled
1. producer stop produce after produced some messages, current ledger is A
2. times up, triggered ledger rollover, a new ledger B created which is empty (no entries)
3. now lastConfirmedEntry is `A:last-entry-id`
4. after `backlogQuotaDefaultLimitSecond` times up, it'll reset cursor to position `A:last-entry-id+1` which is only valid, so loop begin until the producer resume produce

### Modifications

Record the previous slowestReaderPosition, if it is same with newer slowestReaderPosition after `resetCursor`, then exit loop.

(cherry picked from commit 021409b)
…3454)

### Motivation

When a large message is sent by chunks, each chunk needs to reserve a spot of the semaphore. However, when it failed, the already reserved memory from limiter and spots from semaphore are not released.

### Modifications

- Release the semaphore and memory when `canEnqueueRequest` returns false for chunks.
- Add `testChunksEnqueueFailed` to cover this case. It sends a large message whose number of chunks is greater than the `maxPendingMessages` so that the first time `canEnqueueRequest` returns true while the following `canEnqueueRequest` calls will return false.

(cherry picked from commit 2e2cd57)
### Motivation

Currently, if the last confirmed entry is an empty ledger, getting the last message-id operation will get data from the compact ledger, if the compact ledger is also an empty ledger, it will encounter `IncorrectParameterException`.

**Broker error message**
```
[pulsar-io-29-9] ERROR org.apache.bookkeeper.client.LedgerHandle - IncorrectParameterException on ledgerId:617 firstEntry:-1 lastEntry:-1
```

**Client error log**
```
Exception in thread "main" org.apache.pulsar.client.api.PulsarClientException$BrokerMetadataException: The subscription reader-bf9246cfcb of the topic persistent://public/ns-test/t1 gets the last message id was failed
{"errorMsg":"Failed to read last entry of the compacted Ledger Incorrect parameter input","reqId":79405902881798690, "remote":"localhost/127.0.0.1:6650", "local":"/127.0.0.1:55207"}
	at org.apache.pulsar.client.api.PulsarClientException.unwrap(PulsarClientException.java:1034)
	at org.apache.pulsar.client.impl.ConsumerImpl.hasMessageAvailable(ConsumerImpl.java:2001)
	at org.apache.pulsar.client.impl.ReaderImpl.hasMessageAvailable(ReaderImpl.java:181)
	at org.apache.pulsar.compaction.CompactedTopicTest.main(CompactedTopicTest.java:730)
```

### Modifications

Check the compact ledger entry id before reading an entry from the compact ledger, if there is no entry, return a null value.

(cherry picked from commit 8136762)
Fixes apache#13214

When client create producer failed caused by connection failed, topic terminated, or produce fenced. There are some resources that are not released in the client.

When creating producer failed.
1. stop the sendTimout  task
2. cancel the batchTimerTask
3. cancel the keyGeneratorTask
4. cancel the statTimeout task

(cherry picked from commit 57eccf4)
…side (apache#13536)

### Motivation

Currently Pulsar consumer allocates memory from direct memory via `Unpooled.directBuffer` directly, which doesn't make use of the widely used allocator in Pulsar.

### Modifications

Use `PulsarByteBufAllocator` as the memory allocator for chunks buffer.

(cherry picked from commit a5d3473)
…each the end of the topic (apache#13533)

### Motivation

The problem happens when the compaction cursor reaches the end of the topic but the tail messages
of the topic have been removed by producer writes null value messages during the topic compaction.

For example:

- 5 messages in the original topic with key: 0,1,2,3,4
- the corresponding message IDs are: 1:0, 1:1, 1:2, 1:3, 1:4
- producer send null value messages for key 3 and 4
- trigger the topic compaction task

After the compaction task complete,

- 5 messages in the original topic: 1:0, 1:1, 1:2, 1:3, 1:4
- 3 messages in the compacted ledger: 1:0, 1:1, 1:2

At this moment, if the reader tries to get the last message ID of the topic,
we should return `1:2` not `1:4`, because the reader is not able to read the message
with keys `3` and `4` from the compacted topic, otherwise, the `reader.readNext()` method
will be blocked until a new message written to the topic.

### Modifications

The fix is straightforward, when the broker receives a get last message ID request,
the broker will check if the compaction cursor reaches the end of the original topic.
If yes, respond last message ID from the compacted ledger.

### Verifying this change

New test added `testHasMessageAvailableWithNullValueMessage` which ensure the `hasMessageAvailable()`
return false no more messages from the compacted topic if the compaction cursor reaches the end of the topic.

(cherry picked from commit d49415e)
…he#13477)

Motivation
Currently, if the roles in the token are empty, then he `MultiRolesTokenAuthorizationProvider` will have problems processing it. It will keep waiting for an empty list of futures. Eventually causing the operation to time out.

Modification
* In `MultiRolesTokenAuthorizationProvider.authorize`, return false immediately when the roles are empty.

Signed-off-by: Zike Yang <zkyang@streamnative.io>
(cherry picked from commit 4f942d7)
…che#13501)

### Motivation

It's ``AddtionalServlet`` side change, like apache#11270

### Modifications

Change context class loader through Thread.currentThread().setContextClassLoader(classLoader) before every protocol handler callback, and change it back to original class loader afterwards.

(cherry picked from commit 40ac793)
* Fix NPE when unloading namespace bundle

(cherry picked from commit ec0a440)
* Upgrade gson version 2.8.6 to 2.8.9

* fix license

(cherry picked from commit 937131a)
* update log content

* update log content

* check style

(cherry picked from commit 825d2fe)
* Fix time field use error

* rebuid

Co-authored-by: liu.changqing <changqing_l@kingdee.com>
(cherry picked from commit 793dd91)
merlimat and others added 30 commits May 12, 2022 21:12
…eation failure (apache#15570)

* Fix deadlock in broker after race condition in topic creation failure

* Fixed checkstyle
…che#15646)

- release notes https://netty.io/news/2022/05/06/2-1-77-Final.html
  - improves Alpine / musl compatibility
    - could help issues such as apache#14534 apache#11415 apache#11224 apache#10798
  - improves shading compatibility
  - fixes a bug related to the native epoll transport and epoll_pwait2

(cherry picked from commit a8045fc)
…io/batch-data-generator and Jcloud offloader (apache#14150)

(cherry picked from commit 25ccf45)
…O, Offloaders and Pulsar SQL - Bump Guice to 5.1.0 (apache#14300)

* Remove --illegal-access errors resulting from Google Guice - Batch Data Generator connector

* and jcloud-shaded

* use dependencyManagement

* fix pulsar-sql

(cherry picked from commit 332eca8)
…ete. (apache#14818)

If configuration  ``patternAutoDiscoveryPeriod`` is small, there may be some unnecessary subscribe requests.

*Describe the modifications you've done.*

(cherry picked from commit 0fe921f)
…pache#15161)

Fixes apache#15078

When a partitioned producer is created and some of the partitioned
failed to create, `closeAsync` will be called immediately, even if other
partitions were still in progress of creating the associated single
producers.

Since `closeAsync` is called before calling `setFailed` on the
`partitionedProducerCreatedPromise_` field, there is a race condition
that all single producers are closed before the promise is set. Then the
promise will be set with `ResultUnknownError`, see
https://github.com/apache/pulsar/blob/4aeeed5dab9dfe9493526f36d539b3ef29cf6fe5/pulsar-client-cpp/lib/PartitionedProducerImpl.cc#L317.

Only after all single producers failed or succeeded then call
`closeAsync` if one of them failed. And ensure
`partitionedProducerCreatedPromise_` was completed before calling
`closeAsync`.

This PR also makes the state of a partitioned producer atomic because
using a mutex to protect it makes code hard to write.

Create a separate namespace `public/test-backlog-quotas` to test the
case when the backlog quota exceeds. Then add
`testBacklogQuotasExceeded` test to make some backlog via creating a
consumer and sending some messages to a partition of the topic.

In this test, only 1 partition has backlog and will fail with the
related error. So the test verifies that `createProducer` could return a
correct error instead of `ResultUnknownError`.

(cherry picked from commit 0f85596)
…e#14742)

Fixes apache#14459

See the issue

1. Add WebExecutorStats to record web thread pool metrics

(cherry picked from commit 32d7a51)
…ersistence correctly (apache#15206)

In the current implementation, When the first time execute `purgeInactiveProducers`, Although the produces does not expire, it removed directly from the collection(464 line). The will result in these producers never being remove.

https://github.com/apache/pulsar/blob/9861dfb1208c4b6b8a1f17ef026e9af71c3e784c/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/MessageDeduplication.java#L454-L472

1. It is removed from the collection only when the producer is inactive.
2. Take a snapshot after each removal of an inactive producer. When `managedLedger.getLastConfirmedEntry` equals `managedCursor.getMarkDeletedPosition()`, The`deduplication-snapshot-monitor` thread does not trigger a snapshot. The persistence these producers only the next time a message is produced, The can be confusing for users.

```
        PositionImpl position = (PositionImpl) managedLedger.getLastConfirmedEntry();
        if (position == null) {
            return;
        }
        PositionImpl markDeletedPosition = (PositionImpl) managedCursor.getMarkDeletedPosition();
        if (markDeletedPosition != null && position.compareTo(markDeletedPosition) <= 0) {
            return;
        }
```

(cherry picked from commit 8e1ca48)
…erly since writeAndFlush is asynchronous (apache#15384)

(cherry picked from commit cd3816a)
…pache#15415)

* [Proxy] Remove unnecessary blocking DNS lookup in LookupProxyHandler

* Use existing code pattern for creating address

(cherry picked from commit 7373a51)
…#15340)

* Add config of IO and acceptor threads in proxy

* Update doc

* Update site2/docs/reference-configuration.md

Co-authored-by: Anonymitaet <50226895+Anonymitaet@users.noreply.github.com>

* Update site2/docs/reference-configuration.md

Co-authored-by: Anonymitaet <50226895+Anonymitaet@users.noreply.github.com>
(cherry picked from commit da3f017)
…orizationProvider (apache#14857)

### Motivation

Currently, `MultiRolesTokenAuthorizationProvider` doesn't support handling the single string type role. It will return the empty role in that case. This PR adds support for handling the string-type role. This PR also adds support for handling the non-jwt-token.

### Modifications

* Add support for handling the string-type role
* Add support for handling the non-jwt-token

### Verifying this change

This change is already covered by existing tests, such as *testMultiRolesAuthzWithSingleRole*.

(cherry picked from commit 8bf6785)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.