Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid accessing a null ClientConnection instance #317

Merged

Conversation

BewareMyPower
Copy link
Contributor

Motivation

We observed server null ClientConnection accesses in test environment. See the this=0x0 outputs in the following two typical stacks.

#8  bytesWritten (this=0xb8, size=371) at lib/SharedBuffer.h:166
#9  pulsar::ClientConnection::handleRead (this=0x0, err=..., bytesTransferred=371, minReadSize=4) at lib/ClientConnection.cc:609
#12 0x00007f33202933d2 in unique_lock (__m=..., this=0x7f3311c82800) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_mutex.h:197
#13 pulsar::ClientConnection::sendPendingCommands (this=0x0) at lib/ClientConnection.cc:1071
#14 0x00007f3320293d2d in pulsar::ClientConnection::handleSendPair (this=0x0, err=...) at lib/ClientConnection.cc:1066

Though shared_from_this() is always passed to the std::bind function, when the method of ClientConnection is called, the pointer is still null.

Modifications

First, replace all std::bind calls with the lambda expression that catches std::weak_ptr<ClientConnection> and perform null checks explicitly on the value returned by the lock() method.

Since now all asio callbacks don't hold a shared_ptr, the owner of the ClientConnection object should be ConnectionPool, i.e. the pool maintains some connections, while all asio callbacks use weak_ptr to test if the connection is present.

Second, make ClientConnection::getConnection return shared_ptr rather than weak_ptr so that the caller side does not need to check if lock() returns null in the callback of this future.

We cannot make ConnectionPool::getConnectionAsync return shared_ptr because it could return the future of connectPromise_, which is hold by ClientConnection itself. We should avoid holding a shared_ptr of ClientConnection because its owner is ConnectionPool.

@BewareMyPower BewareMyPower added the bug Something isn't working label Sep 18, 2023
@BewareMyPower BewareMyPower added this to the 3.4.0 milestone Sep 18, 2023
@BewareMyPower BewareMyPower self-assigned this Sep 18, 2023
@BewareMyPower BewareMyPower requested review from merlimat, shibd and RobertIndie and removed request for merlimat and shibd September 18, 2023 13:45
### Motivation

We observed server null `ClientConnection` accesses in test environment.
See the `this=0x0` outputs in the following two typical stacks.

```
#8  bytesWritten (this=0xb8, size=371) at lib/SharedBuffer.h:166
#9  pulsar::ClientConnection::handleRead (this=0x0, err=..., bytesTransferred=371, minReadSize=4) at lib/ClientConnection.cc:609
```

```
#12 0x00007f33202933d2 in unique_lock (__m=..., this=0x7f3311c82800) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_mutex.h:197
#13 pulsar::ClientConnection::sendPendingCommands (this=0x0) at lib/ClientConnection.cc:1071
#14 0x00007f3320293d2d in pulsar::ClientConnection::handleSendPair (this=0x0, err=...) at lib/ClientConnection.cc:1066
```

Though `shared_from_this()` is always passed to the `std::bind`
function, when the method of `ClientConnection` is called, the pointer
is still `null`.

### Modifications

First, replace all `std::bind` calls with the lambda expression that
catches `std::weak_ptr<ClientConnection>` and perform null checks
explicitly on the value returned by the `lock()` method.

Since now all asio callbacks don't hold a `shared_ptr`, the owner of
the `ClientConnection` object should be `ConnectionPool`, i.e. the pool
maintains some connections, while all asio callbacks use `weak_ptr` to
test if the connection is present.

Second, make `ClientConnection::getConnection` return `shared_ptr`
rather than `weak_ptr` so that the caller side does not need to check if
`lock()` returns null in the callback of this future.

We cannot make `ConnectionPool::getConnectionAsync` return `shared_ptr`
because it could return the future of `connectPromise_`, which is hold
by `ClientConnection` itself. We should avoid holding a `shared_ptr` of
`ClientConnection` because its owner is `ConnectionPool`.
@BewareMyPower BewareMyPower merged commit b35ae1a into apache:main Sep 20, 2023
11 checks passed
@BewareMyPower BewareMyPower deleted the bewaremypower/refactor-cnx-ref branch September 20, 2023 02:51
BewareMyPower added a commit to BewareMyPower/pulsar-client-cpp that referenced this pull request Oct 8, 2023
Fixes apache#325

### Motivation

apache#317 introduces a bug
that might cause segmentation fault when sending messages after
receiving an error, see
apache#325 (comment)
for the detailed explanation.

### Modifications

When calling `asyncWrite`, capture the `shared_ptr` instead of the
`weak_ptr` to extend the lifetime of the `socket_` or `tlsSocket_` field
in `ClientConnection`. Since the lifetime is extended, in some
callbacks, check `isClosed()` before other logic.

Add a `ChunkDedupTest` to reproduce this issue based on Pulsar 3.1.0.
Run the test for 10 times to ensure it won't crash after this patch.
BewareMyPower added a commit to streamnative/pulsar-client-cpp that referenced this pull request Oct 8, 2023
…apache#326)

Fixes apache#325

### Motivation

apache#317 introduces a bug
that might cause segmentation fault when sending messages after
receiving an error, see
apache#325 (comment)
for the detailed explanation.

### Modifications

When calling `asyncWrite`, capture the `shared_ptr` instead of the
`weak_ptr` to extend the lifetime of the `socket_` or `tlsSocket_` field
in `ClientConnection`. Since the lifetime is extended, in some
callbacks, check `isClosed()` before other logic.

Add a `ChunkDedupTest` to reproduce this issue based on Pulsar 3.1.0.
Run the test for 10 times to ensure it won't crash after this patch.
BewareMyPower added a commit that referenced this pull request Oct 9, 2023
…#326)

Fixes #325

### Motivation

#317 introduces a bug
that might cause segmentation fault when sending messages after
receiving an error, see
#325 (comment)
for the detailed explanation.

### Modifications

When calling `asyncWrite`, capture the `shared_ptr` instead of the
`weak_ptr` to extend the lifetime of the `socket_` or `tlsSocket_` field
in `ClientConnection`. Since the lifetime is extended, in some
callbacks, check `isClosed()` before other logic.

Add a `ChunkDedupTest` to reproduce this issue based on Pulsar 3.1.0.
Run the test for 10 times to ensure it won't crash after this patch.
BewareMyPower added a commit to BewareMyPower/pulsar-client-cpp that referenced this pull request Nov 16, 2023
Fixes apache#348

Fixes apache#349

### Motivation

When `close` is called in `ClientConnection`'s constructor,
`shared_from_this()` will be called, which results in a
`std::bad_weak_ptr` error. This error does not happen before
apache#317 because
`shared_from_this()` could only be called when the `producers` or
`consumers` field is not empty.

### Modifications

Modify the 2nd parameter of `ClientConnection::close` to represent if
the construction completes. If not, just set the state to
`Disconnected` and complete the future to the result. Then
`ConnectionPool::getConnectionAsync` will return a future that completes
with the failed result.

In addition, check `authentication_` even for non-TLS URLs. Otherwise,
the null authentication will be used to construct `CommandConnect`.

Add `testInvalidPlugin` and `testTlsConfigError` to verify the changes.
BewareMyPower added a commit to BewareMyPower/pulsar-client-cpp that referenced this pull request Nov 17, 2023
Fixes apache#348

Fixes apache#349

### Motivation

When `close` is called in `ClientConnection`'s constructor,
`shared_from_this()` will be called, which results in a
`std::bad_weak_ptr` error. This error does not happen before
apache#317 because
`shared_from_this()` could only be called when the `producers` or
`consumers` field is not empty.

### Modifications

Modify the 2nd parameter of `ClientConnection::close` to represent if
the construction completes. If not, just set the state to
`Disconnected` and complete the future to the result. Then
`ConnectionPool::getConnectionAsync` will return a future that completes
with the failed result.

In addition, check `authentication_` even for non-TLS URLs. Otherwise,
the null authentication will be used to construct `CommandConnect`.

Add `testInvalidPlugin` and `testTlsConfigError` to verify the changes.
BewareMyPower added a commit to BewareMyPower/pulsar-client-cpp that referenced this pull request Nov 17, 2023
Fixes apache#348

Fixes apache#349

### Motivation

When `close` is called in `ClientConnection`'s constructor,
`shared_from_this()` will be called, which results in a
`std::bad_weak_ptr` error. This error does not happen before
apache#317 because
`shared_from_this()` could only be called when the `producers` or
`consumers` field is not empty.

### Modifications

Add a `ResultException` and throw it if there is a failure in
`ClientConnection`'s constructor and catch it in
`ConnectionPool::getConnectionAsync`. Then retrieve the result and
return the failed future.

Since `close` is now always called on a constructed `ClientConnection`
object, remove the 2nd parameter.

In addition, check `authentication_` even for non-TLS URLs. Otherwise,
the null authentication will be used to construct `CommandConnect`.

Add `testInvalidPlugin` and `testTlsConfigError` to verify the changes.
BewareMyPower added a commit to BewareMyPower/pulsar-client-cpp that referenced this pull request Nov 17, 2023
Fixes apache#348

Fixes apache#349

### Motivation

When `close` is called in `ClientConnection`'s constructor,
`shared_from_this()` will be called, which results in a
`std::bad_weak_ptr` error. This error does not happen before
apache#317 because
`shared_from_this()` could only be called when the `producers` or
`consumers` field is not empty.

### Modifications

Throw `ResultAuthenticationError` when there is anything wrong with
authentication in `ClientConnection`'s constructor. Then catch the
result in `ConnectionPool::getConnectionAsync` and returned the failed
future.

In addition, check `authentication_` even for non-TLS URLs. Otherwise,
the null authentication will be used to construct `CommandConnect`.

Add `testInvalidPlugin` and `testTlsConfigError` to verify the changes.
BewareMyPower added a commit that referenced this pull request Nov 17, 2023
)

Fixes #348

Fixes #349

### Motivation

When `close` is called in `ClientConnection`'s constructor,
`shared_from_this()` will be called, which results in a
`std::bad_weak_ptr` error. This error does not happen before
#317 because
`shared_from_this()` could only be called when the `producers` or
`consumers` field is not empty.

### Modifications

Throw `ResultAuthenticationError` when there is anything wrong with
authentication in `ClientConnection`'s constructor. Then catch the
result in `ConnectionPool::getConnectionAsync` and returned the failed
future.

In addition, check `authentication_` even for non-TLS URLs. Otherwise,
the null authentication will be used to construct `CommandConnect`.

Add `testInvalidPlugin` and `testTlsConfigError` to verify the changes.
BewareMyPower added a commit that referenced this pull request Nov 17, 2023
)

Fixes #348

Fixes #349

### Motivation

When `close` is called in `ClientConnection`'s constructor,
`shared_from_this()` will be called, which results in a
`std::bad_weak_ptr` error. This error does not happen before
#317 because
`shared_from_this()` could only be called when the `producers` or
`consumers` field is not empty.

### Modifications

Throw `ResultAuthenticationError` when there is anything wrong with
authentication in `ClientConnection`'s constructor. Then catch the
result in `ConnectionPool::getConnectionAsync` and returned the failed
future.

In addition, check `authentication_` even for non-TLS URLs. Otherwise,
the null authentication will be used to construct `CommandConnect`.

Add `testInvalidPlugin` and `testTlsConfigError` to verify the changes.

(cherry picked from commit 7bb94f4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants