Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

By @Ayanda-D: stop QQ replicas when a QQ is forced to shrink to a single replica #12467

Closed
wants to merge 33 commits into from

Conversation

michaelklishin
Copy link
Member

This is #12427 by @Ayanda-D.

Shrinking operations did not stop QQ replicas. This was easy to miss because QQs are usually shrunk before a node is removed from the cluster.

However, there is a scenario where this is not the case. If some nodes (replicas) need to be replaced, in particular when a majority of nodes cannot be recovered for any reasons, the recovery process will involve shrinking a QQ to just one member so that it has an online quorum (of 1 node out of 1) before new replicas can be added.

For this to succeed, the older replicas must be stopped and deleted from the (QQ) cluster.

Ayanda-D and others added 30 commits October 2, 2024 13:47
…r cluster

wide consistency, ensuring only the leader is active/running
Bumps [google-github-actions/auth](https://github.com/google-github-actions/auth) from 2.1.5 to 2.1.6.
- [Release notes](https://github.com/google-github-actions/auth/releases)
- [Changelog](https://github.com/google-github-actions/auth/blob/main/CHANGELOG.md)
- [Commits](google-github-actions/auth@v2.1.5...v2.1.6)

---
updated-dependencies:
- dependency-name: google-github-actions/auth
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
The `dict:dict()` typing of `rabbit_binding` appears to be a historical
artifact. `dict` has been superseded by `maps`. Switching to a map
makes deletions easier to inspect manually and faster. Though if
deletions grow so large that the map representation is important,
manipulation of the deletions is unlikely to be expensive compared to
any other operations that produced them, so performance is probably
irrelevant.

This commit refactors the bottom section of the `rabbit_binding` module
to switch to a map, switch the `deletions()` type to an opaque,
eliminating a TODO created when using Erlang/OTP 17.1, and the deletion
value to a record. We eliminate some historical artifacts and "cruft":

* Deletions taking multiple forms needlessly, specifically the shape
  `{X, deleted | not_deleted, Bindings, none}` no longer being
  handled. `process_deletions/2` was responsible for creating this
  shape. Instead we now use a record to clearly define the fields.
* Clauses to catch `{error, not_found}` are unnecessary after minor
  refactors of the callers. Removing them makes the type specs cleaner.
* `rabbit_binding:process_deletions/1` has no need to update or change
  the deletions. This function uses `maps:foreach/2` instead and returns
  `ok` instead of mapped deletions.
* Remove `undefined` from the typespec of deletions. This value is no
  longer possible with a refactor to `maybe_auto_delete_exchange_in_*`
  functions for Mnesia and Khepri. The value was nonsensical since you
  cannot delete bindings for an exchange that does not exist.
[Why]
Before this change, the controller was looping on all feature flags to
enable, then for each:
1. it checked if it was supported
2. it acquired the registry lock
3. it enabled the feature flag
4. it released the registry lock

It was done this way to not acquire the log if the feature flag was
unsupported in the first place.

However, this put more load on the lock mechanism.

[How]
This commit changes the order. The controller acquires the registry lock
once, then loops on feature flags to enable. The support check is now
under the registry lock.
The subsystem didn't exist before 2019. The deprecated features support
was added in 2023.
…nit logged list

[Why]
Showing that required feature flags are enabled over and over is not
useful and only adds noise to the logs.

[How]
Required feature flags and removed deprecated features are not lists
explicitly. We just log their respective numbers to still be clear that
they exist.

Before:
    list of feature flags found:
      [x] classic_mirrored_queue_version
      [x] classic_queue_type_delivery_support
      [x] direct_exchange_routing_v2
      [x] feature_flags_v2
      [x] implicit_default_bindings
      [ ] khepri_db
      [x] message_containers_deaths_v2
      [x] quorum_queue_non_voters
      [~] rabbit_exchange_type_local_random
      [x] rabbitmq_4.0.0
      ...
    list of deprecated features found:
      [ ] amqp_address_v1
      [x] classic_queue_mirroring
      [ ] global_qos
      [ ] queue_master_locator
      [ ] ram_node_type
      [ ] transient_nonexcl_queues

After:
    list of feature flags found:
      [ ] khepri_db
      [x] message_containers_deaths_v2
      [x] quorum_queue_non_voters
      [~] rabbit_exchange_type_local_random
      [x] rabbitmq_4.0.0
    list of deprecated features found:
      [ ] amqp_address_v1
      [ ] global_qos
      [ ] queue_master_locator
      [ ] ram_node_type
      [ ] transient_nonexcl_queues
    required feature flags not listed above: 18
    removed deprecated features not listed above: 1
[Why]
The inventory map is huge and difficult to read when it is logged as
is.

[How]
Logging a matrix is much more compact and to the point.

Before:
    Feature flags: inventory of node `rabbit-1@giotto`:
    #{feature_flags =>
          #{rabbit_exchange_type_local_random =>
                #{name => rabbit_exchange_type_local_random,
                  desc => "Local random exchange",stability => stable,
                  provided_by => rabbit},
            message_containers_deaths_v2 =>
                #{name => message_containers_deaths_v2,
                  desc => "Bug fix for dead letter cycle detection",
    ...

After:
    Feature flags: inventory queried from node `rabbit-2@giotto`:
                                           ,-- rabbit-2@giotto
                                           |
                         amqp_address_v1:
          classic_mirrored_queue_version:  x
                 classic_queue_mirroring:  x
     classic_queue_type_delivery_support:  x
    ...
[Why]
They just add noise to the UI and there is nothing the user can do about
them at that point.

Given their number will only increase, let's hide them to let the user
focus on the feature flags they can act on.
Bumps `junit.jupiter.version` from 5.11.1 to 5.11.2.

Updates `org.junit.jupiter:junit-jupiter-engine` from 5.11.1 to 5.11.2
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](junit-team/junit5@r5.11.1...r5.11.2)

Updates `org.junit.jupiter:junit-jupiter-params` from 5.11.1 to 5.11.2
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](junit-team/junit5@r5.11.1...r5.11.2)

---
updated-dependencies:
- dependency-name: org.junit.jupiter:junit-jupiter-engine
  dependency-type: direct:development
  update-type: version-update:semver-patch
- dependency-name: org.junit.jupiter:junit-jupiter-params
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps `junit.jupiter.version` from 5.11.1 to 5.11.2.

Updates `org.junit.jupiter:junit-jupiter-engine` from 5.11.1 to 5.11.2
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](junit-team/junit5@r5.11.1...r5.11.2)

Updates `org.junit.jupiter:junit-jupiter-params` from 5.11.1 to 5.11.2
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](junit-team/junit5@r5.11.1...r5.11.2)

---
updated-dependencies:
- dependency-name: org.junit.jupiter:junit-jupiter-engine
  dependency-type: direct:development
  update-type: version-update:semver-patch
- dependency-name: org.junit.jupiter:junit-jupiter-params
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [org.junit.jupiter:junit-jupiter-params](https://github.com/junit-team/junit5) from 5.11.1 to 5.11.2.
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](junit-team/junit5@r5.11.1...r5.11.2)

---
updated-dependencies:
- dependency-name: org.junit.jupiter:junit-jupiter-params
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [org.junit.jupiter:junit-jupiter](https://github.com/junit-team/junit5) from 5.11.1 to 5.11.2.
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](junit-team/junit5@r5.11.1...r5.11.2)

---
updated-dependencies:
- dependency-name: org.junit.jupiter:junit-jupiter
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
As suggested by @johanrhodin in #12454.

This keeps the Prometheus plugin part but
marks it as deprecated. We can remove it in
4.1.
For example during the startup after RabbitMQ was upgraded but an
enabled community plugin wasn't, and the plugin's broker version
requirement isn't met any more, RabbitMQ still started the plugin
after logging an error.
The problem comes from `ct_master` which doesn't tell us
in the return value whether the tests succeeded. In order
to get that information a CT hook was created. But then
we run into another problem: despite its documentation
claiming otherwise, `ct_master` does not handle `ct_hooks`
instructions in the test spec.

So for the time being we fork `ct_master` into a new
`ct_master_fork` module and insert our hook directly
in the code. Later on we will submit patches to OTP.
Fixes a pattern matching bug for discards that come in after a consumer
has been cancelled. Because the rabbit_fifo_client does not keep
the integer consumer key after cancellation, late acks, returns, and
discards use the full {CTag, Pid} consumer id version.

As this is a state machine change the machine version has been
increased to 5.

The same bug is present for the `modify` command also however as
AMQP does not allow late settlements we don't have to make this
fix conditional on the machine version as it cannot happen.
* Support AMQP filter expressions

 ## What?

This PR implements the following property filter expressions for AMQP clients
consuming from streams as defined in
[AMQP Filter Expressions Version 1.0 Working Draft 09](https://groups.oasis-open.org/higherlogic/ws/public/document?document_id=66227):
* properties filters [section 4.2.4]
* application-properties filters [section 4.2.5]

String prefix and suffix matching is also supported.

This PR also fixes a bug where RabbitMQ would accept wrong filters.
Specifically, prior to this PR the values of the filter-set's map were
allowed to be symbols. However, "every value MUST be either null or of a
described type which provides the archetype filter."

 ## Why?

This feature adds the ability to RabbitMQ to have multiple concurrent clients
each consuming only a subset of messages while maintaining message order.

This feature also reduces network traffic between RabbitMQ and clients by
only dispatching those messages that the clients are actually interested in.

Note that AMQP filter expressions are more fine grained than the [bloom filter based
stream filtering](https://www.rabbitmq.com/blog/2023/10/16/stream-filtering) because
* they do not suffer false positives
* the unit of filtering is per-message instead of per-chunk
* matching can be performed on **multiple** values in the properties and
  application-properties sections
* prefix and suffix matching on the actual values is supported.

Both, AMQP filter expressions and bloom filters can be used together.

 ## How?

If a filter isn't valid, RabbitMQ ignores the filter. RabbitMQ only
replies with filters it actually supports and validated successfully to
comply with:
"The receiving endpoint sets its desired filter, the sending endpoint
[RabbitMQ] sets the filter actually in place (including any filters defaulted at
the node)."

* Delete streams test case

The test suite constructed a wrong filter-set.
Specifically the value of the filter-set didn't use a described type as
mandated by the spec.
Using https://azure.github.io/amqpnetlite/api/Amqp.Types.DescribedValue.html
throws errors that the descriptor can't be encoded. Given that this code
path is already tests via the amqp_filtex_SUITE, this F# test gets
therefore deleted.

* Re-introduce the AMQP filter-set bug

Since clients might rely on the wrong filter-set value type, we support
the bug behind a deprecated feature flag and gradually remove support
this bug.

* Revert "Delete streams test case"

This reverts commit c95cfea.
dumbbell and others added 3 commits October 7, 2024 12:30
…RE_FLAGS

[Why]
Before this patch, the $RABBITMQ_FEATURE_FLAGS environment variable took
an exhaustive list of feature flags to enable. This list overrode the
default of enabling all stable feature flags.

It made it inconvenient when a user wanted to enable an experimental
feature flag like `khepri_db` while still leaving the default behavior.

[How]
$RABBITMQ_FEATURE_FLAGS now acceps the following syntax:

    RABBITMQ_FEATURE_FLAGS=+feature1,-feature2

This will start RabbitMQ with all stable feature flags, plus `feature1`,
but without `feature2`.

For users setting `forced_feature_flags_on_init` in the config, the
corresponding syntax is:

    {forced_feature_flags_on_init, {rel, [feature1], [feature2]}}
[Why]
Its use was removed when the registry was converted from a compiled
module to a persistent_term.
@michaelklishin
Copy link
Member Author

This will require a manual backport.

@michaelklishin michaelklishin deleted the rabbitmq-server-12427 branch October 7, 2024 18:23
@michaelklishin
Copy link
Member Author

#12468

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.