Skip to content

Fix a race condition in CompositeEndpointGroup #6220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 28, 2025

Conversation

minwoox
Copy link
Contributor

@minwoox minwoox commented Apr 25, 2025

Motivation:
CompositeEndpointGroup has a race condition that can occur when multiple child EndpointGroups notify listeners at the same time from different threads. Consider the following sequence of events involving two groups, A and B:

  1. A's listener is invoked by thread A.
  2. Thread A sets dirty = true.
  3. Thread A sets dirty = false.
  4. Thread A builds newEndpointsA.
    final ImmutableList.Builder<Endpoint> newEndpoints = ImmutableList.builder();
    for (EndpointGroup endpointGroup : endpointGroups) {
    newEndpoints.addAll(endpointGroup.endpoints());
    }
  5. B's listener is invoked by thread B.
  6. Thread B sets dirty = true.
  7. Thread B sets dirty = false.
  8. Thread B builds newEndpointsB (which includes latest data from both A and B) and sets it.
  9. Thread A sets the newEndpointsA, which does not contain updates from B.

This results in stale endpoints being set, despite a newer state already being computed and applied.

Modifications:

  • Fixed the race condition in CompositeEndpointGroup.

Result:

  • CompositeEndpointGroup now handles concurrent updates correctly.

Motivation:

Explain why you're making this change and what problem you're trying to solve.

Modifications:

  • List the modifications you've made in detail.

Result:

  • Closes #. (If this resolves the issue.)
  • Describe the consequences that a user will face after this PR is merged.

Motivation:
`CompositeEndpointGroup` has a race condition that can occur when multiple child `EndpointGroup`s notify listeners at the same time from different threads.
Consider the following sequence of events involving two groups, A and B:

1. A's listener is invoked by thread A. https://github.com/line/armeria/blob/2367de299f71478bd93a618bd40b70a3676656c0/core/src/main/java/com/linecorp/armeria/client/endpoint/CompositeEndpointGroup.java#L64
2. Thread A sets `dirty = true`. https://github.com/line/armeria/blob/2367de299f71478bd93a618bd40b70a3676656c0/core/src/main/java/com/linecorp/armeria/client/endpoint/CompositeEndpointGroup.java#L65
3. Thread A sets `dirty = false`. https://github.com/line/armeria/blob/2367de299f71478bd93a618bd40b70a3676656c0/core/src/main/java/com/linecorp/armeria/client/endpoint/CompositeEndpointGroup.java#L89
4. Thread A builds `newEndpointsA`. https://github.com/line/armeria/blob/2367de299f71478bd93a618bd40b70a3676656c0/core/src/main/java/com/linecorp/armeria/client/endpoint/CompositeEndpointGroup.java#L95-L98
5. B's listener is invoked by thread B.
6. Thread B sets `dirty = true`.
7. Thread B sets `dirty = false`.
8. Thread B builds `newEndpointsB` (which includes latest data from both A and B) and sets it.
9. Thread A sets the `newEndpointsA`, which does not contain updates from B. https://github.com/line/armeria/blob/2367de299f71478bd93a618bd40b70a3676656c0/core/src/main/java/com/linecorp/armeria/client/endpoint/CompositeEndpointGroup.java#L100

This results in stale endpoints being set, despite a newer state already being computed and applied.

Modifications:
- Fixed the race condition in `CompositeEndpointGroup`.

Result:
- `CompositeEndpointGroup` now handles concurrent updates correctly.
@minwoox minwoox added the defect label Apr 25, 2025
@minwoox minwoox added this to the 1.33.0 milestone Apr 25, 2025
Copy link
Contributor

@jrhee17 jrhee17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation looks good in terms of correctness.

Alternatively, I think a lock based implementation could help avoid comparing endpoint lists which might be large.

@minwoox
Copy link
Contributor Author

minwoox commented May 9, 2025

a lock based implementation

I wanted to avoid using a lock since we don't know what's inside in endpointGroup.endpoints().

avoid comparing endpoint lists which might be large.

CAS uses the references when comparing, so it's no big deal. 😉

Copy link

codecov bot commented May 11, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 0.00%. Comparing base (8150425) to head (76f7d4a).
Report is 68 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #6220       +/-   ##
============================================
- Coverage     74.46%       0   -74.47%     
============================================
  Files          1963       0     -1963     
  Lines         82437       0    -82437     
  Branches      10764       0    -10764     
============================================
- Hits          61385       0    -61385     
+ Misses        15918       0    -15918     
+ Partials       5134       0     -5134     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

newEndpointsBuilder.addAll(endpointGroup.endpoints());
}
final List<Endpoint> newEndpoints = newEndpointsBuilder.build();
if (merged.compareAndSet(oldEndpoints, newEndpoints)) {
Copy link
Contributor

@ikhoon ikhoon May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because ImmutableList.of() returns a singleton instance, rebuildEndpoints may update outdated endpoints in rare cases.

  1. Thread A updates endpointA.
    • Thread A prepares to set list(endpointA) in L93
  2. Thread B removes endpointA
    • Thread B sets empty -> empty in L94
  3. Thread A sets empty -> list(endpointA) in L94
  4. merged is list(endpointA) but the actual endpoints in endpointGroups are empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! Updated. 😉

Copy link
Contributor

@ikhoon ikhoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! 👍 👍

@ikhoon ikhoon merged commit 7f3baaf into line:main May 28, 2025
12 of 14 checks passed
@minwoox minwoox deleted the race_composite branch May 28, 2025 05:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants