Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change thread safety annotations in thread local store code #38113

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

krinkinmu
Copy link
Contributor

@krinkinmu krinkinmu commented Jan 20, 2025

Commit Message:

When I tried to build Envoy with Clang-18 I hit an issue that Clang thread safety analizer does not like the fact that we are returning a reference to a protected member (central_cache_) from centralCacheLockHeld method.

While I do think that the code is correct, when looking at the thread safety annotations I think we could do a little bit better.

Currently, centralCacheLockHeld is annotated with
ABLS_ASSERT_EXCLUSIVE_LOCK. My understanding is that this annotation should be used on functions that during runtime check that the right lock is held and fail if it's not the case. centralCacheLockHeld currently does not actually check that the lock is held - this seems somewhat misleading and I don't think that thread safety analysis should derive anything from this annotation TBH, as there is no runtime check present there.

We could add a runtime check to the function, but unfortunately it will not be enough to address Clang's warning (see
llvm/llvm-project#123512). Besides I think that we can do slightly better.

This change replaces ABLS_ASSERT_EXCLUSIVE_LOCK with ABSL_EXCLUSIVE_LOCKS_REQUIRED, to let Clang thread safety analysis know during compilation time that this function should be called under a lock.

That change triggered a few more warnings in various places that call into centralCacheLockHeld. In simple cases just adding ABSL_EXCLUSIVE_LOCKS_REQUIRED to the functions that call centralCacheLockHeld was enough.

There were a couple of more complicated cases that Clang could not figure out because it does not support aliasing (i.e., when the same mutex is known under different names, Clang cannot always figure out that different names refer to the same lock). To deal with those cases I added assertLocked method with ABSL_ASSERT_EXCLUSIVE_LOCK(scope->parent_.lock_) and ABSL_EXCLUSIVE_LOCKS_REQUIRED(lock_). So on the one hand to call this function Clang should be convinced that lock_ is held, and, on the other hand, it lets Clang know that after this function scope->parent_.lock_ is held.

Given that scope->parent_.lock_ and lock_ are two different names of the same lock, we can avoid doing a runtime check inside the assertLocked method, because if lock_ is held (and Clang is conviced of that) then it follows that scope->parent_.lock_ is also held, because it's the same lock.

All-in-all, I think relying on ABSL_EXCLUSIVE_LOCKS_REQUIRED is slightly better and it addresses the warning for me as well.

Additional Description: Related to the work in #37911
Risk Level: Low
Testing: Tested that things build after the change (plus played around with the code making thread safety analysis warnings to trigger)
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a

+cc @phlax

@krinkinmu krinkinmu marked this pull request as draft January 21, 2025 10:53
@krinkinmu krinkinmu force-pushed the thread-safety-reference-return branch 2 times, most recently from 985574c to 7d31288 Compare January 21, 2025 12:49
@krinkinmu krinkinmu marked this pull request as ready for review January 21, 2025 12:49
@phlax
Copy link
Member

phlax commented Jan 23, 2025

just realized this is not TLS in the ssl sense - its Thread Local Store - reassiging ...

@phlax phlax assigned jmarantz and unassigned ggreenway Jan 23, 2025
@krinkinmu krinkinmu changed the title Change thread safety annotations in TLS code Change thread safety annotations in thread local store code Jan 23, 2025
@krinkinmu
Copy link
Contributor Author

just realized this is not TLS in the ssl sense - its Thread Local Store - reassiging ...

@phlax a good catch, I didn't think about it, changed the name to make it less confusing.

@jmarantz
Copy link
Contributor

IMO the thread annotations are compile-time declarations so the compiler can enforce the intent at compile-time. I don't think you need to use runtime asserts for the thread annotations to make sense. At least that's the way I've been thinking about them.

But I understand the rejection of returning a reference to a guarded member var in a protected method. That's potentially a thread-safety violation from code outside the compilation unit. I wonder if we can resolve this with annotations somehow rather than adding asserts.

RE runtime asserts in this code, I'm wondering if you can say something about whether that compiles to nothing for release builds; it's possible some of this code is performance sensitive.

I'm happy to review this but I wanted to start this dialog first. Thanks!

/wait

@krinkinmu
Copy link
Contributor Author

krinkinmu commented Jan 23, 2025

@jmarantz I can drop the asserts either completely or in release builds only, however, maybe we can consider a slight variation on top of that (in the spirit of resolving it without asserts), e.g,:

  1. In ThreadLocalStoreImpl add a no-op function, let's call it sameLock - it will take ScopeImpl pointer (or scope->parent_.lock_)
  2. Annotate sameLock function with ABSL_ASSERT_EXCLUSIVE_LOCK(scope->parent_.lock_) and ABSL_EXCLUSIVE_LOCKS_REQUIRED(lock_) at the same time.

ABSL_EXCLUSIVE_LOCKS_REQUIRED annotation will ask compiler to check that lock_ is held before calling it and ABSL_ASSERT_EXCLUSIVE_LOCK will let compiler know that scope->parent_.lock is held after that function (and we can be sure of that because lock_ and scope->parent_.lock_ are actually the same thing).

IOW, I'm trying to create a gadget that will effectively let clang know that if lock_ is held it also implies that scope->parent_.lock_ held as well. This way it will be similar to dropping the runtime check (the function will be no-op) with an addition that clang will check that lock_ is held in compile time.

I hope that makes some sense, I will try this approach and see if it works.

@jmarantz
Copy link
Contributor

Sure. I'm OK with the assert calls as you have them if we convince ourselves that during a release build with optimation they disappear completely.

@krinkinmu krinkinmu force-pushed the thread-safety-reference-return branch from 7d31288 to 6e7303e Compare January 24, 2025 10:03
@krinkinmu
Copy link
Contributor Author

@jmarantz I think I didn't explain well what I wanted to do, anyways, I implemented the idea in #38113 (comment), PTAL.

@krinkinmu krinkinmu force-pushed the thread-safety-reference-return branch from 6e7303e to affca48 Compare January 24, 2025 11:10
When I tried to build Envoy with Clang-18 I hit an issue that Clang
thread safety analizer does not like the fact that we are returning
a reference to a protected member (central_cache_) from
centralCacheLockHeld method.

While I do think that the code is correct, when looking at the thread
safety annotations I think we could do a little bit better.

Currently, centralCacheLockHeld is annotated with
ABLS_ASSERT_EXCLUSIVE_LOCK. My understanding is that this annotation
should be used on functions that during runtime check that the right
lock is held and fail if it's not the case. centralCacheLockHeld
currently does not actually check that the lock is held - this seems
somewhat misleading and I don't think that thread safety analysis
should derive anything from this annotation TBH, as there is no runtime
check present there.

We could add a runtime check to the function, but unfortunately it will
not be enough to address Clang's warning (see
llvm/llvm-project#123512). Besides I think
that we can do slightly better.

This change replaces ABLS_ASSERT_EXCLUSIVE_LOCK with
ABSL_EXCLUSIVE_LOCKS_REQUIRED, to let Clang thread safety analysis know
during compilation time that this function should be called under a
lock.

That change triggered a few more warnings in various places that call
into centralCacheLockHeld. In simple cases just adding
ABSL_EXCLUSIVE_LOCKS_REQUIRED to the functions that call
centralCacheLockHeld was enough.

There were a couple of more complicated cases that Clang could not
figure out because it does not support aliasing (i.e., when the same
mutex is known under different names, Clang cannot always figure out
that different names refer to the same lock). To deal with those
cases I added assertHeld method with ABLS_ASSERT_EXCLUSIVE_LOCK
annotation to the lock implementation and used it to help Clang to
figure out what locks are held.

All-in-all, I think relying on ABSL_EXCLUSIVE_LOCKS_REQUIRED is slightly
better and it addresses the warning for me as well.

Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com>
Signed-off-by: Mikhail Krinkin <krinkin.m.u@gmail.com>
@krinkinmu krinkinmu force-pushed the thread-safety-reference-return branch from affca48 to 4356dfa Compare January 24, 2025 11:22
@krinkinmu
Copy link
Contributor Author

/retest weird bazel permission error

@krinkinmu
Copy link
Contributor Author

/retest IO exception when caching successfully built artifacts

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sweet. just one minor nit.

source/common/stats/thread_local_store.h Outdated Show resolved Hide resolved
@jmarantz
Copy link
Contributor

lgtm modulo the one nit.

/wait

Signed-off-by: Mikhail Krinkin <krinkin.m.u@gmail.com>
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign-from @envoyproxy/senior-maintainers

Copy link

@envoyproxy/senior-maintainers assignee is @zuercher

🐱

Caused by: a #38113 (review) was submitted by @jmarantz.

see: more, trace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants