Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] deps: Bump build images -> e21298a (llvm -> 18) #38093

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

dependency-envoy[bot]
Copy link
Contributor

Created by Envoy dependency bot

Signed-off-by: dependency-envoy[bot] <148525496+dependency-envoy[bot]@users.noreply.github.com>

@repokitteh-read-only repokitteh-read-only bot added workflows:untested deps Approval required for changes to Envoy's external dependencies labels Jan 17, 2025
Copy link

CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to (bazel/.*repos.*\.bzl)|(bazel/dependency_imports\.bzl)|(api/bazel/.*\.bzl)|(.*/requirements\.txt)|(.*\.patch).
envoyproxy/dependency-shepherds assignee is @mattklein123

🐱

Caused by: #38093 was synchronize by phlax.

see: more, trace.

@phlax phlax changed the title deps: Bump build images -> e21298a [WIP] deps: Bump build images -> e21298a Jan 17, 2025
@phlax phlax marked this pull request as draft January 17, 2025 18:09
@phlax
Copy link
Member

phlax commented Jan 17, 2025

cc @krinkinmu

@phlax phlax changed the title [WIP] deps: Bump build images -> e21298a [WIP] deps: Bump build images -> e21298a (llvm -> 18) Jan 17, 2025
@phlax
Copy link
Member

phlax commented Jan 17, 2025

 external/com_github_google_tcmalloc/tcmalloc/page_allocator_interface.h:83:46: error: returning variable 'info_' by reference requires holding mutex 'pageheap_lock' [-Werror,-Wthread-safety-reference-return]
   83 |   const PageAllocInfo& info() const { return info_; }
      |                                              ^
1 error generated.

https://github.com/envoyproxy/envoy/actions/runs/12834491248/job/35791937879#step:17:601

@phlax phlax force-pushed the dependency-envoy/build-image/latest branch from 004545d to 8ff9eaf Compare January 20, 2025 13:12
@phlax
Copy link
Member

phlax commented Jan 20, 2025

In file included from source/common/stats/thread_local_store.cc:1:
./source/common/stats/thread_local_store.h:425:14: error: returning variable 'central_cache_' by reference requires holding mutex 'parent_.lock_' [-Werror,-Wthread-safety-reference-return]
  425 |       return central_cache_;
      | 

https://github.com/envoyproxy/envoy/actions/runs/12869653512/job/35878982913#step:17:796

@phlax phlax force-pushed the dependency-envoy/build-image/latest branch 2 times, most recently from 7251f97 to 87d7d85 Compare January 23, 2025 12:40
@@ -19,6 +19,8 @@ licenses(["notice"]) # Apache 2

package(default_visibility = ["//visibility:public"])

CACHE_SILO_KEY = "llvm-18"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this somewhere shared

@phlax phlax force-pushed the dependency-envoy/build-image/latest branch 4 times, most recently from 4a94502 to 2ac2167 Compare January 24, 2025 11:15
dependency-envoy bot and others added 6 commits January 27, 2025 11:07
Signed-off-by: dependency-envoy[bot] <148525496+dependency-envoy[bot]@users.noreply.github.com>

Signed-off-by: Ryan Northey <ryan@synca.io>
Signed-off-by: Ryan Northey <ryan@synca.io>
Signed-off-by: Ryan Northey <ryan@synca.io>
Signed-off-by: Ryan Northey <ryan@synca.io>
Signed-off-by: Ryan Northey <ryan@synca.io>
When I tried to build Envoy with Clang-18 I hit an issue that Clang
thread safety analizer does not like the fact that we are returning
a reference to a protected member (central_cache_) from
centralCacheLockHeld method.

While I do think that the code is correct, when looking at the thread
safety annotations I think we could do a little bit better.

Currently, centralCacheLockHeld is annotated with
ABLS_ASSERT_EXCLUSIVE_LOCK. My understanding is that this annotation
should be used on functions that during runtime check that the right
lock is held and fail if it's not the case. centralCacheLockHeld
currently does not actually check that the lock is held - this seems
somewhat misleading and I don't think that thread safety analysis
should derive anything from this annotation TBH, as there is no runtime
check present there.

We could add a runtime check to the function, but unfortunately it will
not be enough to address Clang's warning (see
llvm/llvm-project#123512). Besides I think
that we can do slightly better.

This change replaces ABLS_ASSERT_EXCLUSIVE_LOCK with
ABSL_EXCLUSIVE_LOCKS_REQUIRED, to let Clang thread safety analysis know
during compilation time that this function should be called under a
lock.

That change triggered a few more warnings in various places that call
into centralCacheLockHeld. In simple cases just adding
ABSL_EXCLUSIVE_LOCKS_REQUIRED to the functions that call
centralCacheLockHeld was enough.

There were a couple of more complicated cases that Clang could not
figure out because it does not support aliasing (i.e., when the same
mutex is known under different names, Clang cannot always figure out
that different names refer to the same lock). To deal with those
cases I added assertHeld method with ABLS_ASSERT_EXCLUSIVE_LOCK
annotation to the lock implementation and used it to help Clang to
figure out what locks are held.

All-in-all, I think relying on ABSL_EXCLUSIVE_LOCKS_REQUIRED is slightly
better and it addresses the warning for me as well.

Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com>

Signed-off-by: Ryan Northey <ryan@synca.io>
Those return references to internal protected members, so ideally all
the callers should acquire a lock before calling those.

However, all the tests that use FakeStream or one of its derivatives
cannot really acquire the right lock because it's a protected member
of the class.

We got away with this so far for a few reasons:

1. Clang thread safety annotations didn't detect this problematic
   pattern in the clang-14 that we are currently using (potentially
   because those methods actually acquired locks, even though those
   locks didn't actually protect much).
2. The locks are really only needed to synchronize all the waitForX
   methods, accesors methods like body(), headers() and trailers() are
   called in tests after the appropriate waitForX method was called.

Disabling thread safety annotations for these methods does not actually
make anything worse, because the existing implementation aren't thread
safe anyways, however here are a few alternatives to disabling those
that I considered and rejected at the moment:

1. Return copies of body, headers and trailers instead of references,
   create those copies under a lock - that would be the easiest way to
   let compiler know that the code is fine, but all three methods return
   abstract classes and currently there is no easy way to copy them
   (that's not to say, that copying is impossible in principle);
2. Expose the lock and require all the callers acquire it - this was my
   first idea of how to fix the issue, but FakeStream (and it's
   derivatives) is used quite a lot in tests, so this change will get
   quite invasive.

Because it does not seem like we really need to lock those methods in
practice and given that alternatives to disabling thread safety analysis
on those are quite invasive, I figured I can just silence the compiler
in this case.

Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com>

Signed-off-by: Ryan Northey <ryan@synca.io>
This change contains multiple MSAN fixes detected by clang-18. They all
in different places, but all of them are caused by a similar issue.

In C++ members of the class are always created in the same order in
which they are declared in the class. And they are always destroyed in
reverse order. So the first memeber of the class is created first and
destroyed last.

When there are interdependencies between the members of a class the
order of destruction becomes important. We typically don't want to
access data of the already destroyed members in destructors of the
members that are being destroyed.

MSAN on clang-18 detected a bunch of cases where we access some fields
after they were destroyed. In all but one case, issues happens in tests
only and not in the Envoy binary. There is one case where a similar
problem occurs in regular Envoy code, but this issue is benign, because
we are accessing an atomic variable that does not trully get destroyed.

So, all-in-all, these are not serious issues, but we want to fix it to
make clang MSAN tests happy on new LLVM toolchain.

I also sharded a few tests, because it was the only way I could actually
run them through the end within given timeouts on my workstation.
Hopefully that's not a problem.

Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com>

Signed-off-by: Ryan Northey <ryan@synca.io>
@phlax phlax force-pushed the dependency-envoy/build-image/latest branch from 2ac2167 to 35c5d66 Compare January 27, 2025 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deps Approval required for changes to Envoy's external dependencies workflows:untested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants