Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destructor S3CrtClient::~S3CrtClient() hangs, as of release 1.11.195 #2969

Closed
hunjmes opened this issue May 23, 2024 · 7 comments
Closed

Destructor S3CrtClient::~S3CrtClient() hangs, as of release 1.11.195 #2969

hunjmes opened this issue May 23, 2024 · 7 comments
Labels
bug This issue is a bug. closed-for-staleness p3 This is a minor priority issue

Comments

@hunjmes
Copy link
Contributor

hunjmes commented May 23, 2024

Describe the bug

Symptoms are superficially similar to #2769 , except that issue involved calling Aws::ShutdownAPI(), while an S3CrtClient was still alive.

This bug involves the S3CrtClient::~S3CrtClient() destructor, itself. We see a hang/deadlock inside:

#0  0x00007fcfffe93377 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fcffdc63800 in __gthread_cond_wait (__mutex=<optimized out>, __cond=0x7fcff3f246c8)
    at /local/p4clients/pkgbuild-v3X3O/workspace/build/LibGCC/LibGCC-gcc.238862.0/AL2_x86_64/DEV.STD.PTHREAD/build/private/gcc-src/build/private/src/gcc-11.4.0-build2/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu/bits/gthr-default.h:865
#2  std::__condvar::wait (__m=..., this=0x7fcff3f246c8)
    at /local/p4clients/pkgbuild-v3X3O/workspace/build/LibGCC/LibGCC-gcc.238862.0/AL2_x86_64/DEV.STD.PTHREAD/build/private/gcc-src/build/private/src/gcc-11.4.0-build2/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/std_mutex.h:155
#3  std::condition_variable::wait (this=this@entry=0x7fcff3f246c8, __lock=...)
    at /local/p4clients/pkgbuild-v3X3O/workspace/build/LibGCC/LibGCC-gcc.238862.0/AL2_x86_64/DEV.STD.PTHREAD/build/private/gcc-src/build/private/src/gcc-11.4.0/libstdc++-v3/src/c++11/condition_variable.cc:41
#4  0x00007fd001f4d79b in std::condition_variable::wait<Aws::Utils::Threading::Semaphore::WaitOne()::<lambda()> > (__p=..., __lock=..., this=0x7fcff3f246c8)
    at /opt/brazil-pkg-cache/packages/GCC/GCC-11.x.1214.0/AL2_x86_64/DEV.STD.PTHREAD/build/gcc-11.4.0/include/c++/11.4.0/condition_variable:103
#5  Aws::Utils::Threading::Semaphore::WaitOne (this=0x7fcff3f24690)
    at /local/p4clients/pkgbuild-3Vovk/workspace/build/Aws-sdk-cpp-core/Aws-sdk-cpp-core-1.11.x_StdMemoryManagement.96405.0/AL2_x86_64/DEV.STD.PTHREAD/build/private/aws-sdk-cpp/src/aws-cpp-sdk-core/source/utils/threading/Semaphore.cpp:21
#6  0x00007fd001a0b76f in Aws::S3Crt::S3CrtClient::~S3CrtClient (this=0x7ffd3bf79cb0, __in_chrg=<optimized out>)
    at /opt/brazil-pkg-cache/packages/GCC/GCC-11.x.1214.0/AL2_x86_64/DEV.STD.PTHREAD/build/gcc-11.4.0/include/c++/11.4.0/bits/shared_ptr_base.h:984

Expected Behavior

The S3CrtClient's destructor should not hang. (This was the behavior, as of release 1.11.175.)

Current Behavior

The S3CrtClient's destructor hangs, as of release 1.11.195.

Reproduction Steps

We don't have a formal repro yet. I am trying to nail down the root cause of the hang -- it might be unexpected/incorrect usage of the AWS API, by our client application -- concurrently with filing this bug.

Possible Solution

No response

Additional Information/Context

No response

AWS CPP SDK version used

1.11.195

Compiler and Version used

gcc 11

Operating System and version

Linux 5.10.216-182.855.amzn2int.x86_64

@hunjmes hunjmes added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels May 23, 2024
@hunjmes
Copy link
Contributor Author

hunjmes commented May 23, 2024

Also hangs in release 1.11.328 .

@hunjmes
Copy link
Contributor Author

hunjmes commented May 23, 2024

Looks like function CrtClientShutdownCallback() is not being called, to Release() the S3CrtClient's m_clientShutdownSem semaphore field.

@hunjmes
Copy link
Contributor Author

hunjmes commented May 23, 2024

Problem seems to be that if the caller creates and destroys an S3CrtClient, using RAII pattern, then the constructor creates a semaphore field:

  m_clientShutdownSem = Aws::MakeShared<Threading::Semaphore>(ALLOCATION_TAG, 0, 1);

-- and passes that semaphore to the wrapped Crt Client:

  m_wrappedData.clientShutdownSem = m_clientShutdownSem;
  s3CrtConfig.shutdown_callback = CrtClientShutdownCallback;
  s3CrtConfig.shutdown_callback_user_data = static_cast<void*>(&m_wrappedData);
  m_s3CrtClient = aws_s3_client_new(Aws::get_aws_allocator(), &s3CrtConfig);

Then, inside the destructor:

  aws_s3_client_release(m_s3CrtClient);
  if(m_clientShutdownSem)
  {
    m_clientShutdownSem->WaitOne(); // Wait aws_s3_client shutdown
  }

However, the call to aws_s3_client_release(...) doesn't seem to be releasing the Crt client.

I wonder if there's an implicit assumption, at the Crt layer, that there is always only 1 S3 (crt) client alive, at a time? What's supposed to happen, if the caller does:

{
  ClientConfiguration config;
  S3CrtClient client1(config);
  S3CrtClient client2(config);
...
}

?

@hunjmes
Copy link
Contributor Author

hunjmes commented May 23, 2024

My hanging/failing test is a Google-test "crash test," and the hanging process is the process fork()ed by the Google-test framework. My other unit tests, that don't use the "crash test" framework, do not appear to hang.

I wonder if some "Crt" state that used to be tied to a given S3CrtClient object has now been made static?

@DmitriyMusatkin
Copy link
Contributor

A quick note is that forking can lead to somewhat unexpected results currently when used in conjunction with CRT.
CRT does rely on some global state that can end up being mangled during fork.
This is something on the backlog for the team to improve, but we dont have a concrete ETA for that.
As a workaround, you could try to avoid forking while CPP SDK is in initialized state, i.e. explicitly calling ShutdownAPI before forking the process.

@hunjmes
Copy link
Contributor Author

hunjmes commented May 23, 2024

Confirmed that the proposed workaround, to call ShutdownAPI before forking, works. Thanks!

@jmklix
Copy link
Member

jmklix commented May 24, 2024

@hunjmes Did you have any other questions about this sdk?

@jmklix jmklix added closing-soon This issue will automatically close in 4 days unless further comments are made. p3 This is a minor priority issue and removed needs-triage This issue or PR still needs to be triaged. labels May 24, 2024
@github-actions github-actions bot added closed-for-staleness and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. closed-for-staleness p3 This is a minor priority issue
Projects
None yet
Development

No branches or pull requests

3 participants