Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutdown API Segfault and Valgrind Issues: Aws::Utils::Crypto::CleanupCrypto() + CleanupHttp() #1550

Closed
2 tasks done
afowles opened this issue Dec 30, 2020 · 11 comments
Closed
2 tasks done
Labels
bug This issue is a bug. p2 This is a standard priority issue

Comments

@afowles
Copy link

afowles commented Dec 30, 2020

Describe the bug
SDK Shutdown segfaults and has valgrind memory issues, apparently around CleanupHttp / CleanupCrypto. This issue only seems to happen if a static pointer to a class which inits and cleans up the SDK is used. Curious why this would be the case.

SDK version number
1.8.113

Platform/OS/Hardware/Device
Fedora 33, (GCC) 10.2.1 20201125

To Reproduce (observed behavior)

#include <memory>

#include <aws/core/Aws.h>

class AwsLifetime {
 public:
  AwsLifetime() {
    Aws::InitAPI(options);
  }

  ~AwsLifetime() {
    Aws::ShutdownAPI(options);
  }

 private:
  Aws::SDKOptions options;
};

static std::unique_ptr<AwsLifetime> aws;

int main() {
  aws = std::make_unique<AwsLifetime>();
  return 0;
} 
$ g++ -std=c++17 -Wall -Werror -O2 -laws-cpp-sdk-core -o test test.cpp
$ valgrind --tool=memcheck --leak-check=full ./test

Expected behavior
Runs, clean valgrind output. With SDK version 1.7.323, (GCC) 10.1.1 20200507 Fedora 32

==3600== Memcheck, a memory error detector
==3600== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==3600== Using Valgrind-3.16.0 and LibVEX; rerun with -h for copyright info
==3600== Command: ./test
==3600==
==3600==
==3600== HEAP SUMMARY:
==3600==     in use at exit: 0 bytes in 0 blocks
==3600==   total heap usage: 4,050 allocs, 4,050 frees, 237,867 bytes allocated
==3600==
==3600== All heap blocks were freed -- no leaks are possible
==3600==
==3600== For lists of detected and suppressed errors, rerun with: -s
==3600== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Logs/output
On Fedora 33 with versions described above
First invalid read:

==591656== Memcheck, a memory error detector
==591656== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==591656== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==591656== Command: ./test
==591656==
==591656== Invalid read of size 8
==591656==    at 0x48C8809: Aws::Http::CleanupHttp() (in /usr/lib64/libaws-cpp-sdk-core.so)
==591656==    by 0x490A27C: Aws::ShutdownAPI(Aws::SDKOptions const&) (in /usr/lib64/libaws-cpp-sdk-core.so)
==591656==    by 0x4014A7: std::unique_ptr<AwsLifetime, std::default_delete<AwsLifetime> >::~unique_ptr() (in /home/fedora/test)
==591656==    by 0x4CF6236: __run_exit_handlers (in /usr/lib64/libc-2.32.so)
==591656==    by 0x4CF63DF: exit (in /usr/lib64/libc-2.32.so)
==591656==    by 0x4CDE1E8: (below main) (in /usr/lib64/libc-2.32.so)
==591656==  Address 0x5a87020 is 16 bytes inside a block of size 24 free'd
==591656==    at 0x483A9F5: free (vg_replace_malloc.c:538)
==591656==    by 0x4CF6236: __run_exit_handlers (in /usr/lib64/libc-2.32.so)
==591656==    by 0x4CF63DF: exit (in /usr/lib64/libc-2.32.so)
==591656==    by 0x4CDE1E8: (below main) (in /usr/lib64/libc-2.32.so)
==591656==  Block was alloc'd at
==591656==    at 0x4839809: malloc (vg_replace_malloc.c:307)
==591656==    by 0x48C8730: Aws::Http::InitHttp() (in /usr/lib64/libaws-cpp-sdk-core.so)
==591656==    by 0x49304F3: Aws::InitAPI(Aws::SDKOptions const&) (in /usr/lib64/libaws-cpp-sdk-core.so)
==591656==    by 0x4011A7: main (in /home/fedora/test)
==591656==

Additional context
With Logging on simply running the program

[~]$ ./test
[DEBUG] 2020-12-30 16:01:19.044 FileSystemUtils [139993724766464] Environment value for variable HOME is /home/fedora
[DEBUG] 2020-12-30 16:01:19.045 FileSystemUtils [139993724766464] Home directory is missing the final / appending one to normalize
[DEBUG] 2020-12-30 16:01:19.045 FileSystemUtils [139993724766464] Final Home Directory is /home/fedora/
[INFO] 2020-12-30 16:01:19.045 Aws::Config::AWSConfigFileProfileConfigLoader [139993724766464] Initializing config loader against fileName /home/fedora/.aws/credentials and using profilePrefix = 0
[DEBUG] 2020-12-30 16:01:19.045 FileSystemUtils [139993724766464] Environment value for variable HOME is /home/fedora
[DEBUG] 2020-12-30 16:01:19.045 FileSystemUtils [139993724766464] Home directory is missing the final / appending one to normalize
[DEBUG] 2020-12-30 16:01:19.045 FileSystemUtils [139993724766464] Final Home Directory is /home/fedora/
[INFO] 2020-12-30 16:01:19.045 Aws::Config::AWSConfigFileProfileConfigLoader [139993724766464] Initializing config loader against fileName /home/fedora/.aws/config and using profilePrefix = 1
[DEBUG] 2020-12-30 16:01:19.045 FileSystemUtils [139993724766464] Environment value for variable HOME is /home/fedora
[DEBUG] 2020-12-30 16:01:19.045 FileSystemUtils [139993724766464] Home directory is missing the final / appending one to normalize
[DEBUG] 2020-12-30 16:01:19.045 FileSystemUtils [139993724766464] Final Home Directory is /home/fedora/
[INFO] 2020-12-30 16:01:19.045 Aws::Config::AWSConfigFileProfileConfigLoader [139993724766464] Unable to open config file /home/fedora/.aws/credentials for reading.
[INFO] 2020-12-30 16:01:19.045 Aws::Config::AWSProfileConfigLoader [139993724766464] Failed to reload configuration.
[DEBUG] 2020-12-30 16:01:19.045 FileSystemUtils [139993724766464] Environment value for variable HOME is /home/fedora
[DEBUG] 2020-12-30 16:01:19.045 FileSystemUtils [139993724766464] Home directory is missing the final / appending one to normalize
[DEBUG] 2020-12-30 16:01:19.045 FileSystemUtils [139993724766464] Final Home Directory is /home/fedora/
[INFO] 2020-12-30 16:01:19.045 Aws::Config::AWSConfigFileProfileConfigLoader [139993724766464] Unable to open config file /home/fedora/.aws/config for reading.
[INFO] 2020-12-30 16:01:19.045 Aws::Config::AWSProfileConfigLoader [139993724766464] Failed to reload configuration.
[INFO] 2020-12-30 16:01:19.046 CurlHttpClient [139993724766464] Initializing Curl library with version: 7.71.1, ssl version: OpenSSL/1.1.1i-fips
[DEBUG] 2020-12-30 16:01:19.046 ClientConfiguration [139993724766464] ClientConfiguration will use SDK Auto Resolved profile: [default] if not specified by users.
[WARN] 2020-12-30 16:01:19.046 ClientConfiguration [139993724766464] Retry Strategy will use the default max attempts.
[INFO] 2020-12-30 16:01:19.046 EC2MetadataClient [139993724766464] Creating AWSHttpResourceClient with max connections 2 and scheme http
[INFO] 2020-12-30 16:01:19.046 CurlHandleContainer [139993724766464] Initializing CurlHandleContainer with size 2
[INFO] 2020-12-30 16:01:19.046 CurlHandleContainer [139993724766464] Cleaning up CurlHandleContainer.
Segmentation fault (core dumped)

Any insights would be appreciated

@afowles afowles added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Dec 30, 2020
@KaibaLopez
Copy link
Contributor

hi @afowles ,
Thanks for pointing this out to us! As a quick update, I'm able to reproduce this and started investigating on the root cause for these valgrind errors.

@KaibaLopez KaibaLopez removed the needs-triage This issue or PR still needs to be triaged. label Jan 4, 2021
@KaibaLopez KaibaLopez self-assigned this Jan 4, 2021
@xonatius
Copy link

xonatius commented Oct 21, 2021

Any updates? We are hitting the same issue after aws-sdk update.

The issue seems to have appeared between 1.7.311 and 1.8.159 releases

@bipinmathew
Copy link

Hello, I am using 1.7.108 and am getting the same issue. Here is what I am seeing in valgrind.

==426== Invalid read of size 8                                                                                                       
==426==    at 0x694C10E: Aws::Http::CleanupHttp() (HttpClientFactory.cpp:180)                                                        
==426==    by 0x6911A98: Aws::ShutdownAPI(Aws::SDKOptions const&) (Aws.cpp:119)                                                      
==426==    by 0x67122E4: AwsSource::~AwsSource() (IArchiveHandler.hpp:97)                                                            
==426==    by 0x67171AC: void __gnu_cxx::new_allocator<AwsSource>::destroy<AwsSource>(AwsSource*) (new_allocator.h:124)              
==426==    by 0x6717024: void std::allocator_traits<std::allocator<AwsSource> >::destroy<AwsSource>(std::allocator<AwsSource>&, AwsSo
urce*) (alloc_traits.h:542)                                                                                                          

And later on:

==426== Invalid read of size 4
==426==    at 0x66F3702: __gnu_cxx::__exchange_and_add(int volatile*, int) (atomicity.h:49)
==426==    by 0x66F37AB: __gnu_cxx::__exchange_and_add_dispatch(int*, int) (atomicity.h:82)
==426==    by 0x66F4ABD: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:147)
==426==    by 0x66F43F5: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (shared_ptr_base.h:659)
==426==    by 0x6911C97: std::__shared_ptr<Aws::Http::HttpClientFactory, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (shared_ptr_bas
e.h:925)
==426==    by 0x694CE6F: std::__shared_ptr<Aws::Http::HttpClientFactory, (__gnu_cxx::_Lock_policy)2>::operator=(std::__shared_ptr<Aws
::Http::HttpClientFactory, (__gnu_cxx::_Lock_policy)2>&&) (shared_ptr_base.h:1000)
==426==    by 0x694CADD: std::shared_ptr<Aws::Http::HttpClientFactory>::operator=(std::shared_ptr<Aws::Http::HttpClientFactory>&&) (s

Still later:

==426== Invalid read of size 8
==426==    at 0x6975F65: Aws::Utils::Crypto::CleanupCrypto() (Factories.cpp:655)
==426==    by 0x6911A9D: Aws::ShutdownAPI(Aws::SDKOptions const&) (Aws.cpp:120)
==426==    by 0x67122E4: AwsSource::~AwsSource() (IArchiveHandler.hpp:97)
==426==    by 0x67171AC: void __gnu_cxx::new_allocator<AwsSource>::destroy<AwsSource>(AwsSource*) (new_allocator.h:124)
==426==    by 0x6717024: void std::allocator_traits<std::allocator<AwsSource> >::destroy<AwsSource>(std::allocator<AwsSource>&, AwsSo
urce*) (alloc_traits.h:542)
==426==    by 0x6716E21: std::_Sp_counted_ptr_inplace<AwsSource, std::allocator<AwsSource>, (__gnu_cxx::_Lock_policy)2>::_M_dispose()
 (shared_ptr_base.h:531)

@yuliy
Copy link

yuliy commented Jul 13, 2022

Hello! We're facing the same problem on CentOS 7. Any update?

@mmacdonald86
Copy link

Same issue in 1.9.220 with CentOS7-- gcc 7.3.1 -- believe issue is do to static shared_ptr in HttpClientFactory potentially being cleaned up before other statiic cleanup that can call ShutdownAPI. Probably should avoid statically initialized shared_ptr to avoid these kinds of errors.

@SergeyRyabinin
Copy link
Contributor

Hello all in this issue,

Please note that this use is discouraged and not recommended.
Please refer to the documentation how to initialize/shutdown the SDK: https://docs.aws.amazon.com/sdk-for-cpp/v1/developer-guide/basic-use.html

The reason you observe memory issues by following this design idea

static std::unique_ptr<AwsLifetime> aws;

is that the order of static object destruction is not determined by the C++ standard, and the SDK and it's dependencies are also using static global objects. After your main returns, there will be static variables destruction time and some of SDK's global static variables may be destructed before ShutdownAPI is called, resulting in an undefined behavior.

We know that the usage of static global variables including std:: smart pointers and data classes is considered a bad design and we are considering different ways of improving on this aspect.
We've already made some improvements to get rid of std:: static objects being destructed prematurely/automatically, such as #2268. I suggest to try to update to the latest version of the SDK available (such as 1.11.6).
However, we are not there yet to declare SDK usage such as mentioned above as fully supported.

Best regards,
Sergey

@mmacdonald86
Copy link

@SergeyRyabinin - Can you add to the documentation for ShutdownAPI that calls to it from within the destructor of a static is specifically unsupported (maybe that's already there?)

@westonpace
Copy link

We know that the usage of static global variables including std:: smart pointers and data classes is considered a bad design and we are considering different ways of improving on this aspect.
We've already made some improvements to get rid of std:: static objects being destructed prematurely/automatically, such as #2268. I suggest to try to update to the latest version of the SDK available (such as 1.11.6).
However, we are not there yet to declare SDK usage such as mentioned above as fully supported.

It is encouraging that you are moving in that direction and I hope you can get there. I work on Apache Arrow which uses S3. We have our own static state challenges 😰 and have been moving singletons from static global variables to static local variables that are initialized the first time they are accessed as this gives us a predictable creation (and thus destruction) order.

If this is fixed then we don't have to require users to call Finalize so it would be a nice feature.

@tsarquis88
Copy link

Same problem here trying to call the InitApi and ShutdownAPI methods within a Singleton class.

Any updates on this? Does someone know any workaround?

facebook-github-bot pushed a commit to facebookincubator/velox that referenced this issue Sep 11, 2023
Summary:
Add finalizeS3FileSystem to explicitly teardown the AWS SDK C++.
Velox users need to manually invoke this before exiting an application.
This is because Velox uses a static object to hold the S3 FileSystem instance.
AWS C++ SDK library also uses static global objects in its code.
The order of static object destruction is not defined by the C++ standard.
This could lead to a segmentation fault during the program exit.
Reference: aws/aws-sdk-cpp#1550 (comment)

Remove intializeClient() as this can be done during S3FileSystem construction.

Clarify AWS SDK log level is set during the S3 initialization. Fix test.

Refactor S3FileSystem tests to separate filesystem tests and registration tests.

Pull Request resolved: #6398

Reviewed By: kgpai

Differential Revision: D49114381

Pulled By: pedroerp

fbshipit-source-id: e124d5d5e74ac60f73df729c2deff2392eef6efe
@jmklix
Copy link
Member

jmklix commented Sep 12, 2023

The basic usage of the sdk should look like this, with all of the sdk calls between InitApi and ShutdownAPI.

#include <aws/core/Aws.h>
int main(int argc, char** argv)
{
   Aws::SDKOptions options;
   Aws::InitAPI(options);
   {
      // make your SDK calls here.
   }
   Aws::ShutdownAPI(options);
   return 0;
}

Please let us know if you run into anymore problems with this sdk.

@jmklix jmklix closed this as completed Sep 12, 2023
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

codyschierbeck pushed a commit to codyschierbeck/velox that referenced this issue Sep 27, 2023
Summary:
Add finalizeS3FileSystem to explicitly teardown the AWS SDK C++.
Velox users need to manually invoke this before exiting an application.
This is because Velox uses a static object to hold the S3 FileSystem instance.
AWS C++ SDK library also uses static global objects in its code.
The order of static object destruction is not defined by the C++ standard.
This could lead to a segmentation fault during the program exit.
Reference: aws/aws-sdk-cpp#1550 (comment)

Remove intializeClient() as this can be done during S3FileSystem construction.

Clarify AWS SDK log level is set during the S3 initialization. Fix test.

Refactor S3FileSystem tests to separate filesystem tests and registration tests.

Pull Request resolved: facebookincubator#6398

Reviewed By: kgpai

Differential Revision: D49114381

Pulled By: pedroerp

fbshipit-source-id: e124d5d5e74ac60f73df729c2deff2392eef6efe
codyschierbeck pushed a commit to codyschierbeck/velox that referenced this issue Sep 27, 2023
Summary:
Add finalizeS3FileSystem to explicitly teardown the AWS SDK C++.
Velox users need to manually invoke this before exiting an application.
This is because Velox uses a static object to hold the S3 FileSystem instance.
AWS C++ SDK library also uses static global objects in its code.
The order of static object destruction is not defined by the C++ standard.
This could lead to a segmentation fault during the program exit.
Reference: aws/aws-sdk-cpp#1550 (comment)

Remove intializeClient() as this can be done during S3FileSystem construction.

Clarify AWS SDK log level is set during the S3 initialization. Fix test.

Refactor S3FileSystem tests to separate filesystem tests and registration tests.

Pull Request resolved: facebookincubator#6398

Reviewed By: kgpai

Differential Revision: D49114381

Pulled By: pedroerp

fbshipit-source-id: e124d5d5e74ac60f73df729c2deff2392eef6efe
codyschierbeck pushed a commit to codyschierbeck/velox that referenced this issue Sep 27, 2023
Summary:
Add finalizeS3FileSystem to explicitly teardown the AWS SDK C++.
Velox users need to manually invoke this before exiting an application.
This is because Velox uses a static object to hold the S3 FileSystem instance.
AWS C++ SDK library also uses static global objects in its code.
The order of static object destruction is not defined by the C++ standard.
This could lead to a segmentation fault during the program exit.
Reference: aws/aws-sdk-cpp#1550 (comment)

Remove intializeClient() as this can be done during S3FileSystem construction.

Clarify AWS SDK log level is set during the S3 initialization. Fix test.

Refactor S3FileSystem tests to separate filesystem tests and registration tests.

Pull Request resolved: facebookincubator#6398

Reviewed By: kgpai

Differential Revision: D49114381

Pulled By: pedroerp

fbshipit-source-id: e124d5d5e74ac60f73df729c2deff2392eef6efe
ericyuliu pushed a commit to ericyuliu/velox that referenced this issue Oct 12, 2023
Summary:
Add finalizeS3FileSystem to explicitly teardown the AWS SDK C++.
Velox users need to manually invoke this before exiting an application.
This is because Velox uses a static object to hold the S3 FileSystem instance.
AWS C++ SDK library also uses static global objects in its code.
The order of static object destruction is not defined by the C++ standard.
This could lead to a segmentation fault during the program exit.
Reference: aws/aws-sdk-cpp#1550 (comment)

Remove intializeClient() as this can be done during S3FileSystem construction.

Clarify AWS SDK log level is set during the S3 initialization. Fix test.

Refactor S3FileSystem tests to separate filesystem tests and registration tests.

Pull Request resolved: facebookincubator#6398

Reviewed By: kgpai

Differential Revision: D49114381

Pulled By: pedroerp

fbshipit-source-id: e124d5d5e74ac60f73df729c2deff2392eef6efe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

10 participants