Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loggers run beyond the usable lifetime of CRT #2409

Closed
jeking3 opened this issue Mar 29, 2023 · 9 comments
Closed

Loggers run beyond the usable lifetime of CRT #2409

jeking3 opened this issue Mar 29, 2023 · 9 comments
Labels
bug This issue is a bug. p2 This is a standard priority issue pending-release This issue will be fixed by an approved PR that hasn't been released yet.

Comments

@jeking3
Copy link
Contributor

jeking3 commented Mar 29, 2023

Describe the bug

#1996 a change was made to shutdown the CRT before shutting down both loggers. In general, the shutdown sequence should always be the exact reverse of the init sequence to avoid lifetime issues, otherwise there is an architectural issue to solve. If the CRT logs during shutdown, it needs to be smart enough not to fall over if there is no logger. I believe the fix for the issue observed was performed at the wrong code layer.

If one is using CUSTOM_MEMORY_MANAGEMENT, when the CRT is shutdown, the custom memory allocator is invalidated. At that time, the logging thread is still active and may call for an allocation.

The following assertion from aws-c-common demonstrates this behavior. Placed log message wrappers around the shutdown calls. There is no message indicating logging was shut down. "Shutting Down" is logged immediately before we call ShutdownAPI. The fix is to restore the original ordering.

[INFO] Libas3Test.test[51845]: main(SdkMgr.cpp:89):  AwsSdkCpp Shutting Down
Fatal error condition occurred in /opt/pkg-cache/packages/Aws-c-common/Aws-c-common-0.2.x.141369.0/generic-flavor/src/source/allocator.c:121: allocator != ((void *)0)
Exiting Application
################################################################################
Stack trace:
################################################################################
lib/libaws-c-common.so.1(aws_backtrace_print+0x4f) [0x7f5d3a33169f]
lib/libaws-c-common.so.1(aws_fatal_assert+0x44) [0x7f5d3a31d1d4]
lib/libaws-c-common.so.1(aws_mem_acquire+0x37) [0x7f5d3a31b5b7]
Libas3Test.test(_ZN3Aws3Crt12StlAllocatorIcE8allocateEmPKv+0x33) [0x65bc4d]
Libas3Test.test(_ZNSt16allocator_traitsIN3Aws3Crt12StlAllocatorIcEEE8allocateERS3_m+0x31) [0x65796b]
Libas3Test.test(_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcEN3Aws3Crt12StlAllocatorIcEEE9_M_createERmm+0x110) [0x65bd68]
Libas3Test.test(_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcEN3Aws3Crt12StlAllocatorIcEEE9_M_mutateEmmPKcm+0x95) [0x65c011]
Libas3Test.test(_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcEN3Aws3Crt12StlAllocatorIcEEE10_M_replaceEmmPKcm+0x351) [0x657d67]
lib/libaws-cpp-sdk-core.so(+0xfd4f7) [0x7f5d3a4474f7]
lib/libaws-cpp-sdk-core.so(+0xfdabf) [0x7f5d3a447abf]
lib/libaws-cpp-sdk-core.so(_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJPFvPN3Aws5Utils7Logging16DefaultLogSystem22LogSynchronizationDataERKSt10shared_ptrISoERKNSt7__cxx1112basic_stringIcSt11char_traitsIcENS3_3Crt12StlAllocatorIcEEEEbES8liib/libstdc++.so.6(+0xe71a0) [0x7f5d388bb1a0] 

Expected Behavior

The CRT logger's lifetime cannot exceed the CRT's lifetime.

No assertion.

Current Behavior

The logger thread is running beyond the CRT lifetime and causing an error.

Reproduction Steps

Difficult to reproduce, timing related.

Possible Solution

Revert the change.

Additional Information/Context

Should provide a unit test that does init/shutdown many times in a row, in the same executable, to prove the behavior is correct.

AWS CPP SDK version used

1.11.47

Compiler and Version used

9.5.0

Operating System and version

Linux (AL2)

@jeking3 jeking3 added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Mar 29, 2023
@jeking3

This comment was marked as outdated.

@jmklix
Copy link
Member

jmklix commented Mar 30, 2023

Thanks for making a PR to fix this. Reviewing the PR

@jmklix jmklix added pending-release This issue will be fixed by an approved PR that hasn't been released yet. p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Mar 30, 2023
@jeking3

This comment was marked as outdated.

@jeking3

This comment was marked as outdated.

@jeking3
Copy link
Contributor Author

jeking3 commented Apr 3, 2023

I canceled the PR, it was insufficient, but the issue stands. Using CUSTOM_MEMORY_MANAGEMENT and the change in #1996 are incompatible.

@jeking3
Copy link
Contributor Author

jeking3 commented Apr 24, 2023

Here's the fix we're using in production for this issue, in case it's helpful to anyone.

diff --git a/src/aws-cpp-sdk-core/source/Aws.cpp b/src/aws-cpp-sdk-core/source/Aws.cpp
index 6428e95951..8b21a7e7e4 100644
--- src/aws-cpp-sdk-core/source/Aws.cpp
+++ src/aws-cpp-sdk-core/source/Aws.cpp
@@ -157,19 +157,27 @@ namespace Aws
         Aws::Monitoring::CleanupMonitoring();
         Aws::Internal::CleanupEC2MetadataClient();
         Aws::Net::CleanupNetwork();
+        cJSON_AS4CPP_InitHooks(nullptr);
         Aws::CleanupEnumOverflowContainer();
         Aws::Http::CleanupHttp();
         Aws::Utils::Crypto::CleanupCrypto();

         Aws::Config::CleanupConfigAndCredentialsCacheManager();

-        Aws::Client::CoreErrorsMapper::CleanupCoreErrorsMapper();
-        Aws::CleanupCrt();
+        // https://github.com/aws/aws-sdk-cpp/issues/2409
+        // undo https://github.com/aws/aws-sdk-cpp/issues/1996
+        // CRT is initialized before loggers, so loggers must be cleaned up before CRT
+        // otherwise the loggers might try to allocate after CleanupCrt() and get
+        // undefined behavior.
         if (options.loggingOptions.logLevel != Aws::Utils::Logging::LogLevel::Off)
         {
             Aws::Utils::Logging::ShutdownCRTLogging();
             Aws::Utils::Logging::ShutdownAWSLogging();
         }
+
+        Aws::Client::CoreErrorsMapper::CleanupCoreErrorsMapper();
+        Aws::CleanupCrt();
+
 #ifdef USE_AWS_MEMORY_MANAGEMENT
         if(options.memoryManagementOptions.memoryManager)
         {
diff --git a/src/aws-cpp-sdk-core/source/utils/logging/CRTLogSystem.cpp b/src/aws-cpp-sdk-core/source/utils/logging/CRTLogSystem.cpp
index 81f94d0d3a..f9e1f80f74 100644
--- src/aws-cpp-sdk-core/source/utils/logging/CRTLogSystem.cpp
+++ src/aws-cpp-sdk-core/source/utils/logging/CRTLogSystem.cpp
@@ -75,6 +75,13 @@ namespace Aws
                 if (aws_logger_get() == &m_logger)
                 {
                     aws_logger_set(NULL);
+                    // https://github.com/aws/aws-sdk-cpp/issues/2409
+                    // give other threads that may have grabbed the logger pointer 25ms to use it and lose it
+                    // this is an extremely poor (but cheap) way to mostly defeat the lack of thread safety
+                    // at the aws-c-common layer for logging, as another thread may have just acquired the
+                    // pointer and may decide to log, so we give it enough time to do that before we destroy
+                    // the pointer
+                    aws_thread_current_sleep(25000000);
                     aws_logger_clean_up(&m_logger);
                 }
             }

@jeking3
Copy link
Contributor Author

jeking3 commented Apr 25, 2023

The fix for shutting down logging before CRT should have been sufficient, however I think there are other lifetime issues at play here that need to be tracked down.

@jeking3
Copy link
Contributor Author

jeking3 commented Dec 18, 2023

#2480 resolved this issue effectively.

@jeking3 jeking3 closed this as completed Dec 18, 2023
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p2 This is a standard priority issue pending-release This issue will be fixed by an approved PR that hasn't been released yet.
Projects
None yet
2 participants