Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue-2409: fix null allocator access in logging during shutdown when using CUSTOM_MEMORY_MANAGEMENT #2462

Closed
wants to merge 1 commit into from

Conversation

jeking3
Copy link
Contributor

@jeking3 jeking3 commented Apr 25, 2023

Issue #, if available:

This fixes #2409.

Description of changes:

In #1996 a change was made to modify the lifetime of the loggers such that they outlive the CRT. When one is using CUSTOM_MEMORY_MANAGEMENT, when the CRT is cleaned up, the allocator is as well. Should there be any threads still using resources, such as the loggers that outlive the CRT, we can hit an assertion in aws-c-common indicating the allocator is null under certain conditions.

Restoring the correct ShutdownAPI order (the exact opposite of InitAPI's sequence) resolves #2409 but re-introduces the root cause that led to #1996 in the first place: #1995. Due to a lack of thread safety in the aws-c-common logging methods, there is no way for us to reliably uninstall the CRT logger and then clean it up, because it might be in use by one of the CRT subsystems like event_loop. For example, see the following output from drd:

==8514== Conflicting load by thread 6 at 0x05664140 size 8
==8514==    at 0x533D930: aws_logger_get (in /workplace/jaseking/aws-sdk-cpp/build/src/aws-cpp-sdk-core/libaws-cpp-sdk-core.so)
==8514==    by 0x52A73CE: aws_event_loop_thread (in /workplace/jaseking/aws-sdk-cpp/build/src/aws-cpp-sdk-core/libaws-cpp-sdk-core.so)
==8514==    by 0x533FA76: thread_fn (in /workplace/jaseking/aws-sdk-cpp/build/src/aws-cpp-sdk-core/libaws-cpp-sdk-core.so)
==8514==    by 0x4C394D4: vgDrd_thread_wrapper (drd_pthread_intercepts.c:491)
==8514==    by 0x617C44A: start_thread (in /usr/lib64/libpthread-2.26.so)
==8514==    by 0x6D5A52E: clone (in /usr/lib64/libc-2.26.so)
==8514== Allocation context: Data section of /workplace/jaseking/aws-sdk-cpp/build/src/aws-cpp-sdk-core/libaws-cpp-sdk-core.so
==8514== Other segment start (thread 1)
==8514==    at 0x4C43F96: pthread_mutex_unlock_intercept (drd_pthread_intercepts.c:1045)
==8514==    by 0x4C43F96: pthread_mutex_unlock (drd_pthread_intercepts.c:1058)
==8514==    by 0x533EEB8: aws_mutex_unlock (in /workplace/jaseking/aws-sdk-cpp/build/src/aws-cpp-sdk-core/libaws-cpp-sdk-core.so)
==8514==    by 0x598012D: CRYPTO_add_lock (in /usr/lib64/libcrypto.so.1.0.2k)
==8514==    by 0x5A36FF1: ??? (in /usr/lib64/libcrypto.so.1.0.2k)
==8514==    by 0x5A36F50: ??? (in /usr/lib64/libcrypto.so.1.0.2k)
==8514==    by 0x5A377EF: ERR_remove_thread_state (in /usr/lib64/libcrypto.so.1.0.2k)
==8514==    by 0x56DED19: ??? (in /usr/lib64/libcurl.so.4.8.0)
==8514==    by 0x56E44E6: ??? (in /usr/lib64/libcurl.so.4.8.0)
==8514==    by 0x569155E: curl_global_cleanup (in /usr/lib64/libcurl.so.4.8.0)
==8514==    by 0x520D370: Aws::Http::DefaultHttpClientFactory::CleanupStaticState() (in /workplace/jaseking/aws-sdk-cpp/build/src/aws-cpp-sdk-core/libaws-cpp-sdk-core.so)
==8514==    by 0x520BD59: Aws::Http::CleanupHttp() (in /workplace/jaseking/aws-sdk-cpp/build/src/aws-cpp-sdk-core/libaws-cpp-sdk-core.so)
==8514==    by 0x5206DC3: Aws::ShutdownAPI(Aws::SDKOptions const&) (in /workplace/jaseking/aws-sdk-cpp/build/src/aws-cpp-sdk-core/libaws-cpp-sdk-core.so)
==8514== Other segment end (thread 1)
==8514==    at 0x4C43F96: pthread_mutex_unlock_intercept (drd_pthread_intercepts.c:1045)
==8514==    by 0x4C43F96: pthread_mutex_unlock (drd_pthread_intercepts.c:1058)
==8514==    by 0x51A67C7: Aws::Utils::Logging::DefaultLogSystem::~DefaultLogSystem() (in /workplace/jaseking/aws-sdk-cpp/build/src/aws-cpp-sdk-core/libaws-cpp-sdk-core.so)
==8514==    by 0x51A9451: Aws::Utils::Logging::InitializeAWSLogging(std::shared_ptr<Aws::Utils::Logging::LogSystemInterface> const&) (in /workplace/jaseking/aws-sdk-cpp/build/src/aws-cpp-sdk-core/libaws-cpp-sdk-core.so)
==8514==    by 0x51A947D: Aws::Utils::Logging::ShutdownAWSLogging() (in /workplace/jaseking/aws-sdk-cpp/build/src/aws-cpp-sdk-core/libaws-cpp-sdk-core.so)
==8514==    by 0x5206DFA: Aws::ShutdownAPI(Aws::SDKOptions const&) (in /workplace/jaseking/aws-sdk-cpp/build/src/aws-cpp-sdk-core/libaws-cpp-sdk-core.so)
==8514==    by 0x456593: InitShutdown_Repeatable_Test::TestBody() (in /workplace/jaseking/aws-sdk-cpp/build/tests/aws-cpp-sdk-core-tests/aws-cpp-sdk-core-tests)
==8514==    by 0x4EA3713: testing::Test::Run() (in /workplace/jaseking/aws-sdk-cpp/build/tests/testing-resources/libtesting-resources.so)
==8514==    by 0x4EA38AF: testing::TestInfo::Run() (in /workplace/jaseking/aws-sdk-cpp/build/tests/testing-resources/libtesting-resources.so)
==8514==    by 0x4EA3C3D: testing::TestSuite::Run() (in /workplace/jaseking/aws-sdk-cpp/build/tests/testing-resources/libtesting-resources.so)
==8514==    by 0x4EA9B53: testing::internal::UnitTestImpl::RunAllTests() (in /workplace/jaseking/aws-sdk-cpp/build/tests/testing-resources/libtesting-resources.so)
==8514==    by 0x4EAA090: testing::UnitTest::Run() (in /workplace/jaseking/aws-sdk-cpp/build/tests/testing-resources/libtesting-resources.so)
==8514==    by 0x4322C5: main (in /workplace/jaseking/aws-sdk-cpp/build/tests/aws-cpp-sdk-core-tests/aws-cpp-sdk-core-tests)

Without underlying support for thread safety, one way to fix this would require an interface change. CRTLogSystemInterface could gain a ShutdownLogging() method, which unhooks the logger from aws_c_common, but does not clean it up. During ShutdownAPI, the loggers would be shutdown, then CRT cleaned up, and now that there are no consumers of the logging subsystem, it would be safe to destroy the global common logger by destructing DefaultCRTLogSystem after CleanupCRT.

Avoiding an interface change, I took another approach. After unhooking DefaultCRTLogSystem from the aws-c-common global logging subsystem, there could be threads that have acquired the logger pointer but have not used it yet. It's ugly, and it's not guaranteed, but a 25ms delay after setting the logger to NULL allows other threads to do their logging business and go on their merry way, making it safe(r) for DefaultCRTLogSystem to clean up m_logger. I'll fully admit that it's a hack, but it's a hack that allowed my team to make forward progress - at least until someone wants to muck with the
CRTLogSystemInterface.

Check all that applies:

  • Did a review by yourself.
  • Added proper tests to cover this PR. (If tests are not applicable, explain.)
  • Checked if this PR is a breaking (APIs have been changed) change. (it is not)
  • Checked if this PR will not introduce cross-platform inconsistent behavior. (it will not)
  • Checked if this PR would require a ReadMe/Wiki update. (it does not)

Check which platforms you have built SDK on to verify the correctness of this PR.

  • Linux
  • Windows
  • Android
  • MacOS
  • IOS
  • Other Platforms

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@SergeyRyabinin
Copy link
Contributor

Hi @jeking3 ,

Thank you for reporting the issue.
I've merged another approach to fix it: #2480 that refactors tests and provides better CRT logging integration/synchronization.
Please let us know if you still experience the problem.

Best regards,
Sergey

@jeking3 jeking3 deleted the issue-2409 branch June 21, 2023 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Loggers run beyond the usable lifetime of CRT
2 participants