Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash in Aws::Utils::Threading::PooledThreadExecutor #1776

Closed
2 tasks done
fboranek opened this issue Sep 22, 2021 · 7 comments
Closed
2 tasks done

crash in Aws::Utils::Threading::PooledThreadExecutor #1776

fboranek opened this issue Sep 22, 2021 · 7 comments
Labels
bug This issue is a bug. p2 This is a standard priority issue pending-release This issue will be fixed by an approved PR that hasn't been released yet.

Comments

@fboranek
Copy link

fboranek commented Sep 22, 2021

Confirm by changing [ ] to [x] below to ensure that it's a bug:

Describe the bug
When the application destroy shared pointer to PooledThreadExecutor and the executor has still some running tasks it will lead to crash.

SDK version number
1.8.185

Platform/OS/Hardware/Device
Linux / Debian Buster

To Reproduce (observed behavior)
Consider following code

{
  {
    std::shared_ptr<Aws::Utils::Threading::PooledThreadExecutor> executor;
    // initialize executor
    std::shared_ptr<Aws::S3::S3Client> s3Client;
    // initialize client
    Aws::Transfer::TransferManagerConfiguration transfer_config(executor.get());
    transfer_config.s3Client = s3Client;
    // just to reproduce the problem which rarely happen
    transfer_config.transferStatusUpdatedCallback =
        [&](const Aws::Transfer::TransferManager*,
            const std::shared_ptr<const Aws::Transfer::TransferHandle>& handle) {
            if (handle->GetStatus() == Aws::Transfer::TransferStatus::COMPLETED)
                std::this_thread::sleep_for(std::chrono::seconds(30));
        };
    auto transferManager = Aws::Transfer::TransferManager::Create(transfer_config);
    auto handle = transferManager->UploadFile(/* args */);
    handle->WaitUntilFinished();
  }
  std::this_thread::sleep_for(std::chrono::seconds(60));
}

which can lead to the appended crash.

The main problem is WaitUntilFinished() is notified from lambda in the executor. Particularly from TransferManager::Handle PutObjectResponse, so it stops blocking. All shared pointer are destroyed, but it doesn't mean that also Instances are destroyed. And here is a race condition in PooledThreadExecutor.

void ThreadTask::MainTaskRunner()
{
    while (m_continue)
    {        
        while (m_continue && m_executor.HasTasks())
        {      
            auto fn = m_executor.PopTask();
            if(fn)
            {
                (*fn)();  // the lambda keeps shared pointer of `TransferManager` and indirectly also `PooledThreadExecutor`
                Aws::Delete(fn);               
            }
        }
     
        if(m_continue)
        {
            m_executor.m_sync.WaitOne();
        }
    }
}
  • if a task with lambda has been destroyed in Aws::Delete(fn); then refcount of shared pointer PooledThreadExecutor is 1 and it will be destroyed in main thread - this is correct behavior
  • if main thread leaves block before then refcount of shared pointer PooledThreadExecutor is higher then 1 and it keeps living in a lamda. Later, deleting the lambda triggers also destroying the executor. Destructor of executor blocks calling thread, because it join all his threads. But her the calling thread is from task and tries to join him self. Yes, in coredump the thread number is the same. The exception rised from std::thread::join is std::system_error pthread_join error : Resource deadlock avoided
ThreadTask::~ThreadTask()
{
    StopProcessingWork();
    m_thread.join();
}

Expected behavior

I don't see any fix except a public function in PooledThreadExecutor which will block until the executor is fully drained. E.g.

{
  {
    std::shared_ptr<Aws::Utils::Threading::PooledThreadExecutor> executor;
    // initialize executor
    ...
    handle->WaitUntilFinished();
    executor->WaitUntilStopped();
  }
  std::this_thread::sleep_for(std::chrono::seconds(60));
}

or any other method to safely destroy the executor.

Additional context

backtrace from coredump

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fab7b5d4535 in __GI_abort () at abort.c:79
#2  0x00007fab7b99c983 in __gnu_cxx::__verbose_terminate_handler () at ../../../../src/libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00007fab7b9a28c6 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:47
#4  0x00007fab7b9a2901 in std::terminate () at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:57
#5  0x00007fab7b9a2b34 in __cxxabiv1::__cxa_throw (obj=obj@entry=0x7fab60988a80, tinfo=tinfo@entry=0x7fab7ba87ae0 <typeinfo for std::system_error>, 
    dest=dest@entry=0x7fab7b9cb8c0 <std::system_error::~system_error()>) at ../../../../src/libstdc++-v3/libsupc++/eh_throw.cc:95
#6  0x00007fab7b99ea55 in std::__throw_system_error (__i=35) at /build/gcc-8-Ev0Tjh/gcc-8-8.3.0/build/x86_64-linux-gnu/libstdc++-v3/include/ext/new_allocator.h:86
#7  0x00007fab7b9cbd6c in std::thread::join (this=this@entry=0x5568078ad650) at ../../../../../src/libstdc++-v3/src/c++11/thread.cc:113
#8  0x0000556805b9fff2 in Aws::Utils::Threading::ThreadTask::~ThreadTask (this=0x5568078ad640, __in_chrg=<optimized out>)
    at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-core/source/utils/threading/ThreadTask.cpp:19
#9  0x0000556805ba0079 in Aws::Delete<Aws::Utils::Threading::ThreadTask> (pointerToT=0x5568078ad640)
    at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-core/include/aws/core/utils/memory/AWSMemory.h:94
#10 Aws::Utils::Threading::PooledThreadExecutor::~PooledThreadExecutor (this=0x5568078ab6e0, __in_chrg=<optimized out>)
    at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-core/source/utils/threading/Executor.cpp:96
#11 0x0000556805a2dcc7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5568078ab6d0) at /usr/include/c++/8/bits/shared_ptr_base.h:148
#12 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5568078ab6d0) at /usr/include/c++/8/bits/shared_ptr_base.h:148
#13 0x0000556805adcf93 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x5568078aec60, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#14 std::__shared_ptr<Aws::Utils::Threading::Executor, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x5568078aec58, __in_chrg=<optimized out>)
    at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#15 std::shared_ptr<Aws::Utils::Threading::Executor>::~shared_ptr (this=0x5568078aec58, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103
#16 Aws::S3::S3Client::~S3Client (this=0x5568078aeb00, __in_chrg=<optimized out>) at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-s3/source/S3Client.cpp:161
#17 0x0000556805a2dcc7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5568078aeaf0) at /usr/include/c++/8/bits/shared_ptr_base.h:148
#18 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5568078aeaf0) at /usr/include/c++/8/bits/shared_ptr_base.h:148
#19 0x0000556805a702c5 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x5568078f9730, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#20 std::__shared_ptr<Aws::S3::S3Client, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x5568078f9728, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#21 std::shared_ptr<Aws::S3::S3Client>::~shared_ptr (this=0x5568078f9728, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103
#22 Aws::Transfer::TransferManagerConfiguration::~TransferManagerConfiguration (this=0x5568078f9728, __in_chrg=<optimized out>)
    at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-transfer/include/aws/transfer/TransferManager.h:38
#23 Aws::Transfer::TransferManager::~TransferManager (this=0x5568078f96a0, __in_chrg=<optimized out>)
    at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-transfer/source/transfer/TransferManager.cpp:68
#24 0x0000556805a2dcc7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5568078f9690) at /usr/include/c++/8/bits/shared_ptr_base.h:148
#25 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5568078f9690) at /usr/include/c++/8/bits/shared_ptr_base.h:148
#26 0x0000556805a66f14 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fab6014c9c8, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#27 std::__shared_ptr<Aws::Transfer::TransferManager, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fab6014c9c0, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#28 std::shared_ptr<Aws::Transfer::TransferManager>::~shared_ptr (this=0x7fab6014c9c0, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103
#29 Aws::Transfer::TransferManager::<lambda(const Aws::Http::HttpRequest*, long long int)>::~<lambda> (this=0x7fab6014c9c0, __in_chrg=<optimized out>)
    at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-transfer/source/transfer/TransferManager.cpp:457
#30 std::_Function_base::_Base_manager<Aws::Transfer::TransferManager::DoSinglePartUpload(const std::shared_ptr<std::basic_iostream<char> >&, const std::shared_ptr<Aws::Transfer::TransferHandle>&)::<lambda(const Aws::Http::HttpRequest*, long long int)> >::_M_destroy (__victim=...) at /usr/include/c++/8/bits/std_function.h:188
#31 std::_Function_base::_Base_manager<Aws::Transfer::TransferManager::DoSinglePartUpload(const std::shared_ptr<std::basic_iostream<char> >&, const std::shared_ptr<Aws::Transfer::TransferHandle>&)::<lambda(const Aws::Http::HttpRequest*, long long int)> >::_M_manager(std::_Any_data &, const std::_Any_data &, std::_Manager_operation) (__dest=..., __source=..., __op=<optimized out>)
    at /usr/include/c++/8/bits/std_function.h:212
#32 0x0000556805ba5dd1 in std::_Function_base::~_Function_base (this=0x7fab6098a650, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/std_function.h:257
#33 std::function<void (Aws::Http::HttpRequest const*, long long)>::~function() (this=0x7fab6098a650, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/std_function.h:370
#34 Aws::AmazonWebServiceRequest::~AmazonWebServiceRequest (this=0x7fab6098a608, __in_chrg=<optimized out>)
    at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-core/include/aws/core/AmazonWebServiceRequest.h:43
#35 Aws::AmazonStreamingWebServiceRequest::~AmazonStreamingWebServiceRequest (this=0x7fab6098a608, __in_chrg=<optimized out>)
    at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-core/source/AmazonStreamingWebServiceRequest.cpp:11
#36 0x0000556805b09b11 in Aws::S3::S3Client::<lambda()>::~<lambda> (this=0x7fab6098a600, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/std_function.h:257
#37 std::_Bind<Aws::S3::S3Client::PutObjectAsync(const Aws::S3::Model::PutObjectRequest&, const PutObjectResponseReceivedHandler&, const std::shared_ptr<const Aws::Client::AsyncCallerContext>&) const::<lambda()>()>::~_Bind (this=0x7fab6098a600, __in_chrg=<optimized out>) at /usr/include/c++/8/functional:386
--Type <RET> for more, q to quit, c to continue without paging--
#38 std::_Function_base::_Base_manager<std::_Bind<Aws::S3::S3Client::PutObjectAsync(const Aws::S3::Model::PutObjectRequest&, const PutObjectResponseReceivedHandler&, const std::shared_ptr<const Aws::Client::AsyncCallerContext>&) const::<lambda()>()> >::_M_destroy (__victim=...) at /usr/include/c++/8/bits/std_function.h:188
#39 std::_Function_base::_Base_manager<std::_Bind<Aws::S3::S3Client::PutObjectAsync(const Aws::S3::Model::PutObjectRequest&, const PutObjectResponseReceivedHandler&, const std::shared_ptr<const Aws::Client::AsyncCallerContext>&) const::<lambda()>()> >::_M_manager(std::_Any_data &, const std::_Any_data &, std::_Manager_operation) (__dest=..., __source=..., __op=<optimized out>)
    at /usr/include/c++/8/bits/std_function.h:212
#40 0x0000556805b9eb23 in std::_Function_base::~_Function_base (this=0x7fab608bbda0, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/std_function.h:257
#41 std::function<void ()>::~function() (this=0x7fab608bbda0, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/std_function.h:370
#42 Aws::Delete<std::function<void ()> >(std::function<void ()>*) (pointerToT=0x7fab608bbda0)
    at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-core/include/aws/core/utils/memory/AWSMemory.h:101
#43 Aws::Utils::Threading::ThreadTask::MainTaskRunner (this=0x5568078ad640) at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-core/source/utils/threading/ThreadTask.cpp:32
#44 0x00007fab7b9cbb2f in std::execute_native_thread_routine (__p=0x5568078b0740) at ../../../../../src/libstdc++-v3/src/c++11/thread.cc:80
#45 0x00007fab7c0c9fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#46 0x00007fab7b6ab4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
@fboranek fboranek added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Sep 22, 2021
@jmklix
Copy link
Member

jmklix commented Jun 16, 2022

Thanks for the detailed write up and PR. I'll see if we can get it reviewed and merged.

@jmklix jmklix added pending-release This issue will be fixed by an approved PR that hasn't been released yet. and removed needs-triage This issue or PR still needs to be triaged. labels Jun 16, 2022
@meimunchi
Copy link

Hello, is there an update on this bug fix? I am seeing the same issue. Thanks!

@SergeyRyabinin
Copy link
Contributor

Hi,
A new blocking wait for completion method for the TransferManager was merged: #2291

Please call pTransferManager->WaitUntilAllFinished() once you've finished the file transfer, or before leaving the scope of executor, or before the ShutdownAPI() call.

It does not fix the original issue reported here though, we will also check the original/root issue of PooledThreadExecutor cyclic references.

@meimunchi
Copy link

Noted, thank you for the update

@fboranek
Copy link
Author

It does not fix the original issue reported here though, we will also check the original/root issue of PooledThreadExecutor cyclic references.

I think, it cannot be solved otherwise than by suggest MR.

The main issue is passing pool as shared pointer. Then it cannot be guaranteed lifetime of object in main thread.
This example is simplified architecture of the PooledThreadExecutor.

int main() {
    struct Pool {
        std::thread thread;

        ~Pool()
        {
            if (thread.joinable()) thread.join();
            std::cout << "~Pool() finished\n";
        }
    };

    std::cout << "1. simple\n";
    {
        auto pool = std::make_shared<Pool>();
        pool->thread = std::thread{[=]() {
            std::this_thread::sleep_for(std::chrono::seconds{5});
            std::cout << "Task finished\n";
        }};
    }

    std::cout << "2. with copy\n";
    {
        auto pool = std::make_shared<Pool>();
        pool->thread = std::thread{[=]() {
            auto copy = pool;
            std::this_thread::sleep_for(std::chrono::seconds{5});
            std::cout << "Task finished\n";
        }};
    }
    std::cout << std::flush;
}

with output

1. simple
Task finished
~Pool() finished
2. with copy

The second thread was still running when the process ended because i copied shared pointer into task.
It can be seen, that it cannot be safely used. You are not able to prevent similar using. The real owner must ensure call something like poll->join() before leaving scope where he expected the pool will be destroyed.

@jmklix jmklix added the p2 This is a standard priority issue label Mar 8, 2023
@jmklix
Copy link
Member

jmklix commented Feb 12, 2024

This should be fixed with this merged PR: #2787

@jmklix jmklix closed this as completed Feb 12, 2024
Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p2 This is a standard priority issue pending-release This issue will be fixed by an approved PR that hasn't been released yet.
Projects
None yet
Development

No branches or pull requests

4 participants