crash in Aws::Utils::Threading::PooledThreadExecutor #1776

fboranek · 2021-09-22T12:08:38Z

Confirm by changing [ ] to [x] below to ensure that it's a bug:

I've gone though Developer Guide and API reference
I've searched for previous similar issues and didn't find any solution

Describe the bug
When the application destroy shared pointer to PooledThreadExecutor and the executor has still some running tasks it will lead to crash.

SDK version number
1.8.185

Platform/OS/Hardware/Device
Linux / Debian Buster

To Reproduce (observed behavior)
Consider following code

{
  {
    std::shared_ptr<Aws::Utils::Threading::PooledThreadExecutor> executor;
    // initialize executor
    std::shared_ptr<Aws::S3::S3Client> s3Client;
    // initialize client
    Aws::Transfer::TransferManagerConfiguration transfer_config(executor.get());
    transfer_config.s3Client = s3Client;
    // just to reproduce the problem which rarely happen
    transfer_config.transferStatusUpdatedCallback =
        [&](const Aws::Transfer::TransferManager*,
            const std::shared_ptr<const Aws::Transfer::TransferHandle>& handle) {
            if (handle->GetStatus() == Aws::Transfer::TransferStatus::COMPLETED)
                std::this_thread::sleep_for(std::chrono::seconds(30));
        };
    auto transferManager = Aws::Transfer::TransferManager::Create(transfer_config);
    auto handle = transferManager->UploadFile(/* args */);
    handle->WaitUntilFinished();
  }
  std::this_thread::sleep_for(std::chrono::seconds(60));
}

which can lead to the appended crash.

The main problem is WaitUntilFinished() is notified from lambda in the executor. Particularly from TransferManager::Handle PutObjectResponse, so it stops blocking. All shared pointer are destroyed, but it doesn't mean that also Instances are destroyed. And here is a race condition in PooledThreadExecutor.

void ThreadTask::MainTaskRunner()
{
    while (m_continue)
    {        
        while (m_continue && m_executor.HasTasks())
        {      
            auto fn = m_executor.PopTask();
            if(fn)
            {
                (*fn)();  // the lambda keeps shared pointer of `TransferManager` and indirectly also `PooledThreadExecutor`
                Aws::Delete(fn);               
            }
        }
     
        if(m_continue)
        {
            m_executor.m_sync.WaitOne();
        }
    }
}

if a task with lambda has been destroyed in Aws::Delete(fn); then refcount of shared pointer PooledThreadExecutor is 1 and it will be destroyed in main thread - this is correct behavior
if main thread leaves block before then refcount of shared pointer PooledThreadExecutor is higher then 1 and it keeps living in a lamda. Later, deleting the lambda triggers also destroying the executor. Destructor of executor blocks calling thread, because it join all his threads. But her the calling thread is from task and tries to join him self. Yes, in coredump the thread number is the same. The exception rised from std::thread::join is std::system_error pthread_join error : Resource deadlock avoided

ThreadTask::~ThreadTask()
{
    StopProcessingWork();
    m_thread.join();
}

Expected behavior

I don't see any fix except a public function in PooledThreadExecutor which will block until the executor is fully drained. E.g.

{
  {
    std::shared_ptr<Aws::Utils::Threading::PooledThreadExecutor> executor;
    // initialize executor
    ...
    handle->WaitUntilFinished();
    executor->WaitUntilStopped();
  }
  std::this_thread::sleep_for(std::chrono::seconds(60));
}

or any other method to safely destroy the executor.

Additional context

backtrace from coredump

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fab7b5d4535 in __GI_abort () at abort.c:79
#2  0x00007fab7b99c983 in __gnu_cxx::__verbose_terminate_handler () at ../../../../src/libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00007fab7b9a28c6 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:47
#4  0x00007fab7b9a2901 in std::terminate () at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:57
#5  0x00007fab7b9a2b34 in __cxxabiv1::__cxa_throw (obj=obj@entry=0x7fab60988a80, tinfo=tinfo@entry=0x7fab7ba87ae0 <typeinfo for std::system_error>, 
    dest=dest@entry=0x7fab7b9cb8c0 <std::system_error::~system_error()>) at ../../../../src/libstdc++-v3/libsupc++/eh_throw.cc:95
#6  0x00007fab7b99ea55 in std::__throw_system_error (__i=35) at /build/gcc-8-Ev0Tjh/gcc-8-8.3.0/build/x86_64-linux-gnu/libstdc++-v3/include/ext/new_allocator.h:86
#7  0x00007fab7b9cbd6c in std::thread::join (this=this@entry=0x5568078ad650) at ../../../../../src/libstdc++-v3/src/c++11/thread.cc:113
#8  0x0000556805b9fff2 in Aws::Utils::Threading::ThreadTask::~ThreadTask (this=0x5568078ad640, __in_chrg=<optimized out>)
    at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-core/source/utils/threading/ThreadTask.cpp:19
#9  0x0000556805ba0079 in Aws::Delete<Aws::Utils::Threading::ThreadTask> (pointerToT=0x5568078ad640)
    at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-core/include/aws/core/utils/memory/AWSMemory.h:94
#10 Aws::Utils::Threading::PooledThreadExecutor::~PooledThreadExecutor (this=0x5568078ab6e0, __in_chrg=<optimized out>)
    at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-core/source/utils/threading/Executor.cpp:96
#11 0x0000556805a2dcc7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5568078ab6d0) at /usr/include/c++/8/bits/shared_ptr_base.h:148
#12 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5568078ab6d0) at /usr/include/c++/8/bits/shared_ptr_base.h:148
#13 0x0000556805adcf93 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x5568078aec60, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#14 std::__shared_ptr<Aws::Utils::Threading::Executor, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x5568078aec58, __in_chrg=<optimized out>)
    at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#15 std::shared_ptr<Aws::Utils::Threading::Executor>::~shared_ptr (this=0x5568078aec58, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103
#16 Aws::S3::S3Client::~S3Client (this=0x5568078aeb00, __in_chrg=<optimized out>) at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-s3/source/S3Client.cpp:161
#17 0x0000556805a2dcc7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5568078aeaf0) at /usr/include/c++/8/bits/shared_ptr_base.h:148
#18 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5568078aeaf0) at /usr/include/c++/8/bits/shared_ptr_base.h:148
#19 0x0000556805a702c5 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x5568078f9730, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#20 std::__shared_ptr<Aws::S3::S3Client, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x5568078f9728, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#21 std::shared_ptr<Aws::S3::S3Client>::~shared_ptr (this=0x5568078f9728, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103
#22 Aws::Transfer::TransferManagerConfiguration::~TransferManagerConfiguration (this=0x5568078f9728, __in_chrg=<optimized out>)
    at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-transfer/include/aws/transfer/TransferManager.h:38
#23 Aws::Transfer::TransferManager::~TransferManager (this=0x5568078f96a0, __in_chrg=<optimized out>)
    at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-transfer/source/transfer/TransferManager.cpp:68
#24 0x0000556805a2dcc7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5568078f9690) at /usr/include/c++/8/bits/shared_ptr_base.h:148
#25 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5568078f9690) at /usr/include/c++/8/bits/shared_ptr_base.h:148
#26 0x0000556805a66f14 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fab6014c9c8, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#27 std::__shared_ptr<Aws::Transfer::TransferManager, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fab6014c9c0, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#28 std::shared_ptr<Aws::Transfer::TransferManager>::~shared_ptr (this=0x7fab6014c9c0, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103
#29 Aws::Transfer::TransferManager::<lambda(const Aws::Http::HttpRequest*, long long int)>::~<lambda> (this=0x7fab6014c9c0, __in_chrg=<optimized out>)
    at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-transfer/source/transfer/TransferManager.cpp:457
#30 std::_Function_base::_Base_manager<Aws::Transfer::TransferManager::DoSinglePartUpload(const std::shared_ptr<std::basic_iostream<char> >&, const std::shared_ptr<Aws::Transfer::TransferHandle>&)::<lambda(const Aws::Http::HttpRequest*, long long int)> >::_M_destroy (__victim=...) at /usr/include/c++/8/bits/std_function.h:188
#31 std::_Function_base::_Base_manager<Aws::Transfer::TransferManager::DoSinglePartUpload(const std::shared_ptr<std::basic_iostream<char> >&, const std::shared_ptr<Aws::Transfer::TransferHandle>&)::<lambda(const Aws::Http::HttpRequest*, long long int)> >::_M_manager(std::_Any_data &, const std::_Any_data &, std::_Manager_operation) (__dest=..., __source=..., __op=<optimized out>)
    at /usr/include/c++/8/bits/std_function.h:212
#32 0x0000556805ba5dd1 in std::_Function_base::~_Function_base (this=0x7fab6098a650, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/std_function.h:257
#33 std::function<void (Aws::Http::HttpRequest const*, long long)>::~function() (this=0x7fab6098a650, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/std_function.h:370
#34 Aws::AmazonWebServiceRequest::~AmazonWebServiceRequest (this=0x7fab6098a608, __in_chrg=<optimized out>)
    at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-core/include/aws/core/AmazonWebServiceRequest.h:43
#35 Aws::AmazonStreamingWebServiceRequest::~AmazonStreamingWebServiceRequest (this=0x7fab6098a608, __in_chrg=<optimized out>)
    at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-core/source/AmazonStreamingWebServiceRequest.cpp:11
#36 0x0000556805b09b11 in Aws::S3::S3Client::<lambda()>::~<lambda> (this=0x7fab6098a600, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/std_function.h:257
#37 std::_Bind<Aws::S3::S3Client::PutObjectAsync(const Aws::S3::Model::PutObjectRequest&, const PutObjectResponseReceivedHandler&, const std::shared_ptr<const Aws::Client::AsyncCallerContext>&) const::<lambda()>()>::~_Bind (this=0x7fab6098a600, __in_chrg=<optimized out>) at /usr/include/c++/8/functional:386
--Type <RET> for more, q to quit, c to continue without paging--
#38 std::_Function_base::_Base_manager<std::_Bind<Aws::S3::S3Client::PutObjectAsync(const Aws::S3::Model::PutObjectRequest&, const PutObjectResponseReceivedHandler&, const std::shared_ptr<const Aws::Client::AsyncCallerContext>&) const::<lambda()>()> >::_M_destroy (__victim=...) at /usr/include/c++/8/bits/std_function.h:188
#39 std::_Function_base::_Base_manager<std::_Bind<Aws::S3::S3Client::PutObjectAsync(const Aws::S3::Model::PutObjectRequest&, const PutObjectResponseReceivedHandler&, const std::shared_ptr<const Aws::Client::AsyncCallerContext>&) const::<lambda()>()> >::_M_manager(std::_Any_data &, const std::_Any_data &, std::_Manager_operation) (__dest=..., __source=..., __op=<optimized out>)
    at /usr/include/c++/8/bits/std_function.h:212
#40 0x0000556805b9eb23 in std::_Function_base::~_Function_base (this=0x7fab608bbda0, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/std_function.h:257
#41 std::function<void ()>::~function() (this=0x7fab608bbda0, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/std_function.h:370
#42 Aws::Delete<std::function<void ()> >(std::function<void ()>*) (pointerToT=0x7fab608bbda0)
    at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-core/include/aws/core/utils/memory/AWSMemory.h:101
#43 Aws::Utils::Threading::ThreadTask::MainTaskRunner (this=0x5568078ad640) at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-core/source/utils/threading/ThreadTask.cpp:32
#44 0x00007fab7b9cbb2f in std::execute_native_thread_routine (__p=0x5568078b0740) at ../../../../../src/libstdc++-v3/src/c++11/thread.cc:80
#45 0x00007fab7c0c9fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#46 0x00007fab7b6ab4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The text was updated successfully, but these errors were encountered:

jmklix · 2022-06-16T22:58:49Z

Thanks for the detailed write up and PR. I'll see if we can get it reviewed and merged.

meimunchi · 2023-01-16T16:29:27Z

Hello, is there an update on this bug fix? I am seeing the same issue. Thanks!

SergeyRyabinin · 2023-01-18T04:46:27Z

Hi,
A new blocking wait for completion method for the TransferManager was merged: #2291

Please call pTransferManager->WaitUntilAllFinished() once you've finished the file transfer, or before leaving the scope of executor, or before the ShutdownAPI() call.

It does not fix the original issue reported here though, we will also check the original/root issue of PooledThreadExecutor cyclic references.

meimunchi · 2023-01-19T18:52:34Z

Noted, thank you for the update

fboranek · 2023-01-19T20:06:33Z

It does not fix the original issue reported here though, we will also check the original/root issue of PooledThreadExecutor cyclic references.

I think, it cannot be solved otherwise than by suggest MR.

The main issue is passing pool as shared pointer. Then it cannot be guaranteed lifetime of object in main thread.
This example is simplified architecture of the PooledThreadExecutor.

int main() {
    struct Pool {
        std::thread thread;

        ~Pool()
        {
            if (thread.joinable()) thread.join();
            std::cout << "~Pool() finished\n";
        }
    };

    std::cout << "1. simple\n";
    {
        auto pool = std::make_shared<Pool>();
        pool->thread = std::thread{[=]() {
            std::this_thread::sleep_for(std::chrono::seconds{5});
            std::cout << "Task finished\n";
        }};
    }

    std::cout << "2. with copy\n";
    {
        auto pool = std::make_shared<Pool>();
        pool->thread = std::thread{[=]() {
            auto copy = pool;
            std::this_thread::sleep_for(std::chrono::seconds{5});
            std::cout << "Task finished\n";
        }};
    }
    std::cout << std::flush;
}

with output

1. simple
Task finished
~Pool() finished
2. with copy

The second thread was still running when the process ended because i copied shared pointer into task.
It can be seen, that it cannot be safely used. You are not able to prevent similar using. The real owner must ensure call something like poll->join() before leaving scope where he expected the pool will be destroyed.

jmklix · 2024-02-12T20:42:48Z

This should be fixed with this merged PR: #2787

github-actions · 2024-02-12T20:43:04Z

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

fboranek added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Sep 22, 2021

fboranek mentioned this issue Sep 22, 2021

feat: added PooledThreadExecutor::WaitUntilStopped function #1778

Closed

11 tasks

jmklix added pending-release This issue will be fixed by an approved PR that hasn't been released yet. and removed needs-triage This issue or PR still needs to be triaged. labels Jun 16, 2022

jmklix added the p2 This is a standard priority issue label Mar 8, 2023

jmklix closed this as completed Feb 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crash in Aws::Utils::Threading::PooledThreadExecutor #1776

crash in Aws::Utils::Threading::PooledThreadExecutor #1776

fboranek commented Sep 22, 2021 •

edited

Loading

jmklix commented Jun 16, 2022

meimunchi commented Jan 16, 2023

SergeyRyabinin commented Jan 18, 2023

meimunchi commented Jan 19, 2023

fboranek commented Jan 19, 2023

jmklix commented Feb 12, 2024

github-actions bot commented Feb 12, 2024

crash in Aws::Utils::Threading::PooledThreadExecutor #1776

crash in Aws::Utils::Threading::PooledThreadExecutor #1776

Comments

fboranek commented Sep 22, 2021 • edited Loading

jmklix commented Jun 16, 2022

meimunchi commented Jan 16, 2023

SergeyRyabinin commented Jan 18, 2023

meimunchi commented Jan 19, 2023

fboranek commented Jan 19, 2023

jmklix commented Feb 12, 2024

github-actions bot commented Feb 12, 2024

fboranek commented Sep 22, 2021 •

edited

Loading