Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transcribestreaming SIGSEGV of library in CRTHttpClient::MakeRequest -> ostream::write #2730

Closed
sem32 opened this issue Oct 25, 2023 · 4 comments
Assignees
Labels
bug This issue is a bug. investigating This issue is being investigated and/or work is in progress to resolve the issue. p2 This is a standard priority issue

Comments

@sem32
Copy link

sem32 commented Oct 25, 2023

Describe the bug

We are using C++ SDK to transcribe stream in realtime, and we have an issue with crashing the SDK library in some cases, but it is 100% reproduced in case of the wrong env variable AWS_SECRET_ACCESS_KEY

Why we are using CRT HTTP CLIENT?
We are using it because we have a performance issue when we use lib CURL.

  • With the version of CURL 7.87 the quality of the transcribe was good, but CPU usage was too high (every 3-5 sec spike of CPU usage to 100%). For one transcribing process is more or less OK, but for 30 is not).
  • With the version of CURL 7.88 we faced an issue with the quality of the transcribe (it looks like the CURL library does some optimization), but we had no performance issue.
  • We have no issue with the quality and performance with the CRT http client.

GDB output: gdb_dump.txt

I tried to use libsanitizer to catch the issue, and here is the result: libsanitizer_res.txt

logs.txt

Expected Behavior

There are no crashes in the library

Current Behavior

the library is crashing

Reproduction Steps

the issue is reproduced in some rare cases with no changes, but 100% reproduced in case we put some wrong symbol to the value of AWS_SECRET_ACCESS_KEY environment variable

Possible Solution

No response

Additional Information/Context

No response

AWS CPP SDK version used

1.11.184 (latest master)

Compiler and Version used

gcc (Debian 10.2.1-6) 10.2.1 20210110

Operating System and version

Debian 11

@sem32 sem32 added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Oct 25, 2023
@jmklix jmklix self-assigned this Oct 25, 2023
@jmklix jmklix added investigating This issue is being investigated and/or work is in progress to resolve the issue. p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Oct 27, 2023
@jmklix
Copy link
Member

jmklix commented Oct 28, 2023

I'm working on trying to reproduce the same error you are getting and I had a few questions:

I just want to make sure we are both trying to solve the same problem. This similar looking issue was caused by a

permission access error in my AWS credential

and I want to make sure we're not debugging an error added artificially by changing the AWS_SECRET_ACCESS_KEY

@sem32
Copy link
Author

sem32 commented Oct 28, 2023

Are the logs/sanitizer/dump from when you reproduce the error without any changes? (i.e. with the normal AWS_SECRET_ACCESS_KEY

I've changed only the default requestTimeoutMs, because it is too small in SDK.

diff --git a/src/aws-cpp-sdk-core/source/client/ClientConfiguration.cpp b/src/aws-cpp-sdk-core/source/client/ClientConfiguration.cpp
index 30e4fbabc0..ba73b788b1 100644
--- a/src/aws-cpp-sdk-core/source/client/ClientConfiguration.cpp
+++ b/src/aws-cpp-sdk-core/source/client/ClientConfiguration.cpp
@@ -122,7 +122,7 @@ void setLegacyClientConfigurationParameters(ClientConfiguration& clientConfig)
clientConfig.useFIPS = false;
clientConfig.maxConnections = 25;
clientConfig.httpRequestTimeoutMs = 0;
- clientConfig.requestTimeoutMs = 3000;
+ clientConfig.requestTimeoutMs = 30000;
clientConfig.connectTimeoutMs = 1000;
clientConfig.enableTcpKeepAlive = true;
clientConfig.tcpKeepAliveIntervalMs = 30000;

Are you getting CRC Mismatch in both error cases?

yes, I have the same error CRC Mismatch even if I have the correct AWS_SECRET_ACCESS_KEY. When it's one transcribing session it's okay, but when I start 10-20 transcribing sessions in some time (20-30 sec) I have the same error (CRC Mismatch ) and the crash.
So, changing AWS_SECRET_ACCESS_KEY is the simplest way to reproduce the issue, but it's not a production case. In production, I have the same error (and crash) with a small load.
here are the logs/dumps:
crash2.zip

Also with the load I've faced other crashes with a load ~30 transcribing sessions
crash3.txt

and one more:
crash4.txt
crash5.zip

Can you confirm you are using the unmodified sample found here: https://github.com/awsdocs/aws-doc-sdk-examples/tree/main/cpp/example_code/transcribe

yes, correct. I tried to reproduce the issue with the wrong AWS_SECRET_ACCESS_KEY and it looks like the crash the same.

have you tried and reproduced this on any other OS's?

no, we are using Debian 11

I'm developing multithread application for realtime transcribing VoIP's calls, so when I load my module, I call Aws::InitAPI(options) and for each SIP call that I need to transcribe I start a separate thread where I call
m_client = Aws::MakeUnique<TranscribeStreamingServiceClient>("TAG", config);
StartStreamTranscriptionRequest m_request;
set all callbacks and call
m_client->StartStreamTranscriptionAsync(_request, OnStreamReady, OnResponseCallback, nullptr);

When I have 5 calls to transcribe, it looks good with no issue, but when it's 10-20 I start to face an issues with crashes of SDK's library.

I compile SDK by:
cmake ../aws-sdk-cpp -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH=/usr/local/ -DCMAKE_INSTALL_PREFIX=/usr/local/ -DBUILD_ONLY="transcribestreaming" -DUSE_CRT_HTTP_CLIENT=1

@sem32
Copy link
Author

sem32 commented Nov 16, 2023

@SergeyRyabinin
The fix is working. Thank you!

@sem32 sem32 closed this as completed Nov 16, 2023
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. investigating This issue is being investigated and/or work is in progress to resolve the issue. p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

2 participants