Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EXPORTER] Support handling retry-able errors for OTLP/gRPC #3219

Merged
merged 20 commits into from
Jan 21, 2025
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion exporters/otlp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -317,7 +317,7 @@ if(BUILD_TESTING)
add_executable(otlp_grpc_exporter_test test/otlp_grpc_exporter_test.cc)
target_link_libraries(
otlp_grpc_exporter_test ${GTEST_BOTH_LIBRARIES} ${CMAKE_THREAD_LIBS_INIT}
${GMOCK_LIB} opentelemetry_exporter_otlp_grpc)
${GMOCK_LIB} opentelemetry_exporter_otlp_grpc gRPC::grpc++)
marcalff marked this conversation as resolved.
Show resolved Hide resolved
gtest_add_tests(
TARGET otlp_grpc_exporter_test
TEST_PREFIX exporter.otlp.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,22 @@ std::string GetOtlpDefaultTracesCompression();
std::string GetOtlpDefaultMetricsCompression();
std::string GetOtlpDefaultLogsCompression();

std::uint32_t GetOtlpDefaultTracesRetryMaxAttempts();
std::uint32_t GetOtlpDefaultMetricsRetryMaxAttempts();
std::uint32_t GetOtlpDefaultLogsRetryMaxAttempts();

float GetOtlpDefaultTracesRetryInitialBackoff();
float GetOtlpDefaultMetricsRetryInitialBackoff();
float GetOtlpDefaultLogsRetryInitialBackoff();

float GetOtlpDefaultTracesRetryMaxBackoff();
float GetOtlpDefaultMetricsRetryMaxBackoff();
float GetOtlpDefaultLogsRetryMaxBackoff();

float GetOtlpDefaultTracesRetryBackoffMultiplier();
float GetOtlpDefaultMetricsRetryBackoffMultiplier();
float GetOtlpDefaultLogsRetryBackoffMultiplier();

} // namespace otlp
} // namespace exporter
OPENTELEMETRY_END_NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,18 @@ struct OtlpGrpcClientOptions
// Concurrent requests
std::size_t max_concurrent_requests;
#endif

/** The maximum number of call attempts, including the original attempt. */
std::uint32_t retry_policy_max_attempts{};

/** The initial backoff delay between retry attempts, random between (0, initial_backoff). */
float retry_policy_initial_backoff{};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use std::chrono::duration<> here?

Copy link
Contributor Author

@chusitoo chusitoo Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had that as a chrono duration initially, but it was not really of any use for otlp/grpc since it just gets passed down to the service config, so it was moved to otlp/http, where it is being required to perform some computations for the backoff.

FYI, implementation in previous commit was like this: cb14857

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding, the exporting of both otlp/http and otlp/grpc will cost much more CPU than type conversion here. I think it's more important to make it clear what this parameters means(We don't know the meaning and the unit of this variable by just the name and comments here), and also float number has EPS and is more imprecise.
What do you think about it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, precision is probably very subjective since examples of normal use cases are limited to a single decimal place (and this is how it is formatted here before passing the config settings to grpc library), which seems logical given that measuring backoff in tens of milliseconds or lower is probably a very niche requirement.

I think there is some truth in that chrono duration makes the type more descriptive. Part of the reasoning I went back to float was because I could not find a common place where I could alias this to a more descriptive name without having to repeat it in at least one more header file (for instance, otlp_environment.h and http_client.h).

For now, I will revert/update this in #3223 until it is approved/merged to avoid duplicating all these work in progress changes for common code bits...


/** The maximum backoff places an upper limit on exponential backoff growth. */
float retry_policy_max_backoff{};

/** The backoff will be multiplied by this value after each retry attempt. */
float retry_policy_backoff_multiplier{};
};

} // namespace otlp
Expand Down
112 changes: 112 additions & 0 deletions exporters/otlp/src/otlp_environment.cc
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,34 @@ static bool GetStringDualEnvVar(const char *signal_name,
return exists;
}

static std::uint32_t GetUintEnvVarOrDefault(opentelemetry::nostd::string_view signal_env,
marcalff marked this conversation as resolved.
Show resolved Hide resolved
opentelemetry::nostd::string_view generic_env,
std::uint32_t default_value)
{
std::string value;

if (GetStringDualEnvVar(signal_env.data(), generic_env.data(), value))
{
return static_cast<std::uint32_t>(std::strtoul(value.c_str(), nullptr, 10));
}

return default_value;
}

static float GetFloatEnvVarOrDefault(opentelemetry::nostd::string_view signal_env,
opentelemetry::nostd::string_view generic_env,
float default_value)
{
std::string value;

if (GetStringDualEnvVar(signal_env.data(), generic_env.data(), value))
{
return std::strtof(value.c_str(), nullptr);
}

return default_value;
}

std::string GetOtlpDefaultGrpcTracesEndpoint()
{
constexpr char kSignalEnv[] = "OTEL_EXPORTER_OTLP_TRACES_ENDPOINT";
Expand Down Expand Up @@ -1125,6 +1153,90 @@ std::string GetOtlpDefaultLogsCompression()
return std::string{"none"};
}

std::uint32_t GetOtlpDefaultTracesRetryMaxAttempts()
{
constexpr char kSignalEnv[] = "OTEL_EXPORTER_OTLP_TRACES_RETRY_MAX_ATTEMPTS";
constexpr char kGenericEnv[] = "OTEL_EXPORTER_OTLP_RETRY_MAX_ATTEMPTS";
return GetUintEnvVarOrDefault(kSignalEnv, kGenericEnv, 5U);
}

std::uint32_t GetOtlpDefaultMetricsRetryMaxAttempts()
{
constexpr char kSignalEnv[] = "OTEL_EXPORTER_OTLP_METRICS_RETRY_MAX_ATTEMPTS";
constexpr char kGenericEnv[] = "OTEL_EXPORTER_OTLP_RETRY_MAX_ATTEMPTS";
return GetUintEnvVarOrDefault(kSignalEnv, kGenericEnv, 5U);
}

std::uint32_t GetOtlpDefaultLogsRetryMaxAttempts()
{
constexpr char kSignalEnv[] = "OTEL_EXPORTER_OTLP_LOGS_RETRY_MAX_ATTEMPTS";
constexpr char kGenericEnv[] = "OTEL_EXPORTER_OTLP_RETRY_MAX_ATTEMPTS";
return GetUintEnvVarOrDefault(kSignalEnv, kGenericEnv, 5U);
}

float GetOtlpDefaultTracesRetryInitialBackoff()
{
constexpr char kSignalEnv[] = "OTEL_EXPORTER_OTLP_TRACES_RETRY_INITIAL_BACKOFF";
constexpr char kGenericEnv[] = "OTEL_EXPORTER_OTLP_RETRY_INITIAL_BACKOFF";
return GetFloatEnvVarOrDefault(kSignalEnv, kGenericEnv, 1.0);
}

float GetOtlpDefaultMetricsRetryInitialBackoff()
{
constexpr char kSignalEnv[] = "OTEL_EXPORTER_OTLP_METRICS_RETRY_INITIAL_BACKOFF";
constexpr char kGenericEnv[] = "OTEL_EXPORTER_OTLP_RETRY_INITIAL_BACKOFF";
return GetFloatEnvVarOrDefault(kSignalEnv, kGenericEnv, 1.0);
}

float GetOtlpDefaultLogsRetryInitialBackoff()
{
constexpr char kSignalEnv[] = "OTEL_EXPORTER_OTLP_LOGS_RETRY_INITIAL_BACKOFF";
constexpr char kGenericEnv[] = "OTEL_EXPORTER_OTLP_RETRY_INITIAL_BACKOFF";
return GetFloatEnvVarOrDefault(kSignalEnv, kGenericEnv, 1.0);
}

float GetOtlpDefaultTracesRetryMaxBackoff()
{
constexpr char kSignalEnv[] = "OTEL_EXPORTER_OTLP_TRACES_RETRY_MAX_BACKOFF";
constexpr char kGenericEnv[] = "OTEL_EXPORTER_OTLP_RETRY_MAX_BACKOFF";
return GetFloatEnvVarOrDefault(kSignalEnv, kGenericEnv, 5.0);
}

float GetOtlpDefaultMetricsRetryMaxBackoff()
{
constexpr char kSignalEnv[] = "OTEL_EXPORTER_OTLP_METRICS_RETRY_MAX_BACKOFF";
constexpr char kGenericEnv[] = "OTEL_EXPORTER_OTLP_RETRY_MAX_BACKOFF";
return GetFloatEnvVarOrDefault(kSignalEnv, kGenericEnv, 5.0);
}

float GetOtlpDefaultLogsRetryMaxBackoff()
{
constexpr char kSignalEnv[] = "OTEL_EXPORTER_OTLP_LOGS_RETRY_MAX_BACKOFF";
constexpr char kGenericEnv[] = "OTEL_EXPORTER_OTLP_RETRY_MAX_BACKOFF";
return GetFloatEnvVarOrDefault(kSignalEnv, kGenericEnv, 5.0);
}

float GetOtlpDefaultTracesRetryBackoffMultiplier()
{
constexpr char kSignalEnv[] = "OTEL_EXPORTER_OTLP_TRACES_RETRY_BACKOFF_MULTIPLIER";
constexpr char kGenericEnv[] = "OTEL_EXPORTER_OTLP_RETRY_BACKOFF_MULTIPLIER";
return GetFloatEnvVarOrDefault(kSignalEnv, kGenericEnv, 1.5f);
}

float GetOtlpDefaultMetricsRetryBackoffMultiplier()
{
constexpr char kSignalEnv[] = "OTEL_EXPORTER_OTLP_METRICS_RETRY_BACKOFF_MULTIPLIER";
constexpr char kGenericEnv[] = "OTEL_EXPORTER_OTLP_RETRY_BACKOFF_MULTIPLIER";
return GetFloatEnvVarOrDefault(kSignalEnv, kGenericEnv, 1.5f);
}

float GetOtlpDefaultLogsRetryBackoffMultiplier()
{
constexpr char kSignalEnv[] = "OTEL_EXPORTER_OTLP_LOGS_RETRY_BACKOFF_MULTIPLIER";
constexpr char kGenericEnv[] = "OTEL_EXPORTER_OTLP_RETRY_BACKOFF_MULTIPLIER";
return GetFloatEnvVarOrDefault(kSignalEnv, kGenericEnv, 1.5f);
}

} // namespace otlp
} // namespace exporter
OPENTELEMETRY_END_NAMESPACE
39 changes: 39 additions & 0 deletions exporters/otlp/src/otlp_grpc_client.cc
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
#include <atomic>
#include <chrono>
#include <condition_variable>
#include <cstdio>
#include <fstream>
#include <iterator>
#include <memory>
Expand All @@ -23,6 +24,7 @@
#include "opentelemetry/common/timestamp.h"
#include "opentelemetry/ext/http/common/url_parser.h"
#include "opentelemetry/nostd/function_ref.h"
#include "opentelemetry/nostd/string_view.h"
#include "opentelemetry/sdk/common/global_log_handler.h"

OPENTELEMETRY_BEGIN_NAMESPACE
Expand Down Expand Up @@ -347,6 +349,43 @@ std::shared_ptr<grpc::Channel> OtlpGrpcClient::MakeChannel(const OtlpGrpcClientO
grpc_arguments.SetCompressionAlgorithm(GRPC_COMPRESS_GZIP);
}

if (options.retry_policy_max_attempts > 0U && options.retry_policy_initial_backoff > 0.0f &&
options.retry_policy_max_backoff > 0.0f && options.retry_policy_backoff_multiplier > 0.0f)
{
static const auto kServiceConfigJson = opentelemetry::nostd::string_view{R"(
{
"methodConfig": [
{
"name": [{}],
"retryPolicy": {
"maxAttempts": %0000000000u,
"initialBackoff": "%.1fs",
"maxBackoff": "%.1fs",
"backoffMultiplier": %.1f,
"retryableStatusCodes": [
"CANCELLED",
"DEADLINE_EXCEEDED",
"ABORTED",
"OUT_OF_RANGE",
"DATA_LOSS",
"UNAVAILABLE"
]
}
}
]
})"};

// Allocate string with buffer large enough to hold the formatted json config
auto service_config = std::string(kServiceConfigJson.size(), '\0');
// Prior to C++17, need to explicitly cast away constness from `data()` buffer
std::snprintf(const_cast<decltype(service_config)::value_type *>(service_config.data()),
service_config.size(), kServiceConfigJson.data(),
options.retry_policy_max_attempts, options.retry_policy_initial_backoff,
options.retry_policy_max_backoff, options.retry_policy_backoff_multiplier);

grpc_arguments.SetServiceConfigJSON(service_config);
}

if (options.use_ssl_credentials)
{
grpc::SslCredentialsOptions ssl_opts;
Expand Down
2 changes: 1 addition & 1 deletion exporters/otlp/src/otlp_grpc_exporter.cc
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ sdk::common::ExportResult OtlpGrpcExporter::Export(
google::protobuf::ArenaOptions arena_options;
// It's easy to allocate datas larger than 1024 when we populate basic resource and attributes
arena_options.initial_block_size = 1024;
// When in batch mode, it's easy to export a large number of spans at once, we can alloc a lager
// When in batch mode, it's easy to export a large number of spans at once, we can alloc a larger
// block to reduce memory fragments.
arena_options.max_block_size = 65536;
std::unique_ptr<google::protobuf::Arena> arena{new google::protobuf::Arena{arena_options}};
Expand Down
5 changes: 5 additions & 0 deletions exporters/otlp/src/otlp_grpc_exporter_options.cc
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,11 @@ OtlpGrpcExporterOptions::OtlpGrpcExporterOptions()
#ifdef ENABLE_ASYNC_EXPORT
max_concurrent_requests = 64;
#endif

retry_policy_max_attempts = GetOtlpDefaultTracesRetryMaxAttempts();
retry_policy_initial_backoff = GetOtlpDefaultTracesRetryInitialBackoff();
retry_policy_max_backoff = GetOtlpDefaultTracesRetryMaxBackoff();
retry_policy_backoff_multiplier = GetOtlpDefaultTracesRetryBackoffMultiplier();
}

OtlpGrpcExporterOptions::~OtlpGrpcExporterOptions() {}
Expand Down
5 changes: 5 additions & 0 deletions exporters/otlp/src/otlp_grpc_log_record_exporter_options.cc
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,11 @@ OtlpGrpcLogRecordExporterOptions::OtlpGrpcLogRecordExporterOptions()
#ifdef ENABLE_ASYNC_EXPORT
max_concurrent_requests = 64;
#endif

retry_policy_max_attempts = GetOtlpDefaultLogsRetryMaxAttempts();
retry_policy_initial_backoff = GetOtlpDefaultLogsRetryInitialBackoff();
retry_policy_max_backoff = GetOtlpDefaultLogsRetryMaxBackoff();
retry_policy_backoff_multiplier = GetOtlpDefaultLogsRetryBackoffMultiplier();
}

OtlpGrpcLogRecordExporterOptions::~OtlpGrpcLogRecordExporterOptions() {}
Expand Down
5 changes: 5 additions & 0 deletions exporters/otlp/src/otlp_grpc_metric_exporter_options.cc
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,11 @@ OtlpGrpcMetricExporterOptions::OtlpGrpcMetricExporterOptions()
#ifdef ENABLE_ASYNC_EXPORT
max_concurrent_requests = 64;
#endif

retry_policy_max_attempts = GetOtlpDefaultMetricsRetryMaxAttempts();
retry_policy_initial_backoff = GetOtlpDefaultMetricsRetryInitialBackoff();
retry_policy_max_backoff = GetOtlpDefaultMetricsRetryMaxBackoff();
retry_policy_backoff_multiplier = GetOtlpDefaultMetricsRetryBackoffMultiplier();
}

OtlpGrpcMetricExporterOptions::~OtlpGrpcMetricExporterOptions() {}
Expand Down
Loading
Loading