Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

review of dmehala/remote-config-impl (#74) #76

1 change: 1 addition & 0 deletions BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ cc_library(
"src/datadog/base64.cpp",
"src/datadog/cerr_logger.cpp",
"src/datadog/clock.cpp",
"src/datadog/config_manager.cpp",
"src/datadog/collector_response.cpp",
# "src/datadog/curl.cpp", no libcurl
"src/datadog/datadog_agent_config.cpp",
Expand Down
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ target_sources(dd_trace_cpp-objects PRIVATE
src/datadog/base64.cpp
src/datadog/cerr_logger.cpp
src/datadog/clock.cpp
src/datadog/config_manager.cpp
src/datadog/collector_response.cpp
src/datadog/curl.cpp
src/datadog/datadog_agent_config.cpp
Expand Down
2 changes: 1 addition & 1 deletion bin/cmake-build
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,5 @@ cd "$(dirname "$0")"/..

mkdir -p .build
cd .build
cmake ..
cmake .. "$@"
make -j "$(nproc)"
21 changes: 8 additions & 13 deletions fuzz/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,19 @@ Fuzzers
Each subdirectory here contains the source of an executable that [fuzz tests][1]
some part of the library using [LLVM's libfuzzer][2].

There is a toplevel CMake boolean option associated with each fuzzer. The naming
convention is `FUZZ_<SUBDIRECTORY_WITH_UNDERSCORES>`, e.g.
`FUZZ_W3C_PROPAGATION` for the fuzzer defined in
[fuzz/w3c-propagation/](./w3c-propagation/). The resulting binary is called
`fuzz` by convention.
There is a toplevel CMake boolean option that adds all of the fuzzer
executables to the build: `BUILD_FUZZERS`.

When building a fuzzer, the toolchain must be clang-based. For example, this
is how to build the fuzzer in [fuzz/w3c-propagation](./w3c-propagation/) from
the root of the repository:
When building the fuzzers, the toolchain must be clang-based. For example:
```console
$ rm -rf .build && mkdir .build # if toolchain or test setup need clearing
$ cd .build
$ CC=clang CXX=clang++ cmake .. -DFUZZ_W3C_PROPAGATION=ON
$ make -j $(nproc)
$ fuzz/w3c-propagation/fuzz
$ rm -rf .build # if toolchain needs clearing
$ bin/with-toolchain llvm bin/cmake-build -DBUILD_FUZZERS=1
$ .build/fuzz/w3c-propagation/w3c-propagation-fuzz

[... fuzzer output ...]
```

The fuzzer executables are named `.build/fuzz/*/*-fuzz` by convention.

[1]: https://en.wikipedia.org/wiki/Fuzzing
[2]: https://llvm.org/docs/LibFuzzer.html
4 changes: 3 additions & 1 deletion fuzz/base64/main.cpp
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
#include <datadog/base64.h>
#include <datadog/string_view.h>

#include <cstdint>

namespace dd = datadog::tracing;

extern "C" int LLVMFuzzerTestOneInput(const std::uint8_t* data, size_t size) {
dd::base64::decode(dd::StringView{(char*)data, size});
dd::base64_decode(dd::StringView{(const char*)data, size});
return 0;
}
133 changes: 62 additions & 71 deletions src/datadog/base64.cpp
Original file line number Diff line number Diff line change
@@ -1,109 +1,100 @@
#include "base64.h"

#include <stdint.h>
#include <cstddef>
#include <cstdint>

namespace datadog {
namespace tracing {
namespace base64 {

#define _ 255
#define SENTINEL_VALUE _
#define EOL 0
constexpr uint8_t k_sentinel = 255;
constexpr uint8_t _ = k_sentinel; // for brevity
constexpr uint8_t k_eol = 0;

/*
* Lookup table mapping the base64 table. Invalid inputs are mapped
* to the value 255.
* `=` map to 0.
*/
// Invalid inputs are mapped to the value 255. '=' maps to 0.
constexpr uint8_t k_base64_table[]{
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _, _, _, 62, _, _, _, 63, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, _, _, _, EOL, _, _, _, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, _, _, _, _,
_, _, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
43, 44, 45, 46, 47, 48, 49, 50, 51, _, _, _, _, _, _, _, _, _, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _, _, _, _, _, _, _};

// TODO: support input without padding?
std::string decode(StringView in) {
const std::size_t in_size = in.size();

std::string out;
out.reserve(in_size);
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _, _, _, _, _, 62, _, _, _, 63, 52, 53, 54, 55, 56, 57,
58, 59, 60, 61, _, _, _, k_eol, _, _, _, 0, 1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
25, _, _, _, _, _, _, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, _, _, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
_, _, _, _};

std::string base64_decode(StringView input) {
const size_t in_size = input.size();

std::string output;
output.reserve(in_size);

union {
uint32_t buffer;
uint8_t bytes[4];
} decoder;

std::size_t i = 0;
size_t i = 0;

for (; i + 4 < in_size;) {
auto c0 = k_base64_table[static_cast<size_t>(in[i++])];
auto c1 = k_base64_table[static_cast<size_t>(in[i++])];
auto c2 = k_base64_table[static_cast<size_t>(in[i++])];
auto c3 = k_base64_table[static_cast<size_t>(in[i++])];
uint32_t c0 = k_base64_table[static_cast<size_t>(input[i++])];
uint32_t c1 = k_base64_table[static_cast<size_t>(input[i++])];
uint32_t c2 = k_base64_table[static_cast<size_t>(input[i++])];
uint32_t c3 = k_base64_table[static_cast<size_t>(input[i++])];

if (c0 == SENTINEL_VALUE || c1 == SENTINEL_VALUE || c2 == SENTINEL_VALUE ||
c3 == SENTINEL_VALUE) {
if (c0 == k_sentinel || c1 == k_sentinel || c2 == k_sentinel ||
c3 == k_sentinel) {
return "";
}

decoder.buffer =
static_cast<uint32_t>(0) | static_cast<uint32_t>(c0) << 26 |
static_cast<uint32_t>(c1) << 20 | static_cast<uint32_t>(c2) << 14 |
static_cast<uint32_t>(c3) << 8;
decoder.buffer = 0 | c0 << (32 - 6 * 1) | c1 << (32 - 6 * 2) |
c2 << (32 - 6 * 3) | c3 << (32 - 6 * 4);

// NOTE(@dmehala): It might seem confusion to read those bytes in reverse
// NOTE(@dmehala): It might seem confusing to read those bytes input reverse
// order. It is related to the architecture endianess. For now the set of
// architecture we support (x86_64 and arm64) are all little-endian.
out.push_back(decoder.bytes[3]);
out.push_back(decoder.bytes[2]);
out.push_back(decoder.bytes[1]);
// TODO(@dgoffredo): I'd prefer an endian-agnostic implementation.
// nginx-datadog targets x86_64 and arm64 in its binary releases, but
// dd-trace-cpp targets any standard C++17 compiler.
output.push_back(decoder.bytes[3]);
output.push_back(decoder.bytes[2]);
output.push_back(decoder.bytes[1]);
}

if ((in_size - i) < 4) return ""; // not padded input is not supported
// If padding is missing, return the empty string in lieu of an Error.
if ((in_size - i) < 4) return "";

auto c0 = k_base64_table[static_cast<size_t>(in[i++])];
auto c1 = k_base64_table[static_cast<size_t>(in[i++])];
auto c2 = k_base64_table[static_cast<size_t>(in[i++])];
auto c3 = k_base64_table[static_cast<size_t>(in[i++])];
uint32_t c0 = k_base64_table[static_cast<size_t>(input[i++])];
uint32_t c1 = k_base64_table[static_cast<size_t>(input[i++])];
uint32_t c2 = k_base64_table[static_cast<size_t>(input[i++])];
uint32_t c3 = k_base64_table[static_cast<size_t>(input[i++])];

if (c0 == SENTINEL_VALUE || c1 == SENTINEL_VALUE || c2 == SENTINEL_VALUE ||
c3 == SENTINEL_VALUE) {
if (c0 == k_sentinel || c1 == k_sentinel || c2 == k_sentinel ||
c3 == k_sentinel) {
return "";
}

decoder.buffer = static_cast<uint32_t>(0) | static_cast<uint32_t>(c0) << 26 |
static_cast<uint32_t>(c1) << 20 |
static_cast<uint32_t>(c2) << 14 |
static_cast<uint32_t>(c3) << 8;
decoder.buffer = 0 | c0 << (32 - 6 * 1) | c1 << (32 - 6 * 2) |
c2 << (32 - 6 * 3) | c3 << (32 - 6 * 4);
Comment on lines +82 to +83
Copy link
Collaborator

@dmehala dmehala Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the extensive use of arithmetic not helpful, it does the opposite. Ultimately, the reader is required to understand the base64 algorithm to grasp the rationale behind those bit shifting.

Either static_cast<uint32_t>(c0) << 26 or c0 << (32 - 6 * 1), is equivalent but the later require mental calculation to find out how much we are shifting, add on top of it a lack of parenthesis and I guarantee lots of people will fail to interpret it correctly compared to << 26. Can we avoid this confusion and revert to the proposed solution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write it however you prefer it, but my preference is expressed in the diff.

More important to me is that we choose an algorithm that does not assume little-endian.


if (c2 == EOL) {
out.push_back(decoder.bytes[3]);
} else if (c3 == EOL) {
out.push_back(decoder.bytes[3]);
out.push_back(decoder.bytes[2]);
if (c2 == k_eol) {
output.push_back(decoder.bytes[3]);
} else if (c3 == k_eol) {
output.push_back(decoder.bytes[3]);
output.push_back(decoder.bytes[2]);
} else {
out.push_back(decoder.bytes[3]);
out.push_back(decoder.bytes[2]);
out.push_back(decoder.bytes[1]);
output.push_back(decoder.bytes[3]);
output.push_back(decoder.bytes[2]);
output.push_back(decoder.bytes[1]);
}

return out;
return output;
}

#undef EOL
#undef SENTINEL_VALUE
#undef _

} // namespace base64
} // namespace tracing
} // namespace datadog
14 changes: 5 additions & 9 deletions src/datadog/base64.h
Original file line number Diff line number Diff line change
@@ -1,19 +1,15 @@
#pragma once

#include <string>

#include "string_view.h"

namespace datadog {
namespace tracing {
namespace base64 {

/*
* Decode a base64-encoded string and returns the decoded data.
*
* It only supported padded base64-encoded string. Providing an unpadded
* input will return an empty string.
*/
std::string decode(StringView in);
// Return the result of decoding the specified padded base64-encoded `input`. If
// `input` is not padded, then return the empty string instead.
std::string base64_decode(StringView input);

} // namespace base64
} // namespace tracing
} // namespace datadog
45 changes: 45 additions & 0 deletions src/datadog/config_manager.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#include "config_manager.h"

namespace datadog {
namespace tracing {

ConfigManager::ConfigManager(const FinalizedTracerConfig& config)
: clock_(config.clock),
default_trace_sampler_(
std::make_shared<TraceSampler>(config.trace_sampler, clock_)),
current_trace_sampler_(default_trace_sampler_) {}

std::shared_ptr<TraceSampler> ConfigManager::get_trace_sampler() {
std::lock_guard<std::mutex> lock(mutex_);
return current_trace_sampler_;
}

void ConfigManager::update(const ConfigManager::Update& conf) {
std::lock_guard<std::mutex> lock(mutex_);

if (conf.trace_sampler) {
if (auto finalized_trace_sampler_cfg =
finalize_config(*conf.trace_sampler)) {
current_trace_sampler_ =
std::make_shared<TraceSampler>(*finalized_trace_sampler_cfg, clock_);
} else {
// TODO: report error
}
} else {
current_trace_sampler_ = default_trace_sampler_;
}
}

void ConfigManager::reset() {
std::lock_guard<std::mutex> lock(mutex_);
current_trace_sampler_ = default_trace_sampler_;
}

nlohmann::json ConfigManager::config_json() const {
std::lock_guard<std::mutex> lock(mutex_);
return nlohmann::json{
{"trace_sampler", current_trace_sampler_->config_json()}};
}

} // namespace tracing
} // namespace datadog
52 changes: 16 additions & 36 deletions src/datadog/config_manager.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
#pragma once

// TODO: Document

#include <mutex>

#include "clock.h"
Expand All @@ -10,54 +12,32 @@
namespace datadog {
namespace tracing {

struct ConfigUpdate {
Optional<TraceSamplerConfig> trace_sampler;
};

class ConfigManager {
mutable std::mutex mutex_;
Clock clock_;
std::shared_ptr<TraceSampler> default_trace_sampler_;
std::shared_ptr<TraceSampler> current_trace_sampler_;

public:
ConfigManager(const FinalizedTracerConfig& config)
: clock_(config.clock),
default_trace_sampler_(
std::make_shared<TraceSampler>(config.trace_sampler, clock_)),
current_trace_sampler_(default_trace_sampler_) {}
struct Update {
Optional<TraceSamplerConfig> trace_sampler;
};
Comment on lines +22 to +24
Copy link
Collaborator

@dmehala dmehala Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An Update is capable of being consumed by the ConfigManager, even though it is likely and perhaps exclusively intended to be consumed by the ConfigManager, that does not means there is a compelling reason to tighly couple both concept. It can also be interpreted as an Update of the ConfigManager, which is not the intended meaning.

I suggest to keep it outside of the ConfigManager, it offers room for flexibility, probably future usage and semantically is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw three options:

  1. Leave it as you had it.
  2. Put struct ConfigUpdate in its own component (its own .h). Good place for documentation!
  3. Consider ConfigUpdate as part of ConfigManager's public interface.

(1) and (3) are compatible, but so too are (3) and my proposed change.

My reasoning was "if I see ConfigUpdate used in RemoteConfigManager, where do I go looking for its definition?" I might look for a config_update.h, but there is none. ConfigUpdate is a parameter to ConfigManager, so I might look there, but the connection is even stronger if it's spelled ConfigManager::Update.

It can also be interpreted as an Update of the ConfigManager, which is not the intended meaning.

That's a good point. I prefer (2) to (1) if you're looking for a compromise. I'm also fine with (1). This PR is illustrative only.


std::shared_ptr<TraceSampler> get_trace_sampler() {
std::lock_guard<std::mutex> lock(mutex_);
return current_trace_sampler_;
}
ConfigManager(const FinalizedTracerConfig& config);

void update(const ConfigUpdate& conf) {
std::lock_guard<std::mutex> lock(mutex_);
// Return the `TraceSampler` consistent with the most recent configuration.
std::shared_ptr<TraceSampler> get_trace_sampler();

if (conf.trace_sampler) {
if (auto finalized_trace_sampler_cfg =
finalize_config(*conf.trace_sampler)) {
current_trace_sampler_ = std::make_shared<TraceSampler>(
*finalized_trace_sampler_cfg, clock_);
} else {
// TODO: report error
}
} else {
current_trace_sampler_ = default_trace_sampler_;
}
}
// Apply the specified `conf` update.
void update(const Update& conf);

void reset() {
std::lock_guard<std::mutex> lock(mutex_);
current_trace_sampler_ = default_trace_sampler_;
}
// Restore the configuration that was passed to this object's constructor,
// overriding any previous calls to `update`.
void reset();

nlohmann::json config_json() const {
std::lock_guard<std::mutex> lock(mutex_);
return nlohmann::json{
{"trace_sampler", current_trace_sampler_->config_json()}};
}
// Return a JSON representation of the current configuration managed by this
// object.
nlohmann::json config_json() const;
};

} // namespace tracing
Expand Down
Loading