Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core configuration add env var and casting functionality #2970

Conversation

GustavoCaso
Copy link
Member

@GustavoCaso GustavoCaso commented Jul 14, 2023

What does this PR do?

The PR adds env_var and deprecated_env_var options to our internal configuration DSL, using the type attribute to coerce ENV variables and validate the value set is validity.

There are a lot of files changes so I split the PR into smaller, more digestible commits:

  • Add the logic for env_var, deprecated_env_var and type coercion 6e2f679
  • Update SpanAttributeSchema to not look for ENV variables e25ab82
  • Update configuration to use the new options 424431d
  • Update specs 57b0aa7

Supported types

For validation

:int
:float
:array
:string
:block
:hash
:bool
:symbol

For coercion. I used the same coercion types as we currently supported within Core::Environment::VariableHelpers

:int
:float
:array
:bool

Examples of configuration possibilities.

Example 1

option :enabled do |o|
  o.type :bool
  o.env_var 'DD_APPSEC_ENABLED'
  o.default DEFAULT_APPSEC_ENABLED
end

We would try to extract the value from the DD_APPSEC_ENABLED ENV var, if the env var is present we would coerce it to a boolean value. In the case, the customer wants to configure using the configure block c.appsec.enabled = true. We would validate that the value provided is a boolean value. If the value doesn't match a boolean value, we would raise an exception with a useful message.

Example 2

option :enabled do |o|
  o.env_var 'DD_APPSEC_AUTOMATED_USER_EVENTS_TRACKING'
  o.default DEFAULT_APPSEC_AUTOMATED_TRACK_USER_EVENTS_ENABLED
  o.setter do |value|
    if value.is_a?(String)
      value != 'disabled'
    else
      !!value # rubocop:disable Style/DoubleNegation
    end
  end
end

This example does not use the type attribute, because the expected values are complex and could be multiple types. In this case, rather than pushing the complexity into the internal of the DSL, we do not set type, which would not try coercing ENV var or validate the provided value. We push the complexity into the setter block in this case. That way, those settings options deserve special attention; they could do so using the setter block.

Example 3

option :default_rate do |o|
    o.type :float, nil: true
    o.env_var Tracing::Configuration::Ext::Sampling::ENV_SAMPLE_RATE
  end

This example signal uses the type option nil: true that signals to the option code that the value could be a Float or nil.

Example 4

 option :tags do |o|
  o.type :hash, nilable: true
  o.env Core::Environment::Ext::ENV_TAGS
  o.env_parser do |env_value|
    values = if env_value.include?(',')
               env_value.split(',')
             else
               env_value.split(' ') # rubocop:disable Style/RedundantArgument
             end

    values.map! do |v|
      v.gsub!(/\A[\s,]*|[\s,]*\Z/, '')

      v.empty? ? nil : v
    end

    values.compact!
    values.each_with_object({}) do |tag, tags|
      key, value = tag.split(':', 2)
      tags[key] = value if value && !value.empty?
    end
  end
  o.setter do |new_value, old_value|
    raw_tags = new_value || {}

    env_value = env
    version_value = version
    service_name = service_without_fallback

    # Override tags if defined
    raw_tags[Core::Environment::Ext::TAG_ENV] = env_value unless env_value.nil?
    raw_tags[Core::Environment::Ext::TAG_VERSION] = version_value unless version_value.nil?

    # Coerce keys to strings
    string_tags = raw_tags.collect { |k, v| [k.to_s, v] }.to_h

    # Cross-populate tag values with other settings
    if env.nil? && string_tags.key?(Core::Environment::Ext::TAG_ENV)
      self.env = string_tags[Core::Environment::Ext::TAG_ENV]
    end

    if version_value.nil? && string_tags.key?(Core::Environment::Ext::TAG_VERSION)
      self.version = string_tags[Core::Environment::Ext::TAG_VERSION]
    end

    if service_name.nil? && string_tags.key?(Core::Environment::Ext::TAG_SERVICE)
      self.service = string_tags[Core::Environment::Ext::TAG_SERVICE]
    end

    # Merge with previous tags
    (old_value || {}).merge(string_tags)
  end
end

This example uses env_parser block. In cases where the parsing of env variables is more complex and can not be achieved using the default types, we can push the complexity to the env_parser block.

In this example the tags ENV variable stores the information on a string with the format tag1:value1,tag2:value2, which gets converted into a hash {"tag1" => "value1", "tag2" => "value2"}

Breaking changes

Also, I want to emphasize that all these changes do not introduce any breaking change 🎉 , since we haven't altered any existing code previously used for configuration.

Motivation

Additional Notes

The logic for coercion inside of option.rb is very similar that the one of Core::Environment::VariableHelpers. Why did I not use that one?

As of today, the code from Core::Environment::VariableHelpers is public, so any alteration to that code would have been considered a breaking change. Keeping internal of our configuration DSL private would allow us to iterate faster without worrying about breaking changes.

How to test the change?

CI

@github-actions github-actions bot added appsec Application Security monitoring product ci-app CI product for test suite instrumentation core Involves Datadog core libraries integrations Involves tracing integrations tracing labels Jul 14, 2023
@GustavoCaso GustavoCaso marked this pull request as ready for review July 14, 2023 11:56
@GustavoCaso GustavoCaso requested a review from a team July 14, 2023 11:56
tags[key] = value if value && !value.empty?
end
o.env_var Core::Environment::Ext::ENV_TAGS
o.setter do |new_value, old_value|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does :tags need its own custom setter now? Before it was happily using env_to_list..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tags is a complex setting. It accepts an ENV var that gets parsed as a list, and for customers configuring it directly though Datadog.configured block, it accepts a hash with the different tags.

To keep the DSL simple and avoid extra complexity to support all cases. The settings that accept multiple value types do not have to define any type and push all the logic into the setter block.

I'd like to explain this setting change in detail since it is the most complex one from the coed base.

Old code:

option :tags do |o|
  o.default do
    tags = {}

    # Parse tags from environment
    env_to_list(Core::Environment::Ext::ENV_TAGS, comma_separated_only: false).each do |tag|
      key, value = tag.split(':', 2)
      tags[key] = value if value && !value.empty?
    end

    # Override tags if defined
    tags[Core::Environment::Ext::TAG_ENV] = env unless env.nil?
    tags[Core::Environment::Ext::TAG_VERSION] = version unless version.nil?

    tags
  end

  o.setter do |new_value, old_value|
    # Coerce keys to strings
    string_tags = new_value.collect { |k, v| [k.to_s, v] }.to_h

    # Cross-populate tag values with other settings
    if env.nil? && string_tags.key?(Core::Environment::Ext::TAG_ENV)
      self.env = string_tags[Core::Environment::Ext::TAG_ENV]
    end

    if version.nil? && string_tags.key?(Core::Environment::Ext::TAG_VERSION)
      self.version = string_tags[Core::Environment::Ext::TAG_VERSION]
    end

    if service_without_fallback.nil? && string_tags.key?(Core::Environment::Ext::TAG_SERVICE)
      self.service = string_tags[Core::Environment::Ext::TAG_SERVICE]
    end

    # Merge with previous tags
    (old_value || {}).merge(string_tags)
  end
end

We can see the default block, parse the ENV to a list and transform its values into a hash, plus some overrides. This block is only executed if the value is not provided by Datadog.configure block.
The setter, in this case, always assumes that it receives a hash either because the ENV has been parsed in the default block or the customer has provided over Datadog.configure block.

New code:

option :tags do |o|
  o.env_var Core::Environment::Ext::ENV_TAGS
  o.setter do |new_value, old_value|
    tag_list = case new_value
               when String
                 values = if new_value.include?(',')
                            new_value.split(',')
                          else
                            new_value.split(' ') # rubocop:disable Style/RedundantArgument
                          end

                 values.map! do |v|
                   v.gsub!(/\A[\s,]*|[\s,]*\Z/, '')

                   v.empty? ? nil : v
                 end

                 values.compact!
                 values.each_with_object({}) do |tag, tags|
                   key, value = tag.split(':', 2)
                   tags[key] = value if value && !value.empty?
                 end
               when Hash
                 new_value
               else
                 {}
               end

    env_value = env
    version_value = version
    service_name = service_without_fallback

    # Override tags if defined
    tag_list[Core::Environment::Ext::TAG_ENV] = env_value unless env_value.nil?
    tag_list[Core::Environment::Ext::TAG_VERSION] = version_value unless version_value.nil?

    # Coerce keys to strings
    string_tags = tag_list.collect { |k, v| [k.to_s, v] }.to_h

    # Cross-populate tag values with other settings
    if env.nil? && string_tags.key?(Core::Environment::Ext::TAG_ENV)
      self.env = string_tags[Core::Environment::Ext::TAG_ENV]
    end

    if version_value.nil? && string_tags.key?(Core::Environment::Ext::TAG_VERSION)
      self.version = string_tags[Core::Environment::Ext::TAG_VERSION]
    end

    if service_name.nil? && string_tags.key?(Core::Environment::Ext::TAG_SERVICE)
      self.service = string_tags[Core::Environment::Ext::TAG_SERVICE]
    end

    # Merge with previous tags
    (old_value || {}).merge(string_tags)
  end
end

We do not specify any type on the setting. The setter block is responsible for accounting for the case the value is a string coming from reading the ENV variable, and a Hash from Datadog.configure block. In the case of any of those, we default to an empty hash.
Everything below that checking of type of value we are dealing with is identical as the old code 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!
I suggest leaving a comment saying we should deprecate allowing this to be with anything that does not reflect the environment variable String format (e.g. c.tags = Hash), as this will complicate long-term maintenance of this setting quite a lot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I don't like about this new version of this setting, is that like in the profiling.no_signals_workaround_enabled setting, previously the parsing of what I'll call the "weird/custom format for the env" was something that was done only for the environment variable.

In this new version, we're actually exposing the "weird/custom format for the env" parsers to customers configuring via code, which I'd claim adds a bit more complexity by increasing the number of ways in which you can do the exact same thing.

Before:

[8] pry(main)> Datadog.configure { |c| c.tags = "potato:cannon" };
NoMethodError: undefined method `collect' for "potato:cannon":String
from lib/datadog/core/configuration/settings.rb:452:in `block (2 levels) in <class:Settings>'

Now:

[5] pry(main)> Datadog.configure { |c| c.tags = "potato:cannon" };
[6] pry(main)> Datadog.configuration.tags
=> {"potato"=>"cannon"}

I wonder if we could avoid this by having the options code treat the value from the environment in a special way.

For instance, as a parameter to the setter:

option :tags do |o|
  o.env_var Core::Environment::Ext::ENV_TAGS
  o.setter do |new_value, old_value, from_environment:| # <-- Added here
    tag_list = case new_value
               when String && from_environment # <--- Only activate parsing of strings if from env
                 values = if new_value.include?(',')
                            new_value.split(',')
                          else
     # ...

or even as a separate proc:

option :tags do |o|
  o.env_var Core::Environment::Ext::ENV_TAGS
  o.env_parser do |env_value|
    # Put the string parsing code here only
  end
  o.setter do |...|
    # The setter doesn't care about parsing the env anymore
  end

This does add a bit more complexity to the DSL to support these weird use-cases, but as an upside for customers it limits the number of valid input values for the configuration, which I think is an overall win.

Copy link
Member Author

@GustavoCaso GustavoCaso Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding from_environment or env_parser might be a solution.

The other solution I suggested would be to be able to fully parse the string value from the env variable into a hash.

option :tags do |o|
  o.type :hash
  o.env_var Core::Environment::Ext::ENV_TAGS
  o.setter do |new_value, old_value|
    env_value = env
    version_value = version
    service_name = service_without_fallback

    # Override tags if defined
    new_value[Core::Environment::Ext::TAG_ENV] = env_value unless env_value.nil?
    new_value[Core::Environment::Ext::TAG_VERSION] = version_value unless version_value.nil?

    # Coerce keys to strings
    string_tags = new_value.collect { |k, v| [k.to_s, v] }.to_h

    # Cross-populate tag values with other settings
    if env.nil? && string_tags.key?(Core::Environment::Ext::TAG_ENV)
      self.env = string_tags[Core::Environment::Ext::TAG_ENV]
    end

    if version_value.nil? && string_tags.key?(Core::Environment::Ext::TAG_VERSION)
      self.version = string_tags[Core::Environment::Ext::TAG_VERSION]
    end

    if service_name.nil? && string_tags.key?(Core::Environment::Ext::TAG_SERVICE)
      self.service = string_tags[Core::Environment::Ext::TAG_SERVICE]
    end

    # Merge with previous tags
    (old_value || {}).merge(string_tags)
  end
end
# option.rb

def coerce_env_variable(value)
  case @definition.type
  when :int
    value.to_i
  when :float
    value.to_f
  when :array
    values = if value.include?(',')
                value.split(',')
              else
                value.split(' ') # rubocop:disable Style/RedundantArgument
              end

    values.map! do |v|
      v.gsub!(/\A[\s,]*|[\s,]*\Z/, '')

      v.empty? ? nil : v
    end

    values.compact!
    values
  when :hash
    values = if new_value.include?(',')
                new_value.split(',')
              else
                new_value.split(' ') # rubocop:disable Style/RedundantArgument
              end

    values.map! do |v|
      v.gsub!(/\A[\s,]*|[\s,]*\Z/, '')

      v.empty? ? nil : v
    end

    values.compact!
    values.each_with_object({}) do |tag, tags|
      key, value = tag.split(':', 2)
      tags[key] = value if value && !value.empty?
    end
  when :bool
    string_value = value
    string_value = string_value.downcase
    string_value == 'true' || string_value == '1' # rubocop:disable Style/MultipleComparison
  when :string, NilClass
    value
  else
    raise ArgumentError, "The option #{@definition.name} is using an unknown type option `#{@definition.type}`"
  end
end

For that we should agree on the format of for converting strings to hash. The current and only pace we do this is for this tags settings. So if we are ok with having the format be tag1:value1,tag2:value2 for this setting and future settings, we can reduce the complexity of this setting

Copy link
Member Author

@GustavoCaso GustavoCaso Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with some of you and understand the different tradeoffs. I added an extra option env_parser that allow us to define complex env parsing login for those edge case options.

d6ab986

Copy link
Contributor

@lloeki lloeki Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having env_parser on o.setter looks like a strange way to factor env parsing in a stage that is unrelated to env parsing.

What about having it as a block to o.env?

o.env Core::Environment::Ext::ENV_TAGS do |str|
  # ... coerce the env string value to ruby
end

o.setter do |...|
  # ... receive ruby value, validate, and set
end

Then o.default, o.setter, and o.env would be internally evaluated in order of precedence.

If needed (which should be rare or transitional), overriding global precedence could be done per option via:

# in order of increasing precedence
o.precedence :default, :deprecated_env, :env, :setter
o.precedence :default, :setter, :deprecated_env, :env

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should allow overriding precedence -- it's more complexity on the DSL, and as I mentioned in my other comments, weird options should be complex, rather than the DSL being complex to make weird options simple.

(As for having the parser as a block for env, no strong feelings, it's more a stylistic choice in the DSL)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for having the parser as a block for env, no strong feelings

I think the decision of having the parser as an optional block argument for env is an interesting option, since it allows for better organisation, but could lead to repetition if we have a env and deprecated_env option that has a custom parsing logic.

option :test do |o|
  o.env HELLO_WORLD do |value|
    custom logic 
  end
  o.deprecated_env HELLO_WORLD_OLD do |value|
    custom logic 
  end
end

Another point of view, is that it allow for individual parsing logic f the deprecated env var has a different format than new one for example.

Another thing we can do (@ivoanjo might love it 😉) is remove deprecated_env and add support for defining multiple env + return the env_var to the env block. That would allow to move the complexity into the env block including the deprecation warning

option :foo do |o|
  o.env ['NEW_ENV_VAR', 'OLD_ENV_VAR'] do |env_var, value|
    if env_var == 'OLD_ENV_VAR'
      # log deprecation
      # custom parsing logic if needed 
    else 
      # parse new env var
    end
  end
end

What do you guys think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... I'm still on the side of "no doing weird things in the core to support weird use cases". Which translated means -- no proc on deprecated_env, only on env.

If someone wants to get funky, they can implement all of their custom logic on the o.env block to read as many env vars as needed.

Copy link
Member

@marcotc marcotc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation is always something we lacked, and could bite ddtrace users easily.

Thank you so much for this work, @GustavoCaso!

I'd like to hear others opinions on this one, and I think @delner is probably a "mandatory" reviewer here too :)

Copy link
Contributor

@lloeki lloeki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good, noted some possible improvements

@@ -74,6 +82,14 @@ def depends_on(*values)
@depends_on = values.flatten
end

def env_var(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<nitpick level=420% bikeshed=yes>

I feel like environment (mimicking Python's os.environment) or env (mimicking Ruby's ENV) look cleaner in the DSL?

Examples:

option :foo_bar do |o|
  o.type :string
  o.environment 'DD_FOO_BAR'
end
option :foo_bar do |o|
  o.type :string
  o.env 'DD_FOO_BAR'
end

</nitpick>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to change it to whatever the team decides. I like the short version env. A little conflicted since just env do not express that is an ENV variable.

Does those someone have thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more vote for env.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like env_var but I will defer to the team. It's not something I feel particularly strongly about.

valid_type = validate(@definition.type, value)

unless valid_type
raise_error = if @definition.type_options[:nil]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might prefer :nilable (or :allow_nil) here.

Example:

o.type :float, nilable: true

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with whatever the team decides.
nilable: true sounds good

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for nilable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 345454a

lib/datadog/core/configuration/option.rb Show resolved Hide resolved
lib/datadog/core/configuration/option.rb Show resolved Hide resolved
Comment on lines 33 to 34
@env_var = meta[:env_var]
@deprecated_env_var = meta[:deprecated_env_var]
Copy link
Contributor

@lloeki lloeki Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have approached it a bit differently:

  • @env_var being an array of env var names, ordered by decreasing precedence, from which the value would be fetched.
  • @deprecated_env_var being an array of env var names: when an env var with such a name is found, a warning is emitted: it does not act as a fetcher.

A few examples:

# one or more old env vars are being phased out

o.environment = ['DD_REMOTE_CONFIG_POLL_INTERVAL_SECONDS', 'DD_REMOTE_CONFIG_POLL_INTERVAL']
o.deprecated_environment = ['DD_REMOTE_CONFIG_POLL_INTERVAL'] # sadly there could be a couple or three of those for a single option.

# two env vars exist with the same semantic, we accept both because one is more consistent with our DSL or with other environment variables that we have, and the other is what some other libs or the agent use, but none of them are deprecated, we're just being friendly.

o.environment = [
  'DD_REMOTE_CONFIGURATION_ENABLED',
  'DD_REMOTE_CONFIG_ENABLED',
]
o.deprecated_environment = []

o.environment = [
  'DD_REMOTE_CONFIGURATION_POLL_INTERVAL_SECONDS',
  'DD_REMOTE_CONFIG_POLL_INTERVAL_SECONDS',
]
o.deprecated_environment = []

WDYT?

Copy link
Member Author

@GustavoCaso GustavoCaso Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this suggestion great 🎉

Supporting multiple env variables gives us a lot of flexibility.

@deprecated_env_var being an array of env var names: when an env var with such a name is found, a warning is emitted: it does not act as a fetcher.

That would be a breaking change for the portion that does not act as a fetcher. Would be a breaking change. The actual code that deal with deprecation ENV vars, logs the deprecation warning and uses the value from the ENV variables.

var.find.with_index do |env_var, i|
found = ENV.key?(env_var)
# Check if we are using a non-preferred environment variable
if deprecation_warning && found && i != 0
Datadog::Core.log_deprecation { "#{env_var} environment variable is deprecated, use #{var.first} instead." }
end
found
end

Also, since the deprecated_environment will allow a customer to use the new environment without breaking their application I think fetching the value until we fully deprecate the old ENV variable makes sense

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both being lists is a great suggestion, actually. I think we should do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecation should emit a warning and use the value when needed (only after all @env_vars have been checked first).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcotc @lloeki I support this.

To keep the PR smaller (if that even possible 🤣 ), I will add that extra functionality on a separate PR or once we find the use case for it

@GustavoCaso GustavoCaso force-pushed the core-configuration-add-env-var-and-casting-functionality branch 2 times, most recently from 971b700 to c676bbb Compare July 18, 2023 07:04
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea, although I wish the PR diff was waaaay smaller lol.

There's a handful of files where I ran out of time and didn't go as deep as I wanted today, but I'm posting my current notes as will do a second pass on those files tomorrow.

Comment on lines 54 to 57
option :ruleset do |o|
o.default { ENV.fetch('DD_APPSEC_RULES', DEFAULT_APPSEC_RULESET) }
o.env_var 'DD_APPSEC_RULES'
o.default DEFAULT_APPSEC_RULESET
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: More as documentation for ourselves, it may be worth leaving here a note on why this setting defines no type.

end

option :waf_timeout do |o|
o.default { ENV.fetch('DD_APPSEC_WAF_TIMEOUT', DEFAULT_APPSEC_WAF_TIMEOUT) } # us
o.env_var 'DD_APPSEC_WAF_TIMEOUT' # us
o.default DEFAULT_APPSEC_WAF_TIMEOUT
o.setter do |v|
Datadog::Core::Utils::Duration.call(v.to_s, base: :us)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already like this (so I'd say this comment is not for this PR), but wow damn this helper is extremely complex for something that is only parsed as integer-microseconds in our other libraries (and this helper is only used once!).

Consider maybe going back to o.type :int in the future?

Copy link
Contributor

@lloeki lloeki Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in our other libraries

Except in Go, coz it's parsed as a time.Duration using time.ParseDuration, which doesn't accept unitless strings, and the env var got used with a suffix in system tests coz Go was the first one there, and then it got in the wild, so I had to implement this parser otherwise both system tests and Go customers having already set env vars in some global way that would reach Ruby apps would crash.

But they fixed since by appending us when it's missing and IIRC system tests have been adjusted to not use the suffix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider maybe going back to o.type :int in the future?

I'm wondering if an alternative would be to have a o.type :duration shorthand and apply that wherever we have durations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the extra context!

On the topic of :duration, to be honest, I'd favor keeping it simple -- it's not that common to see environment variables including units, so I'd avoid it unless we really need to.

o.default { env_to_int('DD_APPSEC_TRACE_RATE_LIMIT', DEFAULT_APPSEC_TRACE_RATE_LIMIT) } # trace/s
o.type :int
o.env_var 'DD_APPSEC_TRACE_RATE_LIMIT' # trace/s
o.default DEFAULT_APPSEC_TRACE_RATE_LIMIT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already like this, so no need to change in this PR, but is it just me that finds weird this structure of splitting the definition of the defaults at the top of the file with the definition of the setting:

              DEFAULT_APPSEC_WAF_DEBUG = false

              # ...

              option :trace_rate_limit do |o|
                o.type :int
                o.env_var 'DD_APPSEC_TRACE_RATE_LIMIT' # trace/s
                o.default DEFAULT_APPSEC_TRACE_RATE_LIMIT
              end

I kinda prefer how we do it in other places:

            option :force_enable_gc_profiling do |o|
              o.env_var 'DD_PROFILING_FORCE_ENABLE_GC'
              o.type :bool
              o.default false
            end

...as it avoids the redundancy + avoids any other parts of the code accessing the default constants where they should always go through the settings object.

I claim the settings should be the source of truth for the defaults.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will cleanup in a separate PR

lib/datadog/appsec/configuration/settings.rb Show resolved Hide resolved
lib/datadog/ci/configuration/settings.rb Show resolved Hide resolved
spec/datadog/core/telemetry/collector_spec.rb Show resolved Hide resolved
Comment on lines 163 to 166
when :int
value.is_a?(Integer)
when :float
value.is_a?(Float)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think strict :float validation is a bit problematic -- in a lot of cases we're using floats to allow for higher definition of values, but an integer is as valid as a float in those situations when you don't need the decimal part.

A good example is the number of defaults and specs that needed to change to pass this validation -- I think we'll be causing undue annoyance to customers here.

My suggestion is -- either accept either an int or a float when a :float is specified OR rename :float to :numeric and only check for is_a?(Numeric).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Thanks for the suggestion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like :numeric in this case, it captures that both integers and floats are accepted in one type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to leave the distinction of :int and :float as for ENV variable parsing and coercion, we still have the distinction between int and floats, but I will change the validation portion to:

when :int, :float
  value.is_a?(Numeric)
end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 765a287

Copy link
Contributor

@lloeki lloeki Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to leave the distinction of :int and :float as for ENV variable parsing and coercion, we still have the distinction between int and floats, but I will change the validation portion to:

I think it is a good and simple example highlighting how coercion is a thing specific to ENV (which can contain only strings which contain some form of serialized values) whereas Ruby type checking is a thing specific to the setting itself. Coercion of ENV resulting in a type accepted by the setting is only incidental (we should make it so still but that's only coz we're friendly folks ^^)

@@ -94,6 +96,91 @@ def default_precedence?

private

def coerce_env_variable(value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the intention here is to replace Datadog::Core::Environment::VariableHelpers. Should we go ahead and mark all those methods as deprecated for removal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that. There is still one setting that still uses Datadog::Core::Environment::VariableHelpers

settings :client_ip do
  # Whether client IP collection is enabled. When enabled client IPs from HTTP requests will
  #   be reported in traces.
  #
  # Usage of the DD_TRACE_CLIENT_IP_HEADER_DISABLED environment variable is deprecated.
  #
  # @see https://docs.datadoghq.com/tracing/configure_data_security#configuring-a-client-ip-header
  #
  # @default `DD_TRACE_CLIENT_IP_ENABLED` environment variable, otherwise `false`.
  # @return [Boolean]
  option :enabled do |o|
    o.type :bool
    o.default do
      disabled = env_to_bool(Tracing::Configuration::Ext::ClientIp::ENV_DISABLED)

      enabled = if disabled.nil?
                  false
                else
                  Datadog::Core.log_deprecation do
                    "#{Tracing::Configuration::Ext::ClientIp::ENV_DISABLED} environment variable is deprecated, use #{Tracing::Configuration::Ext::ClientIp::ENV_ENABLED} instead."
                  end

                  !disabled
                end

      # ENABLED env var takes precedence over deprecated DISABLED
      env_to_bool(Tracing::Configuration::Ext::ClientIp::ENV_ENABLED, enabled)
    end
  end
end

I guess it is ok to mark the methods from Datadog::Core::Environment::VariableHelpers as deprecated.

I would like to that on separate PR to avoid touching more parts of the codebase on this single PR

@@ -1,5 +1,7 @@
# frozen_string_literal: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meta note on this PR: This is the kind of PR that has the potential to have undetected conflicts with other ongoing PRs -- e.g. any PRs that touch settings that branched out before this PR have the potential to cause issues.

Here's my suggestion -- before merging this PR to master:

  1. Diff master and the base of the core-configuration-add-env-var-and-casting-functionality branch, and see if there's any changes that haven't been picked up
  2. (if needed) Merge master into this branch, include any updates
  3. Do a quick round of open PRs and see if any touch the settings; If they do, ask them to hold off until this gets merged, and after that, ask them to rebase on top of master.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meta note on this PR

Meta-question: this has nothing to do with the # frozen_string_literal: true comment, and only uses that first line as anchor for a comment, correct?

Meta-suggestion: maybe it should go as a review comment in "Finish your review"? but then it's not a "threaded" discussion I guess (subsequent discussion would happen in the global PR discussion tab).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meta-suggestion: maybe it should go as a review comment in "Finish your review"? but then it's not a "threaded" discussion I guess (subsequent discussion would happen in the global PR discussion tab).

Yes, it's for the whole review, and I only picked elsewhere because I find people sometimes miss the comment posted with the review itself, and I thought this comment was important enough not to be missed.

Plus, as you pointed out, it's really annoying to not have threading for those comments.

So I'm just working around GitHub UX limitations >_>

lib/datadog/core/configuration/option.rb Show resolved Hide resolved
@GustavoCaso GustavoCaso force-pushed the core-configuration-add-env-var-and-casting-functionality branch from 3c9d018 to b3af7d3 Compare July 19, 2023 10:14
@GustavoCaso GustavoCaso force-pushed the core-configuration-add-env-var-and-casting-functionality branch from b3af7d3 to 345454a Compare July 19, 2023 10:53
end

option :ip_denylist do |o|
o.default { [] }
o.type :array
o.default []
Copy link
Contributor

@lloeki lloeki Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a difference between block and value here:

  1. [] would be eval'd when that line is evaluated, so ever produces a single array instance
  2. { [] } would be eval'd when the block is evaluated, so may or may not produce a single array instance

This led me to wonder if we should freeze that default in the 1. case to ensure it does not get mutated.

This then led me to wonder if we should in fact (transparently) freeze all configuration setting values, because once they are set they should not be mutated.

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Option#default_value we wrap the default with a call to Core::Utils::SafeDup.frozen_or_dup(...) so we dup it, rather than using this exact object.

(I had wondered the same thing you did and ended up spotting it)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to wrap it on Core::Utils::SafeDup.frozen_or_dup(...) because a test validates that when modifying the default value doesn't modify it by reference.

describe '#writer_options' do
  subject(:writer_options) { settings.tracing.test_mode.writer_options }

  it { is_expected.to eq({}) }

  context 'when modified' do
    it 'does not modify the default by reference' do
      settings.tracing.test_mode.writer_options[:foo] = :bar
      expect(settings.tracing.test_mode.writer_options).to_not be_empty
      expect(settings.tracing.test_mode.options[:writer_options].default_value).to be_empty
    end
  end
end

The options were like this before our changes:

option :writer_options do |o|
  o.default { {} }
end

This is how they are after our changes:

option :writer_options do |o|
  o.type :hash
  o.default({})
end

So I believe this was an oversight on my part in which using a default value as a block also meant that calling default_value would return a hash every time. To fix that particular spec, I had to use Core::Utils::SafeDup.frozen_or_dup

@lloeki suggested that we should be freezing the configuration value it gets set, so any modification to the configuration value should have to use dup first

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think freezing instead of dup'ing seems reasonable as well, although I wonder if somewhere we may be doing something weird that doesn't like it (but perhaps in that situation it'd be worth fixing that code instead of not freezing)

end

option :user_id_denylist do |o|
o.default { [] }
o.type :array
o.default []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should have logic that would have o.type :array imply o.default [].

Then, o.type :array, nilable: true would probably imply o.default nil (debatable but it seems to make the most sense)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a great suggestion and will certainly reduce the amount of boilerplate.

I would leave that exercise for a subsequent PR

end
o.type :bool
o.env 'DD_APPSEC_ENABLED'
o.default DEFAULT_APPSEC_ENABLED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels weird, seems like it should read o.default true.

Indeed whatever else depends on DEFAULT_APPSEC_ENABLED should instead read the configuration setting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is legacy from when the appsec setting was not using the core configuration code.

We can remove all those constants on a separate PR 😄


# There's a few cases where we don't want to use the fallback service name, so this helper allows us to get a
# nil instead so that one can do
# nice_service_name = Datadog.configuration.service_without_fallback || nice_service_name_default
o.helper(:service_without_fallback) do
service_name = service
service_name unless service_name.equal?(Core::Environment::Ext::FALLBACK_SERVICE_NAME)
service_name unless service_name === Core::Environment::Ext::FALLBACK_SERVICE_NAME
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that === (a.k.a match operator) intended?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it was an issue with some tests. I had to change it to make them pass.

I reverted to using equal? 87d739d

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments, still going through everything!

end

option :ip_denylist do |o|
o.default { [] }
o.type :array
o.default []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Option#default_value we wrap the default with a call to Core::Utils::SafeDup.frozen_or_dup(...) so we dup it, rather than using this exact object.

(I had wondered the same thing you did and ended up spotting it)

Comment on lines +179 to +180
when NilClass
true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Maybe worth adding a comment here stating that this is a fallback for settings that don't define a type.

Comment on lines +200 to +205
precedence = Precedence::PROGRAMMATIC
end

if value.nil? && definition.deprecated_env && ENV[definition.deprecated_env]
value = coerce_env_variable(ENV[definition.deprecated_env])
precedence = Precedence::PROGRAMMATIC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be Precedence::DEFAULT? According to the comments at the top of this file, DEFAULT is

          # Configuration that comes either from environment variables,
          # or fallback values.
          DEFAULT = [0, :default].freeze

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment should be changed.

If a customer has modified the configuration value via ENV variable, it has been configured programmatically since the customer is not using the default value.

For example:

option :enabled do |o|
  o.type :bool
  o.env 'DD_APPSEC_ENABLED'
  o.default false
end

If the customer changes the value using Datadog.configuration.appsec.enabled = true or via DD_APPSEC_ENABLED, we would want the helper function using_default? to return false Datadog.configuration.appsec.using_default?(:enabled) #=> false, since the intent of DEFAULT = [0, :default].freeze is to signal the customer hasn't set the value in anyway

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Seems reasonable -- either updating the comment or perhaps introducing a new environment variable level seems OK.

Comment on lines +110 to +114
values = if value.include?(',')
value.split(',')
else
value.split(' ') # rubocop:disable Style/RedundantArgument
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this become value.split(',') only? Since the default is to not support the spaces-variant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the default is not to support the spaces-variant?

Where is that defined?

The code for env_to_list did not define any default value for def env_to_list(var, default = [], comma_separated_only:, deprecation_warning: true)

Doing a search on comma_separated_only shows three instances of true, and four instances of false.

I'm happy to remove the if condition, I just to make sure to understand if we have to

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the default for the DSL should be to not support it, because in my view the "spaces separator" is the weird use-case, so it should require extra work on the part of the settings that want it.

(As I mention below, accepting spaces on top of commas has the potential of creating more inconsistency across datadog libraries)

Comment on lines +130 to +132
else
raise ArgumentError, "The option #{@definition.name} is using an unknown type option `#{@definition.type}`"
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I wonder if this should be omitted, since validate_type will do the same work anyway.

@@ -112,23 +135,28 @@ def setter(&block)
@setter = block
end

def type(value = nil)
def type(value, type_options = {})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest making this strict:

Suggested change
def type(value, type_options = {})
def type(value, nilable: false)
type_options = {nilable: nilable}

this way, it's clear from the code which type options are accepted, and we get a nice error message if we mistype them (rather than swallowing everything and then ignoring it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion 😄

lib/datadog/core/configuration/option.rb Show resolved Hide resolved
Comment on lines +384 to +394
o.setter do |value|
if ['true', true, '1', 'false', false, :auto].include?(value)
if value == :auto
value
else
['true', true, '1'].include?(value)
end
else
value
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this can now use the env_parser:

Suggested change
o.setter do |value|
if ['true', true, '1', 'false', false, :auto].include?(value)
if value == :auto
value
else
['true', true, '1'].include?(value)
end
else
value
end
end
o.env_parser do |value|
if value
value = value.strip.downcase!
['true', '1'].include?(value)
end
end

tags[key] = value if value && !value.empty?
end
o.env_var Core::Environment::Ext::ENV_TAGS
o.setter do |new_value, old_value|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should allow overriding precedence -- it's more complexity on the DSL, and as I mentioned in my other comments, weird options should be complex, rather than the DSL being complex to make weird options simple.

(As for having the parser as a block for env, no strong feelings, it's more a stylistic choice in the DSL)

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More comments...

Comment on lines 507 to 508
if env.nil? && string_tags.key?(Core::Environment::Ext::TAG_ENV)
self.env = string_tags[Core::Environment::Ext::TAG_ENV]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Should we also change this for consistency with the others below?

Suggested change
if env.nil? && string_tags.key?(Core::Environment::Ext::TAG_ENV)
self.env = string_tags[Core::Environment::Ext::TAG_ENV]
if env_value.nil? && string_tags.key?(Core::Environment::Ext::TAG_ENV)
self.env = string_tags[Core::Environment::Ext::TAG_ENV]

lib/datadog/tracing/configuration/settings.rb Show resolved Hide resolved
Comment on lines 33 to 41
option :service_name do |o|
o.default do
o.type :string, nil: true
o.env_var Ext::ENV_SERVICE_NAME
o.setter do |value|
Contrib::SpanAttributeSchema.fetch_service_name(
Ext::ENV_SERVICE_NAME,
value,
Ext::DEFAULT_PEER_SERVICE_NAME
)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the settings that are still using o.default for the option :service_name (such as tracing/contrib/active_record/configuration/settings) are doing so because they don't yet specify a type, is that it?

let(:configuration_options) { { error_status_codes: 500..502 } }
let(:configuration_options) { { error_status_codes: (500..502).to_a } }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be careful -- Is this a breaking change? It looks like it was extremely intentional that a range could be used here, and if customers are providing a range, we may be breaking their configuration.

Comment on lines -163 to +167
it { is_expected.to be(default_value) }
it do
# mock .dup lib/datadog/core/configuration/option.rbL87
expect(default_value).to receive(:dup).and_return(default_value)
is_expected.to be(default_value)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be extremely brittle. I would suggest either testing the outcome (value is dup'd when not frozen), or testing the interface (SafeDup.frozen_or_dup gets called, and whatever it returns is the output).

Also I think since this is non-trivial, it would be nice if the it had a description.

(Same note applies to reset_option below)

lib/datadog/core/configuration/option.rb Show resolved Hide resolved
@GustavoCaso
Copy link
Member Author

This PR has been split into smaller ones to facilitate the review process.

#2983
#2987
#2988

@GustavoCaso GustavoCaso marked this pull request as draft July 24, 2023 10:55
@marcotc
Copy link
Member

marcotc commented Jul 26, 2023

@GustavoCaso, because this PR was split, can this specific one be closed then?

@GustavoCaso GustavoCaso deleted the core-configuration-add-env-var-and-casting-functionality branch July 27, 2023 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
appsec Application Security monitoring product ci-app CI product for test suite instrumentation core Involves Datadog core libraries integrations Involves tracing integrations tracing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants