Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuring agent.logging.level through agent policy do not work for managed agents #2851

Closed
4 tasks
nchaulet opened this issue Jun 13, 2023 · 13 comments · Fixed by #3090
Closed
4 tasks

Configuring agent.logging.level through agent policy do not work for managed agents #2851

nchaulet opened this issue Jun 13, 2023 · 13 comments · Fixed by #3090
Assignees
Labels
enhancement New feature or request Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@nchaulet
Copy link
Member

nchaulet commented Jun 13, 2023

Issue

We now have a feature that allow to add arbitrary key to the agent policy sent to the agent, elastic/kibana#159414 (comment)

I tried to set agent log level this way to Fleet managed agents, and it seems the agent is still logging at the default log level, when I run elastic-agent inspect I saw that my agent.logging.level: debug config is here, but it seems not used by the agent

Definition of done

  • Logging level is taking into account the one in the policy
  • Test confirming the logging level change is working as expected
  • Test covering the logging level change from settings action too
  • Test covering an unexisting logging level target
@nchaulet nchaulet added the bug Something isn't working label Jun 13, 2023
@cmacknz
Copy link
Member

cmacknz commented Jun 13, 2023

Seems like the logging level in the policy is ignored in the policy change handler, we always persist the log level we started with.

reader, err := fleetToReader(h.agentInfo, h.config)

We only support changing the log level through the SETTINGS action but I have no idea why we did it this way.

@cmacknz cmacknz added the Team:Elastic-Agent Label for the Agent team label Jun 13, 2023
@jlind23
Copy link
Contributor

jlind23 commented Jun 13, 2023

But agent debug logs can still be turned out with the Ui change and not the api call Nicolas mentioned an I correct?

@cmacknz
Copy link
Member

cmacknz commented Jun 13, 2023

Yes, this is unintuitive but you can still change the log level from the UI with the SETTINGS action.

@jlind23
Copy link
Contributor

jlind23 commented Jun 13, 2023

So at least we have a workaround for this..

@pchila
Copy link
Member

pchila commented Jul 3, 2023

@cmacknz I had a look at the code and I guess that we ignore the log level in the handler_policy_change_action due to an issue with deserialization:
the log level is defined as an int8 within agent where 0 corresponds to info level and we use a map to convert the string we receive in the policy: when unmarshaling a policy that does not contain the log level (all of them unless we use the new fleet entry point), we are probably going to get and use the zero value (an empty string) which will likely throw an error (to be tested).

To solve this issue we have to optionally unpack the log level only when the key is specified and fall back on the configured value if it's not. Another possible solution would be to transform the Level into a *Level and then add the current log level value if the specified Level is nil (probably a bit cleaner) but we can't change it easily as it's part of elastic-agent-libs... 😞

@cmacknz
Copy link
Member

cmacknz commented Jul 4, 2023

but we can't change it easily as it's part of elastic-agent-libs...

The two largest users of elastic-agent-libs are Agent and Beats (and things that implement a Beat outside of the Beats repository). As long as the change wouldn't be breaking for Beats we could do this if it makes more sense, and if it is breaking we could just make it an option or variant of the existing behavior.

@michalpristas
Copy link
Contributor

michalpristas commented Jul 17, 2023

bringing a bit of a history into this.
the proposed change should not work in a way how agent is designed. i'm seeing this as a feature request not a bug.

the idea behind POLICY and SETTING was this
POLICY contains information about Inputs, Output and Fleet connection info
SETTINGS is agent specific setting

when we start thinking about supporting setting Log Level using settings I'd like us to stop for a while and think about how this should behave first.

At this time it' simple, default log level is info, we persist it as a setting. Once there's a need to switch to debug you will switch to debug and we persist it as a configuration option, this overrides default one. Once you're done you'll go back to Info.

Once you introduce setting log level using POLICY at the start we end up with a pair info:nil meaning default using settings is info, and it is not set using policy

then you change policy to use warn ending up with info:warn the question is which one has priority, we dont have information about the source of the first one.

Even if we consider not setting info at start, ending up with nil:warn. We will use warn.

Now setting from Debug SETTINGS will come, we have debug:warn using debug with a higher priority as it is per agent config.

The question is, how we reset per agent config. if we set whatever, we will end up with whatever:warn with whatever always taking priority even with future Policy changes.

When I take a look at diagnostics I should be deterministically say what's the log level based on Policy and Settings without worrying about order.

From my point of view we should either

  • prevent agent config being set in POLICY (as it was until now)
  • add Inherit log level or Reset function on Fleet UI which would result in pair nil:whatever

Getting rid of SETTINGs does not make sense. In the need of debug, changing log level to Debug for potentially 100k agent would create a spike in processing requirement, memory and cost eventually.

@cmacknz offline this week, pinging @joshdover: do we have thought through use cases for this?

@joshdover
Copy link
Contributor

Thanks for the history, @michalpristas. Makes sense why this is behaving this way.

I think this is somewhat clear already, but just to be concise the use cases we have are:

  1. Have a sensible default log level (info)
  2. Set the log level for a group of agents (policy)
    • Example use case: trying to find a infrequent bug that is happening in a pool of agents pulling from an SQS queue
  3. Increase the log verbosity on a single agent as a one-off to debug an issue, without affecting other agents on the policy

IMO the precedence order should be:

  • Default to 1
  • 2 always overrides 1
  • 3 always overrides 2

In general we should track which agents have a one-off setting applied via the SETTINGS action and should flag those to the user in the UI. This is essentially "drift detection" for agents that are deviating from their policy, and the user should be encouraged to reset those agents back to the policy.

add Inherit log level or Reset function on Fleet UI which would result in pair nil:whatever

Yeah I agree with this, there needs to be some way to reset (3) and start using (2) again.

So in conclusion, I agree this is not a bug, but a new feature and use case: controlling the log level via the agent policy. We've seen this request before and it should be prioritized, but as an enhancement and not a bug.

@joshdover
Copy link
Contributor

@pierrehilbert If you agree, let's deprioritize this one for now.

@pierrehilbert pierrehilbert added enhancement New feature or request and removed bug Something isn't working labels Jul 17, 2023
@pierrehilbert
Copy link
Contributor

Thanks everyone for your insights and I agree with the precedence order mentioned by @joshdover.
I changed the issue type and priority according to this new agreement.
Still keeping this in the current sprint for now but with a lower priority compare to other items and will see if we need some other adjustments.

@michalpristas
Copy link
Contributor

keep in mind that in order for this to work properly we need reset fucntionality or Policy log level in fleet in place. Without this once you set log level using settings all log levels coming from Policy will be ignored.

from planning perspective it does not make sense to work on this until fleet part is at least scheduled for development

@joshdover
Copy link
Contributor

Agreed, we will need support for this in Fleet. It's likely a small enhancement, but needs to be planned for.

@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants