Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5k+ download attempts within few seconds in agent logs while upgrading 8.11.3 agent to 8.12.0 with invalid agent binary. #3914

Closed
amolnater-qasource opened this issue Dec 14, 2023 · 20 comments · Fixed by #3930
Assignees
Labels
bug Something isn't working impact:high Short-term priority; add to current release, or definitely next. QA:Validated Validated by the QA Team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@amolnater-qasource
Copy link

Kibana Build details:

VERSION: 8.12.0 BC1
BUILD: 69840
COMMIT: 93cff0aacd70bd0835cd244348f742ba3cafc4aa

Host OS: SLES15

Preconditions:

  1. 8.12.0 BC1 Kibana cloud environment should be available.
  2. 8.11.3 Agent should be installed using agent policy.
  3. Invalid agent binary must be added.

Steps to reproduce:

  1. Trigger upgrade for 8.11.3 agent with invalid agent binary.
  2. Navigate to agent logs tab.
  3. Observe 5k+ download attempts within few seconds.

NOTE:

  • Issue is consistently reproducible every time for 8.11.3.
  • We haven't observed this issue with 8.11.1 linux agent.

Screenshot:
image

Expected Result:
Download attempts shouldn't be so frequent while upgrading 8.11.3 agent to 8.12.0 with invalid agent binary.

Logs:
8.11.3[Issue]:
elastic-agent-diagnostics-2023-12-14T08-19-39Z-00.zip

8.11.1[Not an issue]:
elastic-agent-diagnostics-2023-12-14T07-57-23Z-00.zip

@amolnater-qasource amolnater-qasource added bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team impact:high Short-term priority; add to current release, or definitely next. labels Dec 14, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@amolnater-qasource
Copy link
Author

@manishgupta-qasource Please review.

@manishgupta-qasource
Copy link

Secondary review for this ticket is Done

@juliaElastic
Copy link
Contributor

@ycombinator Do you think this is the same issue as #3915?

@ycombinator
Copy link
Contributor

Yes, it is. I will close #3915 as a duplicate of this one.

@cmacknz
Copy link
Member

cmacknz commented Dec 14, 2023

This seems to only reproduce for agents enrolled in Fleet. Looking at changes between 8.11.1 and 8.11.3 #3803 is the only one that jumped out as related.

Reverting #3803 fixes the problem so somehow it is causing this. Possibly the settings updates are causing the backoff interval to reset continuously. FYI @AndersonQ

@AndersonQ
Copy link
Member

I'm on it. I found the issue already

@AndersonQ
Copy link
Member

the fix is here: #3930

@amolnater-qasource amolnater-qasource added the QA:Ready For Testing Code is merged and ready for QA to validate label Jan 2, 2024
@amolnater-qasource
Copy link
Author

amolnater-qasource commented Jan 9, 2024

Hi @AndersonQ

We have revalidated this issue on latest 8.12.0 BC5 kibana cloud environment and found it still reproducible.

Observations:

  • 5k+ download attempts within few seconds in agent logs while upgrading 8.11.3 agent to 8.12.0 with invalid agent binary.

Build details:
VERSION: 8.12.0
BUILD: 70053
COMMIT: db9b8921b37139cbb1e11d23f6381f655edeb72b
Artifact Link: https://staging.elastic.co/8.12.0-9f05a310/downloads/beats/elastic-agent/elastic-agent-8.12.0-windows-x86_64.zip

Screenshot:
image

Agent Logs:
elastic-agent-diagnostics-2024-01-08T13-28-33Z-00.zip

Hence, we are reopening this issue.

Could you please confirm if this issue will be fixed while upgrading from 8.12.0 onwards?

Please let us know if anything else is required from our end.

Thanks!

@pierrehilbert
Copy link
Contributor

Hey @amolnater-qasource just to understand, what is this invalid agent binary?
From what I can understand the issue seems to not be related to the Agent right?

@AndersonQ
Copy link
Member

I believe he meant an invalid Agent Binary Download url

@AndersonQ
Copy link
Member

@amolnater-qasource, I cannot reproduce it.

root@elastic-agent:~# wget https://staging.elastic.co/8.12.0-9f05a310/downloads/beats/elastic-agent/elastic-agent-8.12.0-linux-x86_64.tar.gz
--2024-01-09 11:37:27--  https://staging.elastic.co/8.12.0-9f05a310/downloads/beats/elastic-agent/elastic-agent-8.12.0-linux-x86_64.tar.gz
Resolving staging.elastic.co (staging.elastic.co)... 34.120.127.130, 2600:1901:0:1d7::
Connecting to staging.elastic.co (staging.elastic.co)|34.120.127.130|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 579251938 (552M) [application/x-gzip]
Saving to: ‘elastic-agent-8.12.0-linux-x86_64.tar.gz’

elastic-agent-8.12.0-linux-x86_64.tar.gz       100%[====================================================================================================>] 552.42M  18.1MB/s    in 33s     

2024-01-09 11:38:00 (16.7 MB/s) - ‘elastic-agent-8.12.0-linux-x86_64.tar.gz’ saved [579251938/579251938]

root@elastic-agent:~# tar -xf elastic-agent-8.12.0-linux-x86_64.tar.gz 
root@elastic-agent:~# ./elastic-agent-8.12.0-linux-x86_64/elastic-agent install -nf --url=https://elastic.stack:443 --enrollment-token=abeautifultoken==
Installing in non-interactive mode.
[=== ] Service Started  [50s] Elastic Agent successfully installed, starting enrollment.
[====] Waiting For Enroll...  [50s] {"log.level":"info","@timestamp":"2024-01-09T11:39:19.210+0100","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":496},"message":"Starting enrollment to URL: https://b4145ded66b44c35b78aa01b0c2b9d3e.fleet.us-west2.gcp.elastic-cloud.com:443/","ecs.version":"1.6.0"}
[====] Waiting For Enroll...  [52s] {"log.level":"info","@timestamp":"2024-01-09T11:39:20.784+0100","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":461},"message":"Restarting agent daemon, attempt 0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-01-09T11:39:20.785+0100","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":285},"message":"Successfully triggered restart on running Elastic Agent.","ecs.version":"1.6.0"}
Successfully enrolled the Elastic Agent.
[====] Done  [52s]                                          
Elastic Agent has been successfully installed.

## force update
root@elastic-agent:~# curl --request POST \
--url 'https://kibana.elastic.stack:9243/api/fleet/agents/fea9d0be-d2a1-4edd-bf60-846d67010d5b/upgrade' \
--user "elastic:securePassword" \
--header 'Content-Type: application/json' \
--header 'kbn-xsrf: as' \
--data '{"version": "8.12.0", "force": true}'

the agent logs:

root@elastic-agent:~# cat /opt/Elastic/Agent/data/elastic-agent-8744ca/logs/elastic-agent-20240109-1.ndjson | grep -e 'download attempt'
{"log.level":"info","@timestamp":"2024-01-09T10:50:53.979Z","log.origin":{"file.name":"upgrade/step_download.go","file.line":253},"message":"download attempt 1","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-01-09T10:51:03.984Z","log.origin":{"file.name":"upgrade/step_download.go","file.line":263},"message":"download attempt 1 failed: unable to download package: 2 errors occurred:\n\t* package '/opt/Elastic/Agent/data/elastic-agent-8744ca/downloads/elastic-agent-8.12.0-linux-x86_64.tar.gz' not found: open /opt/Elastic/Agent/data/elastic-agent-8744ca/downloads/elastic-agent-8.12.0-linux-x86_64.tar.gz: no such file or directory\n\t* fetching package failed: Get \"https://broken.no/way.it.is.a.domain/beats/elastic-agent/elastic-agent-8.12.0-linux-x86_64.tar.gz\": lookup broken.no on 127.0.0.53:53: read udp 127.0.0.1:48679->127.0.0.53:53: i/o timeout\n\n; retrying in 44.513486453s.","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-01-09T10:51:48.502Z","log.origin":{"file.name":"upgrade/step_download.go","file.line":253},"message":"download attempt 2","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-01-09T10:51:58.506Z","log.origin":{"file.name":"upgrade/step_download.go","file.line":263},"message":"download attempt 2 failed: unable to download package: 2 errors occurred:\n\t* package '/opt/Elastic/Agent/data/elastic-agent-8744ca/downloads/elastic-agent-8.12.0-linux-x86_64.tar.gz' not found: open /opt/Elastic/Agent/data/elastic-agent-8744ca/downloads/elastic-agent-8.12.0-linux-x86_64.tar.gz: no such file or directory\n\t* fetching package failed: Get \"https://broken.no/way.it.is.a.domain/beats/elastic-agent/elastic-agent-8.12.0-linux-x86_64.tar.gz\": lookup broken.no on 127.0.0.53:53: read udp 127.0.0.1:40726->127.0.0.53:53: i/o timeout\n\n; retrying in 40.568024097s.","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-01-09T10:52:39.080Z","log.origin":{"file.name":"upgrade/step_download.go","file.line":253},"message":"download attempt 3","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-01-09T10:52:49.087Z","log.origin":{"file.name":"upgrade/step_download.go","file.line":263},"message":"download attempt 3 failed: unable to download package: 2 errors occurred:\n\t* package '/opt/Elastic/Agent/data/elastic-agent-8744ca/downloads/elastic-agent-8.12.0-linux-x86_64.tar.gz' not found: open /opt/Elastic/Agent/data/elastic-agent-8744ca/downloads/elastic-agent-8.12.0-linux-x86_64.tar.gz: no such file or directory\n\t* fetching package failed: Get \"https://broken.no/way.it.is.a.domain/beats/elastic-agent/elastic-agent-8.12.0-linux-x86_64.tar.gz\": lookup broken.no on 127.0.0.53:53: read udp 127.0.0.1:43029->127.0.0.53:53: i/o timeout\n\n; retrying in 1m26.234918773s.","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-01-09T10:54:15.325Z","log.origin":{"file.name":"upgrade/step_download.go","file.line":253},"message":"download attempt 4","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-01-09T10:54:25.331Z","log.origin":{"file.name":"upgrade/step_download.go","file.line":263},"message":"download attempt 4 failed: unable to download package: 2 errors occurred:\n\t* package '/opt/Elastic/Agent/data/elastic-agent-8744ca/downloads/elastic-agent-8.12.0-linux-x86_64.tar.gz' not found: open /opt/Elastic/Agent/data/elastic-agent-8744ca/downloads/elastic-agent-8.12.0-linux-x86_64.tar.gz: no such file or directory\n\t* fetching package failed: Get \"https://broken.no/way.it.is.a.domain/beats/elastic-agent/elastic-agent-8.12.0-linux-x86_64.tar.gz\": lookup broken.no on 127.0.0.53:53: read udp 127.0.0.1:49502->127.0.0.53:53: i/o timeout\n\n; retrying in 56.392391053s.","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-01-09T10:55:21.727Z","log.origin":{"file.name":"upgrade/step_download.go","file.line":253},"message":"download attempt 5","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-01-09T10:55:31.735Z","log.origin":{"file.name":"upgrade/step_download.go","file.line":263},"message":"download attempt 5 failed: unable to download package: 2 errors occurred:\n\t* package '/opt/Elastic/Agent/data/elastic-agent-8744ca/downloads/elastic-agent-8.12.0-linux-x86_64.tar.gz' not found: open /opt/Elastic/Agent/data/elastic-agent-8744ca/downloads/elastic-agent-8.12.0-linux-x86_64.tar.gz: no such file or directory\n\t* fetching package failed: Get \"https://broken.no/way.it.is.a.domain/beats/elastic-agent/elastic-agent-8.12.0-linux-x86_64.tar.gz\": lookup broken.no on 127.0.0.53:53: read udp 127.0.0.1:39399->127.0.0.53:53: i/o timeout\n\n; retrying in 58.524151809s.","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-01-09T10:56:30.259Z","log.origin":{"file.name":"upgrade/step_download.go","file.line":253},"message":"download attempt 6","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-01-09T10:56:40.268Z","log.origin":{"file.name":"upgrade/step_download.go","file.line":263},"message":"download attempt 6 failed: unable to download package: 2 errors occurred:\n\t* package '/opt/Elastic/Agent/data/elastic-agent-8744ca/downloads/elastic-agent-8.12.0-linux-x86_64.tar.gz' not found: open /opt/Elastic/Agent/data/elastic-agent-8744ca/downloads/elastic-agent-8.12.0-linux-x86_64.tar.gz: no such file or directory\n\t* fetching package failed: Get \"https://broken.no/way.it.is.a.domain/beats/elastic-agent/elastic-agent-8.12.0-linux-x86_64.tar.gz\": lookup broken.no on 127.0.0.53:53: read udp 127.0.0.1:52196->127.0.0.53:53: i/o timeout\n\n; retrying in 55.572481496s.","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-01-09T10:57:35.844Z","log.origin":{"file.name":"upgrade/step_download.go","file.line":253},"message":"download attempt 7","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-01-09T10:57:45.854Z","log.origin":{"file.name":"upgrade/step_download.go","file.line":263},"message":"download attempt 7 failed: unable to download package: 2 errors occurred:\n\t* package '/opt/Elastic/Agent/data/elastic-agent-8744ca/downloads/elastic-agent-8.12.0-linux-x86_64.tar.gz' not found: open /opt/Elastic/Agent/data/elastic-agent-8744ca/downloads/elastic-agent-8.12.0-linux-x86_64.tar.gz: no such file or directory\n\t* fetching package failed: Get \"https://broken.no/way.it.is.a.domain/beats/elastic-agent/elastic-agent-8.12.0-linux-x86_64.tar.gz\": lookup broken.no on 127.0.0.53:53: read udp 127.0.0.1:54603->127.0.0.53:53: i/o timeout\n\n; retrying in 1m25.615818166s.","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
root@elastic-agent:~# 

I know I'm using linux, but this part of the code isn't platform specific. Did you try with linux, it is happening only on windows or with any platform?

@amolnater-qasource
Copy link
Author

Hi @AndersonQ

I know I'm using linux, but this part of the code isn't platform specific. Did you try with linux, it is happening only on windows or with any platform?

Yes we have tested this with linux SLES only.

Just to confirm we are validating this while upgrading 8.11.3 to 8.12.0, I assume the merges won't be available in this version. Should we re-test once the 8.12.1 patch is available?

Hey @amolnater-qasource just to understand, what is this invalid agent binary?

@pierrehilbert We mean the agent binary url which we update under the Fleet Settings.

Screenshot:

image

Please let us know if we are missing anything here.
Thanks!

@pierrehilbert
Copy link
Contributor

The fix from @AndersonQ is not in 8.11.3 so I think this is normal to still have the problem for now and as you mentioned, to test this again from 8.12.0 to 8.12.1 when it will be available.

@pierrehilbert
Copy link
Contributor

@amolnater-qasource should we close this one again then?

@amolnater-qasource
Copy link
Author

Sure, @pierrehilbert

We will revalidate this once the 8.12.1 patch is available and will re-open if required.

Thanks!

@cmacknz
Copy link
Member

cmacknz commented Jan 15, 2024

You should be able to test this upgrading from 8.12.0 to 8.13.0-SNAPSHOT without waiting for 8.12.1.

@amolnater-qasource
Copy link
Author

Hi Team,

We have revalidated this issue on latest 8.13.0-SNAPSHOT kibana cloud environment and found it fixed now.

Observations:

  • No 5k download attempts are made in agent logs while upgrading 8.12.0 agent to 8.13.0-SNAPSHOT with invalid agent binary.

Build details:
VERSION: 8.13.0-SNAPSHOT
BUILD: 70705
COMMIT: de79d5db8bd88f75dbd88d097a100c390eac77a1
Artifact Link: https://staging.elastic.co/8.12.0-3eba7f46/summary-8.12.0.html

Screenshot:
image

Agent Logs:
elastic-agent-diagnostics-2024-01-16T10-02-47Z-00.zip

Hence we are marking this issue as QA:Validated.

Thanks!

@amolnater-qasource amolnater-qasource removed the QA:Ready For Testing Code is merged and ready for QA to validate label Jan 16, 2024
@amolnater-qasource amolnater-qasource added the QA:Validated Validated by the QA Team label Jan 16, 2024
@pierrehilbert
Copy link
Contributor

Super, thx @amolnater-qasource

@harshitgupta-qasource
Copy link

Bug Conversion

  • Test-Case not required as this particular checkpoint is already covered in exploratory testing.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working impact:high Short-term priority; add to current release, or definitely next. QA:Validated Validated by the QA Team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants