-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Evaluate Fleet page load performance and steps to improve #118751
Comments
Pinging @elastic/fleet (Team:Fleet) |
@mostlyjason just to keep track, which version did you do this testing on?
We have plans to optimize this in 8.0 as part of #111858
Details about why this is slow can be found in #110500. tl;dr we need a way to write Elasticsearch assets in bulk to reduce the number of cluster state updates needed. The ES team has preferred that instead of a bulk API we spend effort towards a generic "package install" API as part of making packages a first-class concept across the Stack. |
Thanks Josh! I tested on 7.16-snapshot in a cluster that was created about a week ago. It looks like the APM package installs 41 ingest node pipelines so that might explain why it's slow. |
I started to give a look at this testing Fleet performance with some agents and policies and looks like we have a few issues:
I will edit that comment when I found more things and I am going to create or link to existing issues for each problem try to add trace|profile where I can. n+1 problemsDuring Fleet setup each time
During Fleet setup one time
Bulk action on agents:
Installing ES assetsWebperf issuesLoading the package list is slow (I had a cpu Profile with 5s of JS to render the package grid, we should investigate more on how to fix that maybe virtualizing that list). |
@nchaulet some good finds here 🎉
After 8.0 ships which moves setup to Kibana boot and we've gotten some feedback from users / support I think we should consider removing this setup API call from the UI. It definitely shouldn't be necessary once #120616 is in since Kibana won't even start up if there's a setup issue. We only left it in for now as a hacky/cheap "retry" option, but once we block Kibana boot we can be sure that setup already completed before the UI is ever served up. Also related is #121639. Curious if there's any improvement we can make to the Integration details page load. I've noticed in the past that this can be quite slow, especially on Cloud for some reason. For example, would it be advantageous to avoid loading the entire package to show this page and instead only load the manifest and screenshots? |
Yes I think we can have some optimization on the details and integration list page, I need to dig more in the details page but on the integration list page we are passing a lot of time rendering the grid (I have a CPU profile where in took 5s of blocking JS to render that list) for sure we need to optimize this, maybe we can virtualize that list and render only visible item or maybe there is obvious things here that are not performant need to dig more in it. |
I created the following issues after investigation more For the integration listing page: For the details page |
Thanks for the investigation @nchaulet, good stuff here. WRT to integration list and details, I thought a few times that maybe we can cache package info on the client-side via React Context (or similar) so that we can instantly load the information if it's already been fetched before. This would be in addition to any optimizations we do on the actual package info endpoints you've identified in #122560. WDYT? |
Actually this call is only slow the first time, after the package info are already cached server side (in memory without expiration so this could problematic at some point) so I do not think caching client side will not make a huge difference here |
@jen-huang should we pass any of these to the journey team, since they own unified integrations UI now? |
IMO for any client-side caching we should be leveraging the built-in features of the browser, ie. Though as @nchaulet pointed out in #122560, it seems the main issue is that we download the whole package contents rather than just the manifest + screenshots. |
100% agree that we should lean on browser cache controls and the fact that the EPR is served via a CDN rather than doing our own custom caching here. We're duplicating a lot of effort in terms of caching here right now, and I think we're also needlessly relying on the "archive" endpoints (e.g. https://epr.elastic.co/epr/nginx/nginx-1.2.1.zip) when we could be relying on the "plain JSON" endpoint (e.g. https://epr.elastic.co/package/nginx/1.2.1/) for each package instead. The production EPR currently responds with Relying on HTTP caching directives would mean moving our interactions with EPR into the client and off of the server, though, as server-side requests won't honor HTTP cache headers like the browser. It would almost certainly be preferential in place of our current workflow which downloads, unpacks, and caches a We could probably flesh out #122560 to capture this larger refactor in pursuit of performance gains, or we could try and separate a few of these points out into distinct issues. Eager to hear others' thoughts here. @ruflin brought up some concerns about package size in this context in an offline email chain earlier, so I'll loop him in here as well. |
Yes it will be a lot better, also the cache we have in memory do not have any expiration it's just a plain object hash map so if the number of package grow to much in the future it could be an issue. |
This would be a super simple and quick change. |
@nchaulet Can we close this issue now and use the remaining tickets you opened? |
Yes I am going to close that issue 👍 |
Problem
Several people have noticed slow page load performance within in the Fleet and Integrations apps. When users start a trial in Elastic Cloud they expect good performance as part of a good user experience. Needing to wait 5+ seconds for a page to load makes the application feel sluggish, especially in the absence of UI affordances like loading indicators. The getting started working group sees this as a high priority area to investigate and improve in order to lower our trial churn rate.
Evaluation
I'd like us to evaluate the end to end performance of Fleet as the user starts a cloud trial, views the integration browse page, adds an integration, and adds an agent. Trying it out in my own browser I found these results:
It seems to happen inconsistently but three places that stand out are the Fleet setup call, adding a package policy and the initial data load from EPR.
Questions
The text was updated successfully, but these errors were encountered: