Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add load tester for YAMCS to simulate Open MCT traffic #388

Closed
scottbell opened this issue Nov 2, 2023 · 11 comments
Closed

Add load tester for YAMCS to simulate Open MCT traffic #388

scottbell opened this issue Nov 2, 2023 · 11 comments
Assignees
Labels
type:maintenance chore, tests, build, ci

Comments

@scottbell
Copy link
Collaborator

Summary

Using K6, or by writing a simple Node script, simulate 300 WebSocket clients subscribing to 10Hz telemetry. What
is the impact on YAMCS? What happens if you make the clients slow to service the WebSocket messages, simulating a browser under heavy load? Does the CPU utilization scale up linearly, or is there a threshold at which it suddenly jumps up?

@scottbell scottbell added the type:maintenance chore, tests, build, ci label Nov 2, 2023
@scottbell scottbell self-assigned this Nov 2, 2023
@scottbell
Copy link
Collaborator Author

scottbell commented Nov 2, 2023

@akhenry @unlikelyzero It looks like the YAMCS Quickstart only fires at 1Hz. Do you guys have another lead on 10Hz data?

@unlikelyzero had a smart idea of just running the simulator.py 10 times, which gives us 10Hz. It works!

@scottbell
Copy link
Collaborator Author

@unlikelyzero says to run for an hour before shutting down.

@akhenry
Copy link
Owner

akhenry commented Nov 2, 2023

@akhenry @unlikelyzero It looks like the YAMCS Quickstart only fires at 1Hz. Do you guys have another lead on 10Hz data?

@unlikelyzero had a smart idea of just running the simulator.py 10 times, which gives us 10Hz. It works!

I've for sure had luck simulating 10Hz data by modifying this line to be sleep(0.1)

@scottbell
Copy link
Collaborator Author

@akhenry @unlikelyzero It looks like the YAMCS Quickstart only fires at 1Hz. Do you guys have another lead on 10Hz data?
@unlikelyzero had a smart idea of just running the simulator.py 10 times, which gives us 10Hz. It works!

I've for sure had luck simulating 10Hz data by modifying this line to be sleep(0.1)

Ah, duh. I've done this before too and had forgotten there's a separate sleep above. Thanks for the info!

@scottbell
Copy link
Collaborator Author

scottbell commented Nov 3, 2023

Adding a 2s "digestion delay" for the websocket callback absolutely kills YAMCS:
pegged
memory

and the memory stays pretty high too, even post client disconnect, though restarting YAMCS resolves it.

@scottbell
Copy link
Collaborator Author

scottbell commented Nov 3, 2023

If one comments out this:

      webSocket:
        writeBufferWaterMark: { low: 32768, high: 160000000 }

it causes a great deal of these messages:

16:05:21.647 _global [45] WebSocketServerMessageHandler Channel full, cannot write message with priority=NORMAL (slow network?). Closing connection.

but the CPU/memory consumption of YAMCS remains constant. So perhaps writeBufferWaterMark is tuned too high for the YAMCS server?

The defaults are:

{ low: 32768, high: 131072 }

Water marks for the write buffer of each WebSocket connection. When the buffer is full, messages are dropped. High values lead to increased memory use, but connections will be more resilient against unstable networks (i.e. high jitter). Increasing the values also help if a large number of messages are generated in bursts. The map requires keys low and high indicating the low/high water mark in bytes.

@akhenry
Copy link
Owner

akhenry commented Nov 3, 2023

@scottbell Beautiful! These are great findings. The error message is a symptom of a self defense mechanism against slow clients which we are effectively disabling by using arbitrarily large write buffers. Open MCT can handle being dropped, it will just reconnect. Also, if we get dropped for being too slow that's useful feedback that allows us to direct our optimization efforts. There's stuff we can do in Open MCT to make it process WebSocket messages quicker so the buffer doesn't back up, such as taking WebSocket handling off the UI thread.

@akhenry
Copy link
Owner

akhenry commented Nov 3, 2023

@scottbell @unlikelyzero Can we use K6 to load real Open MCT clients?

I think the next step is to reproduce this with real Open MCT clients so that we have a test bed for measuring Open MCT changes.

We will need a sufficiently complex Open MCT display with a bunch of plots, LAD Tables, alphanumerics, and condition sets / widgets. @charlesh88 has some scripting to automate building these I believe.

I think it's worth trying to build a real repro of this in Quickstart for a couple reasons:

  1. We can build regression tests that run on our commercial CI environment and don't require NASA resources.
  2. We can provide reproductions to the Space Applications team if we identify Yamcs bottlenecks
  3. We do not interrupt other development work on shared resources.
  4. Folks outside of NASA can potentially contribute to our performance optimization efforts.

@scottbell
Copy link
Collaborator Author

@unlikelyzero @akhenry

There's a K6 browser I think we could use to do this. From what I can tell, we'd need to:

  • Spin up OpenMCT for YAMCS pointed to the server we want to load test
  • Spin up K6 to create X browsing client to point to various displays using YAMCS data
  • Measure YAMCS performance

Playwright looks like it also has something similar we could do with Artillery, but I'm not familiar with it.

@scottbell
Copy link
Collaborator Author

@unlikelyzero @akhenry
Fiddling with rather modest parameters for K6:

const maxClients = 40;
const workersPerClient = 5;
const digestionTimeInMs = 500;

on their own create a rather slow build in memory consumption. But if I stop the K6 process after 10 minutes, and restart it, YAMCS never really gives up the fat websocket buffers from the previous run (or at least not quickly enough) and quickly run out of memory.

@scottbell
Copy link
Collaborator Author

scottbell commented Nov 9, 2023

@akhenry @unlikelyzero I've added a browser test too. I'll let you know what I find testing it out on Open MCT Quickstart.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:maintenance chore, tests, build, ci
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants