Skip to content

Conversation

@franz1981
Copy link

@franz1981 franz1981 commented Oct 22, 2025

IDK @holly-cummins if that defeat the purpose of simplicity but it should be opaque to users which run the default values.

If you like the type of changes I can do the same for the other scripts (that's why is in DRAFT).

@edeandrea
Copy link
Collaborator

LGTM!

@holly-cummins
Copy link
Collaborator

holly-cummins commented Oct 23, 2025

Hmm, I feel conflicted about this. It clearly makes the scripts more powerful, but I wonder if it makes them less useful for the intended purposes:

  1. A sceptical user can read the scripts, and quickly see it's just a simple wrk2/hyperfoil invocation
  2. A user who has 62 other things to do can just invoke-and-go without thinking

I know the defaults mean users can still just invoke-and-go, but the extra capability does reduce the readability of the scripts. It means if we're doing a live demo and we do a more on the script to show what we just did, it's a bit overwhelming to the audience.

If we wanted to do things 'properly', wouldn't we use the 'medium-complexity' scripts that Eric is working on? If the crappy scripts are actually ok, it leaves less of a gap for the medium scripts to fit into. :)

Did you happen to spot how much of a difference waiting for the first request makes to the throughput? I guess not doing so would penalise the runtime with the slower start time and thus be 'unfair'. I can't decide if the extra complexity is worth it for the fairness, or not. I think on that one it probably is, but for the parameters, I'd almost just want to say people should edit the script if they don't like the defaults, because it's only a simple shell script. We could maybe give variable names to the arguments, though. That does make it more obvious what's going on with a more.

So my initial take is

  • Yes to the || true because that reduces noise from the output and I should have done it anyway
  • No to the multiple iterations because in a live demo/user in a hurry, more output on screen is worse, and slower is worse
  • No to the arguments because it makes the script text so long
  • Yes to using variables in the hyperfoil invocation, rather than magic numbers (improves clarity)
  • Tentative yes to waiting for first request to improve the result fairness

@franz1981
Copy link
Author

I have mixed feelings as well.
What is the exact real purpose of the script? What is intended to demonstrate (from our pov)?

@holly-cummins
Copy link
Collaborator

holly-cummins commented Oct 23, 2025

I have mixed feelings as well. What is the exact real purpose of the script? What is intended to demonstrate (from our pov)?

Two purposes:

  • When we're doing live talks, we know we run scripts exactly like this, and every time we do so, we re-invent them, and probably get things wrong. Eric's done it, Clement's got a repo he uses, Julien's got a set of scripts, I've done it ... So rather than every member of the Quarkus team continually reinventing comparative benchmarks and writing mini-performance harnesses, we want a shared resource that uses at least some best practices (and is easy for you to keep an eye on, because it's in one place).
  • Everyone else comparing Quarkus to spring, either for their own talks, or just for their own internal research, also does a similar process. Seeing an application run on their own machine is always going to be more compelling than numbers we publish, even if the numbers are more methodologically sound. So we want to make it easy and accessible.

In order to feel trustworthy, either to an audience or to someone exploring on their laptop, our scripts have to be easy to understand and digest, which means they'd ideally be one or two lines, and not use any unfamiliar tools. We know they'll actually be less valid if they're that simple, but that's why we have the 'ok, now do it properly' version, and the surrounding discussion about 'here's what's wrong with the numbers you just got'. But there's no point in having two 'do it properly' sets of scripts. :)

@franz1981
Copy link
Author

franz1981 commented Oct 23, 2025

here's what's wrong with the numbers you just got'. But there's no point in having two 'do it properly' sets of scripts

Thanks, got it!
In this regard I think, even before my changes, this script was "too good".
It set the number of cores, memory, it uses wrk2, and start/stop properly, via docker, a dbms...
And still, due to some missed configuration options, is likely not be able to deliver a reliable comparison/some data.

Now, let's say we are at a conference, using this script, and it reports bad numbers (which is possible): it requires to the speaker, live, to "fix" it, in order to obtain something better.
And maybe won't be enough, and more changes need to be made, still live and step by step, to show users why some are required - until numbers become "good enough".
Another option is to just says "yeah, it was expected not be good enough" - and shows the much bigger other script made by Eric, which would overwhelm users - failing into explaining what the original was missing, because too much different.

A third option, which is the purpose of this PR, is to have a slightly more complex script which doesn't need to be fixed live, but just configured, to obtain "good enough" numbers, making it

  • less error prone for the speaker
  • easier to grasp from users

At the same time, by making it configured by default to be broken, will make easier to show how numbers can get more reliable, by changing some parameters values.

Said that, I could remove:

  • parsing the args: it creates too much visual noise
  • the measurement iteration

And see how it looks like.
I've still left the curl command to silently wait the server to be up and running or wrk could fail due to missing server (w Spring, which can be quite slow to start...).

@franz1981 franz1981 force-pushed the stresstest_params branch 2 times, most recently from 0061dcd to c8d5d8e Compare October 27, 2025 05:28
@franz1981 franz1981 marked this pull request as ready for review October 27, 2025 05:28
@franz1981
Copy link
Author

I've tried to reduce the visual noise and still allow a speaker to tune more easily the script
e.g. having more explicit and named params

@franz1981
Copy link
Author

FYI Hyperfoil/Hyperfoil#626

this is why timeout has been added here ^^
We will work on a fix on hyperfoil side - although it happens only under a specific condition i.e. immensely higher throughput compared to what the system under test can sustain

@franz1981
Copy link
Author

franz1981 commented Oct 28, 2025

last but not least: having a way to parametrize the number of cores is good to show people how performance can be affected in some unexepected ways i.e. single runtime core can silently switch the GC algorithm (as well as scaling the number of compiler threads) ^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants