JMH performance benchmarks for Javalin versions, with automated GitHub Actions + GitHub Pages reporting.
https://javalin.github.io/javalin-performance-tests-testing/
By default, benchmark results are written under results/.
One-command local run (recommended):
mise install
./run-local-benchmarks.shThis uses production benchmark settings by default and appends a new run to local history before regenerating the site. The scripts use project-local Gradle caches to reduce lock conflicts with other Gradle builds running on your machine.
Windows PowerShell:
mise install
Set-ExecutionPolicy -Scope Process Bypass
./run-local-benchmarks.ps1Alias wrapper names are also available: run-local-benchmark.sh and run-local-benchmark.ps1.
Run with defaults:
./gradlew clean benchmark -PjavalinVersion=4.6.4Run with explicit tuning and JSON output:
./gradlew clean benchmark \
-PjavalinVersion=4.6.4 \
-Piterations=10 \
-PiterationTime=2000 \
-Pthreads=32 \
-Pforks=2 \
-PresultFormat=jsonjavalinVersion: dependency version to benchmark.iterations: warmup and measurement iterations.iterationTime: warmup and measurement time in milliseconds.threads: JMH worker threads.forks: JMH forks.resultFormat: JMH machine-readable format (csv,json,scsv,latex,text).benchmark.http.connectTimeoutMs: HTTP client connect timeout for benchmark traffic (default15000).benchmark.http.readTimeoutMs: HTTP client read timeout for benchmark traffic (default120000).benchmark.http.writeTimeoutMs: HTTP client write timeout for benchmark traffic (default120000).
Wrapper selection order:
- exact version wrapper in
src/main/external/<version>/(if present), - major-line wrapper in
src/main/external/<major>/(if present), - otherwise
src/main/external/default/. Useclean benchmarkwhen switching versions to avoid stale compiled classes between runs.
Current suite includes:
hello: hello/lifecycle/exception baseline flow.payloadEmpty: empty text payload.payload100kb,payload1mb: plain text payload sizes.jsonSerializationSmall,jsonSerialization100kb,jsonSerialization1mb: JSON serialization sizes.staticFile100kb,staticFile1mb: static-like raw byte responses.routes10,routes100,routes1000,routes10000: route table size scenarios. Note: these route groups currently live in one benchmark app instance, so use them for relative trend tracking.
./gradlew compare -Pbaseline=1.0.0 -PjavalinVersion=3.0.0Keep local history by storing each run under runs/<run-id>/:
RUN_ID="local-$(date -u +%Y%m%dT%H%M%SZ)"
mkdir -p "local-history/runs/$RUN_ID/results"
python3 scripts/collect_runner_info.py "local-history/runs/$RUN_ID/runner-info.json"
python3 scripts/write_run_metadata.py \
--output "local-history/runs/$RUN_ID/run-metadata.json" \
--run-id "$RUN_ID" \
--run-timestamp-utc "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--versions-json '["4.6.4","5.6.3"]' \
--iterations 3 \
--iteration-time-ms 500 \
--forks 2 \
--threads 4Run benchmark versions and copy results:
./gradlew --no-daemon clean benchmark -PjavalinVersion=4.6.4 -Piterations=3 -PiterationTime=500 -Pforks=2 -Pthreads=4 -PresultFormat=json
cp results/4.6.4.json "local-history/runs/$RUN_ID/results/4.6.4.json"
./gradlew --no-daemon clean benchmark -PjavalinVersion=5.6.3 -Piterations=3 -PiterationTime=500 -Pforks=2 -Pthreads=4 -PresultFormat=json
cp results/5.6.3.json "local-history/runs/$RUN_ID/results/5.6.3.json"Generate report with trend charts:
python3 scripts/generate_pages.py --history-root local-history/runs --output-dir local-history/site --repository javalin/javalin-performance-tests-testing
python3 -m http.server 8000 --directory local-history/siteThen open http://localhost:8000.
This now generates:
index.html: latest cumulative report.runs/<run-id>.html: weekly snapshot pages (history up to that run).summary.json: machine-readable summary for automation.
- JMH mode is throughput (
ops/ms), so higher score is better. - Compare versions on the same benchmark row (
payload1mbfor4.6.4vs5.6.3). Delta vs Prev %compares the latest run against the previous run of the same version+benchmark.Winnermarks the highest-scoring version in each benchmark row.Mean/Stdev/CVshow historical stability:- lower
CV%means more stable measurements, - high
CV%means noisy benchmark or unstable environment.
- lower
- Trend charts show each benchmark over time (one line per version).
- Sidebar links let you open older weekly snapshot pages directly.
Workflow: .github/workflows/benchmark-pages.yml
GitHub Pages: https://javalin.github.io/javalin-performance-tests-testing/
Triggers:
- Nightly schedule (daily at 03:17 UTC).
- Manual
workflow_dispatchwith optional inputs:versions(comma/space-separated list),includePrereleaseLatestMajor(include all latest-major alpha/beta/rc in auto version selection),iterations,iterationTimeMs,forks,threads.
Manual dispatch from local machine:
./run-github-weekly-benchmark.sh --watchWindows PowerShell:
./run-github-weekly-benchmark.ps1 -WatchEquivalent direct gh command:
gh workflow run .github/workflows/benchmark-pages.yml \
--repo javalin/javalin-performance-tests-testing \
--ref main \
-f iterations=10 \
-f iterationTimeMs=1000 \
-f forks=2 \
-f threads=4Default workflow values:
iterations=10iterationTimeMs=1000forks=2threads=4
These are production-oriented defaults for statistical stability on weekly runs.
Design notes:
- Versions run sequentially in the same job/runner per workflow run (reduces cross-runner noise for comparisons).
- Runner metadata is captured each run (
runner-info.json). - Raw benchmark history is stored on branch
benchmark-dataunderruns/<run-id>/. - Static report page is generated from history and deployed to GitHub Pages.
- History is append-only: each run gets a unique run id and is added under a new
runs/<run-id>/folder.
Default scheduled versions are auto-resolved from Maven Central on every run:
- include the latest patch from the latest 3 minors in each of the latest 2 major lines,
- include the latest 2 prereleases from the latest major (RC-preferred),
- include the latest snapshot from
https://maven.reposilite.com/snapshots, - do not include older major lines by default,
- minimum stable cutoff
>= 1.0.0, - include all latest-major prereleases only when
includePrereleaseLatestMajor=true.
Fallback static list is config/versions.txt.
You can refresh the fallback files with:
python3 scripts/update_versions_from_maven.py --output config/versions.txt --minimum 1.0.0 --include-all-latest-majors 2 --latest-minors-per-major 3 --no-include-latest-per-major --latest-prerelease-count 2 --include-latest-snapshot
python3 scripts/update_versions_from_maven.py --output config/versions-prerelease.txt --minimum 1.0.0 --include-all-latest-majors 2 --latest-minors-per-major 3 --no-include-latest-per-major --include-prerelease-latest-major --include-latest-snapshotWorkflow: .github/workflows/benchmark-pr.yml
Triggers:
- On every pull request.
- Manual
workflow_dispatchwith optional versions and tuning overrides.
Defaults:
- versions from
config/pr-versions.txt, iterations=10,iterationTimeMs=1000,forks=2,threads=4.
Output:
- uploads raw benchmark JSON + generated trend report as workflow artifact,
- adds a markdown benchmark summary table to the job summary.
The generated website also includes a plain-language “How To Read This” section.