Replace O(n²) operation ID lookup with O(1) HashSet in OpenAPI genera… by thll · Pull Request #25249 · Graylog2/graylog2-server

thll · 2026-03-06T15:31:57Z

Summary

Replace the O(n²) operation ID uniqueness check in OpenAPI spec generation with an O(1) HashSet lookup, saving ~550ms on the enterprise spec.

Problem

Swagger's Reader.getOperationId() checks for duplicate operation IDs by calling existOperationId(), which iterates over every path in the spec and extracts all operation IDs from all HTTP method slots (GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS) via extractOperationIdFromPathItem(). This set is rebuilt from scratch for every single operation.

For the enterprise spec with 1165 operations and 926 paths, this means:

~1165 × 926 × 7 ≈ 7.5 million isNotBlank checks
The cost scales quadratically — adding more endpoints makes it disproportionately worse

JFR profiling confirmed this accounted for 5.2% of total generation time on the server spec (33 profiler samples), and proportionally more on the larger enterprise spec.

Fix

Override getOperationId() in CustomReader to maintain a HashSet<String> of already-used operation IDs. Each lookup becomes O(1) instead of scanning all paths. The set is cleared at the start of each read() cycle.

This is safe because the parent Reader only ever adds paths/operations during a read cycle — it never removes them. The existOperationId() method reads live state from the openAPI object, which is purely additive, so the cached set stays consistent.

Benchmarks

Measured inside a Docker container (linuxkit) on Apple Silicon (aarch64), 8 cores, 12GB RAM, JDK Temurin 21.0.9+10. Absolute times will vary on other hardware. Validation was skipped for all runs.

Server spec (621 operations, 493 paths, 750KB YAML)

Run	Before	After
1	2830ms	2715ms
2	2834ms	2825ms
3	3750ms	2867ms
4	2895ms	2798ms
5	2879ms	2986ms
Median	2879ms	2825ms

Small improvement (~50ms), as expected — the O(n²) cost at 621 operations was only ~155ms.

Enterprise spec (1165 operations, 926 paths, 1.4MB YAML)

Run	Before	After
1	5495ms	4609ms
2	4614ms	4489ms
3	5045ms	4486ms
Median	5045ms	4489ms

~550ms improvement on enterprise, consistent with the quadratic scaling prediction.

JFR confirmation

Metric	Before	After
`existOperationId` / `extractOperationIdFromPathItem` profiler samples	33	0
`getOperationId` profiler samples	n/a	0 (too fast to register)

Scaling

The improvement grows quadratically with operation count:

Operations	Estimated old cost	Estimated new cost
621 (server)	~155ms	~0ms
1165 (enterprise)	~550ms	~0ms
2000 (future)	~1.6s	~0ms

Correctness

Server and enterprise specs are byte-identical before and after the change
All existing OpenAPIContextFactoryTest tests pass (5/5)

Remaining generation time

The remaining ~4.5s (enterprise) breaks down as follows. These are inherent to the scale of the spec and not optimizable without architectural changes (e.g. build-time generation or incremental caching):

Category	% of generation	What it does
Class loading	~30%	JVM loading ~3200 classes from 562 JARs on first access
Jackson introspection	~19%	Reflecting on ~700 schema types to discover POJO properties
Swagger Reader	~16%	Per-method annotation processing for 1165 operations
YAML serialization	~6%	Emitting 1.4MB output via snakeyaml
Other (JDK internals, reflection, proxy generation)	~29%	Unavoidable JVM overhead

/nocl

…tion The Swagger Reader's default getOperationId scans all existing paths and operations for every new operation ID, resulting in O(n²) behavior. With 1165 operations in the enterprise spec, this cost ~550ms. Override getOperationId in CustomReader to maintain a HashSet of used IDs, reducing each lookup to O(1). The set is cleared at the start of each read cycle. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace O(n²) operation ID lookup with O(1) HashSet in OpenAPI genera…#25249

Replace O(n²) operation ID lookup with O(1) HashSet in OpenAPI genera…#25249
thll wants to merge 1 commit intomasterfrom
optimize-openapi-operation-id-lookup

thll commented Mar 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thll commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Fix

Benchmarks

Server spec (621 operations, 493 paths, 750KB YAML)

Enterprise spec (1165 operations, 926 paths, 1.4MB YAML)

JFR confirmation

Scaling

Correctness

Remaining generation time

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thll commented Mar 6, 2026 •

edited

Loading