Skip to content

Replace O(n²) operation ID lookup with O(1) HashSet in OpenAPI genera…#25249

Draft
thll wants to merge 1 commit intomasterfrom
optimize-openapi-operation-id-lookup
Draft

Replace O(n²) operation ID lookup with O(1) HashSet in OpenAPI genera…#25249
thll wants to merge 1 commit intomasterfrom
optimize-openapi-operation-id-lookup

Conversation

@thll
Copy link
Contributor

@thll thll commented Mar 6, 2026

Summary

Replace the O(n²) operation ID uniqueness check in OpenAPI spec generation with an O(1) HashSet lookup, saving ~550ms on the enterprise spec.

Problem

Swagger's Reader.getOperationId() checks for duplicate operation IDs by calling existOperationId(), which iterates over every path in the spec and extracts all operation IDs from all HTTP method slots (GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS) via extractOperationIdFromPathItem(). This set is rebuilt from scratch for every single operation.

For the enterprise spec with 1165 operations and 926 paths, this means:

  • ~1165 × 926 × 7 ≈ 7.5 million isNotBlank checks
  • The cost scales quadratically — adding more endpoints makes it disproportionately worse

JFR profiling confirmed this accounted for 5.2% of total generation time on the server spec (33 profiler samples), and proportionally more on the larger enterprise spec.

Fix

Override getOperationId() in CustomReader to maintain a HashSet<String> of already-used operation IDs. Each lookup becomes O(1) instead of scanning all paths. The set is cleared at the start of each read() cycle.

This is safe because the parent Reader only ever adds paths/operations during a read cycle — it never removes them. The existOperationId() method reads live state from the openAPI object, which is purely additive, so the cached set stays consistent.

Benchmarks

Measured inside a Docker container (linuxkit) on Apple Silicon (aarch64), 8 cores, 12GB RAM, JDK Temurin 21.0.9+10. Absolute times will vary on other hardware. Validation was skipped for all runs.

Server spec (621 operations, 493 paths, 750KB YAML)

Run Before After
1 2830ms 2715ms
2 2834ms 2825ms
3 3750ms 2867ms
4 2895ms 2798ms
5 2879ms 2986ms
Median 2879ms 2825ms

Small improvement (~50ms), as expected — the O(n²) cost at 621 operations was only ~155ms.

Enterprise spec (1165 operations, 926 paths, 1.4MB YAML)

Run Before After
1 5495ms 4609ms
2 4614ms 4489ms
3 5045ms 4486ms
Median 5045ms 4489ms

~550ms improvement on enterprise, consistent with the quadratic scaling prediction.

JFR confirmation

Metric Before After
existOperationId / extractOperationIdFromPathItem profiler samples 33 0
getOperationId profiler samples n/a 0 (too fast to register)

Scaling

The improvement grows quadratically with operation count:

Operations Estimated old cost Estimated new cost
621 (server) ~155ms ~0ms
1165 (enterprise) ~550ms ~0ms
2000 (future) ~1.6s ~0ms

Correctness

  • Server and enterprise specs are byte-identical before and after the change
  • All existing OpenAPIContextFactoryTest tests pass (5/5)

Remaining generation time

The remaining ~4.5s (enterprise) breaks down as follows. These are inherent to the scale of the spec and not optimizable without architectural changes (e.g. build-time generation or incremental caching):

Category % of generation What it does
Class loading ~30% JVM loading ~3200 classes from 562 JARs on first access
Jackson introspection ~19% Reflecting on ~700 schema types to discover POJO properties
Swagger Reader ~16% Per-method annotation processing for 1165 operations
YAML serialization ~6% Emitting 1.4MB output via snakeyaml
Other (JDK internals, reflection, proxy generation) ~29% Unavoidable JVM overhead

/nocl

…tion

The Swagger Reader's default getOperationId scans all existing paths and
operations for every new operation ID, resulting in O(n²) behavior.
With 1165 operations in the enterprise spec, this cost ~550ms.

Override getOperationId in CustomReader to maintain a HashSet of used IDs,
reducing each lookup to O(1). The set is cleared at the start of each
read cycle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant