Replace O(n²) operation ID lookup with O(1) HashSet in OpenAPI genera…#25249
Draft
Replace O(n²) operation ID lookup with O(1) HashSet in OpenAPI genera…#25249
Conversation
…tion The Swagger Reader's default getOperationId scans all existing paths and operations for every new operation ID, resulting in O(n²) behavior. With 1165 operations in the enterprise spec, this cost ~550ms. Override getOperationId in CustomReader to maintain a HashSet of used IDs, reducing each lookup to O(1). The set is cleared at the start of each read cycle. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace the O(n²) operation ID uniqueness check in OpenAPI spec generation with an O(1) HashSet lookup, saving ~550ms on the enterprise spec.
Problem
Swagger's
Reader.getOperationId()checks for duplicate operation IDs by callingexistOperationId(), which iterates over every path in the spec and extracts all operation IDs from all HTTP method slots (GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS) viaextractOperationIdFromPathItem(). This set is rebuilt from scratch for every single operation.For the enterprise spec with 1165 operations and 926 paths, this means:
isNotBlankchecksJFR profiling confirmed this accounted for 5.2% of total generation time on the server spec (33 profiler samples), and proportionally more on the larger enterprise spec.
Fix
Override
getOperationId()inCustomReaderto maintain aHashSet<String>of already-used operation IDs. Each lookup becomes O(1) instead of scanning all paths. The set is cleared at the start of eachread()cycle.This is safe because the parent
Readeronly ever adds paths/operations during a read cycle — it never removes them. TheexistOperationId()method reads live state from theopenAPIobject, which is purely additive, so the cached set stays consistent.Benchmarks
Measured inside a Docker container (linuxkit) on Apple Silicon (aarch64), 8 cores, 12GB RAM, JDK Temurin 21.0.9+10. Absolute times will vary on other hardware. Validation was skipped for all runs.
Server spec (621 operations, 493 paths, 750KB YAML)
Small improvement (~50ms), as expected — the O(n²) cost at 621 operations was only ~155ms.
Enterprise spec (1165 operations, 926 paths, 1.4MB YAML)
~550ms improvement on enterprise, consistent with the quadratic scaling prediction.
JFR confirmation
existOperationId/extractOperationIdFromPathItemprofiler samplesgetOperationIdprofiler samplesScaling
The improvement grows quadratically with operation count:
Correctness
OpenAPIContextFactoryTesttests pass (5/5)Remaining generation time
The remaining ~4.5s (enterprise) breaks down as follows. These are inherent to the scale of the spec and not optimizable without architectural changes (e.g. build-time generation or incremental caching):
/nocl