perf: Index various lists in SchemaCache to change complexity from O(n*n) to O(n) #4396

mkleczek · 2025-10-10T19:25:00Z

Update changelog

mkleczek · 2025-10-10T19:45:38Z

Wow, it looks like it is too fast now :) - test_big_schema assertion that schema reloading takes 10s fails quite dramatically.

wolfgangwalther · 2025-10-10T19:54:37Z

Haha, that's nice. We should do some kind of consistency check as well, that the schema cache still returns all the same things. But... looks great so far. It's exactly the kind of problem I assumed to exist with these relationships...

wolfgangwalther · 2025-10-10T19:57:42Z

test/io/test_big_schema.py

            "plan"
        ]
-        assert plan_dur > 10000.0
+        assert plan_dur < 10000.0


hrhr, nice change :D

While the change is nice, it does break the test expectation. This is not the primary test outcome - we are testing whether other stuff waits on the schema cache reload here. So a high plan duration is a requirement to make the test effective.

We will either need to throw the test away or rewrite it or... I don't know.

Got it - I've taken a look at the test and understand its purpose now.

Testing this by measuring response time is brittle, to say politely.

Without deeper changes into the way things work, I don't see a way to keep this test.

One way would be to keep the timestamp of last schema load, and provide it in a response header. But it wouldn't really allow us to test waiting of concurrent requests for schema loading.
I tend to think this kind of properties should be tested using some kind of chaos testing or fuzzing that would only provide statistical guarantees of the system behavior.

Not sure what to do in this PR though. Sensible thing would be to simply comment out this test for now.

WDYT?

I'd certainly not comment it out. Either we can fix it or we should just remove it - we can always add it back. But commented out code is just waste.

I'll defer to @steve-chavez for this, though.

Testing this by measuring response time is brittle, to say politely.

Yeah, it was the easiest way to prove that at the time. It's most desirable to have requests waiting than to flood server logs with errors or reply quickly with failure (users just see postgREST as unreliable in these cases).

One way would be to keep the timestamp of last schema load, and provide it in a response header. But it wouldn't really allow us to test waiting of concurrent requests for schema loading.
I tend to think this kind of properties should be tested using some kind of chaos testing or fuzzing that would only provide statistical guarantees of the system behavior.

Another option would be to inject a sleep like we do here:

postgrest/src/PostgREST/SchemaCache.hs

Lines 153 to 155 in c1d9728

_ <-

let sleepCall = SQL.Statement "select pg_sleep($1 / 1000.0)" (param HE.int4) HD.noResult prepared in

whenJust configInternalSCSleep (`SQL.statement` sleepCall) -- only used for testing

But only for the relationships loading part. Currently the above works for the whole schema cache load.

Then this test could be removed from test/io/test_big_schema.py too

@steve-chavez - yeah, that makes the test more robust (no longer dependent on sluggish schema loading performance)

Done.

Introduced internal-schema-cache-relationship-load-sleep and implemented delayed loading of relationships (see ec31bdd)

I left the tests in test_big_schema.py - moving it to IO tests is probably a good idea but I wouldn't do that as part of this PR - it is getting out of hand anyway.

WDYT?

This was trickier than I thought. Finally I've ended up with three internal configuration properties to introduce delays in various phases of schema cache loading.
See f8ff7c8

@mkleczek I think those are great! 🔥

How about separating them into another PR? Looks like they can be merged independently.

I would also suggest prefixing these commits with test: instead of refactor:

ec31bdd

f8ff7c8

@steve-chavez done: #4411

mkleczek · 2025-10-10T20:13:06Z

Haha, that's nice. We should do some kind of consistency check as well, that the schema cache still returns all the same things. But... looks great so far. It's exactly the kind of problem I assumed to exist with these relationships...

I am wondering what test we could add... This is a kind of refactoring that does not change any behavior (except performance) and should be covered by existing tests. WDYT?

wolfgangwalther · 2025-10-10T20:16:42Z

I wouldn't add an automated test for it, but it should be simple to load the big schema fixtures from that IO test and then run a --dump-schema on it before and after this change. Then look at the diff. Might need to pipe through jq to sort keys or so. Might still end up with a diff based on some ordering, not sure. But that should give us a result whether... just all relationships are missing or so. (which is extremely unlikely)

steve-chavez · 2025-10-10T21:19:30Z

I think @MHC2000 will be happy about this

I can confirm the schema privately shared on #3704 (comment)

Goes from ~7 minutes:

06/Oct/2025:16:06:50 -0500: Schema cache queried in 349.6 milliseconds
06/Oct/2025:16:06:50 -0500: Schema cache loaded 919 Relations, 3312 Relationships, 249 Functions, 0 Domain Representations, 4 Media Type Handlers, 1196 Timezones
06/Oct/2025:16:13:43 -0500: Schema cache loaded in 413123.5 milliseconds

To ~14 seconds:

10/Oct/2025:16:02:15 -0500: Schema cache queried in 416.6 milliseconds
10/Oct/2025:16:02:15 -0500: Schema cache loaded 919 Relations, 3312 Relationships, 249 Functions, 0 Domain Representations, 4 Media Type Handlers, 1196 Timezones
10/Oct/2025:16:02:29 -0500: Schema cache loaded in 14115.5 milliseconds

mkleczek · 2025-10-11T06:25:49Z

I've updated the patch to cover more cases:

addViewM2OAndO2ORels
addViewPrimaryKeys

mkleczek · 2025-10-11T06:27:09Z

I think @MHC2000 will be happy about this

I can confirm the schema privately shared on #3704 (comment)

Goes from ~7 minutes:
[...]
To ~14 seconds:

@steve-chavez can you check again with the latest version of the patch? It should improve the times even more.

steve-chavez · 2025-10-11T12:04:09Z

@mkleczek It's now down to ~2 seconds 🚀

11/Oct/2025:07:02:14 -0500: Schema cache queried in 448.0 milliseconds
11/Oct/2025:07:02:14 -0500: Schema cache loaded 919 Relations, 3312 Relationships, 249 Functions, 0 Domain Representations, 4 Media Type Handlers, 1196 Timezones
11/Oct/2025:07:02:16 -0500: Schema cache loaded in 2447.1 milliseconds

mkleczek · 2025-10-11T12:32:53Z

@mkleczek It's now down to ~2 seconds 🚀

Amazing, thanks for checking.

I guess it is in mergeable state - patch code coverage is somewhat low but I think the changes touched code not covered by tests originally. It should be probably fixed but I am not sure if this PR is the right one though.
(Interestingly, overall code coverage is better than before)

wolfgangwalther · 2025-10-13T07:47:16Z

src/PostgREST/SchemaCache.hs

-      filter (\(ViewKeyDependency _ viewQi _ dep _) -> dep == PKDep && viewQi == QualifiedIdentifier sch vw) keyDeps
+      fold $ HM.lookup (PKDep, QualifiedIdentifier sch vw) indexedDeps


The only diff in the schema-cache I get is here. The big schema has many of these:

@@ -858435,8 +858435,8 @@ "tableIsView": true, "tableName": "v_pop_ohnekoord", "tablePKCols": [ - "ap_id", - "id" + "id", + "ap_id" ], "tableSchema": "apflora", "tableUpdatable": false

Aka, the order of tablePKCols is changed.

I'm not exactly sure whether we rely on this order anywhere.

Looked at the source of HM.fromListWith op and indeed: it seems like it calls newValue op existingValue so it reverses the list order.

Not sure if we depend on this order anyway but will add reversing the list back just in case.

@wolfgangwalther added fmap reverse in line 565 - that should fix this issue.

Can you re-check? (or you could provide me with a quick way to generate this diff)

The command I used was:

PGRST_DB_SCHEMAS=apflora postgrest-with-postgresql-17 -f test/io/big_schema.sql postgrest-run --dump-schema | jq

Pipe this to a file on main, then do the same on your branch, then diff the two.

Thanks, @wolfgangwalther - run it after 39b404e and it fixes the issue - no more differences between main and this branch.

wolfgangwalther · 2025-10-13T07:49:03Z

src/PostgREST/SchemaCache.hs

-    viewRels Relationship{relTable,relForeignTable,relCardinality=card} =
-      if isM2O card || isO2O card then
+    viewRels Relationship{relTable,relForeignTable,relCardinality=card} | isM2O card || isO2O card =


This seems like an unrelated refactor.

Indeed.
The previous version caused ugly multiple empty list returns (one for the else branch in if and another for the pattern match case). Fixing it was too tempting to resist :)

Do you think it is worth doing it in a separate PR?

No need for a separate PR, but a separate commit would be good - that will make things much easier to understand whenever we look at this again in months or so.

@wolfgangwalther done - see cbcd7b8

wolfgangwalther · 2025-10-13T07:51:35Z

src/PostgREST/SchemaCache.hs

      else Nothing
  | Relationship jt1 t  _ (M2O cons1 cols)  _ tblIsView <- rels
-  , Relationship jt2 ft _ (M2O cons2 fcols) _ fTblisView <- rels
-  , jt1 == jt2


I don't understand the removal of jt1 == jt2, yet. Is this another unrelated refactor? Or related to the change here?

It is not needed as we lookup in the hash-map using jt1 as the key (ie. we changed filtering by equality to hash-map lookup) - that's the crux of this PR.

steve-chavez · 2025-10-15T22:03:10Z

src/PostgREST/SchemaCache.hs

    --   so we don't need to know about the other references.
    -- * We need to choose a single reference for each column, otherwise we'd output too many columns in location headers etc.
    takeFirstPK = mapMaybe (head . snd)
+    indexedDeps = fmap reverse $ HM.fromListWith (++) $ fmap ((keyDepType &&& keyDepView) &&& pure) keyDeps


The reverse is quite mysterious, maybe add a comment that it was done to preserve backwards compat?

Good idea. Done.

Before we decide to keep it, can we first figure out whether we actually need it? Do we rely on the order of these or not?

can we first figure out whether we actually need it?

I don't think we can figure that out. Overall users have been sensitive to OpenAPI changes, so IMO if we can easily maintain backwards compat we should do it.

@wolfgangwalther If you recall #1701, I tried to remove this legacy hack we have (IIRC, it's incorrect since it doesn't consider composite keys)

postgrest/src/PostgREST/Response/OpenAPI.hs

Lines 129 to 133 in c1d9728

n = catMaybes

[ Just "Note:"

, if pk then Just "This is a Primary Key.<pk/>" else Nothing

, fk

]

But that broke vue-postgrest so we kept it.

I'm pretty sure it does consider composite keys.

But this brings up an interesting question: Does this cause any observable OpenAPI output differences? I have not tested that. I have only tested the schema dump, which is just an internal representation. So my question was not really targeting whether this would change anything in the openapi output, but whether it would change anything internal.

But yes, testing the openapi output for differences would be a great idea as well.

I would suggest we leave it like this in this PR. Let's not make perfect the enemy of the good. This PR brings significant performance gains and I think it is worth merging it and possibly taking care of removing fmap reverse here and updating the tests appropriately in the future.

But yes, testing the openapi output for differences would be a great idea as well.

I compared it against the main branch for both follow-privileges and ignore-privileges settings. It returns the same OpenAPI output in both branches. What's funny it's that it even returns the same output when the fmap reverse is removed.

and possibly taking care of removing fmap reverse here and updating the tests appropriately in the future.

So I agree here. Having the same internal schema should be enough and no queries were modified either, so it's not likely that anything else would change (maybe add a TODO so we don't forget to check in the future?).

…n*n) to O(n) * rels in addM2MRels * keyDeps in addViewPrimaryKeys * keyDeps in addViewM2OAndO2ORels

Also changed pattern matching to point-free usage of record function in findViewPKCols so that variable names match between code and comment.

mkleczek · 2025-10-18T06:54:54Z

Hmm... I've rebased the PR on top of main and one test in postgrest-test-memory now fails. Got no idea what's going on.

Locally:

almost all tests in postgrest-test-memory fail for me on main ( and also on main version before test: Separated query and loading internal sleep configs #4411 )
2 tests in postgrest-test-memory fail on this branch

CI reported all green on pre-rebase version of this PR.

Looks like there is some flakiness in postgrest-test-memory. @steve-chavez are you able to provide any insights?

steve-chavez · 2025-10-18T17:58:32Z

Looks like there is some flakiness in postgrest-test-memory. @steve-chavez are you able to provide any insights?

I restarted the CI job and it passed, it's flakiness. Although I've never had all memory tests failing locally.

wolfgangwalther reviewed Oct 10, 2025

View reviewed changes

mkleczek force-pushed the index-rels-in-addm2mrels branch from 6f03a38 to 9c9e641 Compare October 10, 2025 20:03

mkleczek force-pushed the index-rels-in-addm2mrels branch 2 times, most recently from 9c9e641 to deeae82 Compare October 11, 2025 05:55

mkleczek changed the title ~~perf: Index relations in addM2MRels to change complexity from O(n*n) to O(n)~~ perf: Index various lists in SchemaCache to change complexity from O(n*n) to O(n) Oct 11, 2025

mkleczek force-pushed the index-rels-in-addm2mrels branch from be211a9 to e9fcffe Compare October 11, 2025 11:43

mkleczek force-pushed the index-rels-in-addm2mrels branch from e9fcffe to f22151f Compare October 12, 2025 06:09

mkleczek requested a review from wolfgangwalther October 13, 2025 06:01

wolfgangwalther reviewed Oct 13, 2025

View reviewed changes

mkleczek force-pushed the index-rels-in-addm2mrels branch 2 times, most recently from cbcd7b8 to bb8423b Compare October 15, 2025 13:57

steve-chavez reviewed Oct 15, 2025

View reviewed changes

mkleczek force-pushed the index-rels-in-addm2mrels branch 3 times, most recently from 138eba9 to f8ff7c8 Compare October 16, 2025 16:54

mkleczek added 4 commits October 18, 2025 07:11

perf: Index various lists in SchemaCache to change complexity from O(…

8cb043d

…n*n) to O(n) * rels in addM2MRels * keyDeps in addViewPrimaryKeys * keyDeps in addViewM2OAndO2ORels

fix: indexing reverses order of PK columns

dded350

refactor: Change if then else to pattern guard in addViewM2OAndO2ORels

c56e73b

refactor: Add comment about fmap reverse in addM2MRels

45ef1a4

Also changed pattern matching to point-free usage of record function in findViewPKCols so that variable names match between code and comment.

mkleczek force-pushed the index-rels-in-addm2mrels branch from f8ff7c8 to 45ef1a4 Compare October 18, 2025 05:12

fix: postgrest-style-check fail after rebase

2db2a21

	_ <-
	let sleepCall = SQL.Statement "select pg_sleep($1 / 1000.0)" (param HE.int4) HD.noResult prepared in
	whenJust configInternalSCSleep (`SQL.statement` sleepCall) -- only used for testing

		filter (\(ViewKeyDependency _ viewQi _ dep _) -> dep == PKDep && viewQi == QualifiedIdentifier sch vw) keyDeps
		fold $ HM.lookup (PKDep, QualifiedIdentifier sch vw) indexedDeps

	n = catMaybes
	[ Just "Note:"
	, if pk then Just "This is a Primary Key.<pk/>" else Nothing
	, fk
	]

Uh oh!

perf: Index various lists in SchemaCache to change complexity from O(n*n) to O(n) #4396

Are you sure you want to change the base?

perf: Index various lists in SchemaCache to change complexity from O(n*n) to O(n) #4396

Conversation

mkleczek commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mkleczek commented Oct 10, 2025

Uh oh!

wolfgangwalther commented Oct 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkleczek Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkleczek commented Oct 10, 2025

Uh oh!

wolfgangwalther commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steve-chavez commented Oct 10, 2025

Uh oh!

mkleczek commented Oct 11, 2025

Uh oh!

mkleczek commented Oct 11, 2025

Uh oh!

steve-chavez commented Oct 11, 2025

Uh oh!

mkleczek commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkleczek Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkleczek Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkleczek Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkleczek Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mkleczek commented Oct 10, 2025 •

edited

Loading

mkleczek Oct 16, 2025 •

edited

Loading

wolfgangwalther commented Oct 10, 2025 •

edited

Loading

mkleczek commented Oct 11, 2025 •

edited

Loading

mkleczek Oct 13, 2025 •

edited

Loading

mkleczek Oct 13, 2025 •

edited

Loading

mkleczek Oct 13, 2025 •

edited

Loading

mkleczek Oct 16, 2025 •

edited

Loading

mkleczek commented Oct 18, 2025 •

edited

Loading