-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
perf: Index various lists in SchemaCache to change complexity from O(n*n) to O(n) #4396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Wow, it looks like it is too fast now :) - test_big_schema assertion that schema reloading takes 10s fails quite dramatically. |
Haha, that's nice. We should do some kind of consistency check as well, that the schema cache still returns all the same things. But... looks great so far. It's exactly the kind of problem I assumed to exist with these relationships... |
test/io/test_big_schema.py
Outdated
"plan" | ||
] | ||
assert plan_dur > 10000.0 | ||
assert plan_dur < 10000.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hrhr, nice change :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the change is nice, it does break the test expectation. This is not the primary test outcome - we are testing whether other stuff waits on the schema cache reload here. So a high plan duration is a requirement to make the test effective.
We will either need to throw the test away or rewrite it or... I don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it - I've taken a look at the test and understand its purpose now.
Testing this by measuring response time is brittle, to say politely.
Without deeper changes into the way things work, I don't see a way to keep this test.
One way would be to keep the timestamp of last schema load, and provide it in a response header. But it wouldn't really allow us to test waiting of concurrent requests for schema loading.
I tend to think this kind of properties should be tested using some kind of chaos testing or fuzzing that would only provide statistical guarantees of the system behavior.
Not sure what to do in this PR though. Sensible thing would be to simply comment out this test for now.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd certainly not comment it out. Either we can fix it or we should just remove it - we can always add it back. But commented out code is just waste.
I'll defer to @steve-chavez for this, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing this by measuring response time is brittle, to say politely.
Yeah, it was the easiest way to prove that at the time. It's most desirable to have requests waiting than to flood server logs with errors or reply quickly with failure (users just see postgREST as unreliable in these cases).
One way would be to keep the timestamp of last schema load, and provide it in a response header. But it wouldn't really allow us to test waiting of concurrent requests for schema loading.
I tend to think this kind of properties should be tested using some kind of chaos testing or fuzzing that would only provide statistical guarantees of the system behavior.
Another option would be to inject a sleep like we do here:
postgrest/src/PostgREST/SchemaCache.hs
Lines 153 to 155 in c1d9728
_ <- | |
let sleepCall = SQL.Statement "select pg_sleep($1 / 1000.0)" (param HE.int4) HD.noResult prepared in | |
whenJust configInternalSCSleep (`SQL.statement` sleepCall) -- only used for testing |
But only for the relationships loading part. Currently the above works for the whole schema cache load.
Then this test could be removed from test/io/test_big_schema.py too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steve-chavez - yeah, that makes the test more robust (no longer dependent on sluggish schema loading performance)
Done.
Introduced internal-schema-cache-relationship-load-sleep
and implemented delayed loading of relationships (see ec31bdd)
I left the tests in test_big_schema.py
- moving it to IO tests is probably a good idea but I wouldn't do that as part of this PR - it is getting out of hand anyway.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was trickier than I thought. Finally I've ended up with three internal configuration properties to introduce delays in various phases of schema cache loading.
See f8ff7c8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steve-chavez done: #4411
6f03a38
to
9c9e641
Compare
I am wondering what test we could add... This is a kind of refactoring that does not change any behavior (except performance) and should be covered by existing tests. WDYT? |
I wouldn't add an automated test for it, but it should be simple to load the big schema fixtures from that IO test and then run a |
I think @MHC2000 will be happy about this I can confirm the schema privately shared on #3704 (comment) Goes from ~7 minutes:
To ~14 seconds:
|
9c9e641
to
deeae82
Compare
I've updated the patch to cover more cases:
|
@steve-chavez can you check again with the latest version of the patch? It should improve the times even more. |
be211a9
to
e9fcffe
Compare
@mkleczek It's now down to ~2 seconds 🚀
|
Amazing, thanks for checking. I guess it is in mergeable state - patch code coverage is somewhat low but I think the changes touched code not covered by tests originally. It should be probably fixed but I am not sure if this PR is the right one though. |
e9fcffe
to
f22151f
Compare
filter (\(ViewKeyDependency _ viewQi _ dep _) -> dep == PKDep && viewQi == QualifiedIdentifier sch vw) keyDeps | ||
fold $ HM.lookup (PKDep, QualifiedIdentifier sch vw) indexedDeps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only diff in the schema-cache I get is here. The big schema has many of these:
@@ -858435,8 +858435,8 @@
"tableIsView": true,
"tableName": "v_pop_ohnekoord",
"tablePKCols": [
- "ap_id",
- "id"
+ "id",
+ "ap_id"
],
"tableSchema": "apflora",
"tableUpdatable": false
Aka, the order of tablePKCols
is changed.
I'm not exactly sure whether we rely on this order anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked at the source of HM.fromListWith op
and indeed: it seems like it calls newValue op existingValue
so it reverses the list order.
Not sure if we depend on this order anyway but will add reversing the list back just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wolfgangwalther added fmap reverse
in line 565 - that should fix this issue.
Can you re-check? (or you could provide me with a quick way to generate this diff)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The command I used was:
PGRST_DB_SCHEMAS=apflora postgrest-with-postgresql-17 -f test/io/big_schema.sql postgrest-run --dump-schema | jq
Pipe this to a file on main, then do the same on your branch, then diff the two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @wolfgangwalther - run it after 39b404e and it fixes the issue - no more differences between main
and this branch.
viewRels Relationship{relTable,relForeignTable,relCardinality=card} = | ||
if isM2O card || isO2O card then | ||
viewRels Relationship{relTable,relForeignTable,relCardinality=card} | isM2O card || isO2O card = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an unrelated refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed.
The previous version caused ugly multiple empty list returns (one for the else
branch in if
and another for the pattern match case). Fixing it was too tempting to resist :)
Do you think it is worth doing it in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for a separate PR, but a separate commit would be good - that will make things much easier to understand whenever we look at this again in months or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wolfgangwalther done - see cbcd7b8
else Nothing | ||
| Relationship jt1 t _ (M2O cons1 cols) _ tblIsView <- rels | ||
, Relationship jt2 ft _ (M2O cons2 fcols) _ fTblisView <- rels | ||
, jt1 == jt2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the removal of jt1 == jt2
, yet. Is this another unrelated refactor? Or related to the change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not needed as we lookup in the hash-map using jt1
as the key (ie. we changed filtering by equality to hash-map lookup) - that's the crux of this PR.
cbcd7b8
to
bb8423b
Compare
-- so we don't need to know about the other references. | ||
-- * We need to choose a single reference for each column, otherwise we'd output too many columns in location headers etc. | ||
takeFirstPK = mapMaybe (head . snd) | ||
indexedDeps = fmap reverse $ HM.fromListWith (++) $ fmap ((keyDepType &&& keyDepView) &&& pure) keyDeps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reverse
is quite mysterious, maybe add a comment that it was done to preserve backwards compat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we decide to keep it, can we first figure out whether we actually need it? Do we rely on the order of these or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we first figure out whether we actually need it?
I don't think we can figure that out. Overall users have been sensitive to OpenAPI changes, so IMO if we can easily maintain backwards compat we should do it.
@wolfgangwalther If you recall #1701, I tried to remove this legacy hack we have (IIRC, it's incorrect since it doesn't consider composite keys)
postgrest/src/PostgREST/Response/OpenAPI.hs
Lines 129 to 133 in c1d9728
n = catMaybes | |
[ Just "Note:" | |
, if pk then Just "This is a Primary Key.<pk/>" else Nothing | |
, fk | |
] |
But that broke vue-postgrest
so we kept it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure it does consider composite keys.
But this brings up an interesting question: Does this cause any observable OpenAPI output differences? I have not tested that. I have only tested the schema dump, which is just an internal representation. So my question was not really targeting whether this would change anything in the openapi output, but whether it would change anything internal.
But yes, testing the openapi output for differences would be a great idea as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest we leave it like this in this PR. Let's not make perfect the enemy of the good. This PR brings significant performance gains and I think it is worth merging it and possibly taking care of removing fmap reverse
here and updating the tests appropriately in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But yes, testing the openapi output for differences would be a great idea as well.
I compared it against the main
branch for both follow-privileges
and ignore-privileges
settings. It returns the same OpenAPI output in both branches. What's funny it's that it even returns the same output when the fmap reverse
is removed.
and possibly taking care of removing
fmap reverse
here and updating the tests appropriately in the future.
So I agree here. Having the same internal schema should be enough and no queries were modified either, so it's not likely that anything else would change (maybe add a TODO so we don't forget to check in the future?).
138eba9
to
f8ff7c8
Compare
…n*n) to O(n) * rels in addM2MRels * keyDeps in addViewPrimaryKeys * keyDeps in addViewM2OAndO2ORels
Also changed pattern matching to point-free usage of record function in findViewPKCols so that variable names match between code and comment.
f8ff7c8
to
45ef1a4
Compare
Hmm... I've rebased the PR on top of Locally:
CI reported all green on pre-rebase version of this PR. Looks like there is some flakiness in |
I restarted the CI job and it passed, it's flakiness. Although I've never had all memory tests failing locally. |
Fixes #4360
Fixes #3704