-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[jdbc] Refactor make-cached-row-num->i->thunk #179
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #179 +/- ##
==========================================
- Coverage 83.23% 83.12% -0.12%
==========================================
Files 37 37
Lines 2511 2506 -5
Branches 215 215
==========================================
- Hits 2090 2083 -7
- Misses 206 208 +2
Partials 215 215 ☔ View full report in Codecov by Sentry. |
da6c1d9
to
bf301c2
Compare
Yeah, it's only used for clearing the cache. I guess
The idea is that if you get the value of say |
(defn- read-column-value [conn model ^ResultSet rset i] | ||
(let [i (int i) | ||
rsmeta (.getMetaData rset) | ||
thunk (read-column-thunk conn model rset rsmeta i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it is actually somewhat important here to calculate these thunks just once for the entire result set instead of once per value. The overhead of calling the read-column-thunk
multimethod on every single value in the results can get pretty high.
This whole read-column-thunk
idea came about when I did something similar in Metabase. Some of these read-column-thunk
methods have to look at ResultSetMetaData
to determine how to read a column out or make other decisions like that that only need to happen once per ResultSet rather than once per row. Here's an example with Postgres in Metabase:
Both timestamp
and timestamp with time zone
come back as java.sql.Types/TIMESTAMP
so you need to look at the ResultSetMetaData
to determine what the actual column type is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your intent and agree that it would be beneficial. The problem is, if I'm reading the code correctly, the process that you describe is not actually cached right now. Obtaining metadata and calling read-column-thunk
happens here https://github.com/camsaul/toucan2/blob/master/src/toucan2/jdbc/read.clj#L134, inside the cached lambda, and so it happens for each row.
I can try to rewrite this code so that it is properly cached and reused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calculating result-set-thunk
once per ResultSet instead of once per row is actually an important optimization
After benchmarking various Metabase endpoints that interact with the database, I strongly suspect that the caching overhead outweighs the benefit. ResultSet getters like I will make a separate PR that removes the value caching completely. Let's see how it goes. |
I made two changes to
make-cached-row-num->i->thunk
:I also have a few questions for some future potential changes:
current-row-num
if it is in fact only used for clearing the cache? Can(.getRow rset)
be used instead, like in other functions?read-column-thunk
(which eventually produces the actual values that are then put intocached-values
atom) doesn't seem to perform anything worth caching. This is probably the reason why nobody noticed yet that cache wasn't actually working.If cache was not needed here, the function and approach can be simplified much more. If backward compatibility of these particular functions is not a concern (they look like a part of the internal API), then even more improvements are possible.