[jdbc] Refactor make-cached-row-num->i->thunk #179

alexander-yakushev · 2024-09-13T20:54:06Z

I made two changes to make-cached-row-num->i->thunk:

Apparently, the cache implemented in this function was not working because the cached value was just discarded, and the new value was always recomputed and re-cached.
I removed one level of closures/thunking in the implementation while keeping the interface the same.

I also have a few questions for some future potential changes:

Why does the internal function take current-row-num if it is in fact only used for clearing the cache? Can (.getRow rset) be used instead, like in other functions?
What is it caching anyway? It looks to me that read-column-thunk (which eventually produces the actual values that are then put into cached-values atom) doesn't seem to perform anything worth caching. This is probably the reason why nobody noticed yet that cache wasn't actually working.

If cache was not needed here, the function and approach can be simplified much more. If backward compatibility of these particular functions is not a concern (they look like a part of the internal API), then even more improvements are possible.

codecov · 2024-09-13T21:00:24Z

Codecov Report

Attention: Patch coverage is 84.61538% with 2 lines in your changes missing coverage. Please review.

Project coverage is 83.12%. Comparing base (e14a45e) to head (bf301c2).
Report is 4 commits behind head on master.

Files with missing lines	Patch %	Lines
src/toucan2/jdbc/read.clj	84.61%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #179      +/-   ##
==========================================
- Coverage   83.23%   83.12%   -0.12%     
==========================================
  Files          37       37              
  Lines        2511     2506       -5     
  Branches      215      215              
==========================================
- Hits         2090     2083       -7     
- Misses        206      208       +2     
  Partials      215      215

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

camsaul · 2024-10-04T16:54:46Z

Why does the internal function take current-row-num if it is in fact only used for clearing the cache? Can (.getRow rset) be used instead, like in other functions?

Yeah, it's only used for clearing the cache. I guess (.getRow rset) would be fine. I think my thinking must have been calling it once and passing it around was better than calling it 20 times for 20 columns or something like that but maybe I was imagining stuff

What is it caching anyway? It looks to me that read-column-thunk (which eventually produces the actual values that are then put into cached-values atom) doesn't seem to perform anything worth caching. This is probably the reason why nobody noticed yet that cache wasn't actually working.

The idea is that if you get the value of say id twice then the first time does (.getInteger rset 1) or whatever and caches it and then the second time returns the cached value instead of fetching that value from the ResultSet again. But maybe this is not really important or the overhead of the caching stuff simply is not worth it

camsaul · 2024-10-04T17:00:34Z

src/toucan2/jdbc/read.clj

+(defn- read-column-value [conn model ^ResultSet rset i]
+  (let [i (int i)
+        rsmeta (.getMetaData rset)
+        thunk  (read-column-thunk conn model rset rsmeta i)


So it is actually somewhat important here to calculate these thunks just once for the entire result set instead of once per value. The overhead of calling the read-column-thunk multimethod on every single value in the results can get pretty high.

This whole read-column-thunk idea came about when I did something similar in Metabase. Some of these read-column-thunk methods have to look at ResultSetMetaData to determine how to read a column out or make other decisions like that that only need to happen once per ResultSet rather than once per row. Here's an example with Postgres in Metabase:

https://github.com/metabase/metabase/blob/f49407fcbdf18f40e247437f41b3bbefd1e227bf/src/metabase/driver/postgres.clj#L795-L803

Both timestamp and timestamp with time zone come back as java.sql.Types/TIMESTAMP so you need to look at the ResultSetMetaData to determine what the actual column type is.

I understand your intent and agree that it would be beneficial. The problem is, if I'm reading the code correctly, the process that you describe is not actually cached right now. Obtaining metadata and calling read-column-thunk happens here https://github.com/camsaul/toucan2/blob/master/src/toucan2/jdbc/read.clj#L134, inside the cached lambda, and so it happens for each row.

I can try to rewrite this code so that it is properly cached and reused.

camsaul

Calculating result-set-thunk once per ResultSet instead of once per row is actually an important optimization

alexander-yakushev · 2024-10-05T08:57:29Z

The idea is that if you get the value of say id twice then the first time does (.getInteger rset 1) or whatever and caches it and then the second time returns the cached value instead of fetching that value from the ResultSet again. But maybe this is not really important or the overhead of the caching stuff simply is not worth it

After benchmarking various Metabase endpoints that interact with the database, I strongly suspect that the caching overhead outweighs the benefit. ResultSet getters like .getInteger and .getString are lightweight enough so that looking up cached values in the hashmap are already not faster than accessing ResultSet directly. But the caching machinery that surrounds it is quite expensive to run.

I will make a separate PR that removes the value caching completely. Let's see how it goes.

alexander-yakushev · 2024-10-08T16:04:07Z

Closing this in favor of #183 and #189.

alexander-yakushev requested a review from camsaul as a code owner September 13, 2024 20:54

alexander-yakushev added 2 commits September 21, 2024 01:06

[jdbc] Fix cached values being ignored in make-cached-row-num->i->thunk

8295b6a

[jdbc] Get rid of extra thunking

bf301c2

alexander-yakushev force-pushed the jdbc-fixes branch from da6c1d9 to bf301c2 Compare September 20, 2024 22:07

camsaul reviewed Oct 4, 2024

View reviewed changes

camsaul requested changes Oct 4, 2024

View reviewed changes

alexander-yakushev mentioned this pull request Oct 5, 2024

[jdbc.read] Remove value caching from JDBC reading machinery #183

Merged

alexander-yakushev marked this pull request as draft October 5, 2024 09:58

alexander-yakushev mentioned this pull request Oct 8, 2024

[jdbc.read] Properly pre-compute column fetching thunk #189

Merged

alexander-yakushev closed this Oct 8, 2024

alexander-yakushev deleted the jdbc-fixes branch October 8, 2024 16:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[jdbc] Refactor make-cached-row-num->i->thunk #179

[jdbc] Refactor make-cached-row-num->i->thunk #179

alexander-yakushev commented Sep 13, 2024 •

edited

Loading

codecov bot commented Sep 13, 2024 •

edited

Loading

camsaul commented Oct 4, 2024

camsaul Oct 4, 2024

alexander-yakushev Oct 5, 2024

camsaul left a comment

alexander-yakushev commented Oct 5, 2024

alexander-yakushev commented Oct 8, 2024

[jdbc] Refactor make-cached-row-num->i->thunk #179

[jdbc] Refactor make-cached-row-num->i->thunk #179

Conversation

alexander-yakushev commented Sep 13, 2024 • edited Loading

codecov bot commented Sep 13, 2024 • edited Loading

Codecov Report

camsaul commented Oct 4, 2024

camsaul Oct 4, 2024

Choose a reason for hiding this comment

alexander-yakushev Oct 5, 2024

Choose a reason for hiding this comment

camsaul left a comment

Choose a reason for hiding this comment

alexander-yakushev commented Oct 5, 2024

alexander-yakushev commented Oct 8, 2024

alexander-yakushev commented Sep 13, 2024 •

edited

Loading

codecov bot commented Sep 13, 2024 •

edited

Loading