Merged
Conversation
…#40) I want to use psycopg3 features (the sql module and the copy write_record) but don't want to break backwards compatibility just yet. This introduces psycopg3 while leaving in psycopg2 and switching it for the binary version to make it easier for users to install the library. While messing with the connection methods I've isolated the connection object that is returned from ldlite and the one that ldlite uses internally. This makes sure that the connections aren't broken or closed unexepectedly.
LDlite has been playing fast and loose with concatenating sql strings. It's probably fine but psycopg's sql module is intended to make this safe. While refactoring the existing code to implementing the sql module I've abstracted it into a database class intended to be overridden. In the first implementation we're still using the DBType enum and a single implementation, this will change in future into separate postgres and duckdb implementations without the DBType enum.
The drop tables logic was always a little weird and required autocommit transaction hacks for postgres. This refactors the existing drop tables logic into the database abstraction. Any place that is dropping tables also uses the same logic where is was duplicated before. I don't know when the jtable was used but it doesn't hurt to leave it in I guess? This also lets the abstraction manage its own connection as is intended by the abstraction. There is maybe a behavior change here, where a table in the catalog doesn't actually exist in the database. Before that wouldn't cause an error but now it will. I'm ok with this edge case as the behavior wasn't really specified before. Lastly, the on_progress function was doing multiple checks which adds up over a million invocations. Maybe python optimizes these code paths at runtime but it doesn't hurt to do a little less work by default.
Using COPY FROM is an order of magnitude faster for bulk insertions for postgres. This is the last low-hanging fruit for download performance until async/await is supported and optimized for in a future future release. DuckDB can kind of sort of do bulk operations but it isn't a priority right now, especially with sqlite still supported in this release. Getting bytes do go directly into the database was a bit of a struggle but postgres expects a "version" byte at the beginning of any record. By doing the byte munging ourselves we skip a whole lot of conversion in ldlite and conversion/logic in psycopg which would have added points of failure and slowness. I'm going to be testing to make sure it works across 5C FOLIO data before releasing the change.
I wasn't super happy with the weird type shenanigans happening in the last MR #43 but thought that doing a loads/dumps on all the source records to keep it consistent would be too slow. I discovered orjson.Fragment which means we can just use dumps and treat both srs and non-srs as bytes. This simplified the signatures and type handling quite a bit. Streaming support for SRS was rushed because non-streaming became even more unstable under Ramsons. There was a chance (though probably small) that someone was loading source-storage endpoints that weren't the source-records one and they would have broken. This adds more consistency around which endpoints we do support with streaming and fixes any accidental breaks.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Contains a refactor of the download portion of the database code to allow for using COPY FROM for inserting data into postgres. There's a couple of other minor bugfixes and upgrades in the CHANGELOG.