Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move some common database import subroutines to Script::Utils #3200

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mwiencek
Copy link
Member

@mwiencek mwiencek commented Mar 8, 2024

Problem

  • We have some identical or very similar-looking subroutines duplicated between admin/MBImport.pl and admin/replication/ImportReplicationChanges: empty (to tell if a table is empty) and ImportTable (to import a table from a file). The two implementations of the latter function differ slightly.

  • For the incremental dumps (MusicBrainz::Script::Role::IncrementalDump), I'd like to reuse the ImportTable function to import a dbmirror2 packet into some temporary tables (to parse it).

    We currently parse the old packets by hand; I'm surprised it works at all (it probably doesn't in some cases) because I don't believe it fully or properly implement's parsing of PostgreSQL's COPY text format. dbmirror2 packets contain JSON, which may contain JSON escape sequences underneath COPY's text format escape sequences, and the mixing of these can be tricky to parse. It's much, much easier to let PostgreSQL do the parsing!

Solution

This just adds two shared subroutines, is_table_empty and copy_table_from_file to Script::Utils, and modifies admin/MBImport.pl and admin/replication/ImportReplicationChanges to use them.

Testing

We have existing automated tests that make heavy use of these scripts.

I did not test the --fix-broken-utf8 or --ignore-errors flags specifically.

Copy link
Member

@reosarevok reosarevok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems sensible to me, but I'd prefer if we had a test for the extra flags if they wouldn't be too complex. Although it does kinda feel maybe we can just drop the utf8 flag, but maybe we should check with @mayhem first if he remembers the reasons. The comment mentioning automodsaccepted which I have never even seen suggests that this is ancient and the extra options might be unneeded.

@yvanzo yvanzo marked this pull request as draft May 28, 2024 14:48
@mwiencek
Copy link
Member Author

This seems sensible to me, but I'd prefer if we had a test for the extra flags if they wouldn't be too complex. Although it does kinda feel maybe we can just drop the utf8 flag, but maybe we should check with @mayhem first if he remembers the reasons. The comment mentioning automodsaccepted which I have never even seen suggests that this is ancient and the extra options might be unneeded.

Well, I'm not sure how to test those easily (I guess I'd have to construct, by hand, some broken dumps with artificial errors?), but I didn't change how they work either. If I somehow broke them, it'd only be by passing the options incorrectly (via ignore_errors => $fIgnoreErrors, etc.).

I've never used these flags in all my time at MB, as far as I can remember. At the same time, I left them alone in the PR to avoid having to make a decision about removing them. :) Since it's unrelated to what I wanted to achieve here, which is making ImportTable reusable.

(Removing the draft status since this was finished when I submitted it.)

@mwiencek mwiencek marked this pull request as ready for review December 19, 2024 02:19
@mwiencek mwiencek force-pushed the script-utils-dump-methods branch from 2b06d5d to 168972d Compare December 19, 2024 02:23
admin/MBImport.pl and admin/replication/ImportReplicationChanges contained very
similar implementations of `ImportTable`, so it would be ideal to share them.
I'd also like to use the same functionality in a future commit (to load
dbmirror2 packets into temporary tables).

The implementations in these two files did diverge slightly. For one,
MBImport.pl's allowed fixing broken UTF-8 byte sequences. I'm not sure how
necessary that is in 2024, or what the historical reasons for adding it were,
but I kept the functionality behind a flag in `copy_table_from_file`.

MBImport.pl's also supported the flags `$delete_first` (to empty the table
before importing) and `$fProgress` (to control whether progress is shown).
I've basically kept all of MBImport.pl's code, with these features behind
`%opts` flags.

I kept the definitions of `ImportTable`, but they now call
`copy_table_from_file` internally. I couldn't replace all of the `ImportTable`
calls with direct calls to `copy_table_from_file`, because `ImportTable` also
updates statistics local to each file and has a different return value.
@mwiencek mwiencek force-pushed the script-utils-dump-methods branch from 168972d to b810c1a Compare December 19, 2024 02:24
Copy link
Contributor

@yvanzo yvanzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mwiencek for this refactoring. It’s a shame we can’t find the reason for this code fixing UTF-8. Did you test your changes with correct dumps still?

Would it be reasonable to add support for setting these unused options such as show_progress from the command line?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants