Skip to content

Conversation

@jdavcs
Copy link
Member

@jdavcs jdavcs commented May 15, 2023

Solves issue uncovered via work on #16003.

This adds an automated naming convention for database indexes and constraints added by SQLAlchemy and Alembic. This ensures that indexes and constraints will always have deterministic names regardless of the database (PostgreSQL, SQLite, etc.) and of whether they were explicitly named or not (e.g. added inline via Column parameters). This is needed for correct handling of downgrade revision scripts (as well as consistency of constraint and index naming in the database overall).

The issue surfaced thanks to a failed test, which broke on a downgrade migration after a new index had been added to the model definition. The index was unnamed (added via index=True), and the name that was assigned to it by the database was different from the name used in the revision's downgrade script - hence, the test failed when the script tried to drop a nonexistent index. This bug will go unnoticed if the new migration is applied to an existing database (since the index is added via the revision script where it is explicitly named); however, if a new database is initialized, migrations are bypassed, which will trigger the bug on a subsequent downgrade.

Setting the object's name to match whatever is generated by the database by default is easy for postgres (e.g. [table-name]_[column-name]_fkey). However, SQLite does things differently: to quote Alembic's docs, "SQLite, unlike any other database, allows constraints to exist in the database that have no identifying name." [ref]. The solution is to either (a) always give objects explicit names (which is tedious and error-prone: no more inline specification of indexes and foreign keys!); or (b) to use an automated naming convention, recommended by SQLAlchemy and Alembic (which has a whole section on The Importance of Naming Constraints).

This has been on my to-do list for a long while. It finally bit me (thanks, @davelopez 😄)

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. Run ./manage_db.sh upgrade; ./manage_db.sh downgrade base; ./manage_db.sh upgrade: this verifies that the helper methods in migrations.utils.DbObjectNames used in existing revision scripts do not affect the state of an existing database.
    2. Add a new revision that creates a NEW index via the index=True Column attribute (i.e., index is not explicitly named). Repeat step 1. This verifies that the naming convention specified in galaxy.model is automatically applied in a revision script. Remove the revision after testing.
    3. Repeat steps 1,2 on postgresql or sqlite (so both databases are tested).
    4. Run pytest test/unit/app/test_migrate_database.py. This verifies that the naming convention is automatically applied when initializing a new database (and bypassing migrations), and that the names generated by the model are identical to those generated by the migration utility DbObjectNames.

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@jdavcs jdavcs added kind/enhancement area/database Galaxy's database or data access layer labels May 15, 2023
@jdavcs jdavcs added this to the 23.1 milestone May 15, 2023
@jdavcs jdavcs requested review from davelopez and mvdbeek May 15, 2023 03:51
@jdavcs
Copy link
Member Author

jdavcs commented May 15, 2023

I've added composite handling for indexes, but also for foreign keys and unique constraints - seemed silly not to, given the simplicity.

I did not refactor the model itself to utilize the naming helpers: there are 17 named indexes in the model definition, but 6 of them do not follow convention. If we choose to make that naming uniform across the database, we can add a migration to rename existing indexes (but I don't think it's worth it or really matters).

@jdavcs jdavcs force-pushed the dev_db_obj_naming branch from 5f4c826 to 2bbd339 Compare May 15, 2023 15:20
@jdavcs jdavcs force-pushed the dev_db_obj_naming branch from 2bbd339 to 22218b5 Compare May 15, 2023 15:23
Copy link
Contributor

@davelopez davelopez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks @jdavcs!

@jdavcs jdavcs force-pushed the dev_db_obj_naming branch from d94fb54 to f532069 Compare May 15, 2023 17:44
@jdavcs jdavcs merged commit ae7eff4 into galaxyproject:dev May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/database Galaxy's database or data access layer kind/enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants