Add unique constraints to the email and username fields in the galaxy_user table #18493

jdavcs · 2024-07-04T00:08:40Z

BEFORE MERGING:

Rebase on top of @jmchilton's PR(s) that contain db migrations OR on top of merged Backend handling of setting user-role, user-group, and group-role associations #18777; update migration scripts to ensure proper lineage (hashes inside scripts + script filenames)
Add new db tests for testing data fixers

Migrations that add unique constraints to email and username fields in the galaxy_user table + model definition update.
Unique constraints are implemented with indexes in both postgres and sqlite, so the 2 existing indexes are no longer needed - which is why the migration drops them.

The model definitions for both tables contain both index=True and unique=True: that's not a problem. Having unique=True would be sufficient, but this, I think, makes it a little easier to understand (maybe?). I've tested manually - in any case a b-tree index is created with a unique constraint.

How to test the changes?

(Select all options that apply)

I've included appropriate automated tests.
This is a refactoring of components with existing test coverage.
Instructions for manual testing are as follows:
1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

jdavcs · 2024-07-04T05:51:23Z

Failures are relevant: sqlite and postgres create different db objects based on the column specification index=True unique=True, which is why our migration tests fail. One more cross-database gotcha..

mvdbeek · 2024-07-04T09:57:56Z

...laxy/model/migrations/alembic/versions_gxy/d619fdfa6168_username_column_unique_constraint.py

+
+def upgrade():
+    with transaction():
+        create_unique_constraint(constraint_name, table_name, [column_name])


This is going to fail without #18492. I would pull that into the migration. Once the constraint is present we'll never need the script, so I don't see why we have it live as a separate script.

My initial thinking was that we could run that script on the current release, which would deduplicate the usernames and, with that, stop the errors. However, the more I thought about it, I became convinced that it shouldn't be merged into a stable release: the script is a feature and belongs in dev. So yes, I'll move it into the same PR.

That said, I'm not sure we should be combining database schema revisions with data migrations in the same revision script. I think it may be better to have a script and reference it in the db migration module as an "upgrade note" (like I did here) + in the release admin notes.

I think we should be pragmatic here and not provide a migration that can fail, which is going to be a major pain for admins. This is not a huge amount of data that needs to be altered, we're talking a bit more than 200 users on main.

OK. I suppose we can distinguish between (a) a data migration that addresses business logic needs and (b) a data migration that fixes inconsistent data to enable the db schema migration; and combine the latter with the db schema migration into one revision script. We should then mention such cases in the release admin notes to alert admins that some data is being changed (adding a label now).

jdavcs · 2024-09-19T02:58:25Z

Implementation note on indexes and unique constraints
In both postgresql and sqlite, a unique constraint is implemented as an index. When the model's column definition contains both index=True and unique=True, only one index will be created. Therefore, attempting to create or drop both the index and the unique constraint in the migration script will result in an error. To handle this, we simply use create_index, passing unique=True as a keyword argument.

If we are adding a unique constraint to a column that had an index defined on it, we must drop that index, because it did not enforce a unique constraint. However, only existing databases will contain that index, new databases will not - so we must check for its existence in the migration script before issuing the drop_index command.

jdavcs · 2024-09-19T03:34:22Z

unit test failures relevant: the tests don't respect the new constraints. I'll fix this.
UPDATE: fixed.

jdavcs · 2024-09-21T04:28:26Z

So this has a new kind of test: individual migration test. The purpose is to test how existing data (that would prevent the execution of a particular migration) is fixed in the database during the migration process. For example, let's say we are adding a unique constraint to a field that currently contains duplicate values. We must cleanup the data before we can run this migration, so we have a script that handles this cleanup (by deleting duplicates, or deduplicating them, etc.). This test verifies such a script.

Here's what the test does:

Load and initialize a new database and migration environment
Downgrade to the revision BEFORE the migration under test
Load and verify bad data (that would have prevented running the next revision script)
Run the migration (via alembic)
Verify that data has been cleaned up as expected.

jdavcs added kind/enhancement area/database Galaxy's database or data access layer labels Jul 4, 2024

jdavcs added this to the 24.2 milestone Jul 4, 2024

jdavcs marked this pull request as draft July 4, 2024 01:59

mvdbeek reviewed Jul 4, 2024

View reviewed changes

jdavcs added the highlight/admin Included in admin/dev release notes label Jul 5, 2024

jdavcs force-pushed the dev_user_db_fields branch 4 times, most recently from c57cb7d to 48bf21f Compare September 19, 2024 02:49

jdavcs force-pushed the dev_user_db_fields branch from 48bf21f to 9edce5e Compare September 19, 2024 03:01

jdavcs added 2 commits September 20, 2024 17:16

Add directory for migrations data fixing scripts

e25f5e0

Add username deduplication data fixer

505cd87

jdavcs force-pushed the dev_user_db_fields branch from beac130 to 0c43588 Compare September 20, 2024 21:17

jdavcs added 9 commits September 20, 2024 17:24

Add migration for username column unique constraint

90f298b

Add email deduplication data fixer

0e4559f

Add migration for email column unique constraint

bacc304

Update the model w/unique constraints

07c0b2d

Fix integration test that violated db integrity constraint

527b371

Randomize test user email to respect unique constraint

b261dea

Fix bug in test_rule_helper

d8abcad

Fix test_quota to respect constraint

11a25ae

Fix test_galaxy_mapping to respect constraint

600523b

jdavcs force-pushed the dev_user_db_fields branch from 0c43588 to 600523b Compare September 20, 2024 21:24

jdavcs added 2 commits September 21, 2024 00:17

Refactor tests to reduce duplication

4471c6a

Add tests for migration data fixes

87573ee

jdavcs marked this pull request as ready for review September 21, 2024 04:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add unique constraints to the email and username fields in the galaxy_user table #18493

Add unique constraints to the email and username fields in the galaxy_user table #18493

jdavcs commented Jul 4, 2024 •

edited

Loading

jdavcs commented Jul 4, 2024

mvdbeek Jul 4, 2024 •

edited

Loading

jdavcs Jul 4, 2024

mvdbeek Jul 4, 2024

jdavcs Jul 5, 2024

jdavcs commented Sep 19, 2024

jdavcs commented Sep 19, 2024 •

edited

Loading

jdavcs commented Sep 21, 2024 •

edited

Loading

Add unique constraints to the email and username fields in the galaxy_user table #18493

Are you sure you want to change the base?

Add unique constraints to the email and username fields in the galaxy_user table #18493

Conversation

jdavcs commented Jul 4, 2024 • edited Loading

BEFORE MERGING:

How to test the changes?

License

jdavcs commented Jul 4, 2024

mvdbeek Jul 4, 2024 • edited Loading

Choose a reason for hiding this comment

jdavcs Jul 4, 2024

Choose a reason for hiding this comment

mvdbeek Jul 4, 2024

Choose a reason for hiding this comment

jdavcs Jul 5, 2024

Choose a reason for hiding this comment

jdavcs commented Sep 19, 2024

jdavcs commented Sep 19, 2024 • edited Loading

jdavcs commented Sep 21, 2024 • edited Loading

jdavcs commented Jul 4, 2024 •

edited

Loading

mvdbeek Jul 4, 2024 •

edited

Loading

jdavcs commented Sep 19, 2024 •

edited

Loading

jdavcs commented Sep 21, 2024 •

edited

Loading