Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow comparison of String_UUID and Text #51

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

snordhausen
Copy link

After switching from data-diff to reladiff, one of our tables started failing with

TypeError: Incompatible types for column 'the_column_name': String_UUID() <-> Text()

As far as I can tell, the root cause is that this code in sqeleton/databases/base.py uses sampling to determine if a column is String_UUID. But when such a column is nullable, it depends on chance if the sample contains NULL values. If it does, sqeleton will use the Text type, otherwise it will use String_UUID.

This change ensures that such columns can be compared reliably. I'm not sure if the is the best way to do it, but in our internal testing it looks good.

The code at
https://github.com/erezsh/sqeleton/blob/8655be43096dd6610c4ed8b5f9713f9a97670e7e/sqeleton/databases/base.py#L523-L534
uses sampling to determine if a column is String_UUID.
But when such a column is nullable, it depends on chance if the sample
contains NULL values or not. If it does, sqeleton will use the Text
type, otherwise it will use String_UUID.
Thise change ensures that such columns can be compared reliably.
@erezsh
Copy link
Owner

erezsh commented Oct 21, 2024

Thanks for reporting this and suggesting a solution.

I think a better solution would be to do better sampling, for example by adding WHERE pk_col IS NOT NULL. What do you think?

@snordhausen
Copy link
Author

That sounds promising, I just don't know exactly how the sampling mechanism works. If the sampling happens column-by-column, it should work.

But if we apply the IS NOT NULL to all columns at the same time, wouldn't we risk not getting any samples at all? E.g when one of the columns is completely NULL because it is used for a new software feature that is not yet live.

@erezsh
Copy link
Owner

erezsh commented Oct 21, 2024

That's a good point. If a column is entirely null, there is no way to verify if it's UUID or not.

At the same time, if we use a column that is arbitrary text as a key, the diff might return incorrect results.

But it shouldn't be too hard to mark a column that is all null (or empty) as such. Then when comparing the column types, it could safely adopt whatever is the type in the other table.

Copy link
Owner

@erezsh erezsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution will allow keys to have arbitrary text values, thus potentially returning an incorrect diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants