Allow comparison of String_UUID and Text #51

snordhausen · 2024-10-21T09:42:14Z

After switching from data-diff to reladiff, one of our tables started failing with

TypeError: Incompatible types for column 'the_column_name': String_UUID() <-> Text()

As far as I can tell, the root cause is that this code in sqeleton/databases/base.py uses sampling to determine if a column is String_UUID. But when such a column is nullable, it depends on chance if the sample contains NULL values. If it does, sqeleton will use the Text type, otherwise it will use String_UUID.

This change ensures that such columns can be compared reliably. I'm not sure if the is the best way to do it, but in our internal testing it looks good.

The code at https://github.com/erezsh/sqeleton/blob/8655be43096dd6610c4ed8b5f9713f9a97670e7e/sqeleton/databases/base.py#L523-L534 uses sampling to determine if a column is String_UUID. But when such a column is nullable, it depends on chance if the sample contains NULL values or not. If it does, sqeleton will use the Text type, otherwise it will use String_UUID. Thise change ensures that such columns can be compared reliably.

erezsh · 2024-10-21T12:44:46Z

Thanks for reporting this and suggesting a solution.

I think a better solution would be to do better sampling, for example by adding WHERE pk_col IS NOT NULL. What do you think?

snordhausen · 2024-10-21T13:23:30Z

That sounds promising, I just don't know exactly how the sampling mechanism works. If the sampling happens column-by-column, it should work.

But if we apply the IS NOT NULL to all columns at the same time, wouldn't we risk not getting any samples at all? E.g when one of the columns is completely NULL because it is used for a new software feature that is not yet live.

erezsh · 2024-10-21T14:27:09Z

That's a good point. If a column is entirely null, there is no way to verify if it's UUID or not.

At the same time, if we use a column that is arbitrary text as a key, the diff might return incorrect results.

But it shouldn't be too hard to mark a column that is all null (or empty) as such. Then when comparing the column types, it could safely adopt whatever is the type in the other table.

erezsh

This solution will allow keys to have arbitrary text values, thus potentially returning an incorrect diff.

erezsh requested changes Oct 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow comparison of String_UUID and Text #51

Allow comparison of String_UUID and Text #51

snordhausen commented Oct 21, 2024

erezsh commented Oct 21, 2024 •

edited

Loading

snordhausen commented Oct 21, 2024

erezsh commented Oct 21, 2024

erezsh left a comment

Allow comparison of String_UUID and Text #51

Are you sure you want to change the base?

Allow comparison of String_UUID and Text #51

Conversation

snordhausen commented Oct 21, 2024

erezsh commented Oct 21, 2024 • edited Loading

snordhausen commented Oct 21, 2024

erezsh commented Oct 21, 2024

erezsh left a comment

Choose a reason for hiding this comment

erezsh commented Oct 21, 2024 •

edited

Loading