Skip to content

Conversation

@holdenk
Copy link
Contributor

@holdenk holdenk commented Nov 8, 2025

What changes were proposed in this pull request?

We plumb through non-determinism information, as with nullability, from projection to union.
WIP: I need to double check the other set operations.

Why are the changes needed?

We need to be careful pushing through non-deterministic components & other changes.

Does this PR introduce any user-facing change?

Yes, the attribute reference now has a determinism field. For backwards compat, unpack does not unpack that field, and it has a default value.

How was this patch tested?

Added a test in the filter pushdown logic.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Nov 8, 2025
@holdenk holdenk marked this pull request as draft November 8, 2025 00:24
@holdenk
Copy link
Contributor Author

holdenk commented Nov 8, 2025

Open question: is not pushing through the union the right thing to do OR is it ok? I'm... fuzzy on this. I think we should mark the attribute ref as non-deterministic though regardless since non-determinism has broader implications but filter pushdown is where I found this. WDYT @cloud-fan ?

@holdenk holdenk requested a review from cloud-fan November 12, 2025 19:05
@holdenk
Copy link
Contributor Author

holdenk commented Nov 12, 2025

oh nvm test failures were not what I thought, sorry for the ping.

sfc-gh-hkarau and others added 2 commits November 13, 2025 11:21
…ons as we do with nullability

Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
@holdenk holdenk force-pushed the SPARK-47031-Union-determinism-fix branch from 5f4be55 to 00f3972 Compare November 13, 2025 19:21
@holdenk
Copy link
Contributor Author

holdenk commented Nov 13, 2025

Ok did more digging and thinking about the test failures. The problem with the current attempted solution is that we have expression which start out non-deterministic and then become deterministic once a seed assigned but the toAttribute is not updated on that and similarily the unions logic on toAttribute would not be updated on that either. One option I considered was passing in a callback (which I don't love of course) but given that we resolve the attrs in the union once that still would break (although if we made it not cached that could be ok). A "partial" fix could be to only propegate the non-deterministic into attr ref IFF it's something where a seed being set would not resolve the non-determinism but I don't see a super clean way to do that in the code right now. I'm going to give this more thought. Maybe a new trait for "SortOfNonDeterministic" and keep the existing NonDeterministic trait alive?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants