-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(frontend): support iceberg predicate pushdown #19228
base: main
Are you sure you want to change the base?
Conversation
f14d851
to
8572e75
Compare
This comment was marked as resolved.
This comment was marked as resolved.
|
||
if splits.is_empty() { | ||
bail!("No splits found for the iceberg table"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking if it's reasonable to have 0 splits found, e.g. when querying empty iceberg table.
pub struct BatchIcebergScan { | ||
pub base: PlanBase<Batch>, | ||
pub core: generic::Source, | ||
#[educe(Hash(ignore))] | ||
pub predicate: IcebergPredicate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hash and Eq are only required for streaming share plan. But down the road we may support batch share plan. In that case every single batch plan node using Educe
needs to be audited.
fn predicate_pushdown( | ||
&self, | ||
predicate: Condition, | ||
_ctx: &mut PredicatePushdownContext, | ||
) -> PlanRef { | ||
fn rw_literal_to_iceberg_datum(literal: &Literal) -> Option<IcebergDatum> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking if it's better to add a separate optimization pass instead of implementing it inside the predicate_pushdown
, at the end of batch, to pattern match:
BatchFilter -> BatchIcebergScan
And rewrite both of them, to push down the predicate inside BatchFilter
to BatchIcebergScan
.
wdyt @chenzl25
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mainly for future-proofing, in case we support batch share plan. Then the predicate inside BatchIcebergScan
will always be empty during the share plan optimization pass, so we won't need to compare against it.
And the predicate will be held all the way by BatchFilter
, as a rw predicate. That way only after any share plan optimization pass in batch, then we push down the rw predicate to the iceberg one, in a single step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it: 115e2ef. I think it makes more sense.
Besides preserving the semantics of Eq
and Hash
for Batch Share Plan
(if any in the future), we also avoid multiple pushdown passes just for iceberg. We can just do it in one shot at the end, and benefit from other optimization pass like constant folding of RW.
e4811b5
to
0d2f661
Compare
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.