Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(frontend): support iceberg predicate pushdown #19228

Open
wants to merge 39 commits into
base: main
Choose a base branch
from

Conversation

kwannoel
Copy link
Contributor

@kwannoel kwannoel commented Nov 1, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@kwannoel

This comment was marked as resolved.


if splits.is_empty() {
bail!("No splits found for the iceberg table");
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking if it's reasonable to have 0 splits found, e.g. when querying empty iceberg table.

pub struct BatchIcebergScan {
pub base: PlanBase<Batch>,
pub core: generic::Source,
#[educe(Hash(ignore))]
pub predicate: IcebergPredicate,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hash and Eq are only required for streaming share plan. But down the road we may support batch share plan. In that case every single batch plan node using Educe needs to be audited.

fn predicate_pushdown(
&self,
predicate: Condition,
_ctx: &mut PredicatePushdownContext,
) -> PlanRef {
fn rw_literal_to_iceberg_datum(literal: &Literal) -> Option<IcebergDatum> {
Copy link
Contributor Author

@kwannoel kwannoel Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking if it's better to add a separate optimization pass instead of implementing it inside the predicate_pushdown, at the end of batch, to pattern match:

BatchFilter -> BatchIcebergScan

And rewrite both of them, to push down the predicate inside BatchFilter to BatchIcebergScan.

wdyt @chenzl25

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mainly for future-proofing, in case we support batch share plan. Then the predicate inside BatchIcebergScan will always be empty during the share plan optimization pass, so we won't need to compare against it.

And the predicate will be held all the way by BatchFilter, as a rw predicate. That way only after any share plan optimization pass in batch, then we push down the rw predicate to the iceberg one, in a single step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it: 115e2ef. I think it makes more sense.

Besides preserving the semantics of Eq and Hash for Batch Share Plan (if any in the future), we also avoid multiple pushdown passes just for iceberg. We can just do it in one shot at the end, and benefit from other optimization pass like constant folding of RW.

@kwannoel kwannoel marked this pull request as ready for review November 8, 2024 05:46
@kwannoel kwannoel force-pushed the kwannoel/iceberg-predicate-pushdown branch from e4811b5 to 0d2f661 Compare November 8, 2024 05:49
@graphite-app graphite-app bot requested a review from a team November 8, 2024 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant