feat(frontend): support iceberg predicate pushdown #19228

kwannoel · 2024-11-01T07:38:16Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
I have added test labels as necessary. See details.
I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
All checks passed in ./risedev check (or alias, ./risedev c)
My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)

My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

kwannoel · 2024-11-08T05:35:48Z

src/connector/src/source/iceberg/mod.rs

+
+        if splits.is_empty() {
+            bail!("No splits found for the iceberg table");
+        }


I'm thinking if it's reasonable to have 0 splits found, e.g. when querying empty iceberg table.

kwannoel · 2024-11-08T05:37:21Z

src/frontend/src/optimizer/plan_node/batch_iceberg_scan.rs

 pub struct BatchIcebergScan {
    pub base: PlanBase<Batch>,
    pub core: generic::Source,
+    #[educe(Hash(ignore))]
+    pub predicate: IcebergPredicate,


Hash and Eq are only required for streaming share plan. But down the road we may support batch share plan. In that case every single batch plan node using Educe needs to be audited.

kwannoel · 2024-11-08T05:46:22Z

src/frontend/src/optimizer/plan_node/logical_iceberg_scan.rs

    fn predicate_pushdown(
        &self,
        predicate: Condition,
        _ctx: &mut PredicatePushdownContext,
    ) -> PlanRef {
+        fn rw_literal_to_iceberg_datum(literal: &Literal) -> Option<IcebergDatum> {


Thinking if it's better to add a separate optimization pass instead of implementing it inside the predicate_pushdown, at the end of batch, to pattern match:

BatchFilter -> BatchIcebergScan

And rewrite both of them, to push down the predicate inside BatchFilter to BatchIcebergScan.

wdyt @chenzl25

It's mainly for future-proofing, in case we support batch share plan. Then the predicate inside BatchIcebergScan will always be empty during the share plan optimization pass, so we won't need to compare against it.

And the predicate will be held all the way by BatchFilter, as a rw predicate. That way only after any share plan optimization pass in batch, then we push down the rw predicate to the iceberg one, in a single step.

Changed it: 115e2ef. I think it makes more sense.

Besides preserving the semantics of Eq and Hash for Batch Share Plan (if any in the future), we also avoid multiple pushdown passes just for iceberg. We can just do it in one shot at the end, and benefit from other optimization pass like constant folding of RW.

This reverts commit 706b9d8.

This reverts commit d55dbc3.

This reverts commit 0bef41b.

This reverts commit eb49717.

This reverts commit 4859705.

github-actions bot added the type/feature label Nov 1, 2024

kwannoel added ci/run-e2e-iceberg-sink-tests ci/main-cron/run-selected ci/pr/run-selected labels Nov 7, 2024

kwannoel force-pushed the kwannoel/iceberg-predicate-pushdown branch from f14d851 to 8572e75 Compare November 7, 2024 03:35

This comment was marked as resolved.

Sign in to view

kwannoel commented Nov 8, 2024

View reviewed changes

kwannoel marked this pull request as ready for review November 8, 2024 05:46

kwannoel removed the ci/pr/run-selected label Nov 8, 2024

kwannoel added 19 commits November 8, 2024 13:49

match rw predicates

a0656ff

mark place to add filter

a105859

pass in schema fields as a parameter

7016979

convert input ref to reference, datum to iceberg datum

b0c00c0

convert rw expressions into iceberg predicates

f07e235

add support for more literals

aa7e899

add predicate proto

9b9bac0

interim commit: use iceberg proto

31dff2f

change predicate_pushdown return

11d5930

derive eq, hash for iceberg predicate

f648af3

interim commit: add iceberg_predicate to batch

eed2ed2

add fetch_parameters

ed6f73d

Revert "derive eq, hash for iceberg predicate"

5fb182b

This reverts commit 706b9d8.

Revert "change predicate_pushdown return"

6dea2bc

This reverts commit d55dbc3.

Revert "interim commit: use iceberg proto"

33b6103

This reverts commit 0bef41b.

Revert "add predicate proto"

b53d205

This reverts commit eb49717.

Revert "interim commit: add iceberg_predicate to batch"

287b032

This reverts commit 4859705.

use iceberg predicate in logical_iceberg_scan fields

060b6f1

add to batch

1e23059

kwannoel added 15 commits November 8, 2024 13:49

build with predicate

34392c8

clean

1dc7fd5

implement distill

28389ee

fix warn

57f0c33

add tests

6f41c2e

no verbose for wget

0c9da00

more tests

693b608

check results

e76d65d

fix bugs

5672443

explain source plan, maybe schema malformed

d742365

fix tests

519c115

fmt

f4e9633

increase timeout

32823b5

no need double assert

665b46f

docs

0d2f661

kwannoel force-pushed the kwannoel/iceberg-predicate-pushdown branch from e4811b5 to 0d2f661 Compare November 8, 2024 05:49

kwannoel added 2 commits November 8, 2024 15:44

fmt

a7c98a2

increase timeout

f394302

graphite-app bot requested a review from a team November 8, 2024 08:55

kwannoel added 3 commits November 8, 2024 19:38

use rule based predicate push down

115e2ef

prune BatchFilter if predicate always true

e60b4e4

test mix filter and predicate

326cd17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(frontend): support iceberg predicate pushdown #19228

feat(frontend): support iceberg predicate pushdown #19228

kwannoel commented Nov 1, 2024

This comment was marked as resolved.

kwannoel Nov 8, 2024

kwannoel Nov 8, 2024

kwannoel Nov 8, 2024 •

edited

Loading

kwannoel Nov 8, 2024

kwannoel Nov 8, 2024

feat(frontend): support iceberg predicate pushdown #19228

Are you sure you want to change the base?

feat(frontend): support iceberg predicate pushdown #19228

Conversation

kwannoel commented Nov 1, 2024

What's changed and what's your intention?

Checklist

Documentation

Release note

This comment was marked as resolved.

kwannoel Nov 8, 2024

Choose a reason for hiding this comment

kwannoel Nov 8, 2024

Choose a reason for hiding this comment

kwannoel Nov 8, 2024 • edited Loading

Choose a reason for hiding this comment

kwannoel Nov 8, 2024

Choose a reason for hiding this comment

kwannoel Nov 8, 2024

Choose a reason for hiding this comment

kwannoel Nov 8, 2024 •

edited

Loading