[HUDI-9480] Add Expression Index Support#25
[HUDI-9480] Add Expression Index Support#25voonhous wants to merge 3 commits intoonehouseinc:masterfrom
Conversation
75effda to
f79b19d
Compare
1371d38 to
15d408f
Compare
11c470a to
5a58655
Compare
There was a problem hiding this comment.
Could the user also explicitly disable the expression index if there is an issue or regression? If so, let's revise the config description.
There was a problem hiding this comment.
Yes, all configurations that are added can be explicitly disabled.
There was a problem hiding this comment.
Use Preconditions#checkArgument
There was a problem hiding this comment.
nit: pull arguments.size() == 1 out of the condition check and put it into Preconditions#checkArgument; similar for other else if branches.
There was a problem hiding this comment.
Could unit tests be added on this class given this is critical?
There was a problem hiding this comment.
Is there any inefficiency of having two levels of expression during predicate evaluation, instead of sth like a Predicates.neq?
There was a problem hiding this comment.
This should be negligible as it will only be evaluated at most once. This is written the way it is as Predicates belongs to Hudi project scope: org.apache.hudi.expression.Predicates and does not have the neq function.
There was a problem hiding this comment.
In the case of "IN"(column, ARRAY[val1, val2...]), does the second argument only contain ARRAY[val1, val2...] and Hudi's Predicates supports it?
There was a problem hiding this comment.
Or should ARRAY[val1, val2...] be unfolded to adapt to Hudi's Predicates?
There was a problem hiding this comment.
Addressed this in the new commit. However, IN operators are not used for expression index file skipping for now, i.e. not used in io.trino.plugin.hudi.expression.HudiColumnStatsIndexEvaluator
There was a problem hiding this comment.
String name can be remove as not used.
There was a problem hiding this comment.
Use getName for matching? A static import of TRINO_FN_NAME can easily make it confusing.
| else if (HudiTrinoDayFunctionExpression.TRINO_FN_NAME.equalsIgnoreCase(functionName.getName())) { | |
| else if (HudiTrinoDayFunctionExpression.getName().equalsIgnoreCase(functionName.getName())) { |
There was a problem hiding this comment.
This is called before initializing an instance, so, a static function is required regardless. Will change TRINO_FN_NAME to getTrinoFnName().
There was a problem hiding this comment.
Does this general expression get evaluated without problem?
There was a problem hiding this comment.
This is suppose to be a catch-all similar to the default function which will be evaluated, but will not be used downstream.
I am currently trying to get nested types to be evaluated into this branch.
TLDR: There are some issues that will cause this general expression to not be evaluated/reached.
There was a problem hiding this comment.
For the specific functions for expression index, should they be put into a separate method for handling so it's more clear?
There was a problem hiding this comment.
Was thinking of this, but not really sure how to do it as initialisation of the concrete implementation is currently a one liner. It is concise and readable and i can't think of any better way to make it more readable.
There was a problem hiding this comment.
Does this collection only contain one or no entry based on the logic right now?
There was a problem hiding this comment.
tryAddAsCandidate will attempt to add candidates. It will contain contain at least one or no entry based on the logic.
The current smoke tests that were added are example/cases where there are more than one entry.
There was a problem hiding this comment.
Could the ExpressionConverter be a singleton and expose a method to convert the expression, and let the upper layer to handle whether they should be used with expression index?
There was a problem hiding this comment.
So basically, all the predicates or expressions are still kept and re-evaluated by Trino, correct?
There was a problem hiding this comment.
Yes, what we have added is outside the scope of what uses, so, it will not affect Trino's optimizations.
There was a problem hiding this comment.
If the predicate is in the form of day(ts) > 1 and day(ts) < 10, it seems like it is not a candidate for the expression index. Only day(ts) > 1 is. Is that the case, i.e., complex or nested predicate with expression wrapped might not be eligible for expression index pruning?
149e93d to
f4ee420
Compare
|
Resolved merge conflicts, going through the comments incrementally. |
f4ee420 to
c2ccd5d
Compare
07472af to
c13b6e6
Compare
- Fix rebase errors - Added unit tests for ExpressionConverter - Fix checkstyle - Address comments
81d441d to
769e0a5
Compare
769e0a5 to
338f18d
Compare

Description
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: