-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-38258][model] support model args in ptf and add ml_predict builtin ptf #26924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
2eeadb8
to
ed213d6
Compare
@@ -147,7 +159,9 @@ private static void checkScalarArgsOnly(List<StaticArgument> defaultArgs) { | |||
checkPassThroughColumns(declaredArgs); | |||
|
|||
final List<StaticArgument> newStaticArgs = new ArrayList<>(declaredArgs); | |||
newStaticArgs.addAll(PROCESS_TABLE_FUNCTION_SYSTEM_ARGS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious what the system arguments mean. Is this something that the user needs to be aware of? I do not see this phrase in the Flip and there is no more information in the Jira. I suggest including a description and motivation behind this piece. It appears to be a type of static arg that will be added if the boolean flag is on, but I am not sure when this would/should be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to control whether uid
, ontime
field will be added to ptf input. This is currently used by ml_predict
because it doesn't need uid
and ontime
field. It's not exposed to PTF function user can define. Yes. I can add more description if this approach makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This in not entirely correct. A user-defined PTF can implement a TypeInference and avoid system args, but this is kind of second-level API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job @lihaosky! I added some comments to improve the contribution a bit but nothing major.
.../main/java/org/apache/flink/table/expressions/resolver/rules/ResolveCallByArgumentsRule.java
Outdated
Show resolved
Hide resolved
@@ -298,6 +298,11 @@ public Builder staticArguments(StaticArgument... staticArguments) { | |||
return this; | |||
} | |||
|
|||
public Builder allowSystemArguments(boolean allowSystemArguments) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's call this disableSystemArguments
and in TypeInference. By default, this then can be false.
...-table/flink-table-common/src/main/java/org/apache/flink/table/functions/ModelSemantics.java
Outdated
Show resolved
Hide resolved
DataType inputDataType(); | ||
|
||
/** | ||
* Output data type produced by the passed model. Extracting type from PTF class definition is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Output data type produced by the passed model. Extracting type from PTF class definition is | |
* Output data type produced by the passed model. |
if (tableSemantics == null) { | ||
if (throwOnFailure) { | ||
throw new ValidationException( | ||
"First argument must be a table for ML_PREDICT function."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An input type strategy is optional if static arguments have been declared. You can assume that this check has been done already. An input type strategy might only be useful if you want to do additional validation, like validateTableAndDescriptorArguments below
import org.apache.flink.table.types.DataType; | ||
|
||
/** Mock implementation of {@link ModelSemantics} for testing purposes. */ | ||
public class ModelSemanticsMock implements ModelSemantics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move next to CallContextMock into utils package
@@ -62,6 +65,7 @@ public final class CallBindingCallContext extends AbstractSqlCallContext { | |||
private final List<DataType> argumentDataTypes; | |||
private final @Nullable DataType outputType; | |||
private final @Nullable List<StaticArgument> staticArguments; | |||
private final SqlValidator validator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validator does not really fit in here, can we avoid it? SqlModelCall should have resolved types already. I think TableArgCall has the same? we should synchronize the two if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored
@@ -1327,7 +1326,7 @@ public List<SqlGroupedWindowFunction> getAuxiliaryFunctions() { | |||
public static final SqlFunction SESSION = new SqlSessionTableFunction(); | |||
|
|||
// MODEL TABLE FUNCTIONS | |||
public static final SqlFunction ML_PREDICT = new SqlMLPredictTableFunction(); | |||
// public static final SqlFunction ML_PREDICT = new SqlMLPredictTableFunction(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
...table-planner/src/main/java/org/apache/flink/table/planner/plan/QueryOperationConverter.java
Outdated
Show resolved
Hide resolved
util.addTemporarySystemFunction("f", NoSystemArgsTableFunction.class); | ||
assertThatThrownBy(() -> util.verifyRelPlan("SELECT * FROM f(r => TABLE t, i => 1);")) | ||
.satisfies( | ||
anyCauseMatches("Disabling uid/time attributes is not supported for PTF.")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anyCauseMatches("Disabling uid/time attributes is not supported for PTF.")); | |
anyCauseMatches("Disabling system arguments is not supported for user-defined PTF yet.")); |
if (tableSemantics == null) { | ||
if (throwOnFailure) { | ||
throw new ValidationException( | ||
"First argument must be a table for ML_PREDICT function."); | ||
} else { | ||
return Optional.empty(); | ||
} | ||
} | ||
|
||
// Check that second argument is a model | ||
ModelSemantics modelSemantics = callContext.getModelSemantics(1).orElse(null); | ||
if (modelSemantics == null) { | ||
if (throwOnFailure) { | ||
throw new ValidationException( | ||
"Second argument must be a model for ML_PREDICT function."); | ||
} else { | ||
return Optional.empty(); | ||
} | ||
} | ||
|
||
// Check that third argument is a descriptor with column names | ||
Optional<ColumnList> descriptorColumns = callContext.getArgumentValue(2, ColumnList.class); | ||
if (descriptorColumns.isEmpty()) { | ||
if (throwOnFailure) { | ||
throw new ValidationException( | ||
"Third argument must be a descriptor with simple column names for ML_PREDICT function."); | ||
} else { | ||
return Optional.empty(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twalthr , I added these checks back since I have tests in MLPredictInputTypeStrategyTest
testing invalid argument as well. Also tableSemantics
etc are needed below. Maybe doesn't hurt to do extra check and give meaningful error message.
@flinkbot run azure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @lihaosky. I just added some final minor comments. The PR should be ready to merge in the next iteration.
@@ -117,6 +133,35 @@ public static InputTypeStrategy windowTimeIndicator() { | |||
and(logical(LogicalTypeFamily.CHARACTER_STRING), LITERAL), | |||
JSON_ARGUMENT)); | |||
|
|||
/** Input strategy for {@link BuiltInFunctionDefinitions#ML_PREDICT}. */ | |||
public static final InputTypeStrategy ML_PREDICT_INPUT_TYPE_STRATEGY = | |||
new InputTypeStrategy() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually we put this strategy into a separate class in same package but with default scope. take ArrayComparableElementArgumentTypeStrategy as an example
@@ -194,6 +202,44 @@ public final class SpecificTypeStrategies { | |||
/** Type strategy specific for {@link BuiltInFunctionDefinitions#OBJECT_UPDATE}. */ | |||
public static final TypeStrategy OBJECT_UPDATE = new ObjectUpdateTypeStrategy(); | |||
|
|||
/** Type strategy specific for {@link BuiltInFunctionDefinitions#ML_PREDICT}. */ | |||
public static final TypeStrategy ML_PREDICT_OUTPUT_TYPE_STRATEGY = | |||
callContext -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually we put this strategy into a separate class in same package but with default scope. take ObjectUpdateTypeStrategy as an example.
} | ||
|
||
@Override | ||
public @Nullable RelNode convert(RelNode rel) { | ||
final FlinkLogicalTableFunctionScan scan = (FlinkLogicalTableFunctionScan) rel; | ||
final RexCall rexCall = (RexCall) scan.getCall(); | ||
validateAllowSystemArgs(rexCall); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validating one time should be enough. So we can drop the call in StreamPhysicalProcessTableFunction.
void testNoSystemArgsAllowedForScalarPtf() { | ||
util.addTemporarySystemFunction("f", NoSystemArgsScalarFunction.class); | ||
assertThatThrownBy(() -> util.verifyRelPlan("SELECT * FROM f(i => 1);")) | ||
.satisfies(anyCauseMatches("Disabling system arguments is not supported for PTF.")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.satisfies(anyCauseMatches("Disabling system arguments is not supported for PTF.")); | |
.satisfies(anyCauseMatches("Disabling system arguments is not supported for user-defined PTF.")); |
"SQL validation failed. Unsupported expression -1 is in runtime config at position line 2, column 109. Currently, runtime config should be be a MAP of string literals."); | ||
.hasCauseInstanceOf(ValidationException.class) | ||
.hasStackTraceContaining( | ||
"Config param of ML_PREDICT function should be a MAP of String literals."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Config param of ML_PREDICT function should be a MAP of String literals."); | |
"Config paramameter of ML_PREDICT function should be a MAP data type consisting string literals."); |
What is the purpose of the change
ml_predict
builtin ptf definitionBrief change log
ModelSemantics
inCallContext
model
inStaticArgument
disableSystemArguments
to control whether adduid
androwtime
in ptf input/outputml_predict
builtin ptfVerifying this change
ml_predict
type inferenceDoes this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation