-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix regexp filter query #2797
base: main
Are you sure you want to change the base?
Fix regexp filter query #2797
Conversation
Signed-off-by: panguixin <panguixin@bytedance.com>
@@ -48,6 +48,15 @@ private ExprValue evaluateExpression( | |||
return ExprBooleanValue.of(false); | |||
} | |||
|
|||
// refer to https://github.com/opensearch-project/sql/issues/2796 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some unit tests in ExpressionFilterScriptTest
?
if (result.integerValue() == 0) { | ||
result = ExprBooleanValue.of(false); | ||
} else if (result.integerValue() == 1) { | ||
result = ExprBooleanValue.of(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check will cause additional issue.
select * from %s where 1 * 1
will be treated as where true
and select * from %s where 1 * 0
will be treated as where false
. They should throw exceptions.
How about change the return value type to ExprBooleanValue
in
sql/core/src/main/java/org/opensearch/sql/utils/OperatorUtils.java
Lines 37 to 40 in 4326396
public static ExprIntegerValue matchesRegexp(ExprValue text, ExprValue pattern) { | |
return new ExprIntegerValue( | |
Pattern.compile(pattern.stringValue()).matcher(text.stringValue()).matches() ? 1 : 0); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we modify the return type of OperatorUtils#matchesRegexp, it will impact the return type of the REGEXP function, which could be a breaking change. For instance, the query SELECT 'Hello!' REGEXP '.*', 'a' REGEXP 'b'
will initially yield 1,0
as the output. However, after modifying the return type, the output will be true,false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LantaoJin only when the expression is a REGEXP expression will the conversion be executed. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. The conversion scope should be limited in REGEXP
expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed it. Please take another look at your convenience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why REGEXP
should return integer? From the Javadoc, the implementation seems wrong? I'm thinking we should avoid adding such special logic on critical path if possible.
/**
* Checks if text matches regular expression pattern.
*
* @param pattern string pattern to match.
* @return if text matches pattern returns true; else return false.
*/
public static ExprIntegerValue matchesRegexp(ExprValue text, ExprValue pattern) {
return new ExprIntegerValue(
Pattern.compile(pattern.stringValue()).matcher(text.stringValue()).matches() ? 1 : 0);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why REGEXP should return integer?
@dai-chen I guess the reason is the behaviour of MySQL (https://dev.mysql.com/doc/refman/8.0/en/regexp.html#operator_regexp). The original request was opened in opendistro-for-elasticsearch/sql#710 and its implementation introduced by opendistro-for-elasticsearch/sql#750. It would be a breaking change if we change its return value type. Besides updating the user doc (https://github.com/opensearch-project/sql/blob/main/docs/user/dql/expressions.rst#regexp-value-test), we need a changelog file to record those breaking changes. Any thoughts cc @penghuo, @chloe-zh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My proposal are two options:
- Correct the return type of
matchesRegexp()
toExprBooleanValue
in this PR, including user doc updates, then we need a new PR to introduce a changelog to record any breaking changes. - Close current RP because it is not an issue.
SELECT field1 FROM test WHERE 1 = (field1 REGEXP 'test.*')
should work. The current fixing not only adds special logic but also introduces new confused behaviour (REGEXP expression returns integer in SELECT clause but boolean in WHERE clause).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, option 2 lacks user-friendliness.
return expression instanceof FunctionImplementation | ||
&& ((FunctionExpression) expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: instanceof FunctionExpression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Signed-off-by: panguixin <panguixin@bytedance.com>
CI was blocked. Triggered its run now. |
This PR is stalled because it has been open for 30 days with no activity. |
Description
Fix #2796 .
Issues Resolved
#2796
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.