Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regexp filter query #2797

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

bugmakerrrrrr
Copy link
Contributor

@bugmakerrrrrr bugmakerrrrrr commented Jul 1, 2024

Description

Fix #2796 .

Issues Resolved

#2796

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: panguixin <panguixin@bytedance.com>
@@ -48,6 +48,15 @@ private ExprValue evaluateExpression(
return ExprBooleanValue.of(false);
}

// refer to https://github.com/opensearch-project/sql/issues/2796
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some unit tests in ExpressionFilterScriptTest?

Comment on lines +53 to +56
if (result.integerValue() == 0) {
result = ExprBooleanValue.of(false);
} else if (result.integerValue() == 1) {
result = ExprBooleanValue.of(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check will cause additional issue.
select * from %s where 1 * 1 will be treated as where true and select * from %s where 1 * 0 will be treated as where false. They should throw exceptions.
How about change the return value type to ExprBooleanValue in

public static ExprIntegerValue matchesRegexp(ExprValue text, ExprValue pattern) {
return new ExprIntegerValue(
Pattern.compile(pattern.stringValue()).matcher(text.stringValue()).matches() ? 1 : 0);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we modify the return type of OperatorUtils#matchesRegexp, it will impact the return type of the REGEXP function, which could be a breaking change. For instance, the query SELECT 'Hello!' REGEXP '.*', 'a' REGEXP 'b' will initially yield 1,0 as the output. However, after modifying the return type, the output will be true,false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LantaoJin only when the expression is a REGEXP expression will the conversion be executed. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. The conversion scope should be limited in REGEXP expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed it. Please take another look at your convenience.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why REGEXP should return integer? From the Javadoc, the implementation seems wrong? I'm thinking we should avoid adding such special logic on critical path if possible.

  /**
   * Checks if text matches regular expression pattern.
   *
   * @param pattern string pattern to match.
   * @return if text matches pattern returns true; else return false.
   */
  public static ExprIntegerValue matchesRegexp(ExprValue text, ExprValue pattern) {
    return new ExprIntegerValue(
        Pattern.compile(pattern.stringValue()).matcher(text.stringValue()).matches() ? 1 : 0);
  }

Copy link
Member

@LantaoJin LantaoJin Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why REGEXP should return integer?

@dai-chen I guess the reason is the behaviour of MySQL (https://dev.mysql.com/doc/refman/8.0/en/regexp.html#operator_regexp). The original request was opened in opendistro-for-elasticsearch/sql#710 and its implementation introduced by opendistro-for-elasticsearch/sql#750. It would be a breaking change if we change its return value type. Besides updating the user doc (https://github.com/opensearch-project/sql/blob/main/docs/user/dql/expressions.rst#regexp-value-test), we need a changelog file to record those breaking changes. Any thoughts cc @penghuo, @chloe-zh

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My proposal are two options:

  1. Correct the return type of matchesRegexp() to ExprBooleanValue in this PR, including user doc updates, then we need a new PR to introduce a changelog to record any breaking changes.
  2. Close current RP because it is not an issue. SELECT field1 FROM test WHERE 1 = (field1 REGEXP 'test.*') should work. The current fixing not only adds special logic but also introduces new confused behaviour (REGEXP expression returns integer in SELECT clause but boolean in WHERE clause).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, option 2 lacks user-friendliness.

Signed-off-by: panguixin <panguixin@bytedance.com>
Comment on lines 75 to 76
return expression instanceof FunctionImplementation
&& ((FunctionExpression) expression)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: instanceof FunctionExpression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Signed-off-by: panguixin <panguixin@bytedance.com>
@dai-chen
Copy link
Collaborator

dai-chen commented Aug 6, 2024

CI was blocked. Triggered its run now.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] REGEXP filter failed due to IllegalStateException
3 participants