Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regexp filter query #2797

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

package org.opensearch.sql.sql;

import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_STRINGS;
import static org.opensearch.sql.legacy.plugin.RestSqlAction.QUERY_API_ENDPOINT;
import static org.opensearch.sql.util.MatcherUtils.rows;
import static org.opensearch.sql.util.MatcherUtils.schema;
Expand All @@ -25,7 +26,7 @@ public class TextFunctionIT extends SQLIntegTestCase {

@Override
public void init() throws Exception {
super.init();
loadIndex(Index.BANK_WITH_STRING_VALUES);
}

void verifyQuery(String query, String type, String output) throws IOException {
Expand All @@ -52,6 +53,15 @@ public void testRegexp() throws IOException {
verifyQuery("'a' regexp '.*'", "integer", 1);
}

@Test
public void testRegexpAgainstIndex() throws IOException {
JSONObject result =
executeQuery(
String.format("select * from %s where name regexp 'hel.*'", TEST_INDEX_STRINGS));
verifySchema(result, schema("name", "text"));
verifyDataRows(result, rows("hello"), rows("helloworld"));
}

@Test
public void testReverse() throws IOException {
verifyQuery("reverse('hello')", "keyword", "olleh");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,15 @@ private ExprValue evaluateExpression(
return ExprBooleanValue.of(false);
}

// refer to https://github.com/opensearch-project/sql/issues/2796
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some unit tests in ExpressionFilterScriptTest?

if (result.type() == ExprCoreType.INTEGER) {
if (result.integerValue() == 0) {
result = ExprBooleanValue.of(false);
} else if (result.integerValue() == 1) {
result = ExprBooleanValue.of(true);
Comment on lines +56 to +59
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check will cause additional issue.
select * from %s where 1 * 1 will be treated as where true and select * from %s where 1 * 0 will be treated as where false. They should throw exceptions.
How about change the return value type to ExprBooleanValue in

public static ExprIntegerValue matchesRegexp(ExprValue text, ExprValue pattern) {
return new ExprIntegerValue(
Pattern.compile(pattern.stringValue()).matcher(text.stringValue()).matches() ? 1 : 0);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we modify the return type of OperatorUtils#matchesRegexp, it will impact the return type of the REGEXP function, which could be a breaking change. For instance, the query SELECT 'Hello!' REGEXP '.*', 'a' REGEXP 'b' will initially yield 1,0 as the output. However, after modifying the return type, the output will be true,false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LantaoJin only when the expression is a REGEXP expression will the conversion be executed. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. The conversion scope should be limited in REGEXP expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed it. Please take another look at your convenience.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why REGEXP should return integer? From the Javadoc, the implementation seems wrong? I'm thinking we should avoid adding such special logic on critical path if possible.

  /**
   * Checks if text matches regular expression pattern.
   *
   * @param pattern string pattern to match.
   * @return if text matches pattern returns true; else return false.
   */
  public static ExprIntegerValue matchesRegexp(ExprValue text, ExprValue pattern) {
    return new ExprIntegerValue(
        Pattern.compile(pattern.stringValue()).matcher(text.stringValue()).matches() ? 1 : 0);
  }

Copy link
Member

@LantaoJin LantaoJin Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why REGEXP should return integer?

@dai-chen I guess the reason is the behaviour of MySQL (https://dev.mysql.com/doc/refman/8.0/en/regexp.html#operator_regexp). The original request was opened in opendistro-for-elasticsearch/sql#710 and its implementation introduced by opendistro-for-elasticsearch/sql#750. It would be a breaking change if we change its return value type. Besides updating the user doc (https://github.com/opensearch-project/sql/blob/main/docs/user/dql/expressions.rst#regexp-value-test), we need a changelog file to record those breaking changes. Any thoughts cc @penghuo, @chloe-zh

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My proposal are two options:

  1. Correct the return type of matchesRegexp() to ExprBooleanValue in this PR, including user doc updates, then we need a new PR to introduce a changelog to record any breaking changes.
  2. Close current RP because it is not an issue. SELECT field1 FROM test WHERE 1 = (field1 REGEXP 'test.*') should work. The current fixing not only adds special logic but also introduces new confused behaviour (REGEXP expression returns integer in SELECT clause but boolean in WHERE clause).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, option 2 lacks user-friendliness.

}
}

if (result.type() != ExprCoreType.BOOLEAN) {
throw new IllegalStateException(
String.format(
Expand Down