Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mango: add $beginsWith operator #4810

Merged
merged 11 commits into from
Oct 30, 2023
Merged
136 changes: 72 additions & 64 deletions src/docs/src/api/database/find.rst
Original file line number Diff line number Diff line change
Expand Up @@ -673,68 +673,74 @@ In addition, some 'meta' condition operators are available. Some condition
operators accept any valid JSON content as the argument. Other condition
operators require the argument to be in a specific JSON format.

+---------------+-------------+------------+-----------------------------------+
| Operator type | Operator | Argument | Purpose |
+===============+=============+============+===================================+
| (In)equality | ``$lt`` | Any JSON | The field is less than the |
| | | | argument. |
+---------------+-------------+------------+-----------------------------------+
| | ``$lte`` | Any JSON | The field is less than or equal to|
| | | | the argument. |
+---------------+-------------+------------+-----------------------------------+
| | ``$eq`` | Any JSON | The field is equal to the argument|
+---------------+-------------+------------+-----------------------------------+
| | ``$ne`` | Any JSON | The field is not equal to the |
| | | | argument. |
+---------------+-------------+------------+-----------------------------------+
| | ``$gte`` | Any JSON | The field is greater than or equal|
| | | | to the argument. |
+---------------+-------------+------------+-----------------------------------+
| | ``$gt`` | Any JSON | The field is greater than the |
| | | | to the argument. |
+---------------+-------------+------------+-----------------------------------+
| Object | ``$exists`` | Boolean | Check whether the field exists or |
| | | | not, regardless of its value. |
+---------------+-------------+------------+-----------------------------------+
| | ``$type`` | String | Check the document field's type. |
| | | | Valid values are ``"null"``, |
| | | | ``"boolean"``, ``"number"``, |
| | | | ``"string"``, ``"array"``, and |
| | | | ``"object"``. |
+---------------+-------------+------------+-----------------------------------+
| Array | ``$in`` | Array of | The document field must exist in |
| | | JSON values| the list provided. |
+---------------+-------------+------------+-----------------------------------+
| | ``$nin`` | Array of | The document field not must exist |
| | | JSON values| in the list provided. |
+---------------+-------------+------------+-----------------------------------+
| | ``$size`` | Integer | Special condition to match the |
| | | | length of an array field in a |
| | | | document. Non-array fields cannot |
| | | | match this condition. |
+---------------+-------------+------------+-----------------------------------+
| Miscellaneous | ``$mod`` | [Divisor, | Divisor is a non-zero integer, |
| | | Remainder] | Remainder is any integer. |
| | | | Non-integer values result in a |
| | | | 404. Matches documents where |
| | | | ``field % Divisor == Remainder`` |
| | | | is true, and only when the |
| | | | document field is an integer. |
+---------------+-------------+------------+-----------------------------------+
| | ``$regex`` | String | A regular expression pattern to |
| | | | match against the document field. |
| | | | Only matches when the field is a |
| | | | string value and matches the |
| | | | supplied regular expression. The |
| | | | matching algorithms are based on |
| | | | the Perl Compatible Regular |
| | | | Expression (PCRE) library. For |
| | | | more information about what is |
| | | | implemented, see the see the |
| | | | `Erlang Regular Expression |
| | | | <http://erlang.org/doc |
| | | | /man/re.html>`_. |
+---------------+-------------+------------+-----------------------------------+
+---------------+-----------------+-------------+------------------------------------+
| Operator type | Operator | Argument | Purpose |
+===============+=================+=============+====================================+
| (In)equality | ``$lt`` | Any JSON | The field is less than the |
| | | | argument. |
+---------------+-----------------+-------------+------------------------------------+
| | ``$lte`` | Any JSON | The field is less than or equal to |
| | | | the argument. |
+---------------+-----------------+-------------+------------------------------------+
| | ``$eq`` | Any JSON | The field is equal to the argument |
+---------------+-----------------+-------------+------------------------------------+
| | ``$ne`` | Any JSON | The field is not equal to the |
| | | | argument. |
+---------------+-----------------+-------------+------------------------------------+
| | ``$gte`` | Any JSON | The field is greater than or equal |
| | | | to the argument. |
+---------------+-----------------+-------------+------------------------------------+
| | ``$gt`` | Any JSON | The field is greater than the |
| | | | to the argument. |
+---------------+-----------------+-------------+------------------------------------+
| Object | ``$exists`` | Boolean | Check whether the field exists or |
| | | | not, regardless of its value. |
+---------------+-----------------+-------------+------------------------------------+
| | ``$type`` | String | Check the document field's type. |
| | | | Valid values are ``"null"``, |
| | | | ``"boolean"``, ``"number"``, |
| | | | ``"string"``, ``"array"``, and |
| | | | ``"object"``. |
+---------------+-----------------+-------------+------------------------------------+
| Array | ``$in`` | Array of | The document field must exist in |
| | | JSON values | the list provided. |
+---------------+-----------------+-------------+------------------------------------+
| | ``$nin`` | Array of | The document field not must exist |
| | | JSON values | in the list provided. |
+---------------+-----------------+-------------+------------------------------------+
| | ``$size`` | Integer | Special condition to match the |
| | | | length of an array field in a |
| | | | document. Non-array fields cannot |
| | | | match this condition. |
+---------------+-----------------+-------------+------------------------------------+
| Miscellaneous | ``$mod`` | [Divisor, | Divisor is a non-zero integer, |
| | | Remainder] | Remainder is any integer. |
| | | | Non-integer values result in a |
| | | | 404. Matches documents where |
| | | | ``field % Divisor == Remainder`` |
| | | | is true, and only when the |
| | | | document field is an integer. |
+---------------+-----------------+-------------+------------------------------------+
| | ``$regex`` | String | A regular expression pattern to |
| | | | match against the document field. |
| | | | Only matches when the field is a |
| | | | string value and matches the |
| | | | supplied regular expression. The |
| | | | matching algorithms are based on |
| | | | the Perl Compatible Regular |
| | | | Expression (PCRE) library. For |
| | | | more information about what is |
| | | | implemented, see the see the |
| | | | `Erlang Regular Expression |
| | | | <http://erlang.org/doc |
| | | | /man/re.html>`_. |
+---------------+-----------------+-------------+------------------------------------+
| | ``$beginsWith`` | String | Matches where the document field |
| | | | begins with the specified prefix |
| | | | (case-sensitive). If the document |
| | | | field contains a non-string value, |
| | | | the document is not matched. |
+---------------+-----------------+-------------+------------------------------------+

.. warning::
Regular expressions do not work with indexes, so they should not be used to
Expand All @@ -754,8 +760,10 @@ can itself be another operator with arguments of its own. This enables us to
build up more complex selector expressions.

However, only equality operators such as ``$eq``, ``$gt``, ``$gte``, ``$lt``,
and ``$lte`` (but not ``$ne``) can be used as the basis of a query. You should
include at least one of these in a selector.
``$lte`` and ``$beginsWith`` (but not ``$ne``) can be used as the basis
pgj marked this conversation as resolved.
Show resolved Hide resolved
of a query that can make efficient use of a ``json`` index. You should
include at least one of these in a selector, or consider using
a ``text`` index if more flexibility is required.

For example, if you try to perform a query that attempts to match all documents
that have a field called `afieldname` containing a value that begins with the
Expand Down
6 changes: 6 additions & 0 deletions src/mango/src/mango_idx_view.erl
Original file line number Diff line number Diff line change
Expand Up @@ -306,6 +306,8 @@ indexable({[{<<"$gt">>, _}]}) ->
true;
indexable({[{<<"$gte">>, _}]}) ->
true;
indexable({[{<<"$beginsWith">>, _}]}) ->
true;
% This is required to improve index selection for covering indexes.
% Making `$exists` indexable should not cause problems in other cases.
indexable({[{<<"$exists">>, _}]}) ->
Expand Down Expand Up @@ -412,6 +414,10 @@ range(_, _, LCmp, Low, HCmp, High) ->
% operators but its all straight forward once you figure out how
% we're basically just narrowing our logical ranges.

% beginsWith requires both a high and low bound
range({[{<<"$beginsWith">>, Arg}]}, LCmp, Low, HCmp, High) ->
{LCmp0, Low0, HCmp0, High0} = range({[{<<"$gte">>, Arg}]}, LCmp, Low, HCmp, High),
range({[{<<"$lte">>, <<Arg/binary, 16#10FFFF>>}]}, LCmp0, Low0, HCmp0, High0);
range({[{<<"$lt">>, Arg}]}, LCmp, Low, HCmp, High) ->
case range_pos(Low, Arg, High) of
min ->
Expand Down
70 changes: 47 additions & 23 deletions src/mango/src/mango_selector.erl
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,8 @@ norm_ops({[{<<"$text">>, Arg}]}) when
{[{<<"$default">>, {[{<<"$text">>, Arg}]}}]};
norm_ops({[{<<"$text">>, Arg}]}) ->
?MANGO_ERROR({bad_arg, '$text', Arg});
norm_ops({[{<<"$beginsWith">>, Arg}]} = Cond) when is_binary(Arg) ->
Cond;
% Not technically an operator but we pass it through here
% so that this function accepts its own output. This exists
% so that $text can have a field name value which simplifies
Expand Down Expand Up @@ -514,6 +516,11 @@ match({[{<<"$mod">>, [D, R]}]}, Value, _Cmp) when is_integer(Value) ->
Value rem D == R;
match({[{<<"$mod">>, _}]}, _Value, _Cmp) ->
false;
match({[{<<"$beginsWith">>, Prefix}]}, Value, _Cmp) when is_binary(Prefix), is_binary(Value) ->
string:prefix(Value, Prefix) /= nomatch;
% When Value is not a string, do not match
match({[{<<"$beginsWith">>, Prefix}]}, _, _Cmp) when is_binary(Prefix) ->
false;
match({[{<<"$regex">>, Regex}]}, Value, _Cmp) when is_binary(Value) ->
try
match == re:run(Value, Regex, [{capture, none}])
Expand Down Expand Up @@ -652,6 +659,14 @@ fields({[]}) ->
-ifdef(TEST).
-include_lib("eunit/include/eunit.hrl").

-define(TEST_DOC,
{[
{<<"_id">>, <<"foo">>},
{<<"_rev">>, <<"bar">>},
{<<"user_id">>, 11}
]}
).

is_constant_field_basic_test() ->
Selector = normalize({[{<<"A">>, <<"foo">>}]}),
Field = <<"A">>,
Expand Down Expand Up @@ -991,30 +1006,22 @@ has_required_fields_or_nested_or_false_test() ->
Normalized = normalize(Selector),
?assertEqual(false, has_required_fields(Normalized, RequiredFields)).

check_match(Selector) ->
% Call match_int/2 to avoid ERROR for missing metric; this is confusing
% in the middle of test output.
match_int(mango_selector:normalize(Selector), ?TEST_DOC).

%% This test shows the shape match/2 expects for its arguments.
match_demo_test_() ->
Doc =
{[
{<<"_id">>, <<"foo">>},
{<<"_rev">>, <<"bar">>},
{<<"user_id">>, 11}
]},
Check = fun(Selector) ->
% Call match_int/2 to avoid ERROR for missing metric; this is confusing
% in the middle of test output.
match_int(mango_selector:normalize(Selector), Doc)
end,
[
% matching
?_assertEqual(true, Check({[{<<"user_id">>, 11}]})),
?_assertEqual(true, Check({[{<<"_id">>, <<"foo">>}]})),
?_assertEqual(true, Check({[{<<"_id">>, <<"foo">>}, {<<"_rev">>, <<"bar">>}]})),
% non-matching
?_assertEqual(false, Check({[{<<"user_id">>, 1234}]})),
% string 11 doesn't match number 11
?_assertEqual(false, Check({[{<<"user_id">>, <<"11">>}]})),
?_assertEqual(false, Check({[{<<"_id">>, <<"foo">>}, {<<"_rev">>, <<"quux">>}]}))
].
match_demo_test() ->
% matching
?assertEqual(true, check_match({[{<<"user_id">>, 11}]})),
?assertEqual(true, check_match({[{<<"_id">>, <<"foo">>}]})),
?assertEqual(true, check_match({[{<<"_id">>, <<"foo">>}, {<<"_rev">>, <<"bar">>}]})),
% non-matching
?assertEqual(false, check_match({[{<<"user_id">>, 1234}]})),
% string 11 doesn't match number 11
?assertEqual(false, check_match({[{<<"user_id">>, <<"11">>}]})),
?assertEqual(false, check_match({[{<<"_id">>, <<"foo">>}, {<<"_rev">>, <<"quux">>}]})).

fields_of(Selector) ->
fields(test_util:as_selector(Selector)).
Expand Down Expand Up @@ -1054,4 +1061,21 @@ fields_nor_test() ->
},
?assertEqual([<<"field1">>, <<"field2">>], fields_of(Selector2)).

check_beginswith(Field, Prefix) ->
Selector = {[{Field, {[{<<"$beginsWith">>, Prefix}]}}]},
% Call match_int/2 to avoid ERROR for missing metric; this is confusing
% in the middle of test output.
match_int(mango_selector:normalize(Selector), ?TEST_DOC).

match_beginswith_test() ->
% matching
?assertEqual(true, check_beginswith(<<"_id">>, <<"f">>)),
% no match (user_id is not a binary string)
?assertEqual(false, check_beginswith(<<"user_id">>, <<"f">>)),
% invalid (prefix is not a binary string)
?assertThrow(
{mango_error, mango_selector, {invalid_operator, <<"$beginsWith">>}},
check_beginswith(<<"user_id">>, 1)
).

-endif.
11 changes: 11 additions & 0 deletions src/mango/src/mango_selector_text.erl
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,11 @@ convert(Path, {[{<<"$exists">>, ShouldExist}]}) ->
true -> FieldExists;
false -> {op_not, {FieldExists, false}}
end;
convert(Path, {[{<<"$beginsWith">>, Arg}]}) when is_binary(Arg) ->
Prefix = mango_util:lucene_escape_query_value(Arg),
Suffix = <<"*">>,
PrefixSearch = <<Prefix/binary, Suffix/binary>>,
{op_field, {make_field(Path, Arg), PrefixSearch}};
% We're not checking the actual type here, just looking for
% anything that has a possibility of matching by checking
% for the field name. We use the same logic for $exists on
Expand Down Expand Up @@ -821,6 +826,12 @@ convert_nor_test() ->
})
).

convert_beginswith_test() ->
?assertEqual(
{op_field, {[[<<"field">>], <<":">>, <<"string">>], <<"foo*">>}},
convert_selector(#{<<"field">> => #{<<"$beginsWith">> => <<"foo">>}})
).

to_query_test() ->
F = fun(S) -> iolist_to_binary(to_query(S)) end,
Input = {<<"name">>, <<"value">>},
Expand Down
35 changes: 34 additions & 1 deletion src/mango/test/03-operator-test.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,13 @@
# License for the specific language governing permissions and limitations under
# the License.

from requests.exceptions import HTTPError
pgj marked this conversation as resolved.
Show resolved Hide resolved
import mango
import unittest


class BaseOperatorTests:
class Common(object):
class Common(unittest.TestCase):
def assertUserIds(self, user_ids, docs):
user_ids_returned = list(d["user_id"] for d in docs)
user_ids.sort()
Expand Down Expand Up @@ -141,6 +142,38 @@ def test_exists_false_returns_missing_but_not_null(self):
for d in docs:
self.assertNotIn("twitter", d)

def test_beginswith(self):
self.db.save_docs(
[
{"user_id": 99, "location": {"state": ":Bar"}},
]
)

cases = [
{"prefix": "New", "user_ids": [2, 10]},
# test characters that require escaping
{"prefix": "New ", "user_ids": [2, 10]},
{"prefix": ":", "user_ids": [99]},
{"prefix": "Foo", "user_ids": []},
{"prefix": '"Foo', "user_ids": []},
{"prefix": " New", "user_ids": []},
]

for case in cases:
with self.subTest(prefix=case["prefix"]):
selector = {"location.state": {"$beginsWith": case["prefix"]}}
docs = self.db.find(selector)
self.assertEqual(len(docs), len(case["user_ids"]))
self.assertUserIds(case["user_ids"], docs)

# non-string prefixes should return an error
def test_beginswith_invalid_prefix(self):
cases = [123, True, [], {}]
for prefix in cases:
with self.subTest(prefix=prefix):
with self.assertRaises(HTTPError):
pgj marked this conversation as resolved.
Show resolved Hide resolved
self.db.find({"location.state": {"$beginsWith": prefix}})


class OperatorJSONTests(mango.UserDocsTests, BaseOperatorTests.Common):
# START: text indexes do not support range queries across type boundaries so only
Expand Down
Loading