Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1904191:[API Coverage] functions coverage #2964

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

sfc-gh-yuwang
Copy link
Collaborator

@sfc-gh-yuwang sfc-gh-yuwang commented Jan 30, 2025

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1904191

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
  3. Please describe how your code solves the related issue.

    Please write a short description of how your code change solves the related issue.

@sfc-gh-yuwang sfc-gh-yuwang marked this pull request as ready for review January 31, 2025 18:16
@sfc-gh-yuwang sfc-gh-yuwang requested review from a team as code owners January 31, 2025 18:16
src/snowflake/snowpark/functions.py Outdated Show resolved Hide resolved
src/snowflake/snowpark/functions.py Outdated Show resolved Hide resolved
Comment on lines 1417 to 1421
elif isinstance(e, (list, tuple, set)):
for sub_e in e:
names.append(sub_e._named())
if _emit_ast and _ast_stmt is None:
ast_cols.append(sub_e._ast)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this change affect this new funcs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json_tuple will have to return a list of columns, while our select function currently does not support this form of call:
df.select(col1, [col2,col3]), this is meant to support it

CHANGELOG.md Outdated
Comment on lines 46 to 47
- `array_slice`
- `try_to_binary`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did this change get removed?

Copy link
Contributor

@sfc-gh-aalam sfc-gh-aalam Feb 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+ need to update functions.rst?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found out that we already support it, so I removed it from this PR

Copy link
Contributor

@sfc-gh-aalam sfc-gh-aalam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor fixes about _emit_ast. Please also take a review from IR team.

Comment on lines 707 to 709
max_bit = bitshiftleft(lit(1), 64)
unsigned_c = iff(c < 0, bitshiftright(c + max_bit, n), bitshiftright(c, n))
return call_builtin("bitand", unsigned_c, max_bit - 1, _emit_ast=_emit_ast)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
max_bit = bitshiftleft(lit(1), 64)
unsigned_c = iff(c < 0, bitshiftright(c + max_bit, n), bitshiftright(c, n))
return call_builtin("bitand", unsigned_c, max_bit - 1, _emit_ast=_emit_ast)
max_bit = bitshiftleft(lit(1, _emit_ast=False), 64, _emit_ast=False)
unsigned_c = iff(c < 0, bitshiftright(c + max_bit, n, _emit_ast=False), bitshiftright(c, n, _emit_ast=False), _emit_ast=False)
return call_builtin("bitand", unsigned_c, max_bit - 1, _emit_ast=_emit_ast)

let's make sure that for all internal calls to our functions, we set _emit_ast=False

"""
c = _to_col_if_str(col, "json_tuple")
return [
json_extract_path_text(parse_json(c), lit(field)).as_(f"c{i}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, _emit_ast=False

Copy link
Contributor

@sfc-gh-heshah sfc-gh-heshah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add expectation tests in the mentioned files, use the suggested helper functions from ast.utils, and see notes about json_tuple.

@@ -1414,6 +1418,11 @@ def select(
names.append(e._named())
if _emit_ast and _ast_stmt is None:
ast_cols.append(e._ast)
elif isinstance(e, (list, tuple)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding the column ASTs manually for this case, please use build_expr_from_python_val(ast_cols.add(), e) (from ast.utils) to capture when an input provided is a list or tuple.

It would also be very helpful if you could add an expectation test for this in tests/ast/data/select.test

bitshiftright(c, n, _emit_ast=False),
_emit_ast=False,
)
return call_builtin("bitand", unsigned_c, max_bit - 1, _emit_ast=_emit_ast)
Copy link
Contributor

@sfc-gh-heshah sfc-gh-heshah Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the arguments to the public API and the internal call_builtin method are different the AST needs to be constructed manually via build_builtin_fn_apply and assigned to the _ast field of the Column instance returned by call_builtin before returning. See functions.bround for a relevant example.

An expectation test in tests/ast/data/functions2.test would also be very helpful.

_emit_ast=False,
).as_(f"c{i}", _emit_ast=False)
for i, field in enumerate(fields)
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this API would emit any AST currently.

Modeling this API with the AST would still be more involved though, as this is the first Snowpark function that returns a List[Column] AFAIK. Usually we rely on the returned Column object storing its own AST in the ._ast field.

Is there any way the return type here can be a Column object? This would allow the AST to be constructed via build_builtin_fn_apply and assigned to the Column instance to be returned.

ast = None
if _emit_ast:
ast = proto.Expr()
build_builtin_fn_apply(ast, "bround", to_shift_column, n)
Copy link
Contributor

@sfc-gh-heshah sfc-gh-heshah Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Change builtin_name arg to "bitshiftright_unsigned" instead of "bround"

bitshiftright(c, n, _emit_ast=False),
_emit_ast=False,
)
col = call_builtin("bitand", unsigned_c, max_bit - 1, _emit_ast=_emit_ast)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more nit: _emit_ast should be False here as well now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants