-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: SparkLikeNamespace
methods
#1779
base: main
Are you sure you want to change the base?
Conversation
sorted_indices = sorted( | ||
range(len(sort_list)), key=lambda i: (sort_list[i] is None, sort_list[i]) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise list with None
s would just fail to sort
from pyspark.sql.types import IntegerType | ||
|
||
def _n_unique(_input: Column) -> Column: | ||
return F.count_distinct(_input) + F.max(F.isnull(_input).cast(IntegerType())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highly inspired by duckdb implementation ππ
expr._function_name, expr._function_name | ||
) | ||
agg_func = get_spark_function(function_name, **expr._kwargs) | ||
agg_func = get_spark_function(expr._function_name, **expr._kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not manage to make nw.len()
work in the group_by context
@MarcoGorelli scikit-lego issue is definitly unrelated |
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below
Introduces:
Expr.is_null
Expr.n_unique
Namespace.len
Namespace.any_horizontal
Namespace.mean_horizontal
Namespace.min_horizontal
Namespace.max_horizontal
Namespace.concat
Namespace.concat_str