-
Notifications
You must be signed in to change notification settings - Fork 2.5k
fix(python): Check that fractions are between 0 and 1 for pl.Expr.sample
#25366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(python): Check that fractions are between 0 and 1 for pl.Expr.sample
#25366
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #25366 +/- ##
=======================================
Coverage 82.16% 82.16%
=======================================
Files 1728 1728
Lines 240153 240153
Branches 3028 3028
=======================================
+ Hits 197310 197322 +12
+ Misses 42063 42051 -12
Partials 780 780 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I dont know the expected behaviour if fraction is Here is a set of tests I wrote for myself to check its behaviour: |
|
@vrajat, thanks for suggesting those unit tests. After speaking with reviewers on Discord, I've changed the code so it no longer raises a However, your tests cover edge cases that I have not considered so it would be useful to get the opinion of a reviewer on this. In particular, @vrajat, you have written tests for negative fractions and >>> import polars as pl
>>> df = pl.DataFrame(pl.Series("a", [[1, 2, 3], [4, 5]]))
>>> pl.col.a.list.sample(fraction=pl.Series([1.0, 0.0]), with_replacement=False)
<Expr ['col("a").list.sample_fraction(…'] at 0x27DC34C7650>
>>> df.select(pl.col.a.list.sample(fraction=pl.Series([1.0, 0.0]), with_replacement=False))
shape: (2, 1)
┌───────────┐
│ a │
│ --- │
│ list[i64] │
╞═══════════╡
│ [1, 2, 3] │
│ [] │
└───────────┘
>>> >>> df.select(pl.col.a.list.sample(fraction=pl.Series([1.1, 0.0]), with_replacement=False))
Traceback (most recent call last):
File "<python-input-4>", line 1, in <module>
df.select(pl.col.a.list.sample(fraction=pl.Series([1.1, 0.0]), with_replacement=False))
~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\AlexG\Documents\polars\py-polars\src\polars\dataframe\frame.py", line 10124, in select
.collect(optimizations=QueryOptFlags._eager())
~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\AlexG\Documents\polars\py-polars\src\polars\_utils\deprecation.py", line 97, in wrapper
return function(*args, **kwargs)
File "C:\Users\AlexG\Documents\polars\py-polars\src\polars\lazyframe\opt_flags.py", line 328, in wrapper
return function(*args, **kwargs)
File "C:\Users\AlexG\Documents\polars\py-polars\src\polars\lazyframe\frame.py", line 2425, in collect
return wrap_df(ldf.collect(engine, callback))
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
polars.exceptions.ShapeError: cannot take a larger sample than the total population when `with_replacement=false`
>>> df.select(pl.col.a.list.sample(fraction=pl.Series([-5, -1]), with_replacement=False))
shape: (2, 1)
┌───────────┐
│ a │
│ --- │
│ list[i64] │
╞═══════════╡
│ [] │
│ [] │
└───────────┘
>>> >>> df.select(pl.col.a.list.sample(fraction=pl.Series([None, 1.0]), with_replacement=False))
shape: (2, 1)
┌───────────┐
│ a │
│ --- │
│ list[i64] │
╞═══════════╡
│ null │
│ [4, 5] │
└───────────┘The case where the fraction is |
|
I vote for an error. The intent of -ve fractions is unclear to me. |
Closes #22024.
This adds a check to ensure each element of the
fractionargument given topl.Expr.sampleis between 0 and 1. If it isn't, aComputeErroris raised.Note for reviewers: I'm not sure if this is the right error type to use and I'm not sure whether this should have a unit test in rust, python or both.