Skip to content

Conversation

@alex-gregory-ds
Copy link

Closes #22024.

This adds a check to ensure each element of the fraction argument given to pl.Expr.sample is between 0 and 1. If it isn't, a ComputeError is raised.

Note for reviewers: I'm not sure if this is the right error type to use and I'm not sure whether this should have a unit test in rust, python or both.

@codecov
Copy link

codecov bot commented Nov 15, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.16%. Comparing base (e1d6f29) to head (e28d680).

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #25366   +/-   ##
=======================================
  Coverage   82.16%   82.16%           
=======================================
  Files        1728     1728           
  Lines      240153   240153           
  Branches     3028     3028           
=======================================
+ Hits       197310   197322   +12     
+ Misses      42063    42051   -12     
  Partials      780      780           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@vrajat
Copy link

vrajat commented Nov 16, 2025

I dont know the expected behaviour if fraction is None AND n is not set. Refer: #22024 (comment)

Here is a set of tests I wrote for myself to check its behaviour:

diff --git a/py-polars/tests/unit/operations/namespaces/list/test_list.py b/py-polars/tests/unit/operations/namespaces/list/test_list.py
index 3c9e5fe2f7..ae762fe6d0 100644
--- a/py-polars/tests/unit/operations/namespaces/list/test_list.py
+++ b/py-polars/tests/unit/operations/namespaces/list/test_list.py
@@ -424,6 +424,23 @@ def test_list_sample() -> None:
    )
    assert_frame_equal(df, expected_df)

+def test_list_sample_fraction_out_of_bounds():
+    s = pl.Series("values", [[1, 2, 3], [4, 5]])
+    # Fraction > 1.0
+    with pytest.raises(ComputeError, match="fraction.*must be between 0 and 1"):
+        s.list.sample(fraction=1.5)
+    # Fraction < 0.0
+    with pytest.raises(ComputeError, match="fraction.*must be between 0 and 1"):
+        s.list.sample(fraction=-0.1)
+    with pytest.raises(ComputeError, match="fraction.*must be between 0 and 1"):
+        s.list.sample(fraction=None)
+
+def test_list_sample_fraction_null():
+    s = pl.Series("values", [[1, 2, 3], [4, 5]])
+    # Fraction is null for the first row, valid for the second
+    fraction = pl.Series("fraction", [None, 1.5])
+    with pytest.raises(ComputeError, match="fraction.*must be between 0 and 1"):
+        s.list.sample(fraction=fraction)

@alex-gregory-ds
Copy link
Author

@vrajat, thanks for suggesting those unit tests. After speaking with reviewers on Discord, I've changed the code so it no longer raises a ComputeError and therefore have not added your unit tests as you have written them.

However, your tests cover edge cases that I have not considered so it would be useful to get the opinion of a reviewer on this. In particular, @vrajat, you have written tests for negative fractions and None fractions. This is how the code currently behaves in both those instances:

>>> import polars as pl
>>> df = pl.DataFrame(pl.Series("a", [[1, 2, 3], [4, 5]]))
>>> pl.col.a.list.sample(fraction=pl.Series([1.0, 0.0]), with_replacement=False)
<Expr ['col("a").list.sample_fraction(…'] at 0x27DC34C7650>
>>> df.select(pl.col.a.list.sample(fraction=pl.Series([1.0, 0.0]), with_replacement=False))
shape: (2, 1)
┌───────────┐
│ a         │
│ ---       │
│ list[i64] │
╞═══════════╡
│ [1, 2, 3] │
│ []        │
└───────────┘
>>> >>> df.select(pl.col.a.list.sample(fraction=pl.Series([1.1, 0.0]), with_replacement=False))
Traceback (most recent call last):
  File "<python-input-4>", line 1, in <module>
    df.select(pl.col.a.list.sample(fraction=pl.Series([1.1, 0.0]), with_replacement=False))
    ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\AlexG\Documents\polars\py-polars\src\polars\dataframe\frame.py", line 10124, in select
    .collect(optimizations=QueryOptFlags._eager())
     ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\AlexG\Documents\polars\py-polars\src\polars\_utils\deprecation.py", line 97, in wrapper
    return function(*args, **kwargs)
  File "C:\Users\AlexG\Documents\polars\py-polars\src\polars\lazyframe\opt_flags.py", line 328, in wrapper
    return function(*args, **kwargs)
  File "C:\Users\AlexG\Documents\polars\py-polars\src\polars\lazyframe\frame.py", line 2425, in collect
    return wrap_df(ldf.collect(engine, callback))
                   ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
polars.exceptions.ShapeError: cannot take a larger sample than the total population when `with_replacement=false`
>>> df.select(pl.col.a.list.sample(fraction=pl.Series([-5, -1]), with_replacement=False))
shape: (2, 1)
┌───────────┐
│ a         │
│ ---       │
│ list[i64] │
╞═══════════╡
│ []        │
│ []        │
└───────────┘
>>> >>> df.select(pl.col.a.list.sample(fraction=pl.Series([None, 1.0]), with_replacement=False))
shape: (2, 1)
┌───────────┐
│ a         │
│ ---       │
│ list[i64] │
╞═══════════╡
│ null      │
│ [4, 5]    │
└───────────┘

The case where the fraction is None makes sense to me but the cases where the fractions are negative make less sense to me. Should we raise an error when a negative fraction is provided?

@vrajat
Copy link

vrajat commented Nov 20, 2025

I vote for an error. The intent of -ve fractions is unclear to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Bug fix python Related to Python Polars

Projects

None yet

Development

Successfully merging this pull request may close these issues.

list.sample fraction allows floats between 1.0 and 2.0

2 participants