Skip to content

Commit 76b35ad

Browse files
authored
Ignore NaN correctly in .quantile (rapidsai#17593)
From an offline conversation, fixes the follow discrepancy between cudf and pandas ```python In [1]: import cudf In [2]: import numpy as np In [3]: ser = cudf.Series([np.nan, np.nan, 0.9], nan_as_null=False) In [4]: ser Out[4]: 0 NaN 1 NaN 2 0.9 dtype: float64 In [5]: ser.quantile(0.9) Out[5]: np.float64(nan) In [6]: import pandas as pd In [7]: ser = pd.Series([np.nan, np.nan, 0.9]) In [8]: ser.quantile(0.9) Out[8]: np.float64(0.9) ``` Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#17593
1 parent 34e2045 commit 76b35ad

File tree

2 files changed

+20
-3
lines changed

2 files changed

+20
-3
lines changed

python/cudf/cudf/core/column/numerical_base.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -143,13 +143,14 @@ def quantile(
143143
),
144144
)
145145
else:
146+
no_nans = self.nans_to_nulls()
146147
# get sorted indices and exclude nulls
147148
indices = sorting.order_by(
148-
[self], [True], "first", stable=True
149-
).slice(self.null_count, len(self))
149+
[no_nans], [True], "first", stable=True
150+
).slice(no_nans.null_count, len(no_nans))
150151
with acquire_spill_lock():
151152
plc_column = plc.quantiles.quantile(
152-
self.to_pylibcudf(mode="read"),
153+
no_nans.to_pylibcudf(mode="read"),
153154
q,
154155
plc.types.Interpolation[interpolation.upper()],
155156
indices.to_pylibcudf(mode="read"),

python/cudf/cudf/tests/test_quantiles.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,3 +91,19 @@ def test_quantile_type_int_float(interpolation):
9191

9292
assert expected == actual
9393
assert type(expected) is type(actual)
94+
95+
96+
@pytest.mark.parametrize(
97+
"data",
98+
[
99+
[float("nan"), float("nan"), 0.9],
100+
[float("nan"), float("nan"), float("nan")],
101+
],
102+
)
103+
def test_ignore_nans(data):
104+
psr = pd.Series(data)
105+
gsr = cudf.Series(data, nan_as_null=False)
106+
107+
expected = gsr.quantile(0.9)
108+
result = psr.quantile(0.9)
109+
assert_eq(result, expected)

0 commit comments

Comments
 (0)