BUG: Inconsistent behavior of Groupby with None values with filter (#… #63178

koskampt · 2025-11-23T15:13:32Z

…62501)

closes BUG: Inconsistent behavior of Groupby with None values with filter #62501
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added an entry in the latest doc/source/whatsnew/v2.3.4.rst file if fixing a bug or adding a new feature.

rhshadrach

Thanks for the PR! Please always add tests. Does this also handle the tuple case on L667?

rhshadrach · 2025-11-23T17:52:15Z

pandas/core/groupby/groupby.py

                for name in names
            )

+        elif any(isna(k) for k in self.indices.keys()):


This check is expensive - this function is only ever called currently with names a list of length 1, and the rest of the method is O(1) in terms of self.indices. It's called from the inner loop of DataFramGroupBy.fitler as we're iterating over each group. This seems avoidable.

I believe we could change this function to just accept a single name (rather than a list) and then have a special case:

if isna(name): return self.indices.get(np.nan, [])

I think self.indices.get(np.nan, []) won't work as the Nan value in the self.indices can not be accessed reliable before changing the keys from Nan to np.nan. I think I have a working solution though. Will supply the updated version of the PR tomorrow.

I think self.indices.get(np.nan, []) won't work as the Nan value in the self.indices can not be accessed reliable before changing the keys from Nan to np.nan.

Isn't this what I suggested to do in #63178 (comment)

I think I misread your first comment from two days ago. To make sure we are on the same page, we can change the function _get_indices(self, names) to _get_indices(self, name). Changing the list for a single name?

Yes! It is only ever used with a single name today.

rhshadrach · 2025-11-23T17:58:08Z

pandas/core/groupby/groupby.py

            names = (converter(name) for name in names)

-        return [self.indices.get(name, []) for name in names]
+        indices = {np.nan if isna(k) else k: v for k, v in self.indices.items()}


It seems better to do this on indices cached property directly, and only in the case where there is a NaN value with if not self.dropna and self.result_index.hasnans.

Good point, will adjust.

rhshadrach · 2025-11-25T22:42:50Z

@koskampt - I opened #63202 to give some idea of what I'm thinking. If you like that, can incorporate it here. But still open to alternative solutions that do not iterate through indices within _get_indices for the reasons provided.

Even with such a solution, will still want to see the result of running the groupby ASVs to evaluate performance impact. I can also help assist here if desired.

koskampt · 2025-11-26T22:16:47Z

@rhshadrach I had a look at your pull request and incorporated your suggestions in mine. I also made the change _get_indices(self, names) to _get_indices(self, name).

I am not familiar with the (groupby) ASVs, but I guess it referring to this: https://pandas.pydata.org/community/benchmarks.html. Help would be greatly appreciated, although I will through the docs by myself first.

rhshadrach

I am not familiar with the (groupby) ASVs, but I guess it referring to this: https://pandas.pydata.org/community/benchmarks.html. Help would be greatly appreciated, although I will through the docs by myself first.

Correct - if you're using conda for your virtual environment, then this should be sufficient:

asv continuous -f 1.1 upstream/main HEAD -b ^groupby

rhshadrach · 2025-11-29T13:19:54Z

pandas/core/groupby/groupby.py


    @final
-    def _get_indices(self, names):
+    def _get_indices(self, name):


Can you rename this _get_index and remove the other _get_index entirely.

rhshadrach · 2025-11-29T13:23:04Z

pandas/core/groupby/ops.py

+            elif has_mi:
+                # MultiIndex has no efficient way to tell if there are NAs
+                result = {
+                    tuple(np.nan if isna(comp) else comp for comp in key): value


Suggested change

tuple(np.nan if isna(comp) else comp for comp in key): value

# error: "Hashable" has no attribute "__iter__" (not iterable)

tuple(np.nan if isna(comp) else comp for comp in key): value # type: ignore[attr-defined]

Added the change.

…andas-dev#62501)

…andas-dev#62501) - Add test cases - Add tuple support - Incorporate feedback

BUG: Inconsistent behavior of Groupby with None values with filter

koskampt · 2025-11-29T16:10:02Z

I was able to get the asv up and running (a couple of days ago). I will run the benchmark with the below command and report back the results.

asv continuous -f 1.1 upstream/main HEAD -b ^groupby

koskampt · 2025-11-29T16:33:46Z

Just checking, I also went through asv_bench/benchmarks/groupby.py file but couldn't see a specific benchmark test that cover the case where dropna = False, there are no values and indices is called. Am I missing something or should we add a new benchmark in order to test the performance impact of the change in this pull request?

koskampt requested a review from rhshadrach as a code owner November 23, 2025 15:13

rhshadrach reviewed Nov 23, 2025

View reviewed changes

koskampt force-pushed the bug-fix-grouby-with-none-values-with-filter branch from edd8a1f to 8d2126a Compare November 25, 2025 19:57

rhshadrach requested changes Nov 29, 2025

View reviewed changes

T. Koskamp added 4 commits November 29, 2025 17:04

BUG: Inconsistent behavior of Groupby with None values with filter (p…

b5b447e

…andas-dev#62501)

BUG: Inconsistent behavior of Groupby with None values with filter (p…

d2046e9

…andas-dev#62501) - Add test cases - Add tuple support - Incorporate feedback

Update indices property from groupby

74057eb

Incorporate review suggestion for issue pandas-dev#63178

f7c5e23

BUG: Inconsistent behavior of Groupby with None values with filter

koskampt force-pushed the bug-fix-grouby-with-none-values-with-filter branch from 2ef342b to f7c5e23 Compare November 29, 2025 16:04

	tuple(np.nan if isna(comp) else comp for comp in key): value
	# error: "Hashable" has no attribute "__iter__" (not iterable)
	tuple(np.nan if isna(comp) else comp for comp in key): value # type: ignore[attr-defined]

Uh oh!

BUG: Inconsistent behavior of Groupby with None values with filter (#… #63178

Are you sure you want to change the base?

BUG: Inconsistent behavior of Groupby with None values with filter (#… #63178

Conversation

koskampt commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

koskampt Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rhshadrach commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

koskampt commented Nov 26, 2025

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

koskampt commented Nov 29, 2025

Uh oh!

koskampt commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

koskampt commented Nov 23, 2025 •

edited

Loading

koskampt Nov 25, 2025 •

edited

Loading

rhshadrach commented Nov 25, 2025 •

edited

Loading