Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Support pandas ExtensionArray ordering #6481

Merged
merged 4 commits into from
Jan 17, 2025
Merged

Conversation

flying-sheep
Copy link
Contributor

Fixes #6452

@flying-sheep flying-sheep changed the title Support pandas ExtensionArray ordering fix: Support pandas ExtensionArray ordering Jan 7, 2025
Copy link

codecov bot commented Jan 7, 2025

Codecov Report

Attention: Patch coverage is 92.30769% with 2 lines in your changes missing coverage. Please review.

Project coverage is 88.76%. Comparing base (b9243ba) to head (a2cb75f).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
holoviews/element/util.py 87.50% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6481      +/-   ##
==========================================
- Coverage   88.76%   88.76%   -0.01%     
==========================================
  Files         323      323              
  Lines       68681    68698      +17     
==========================================
+ Hits        60963    60978      +15     
- Misses       7718     7720       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@hoxbro hoxbro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one comment, but this PR looks good otherwise.

if TYPE_CHECKING:
from typing import TypeVar

Array = TypeVar("Array", np.ndarray, pd.api.extensions.ExtensionArray)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type-annotation does not seem right to me. The function where it is used can take more than these two, e.g., pd.Series or a cupy.Array. I think we should use something like np.typing.ArrayLike, but I haven't done any testing.

Copy link
Contributor Author

@flying-sheep flying-sheep Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dataset.dimension_values() is documented as:

Returns:

NumPy array of values along the requested dimension

This documentation isn’t quite correct, e.g. my code relies on it returning a pd.ExtensionArray, but a lot of your code relies on dimension_values’s output e.g.

  • having a .dtype attribute
  • having the methods .astype, .min, .max, …
  • being iterable
  • being sliceable

I’m pretty sure it can’t be a pd.Series, for these your code passes on the inner np.ndarray | pd.ExtensionArray, that’s what I’m relying on here!

np.typing.ArrayLike also includes types like list or int, which I’m mostly sure isn’t correct. I mean, some parts of your code base check isinstance(list, vals), but others access the methods above without checking.

So can we figure out which types you actually support and actually fix the typing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the original issue, much of the code was written before the invention of typing, so it is a huge lift.

Also worth mentioning is that when written, most of the underlying data structure was numpy. This is no longer the case, as it can be something else, like here with pd.ExtensionArray.

I tried to see if I could get out some other types than the array type and couldn't.

Your changes LGTM. Thank you for the PR!

@hoxbro hoxbro merged commit becb5f3 into holoviz:main Jan 17, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Categorical order in hierarchical axis not respected
2 participants